On Thu, Aug 28, 2025 at 05:28:26PM +0800, Li Nan wrote: > > > 在 2025/8/27 16:10, Ming Lei 写道: > > On Wed, Aug 27, 2025 at 11:22:06AM +0800, Li Nan wrote: > > > > > > > > > 在 2025/8/27 9:35, Ming Lei 写道: > > > > On Wed, Aug 27, 2025 at 09:04:45AM +0800, Yu Kuai wrote: > > > > > Hi, > > > > > > > > > > 在 2025/08/27 8:58, Ming Lei 写道: > > > > > > On Tue, Aug 26, 2025 at 04:48:54PM +0800, linan666@xxxxxxxxxxxxxxx wrote: > > > > > > > From: Li Nan <linan122@xxxxxxxxxx> > > > > > > > > > > > > > > In __blk_mq_update_nr_hw_queues() the return value of > > > > > > > blk_mq_sysfs_register_hctxs() is not checked. If sysfs creation for hctx > > > > > > > > > > > > Looks we should check its return value and handle the failure in both > > > > > > the call site and blk_mq_sysfs_register_hctxs(). > > > > > > > > > > From __blk_mq_update_nr_hw_queues(), the old hctxs is already > > > > > unregistered, and this function is void, we failed to register new hctxs > > > > > because of memory allocation failure. I really don't know how to handle > > > > > the failure here, do you have any suggestions? > > > > > > > > It is out of memory, I think it is fine to do whatever to leave queue state > > > > intact instead of making it `partial workable`, such as: > > > > > > > > - try update nr_hw_queues to 1 > > > > > > > > - if it still fails, delete disk & mark queue as dead if disk is attached > > > > > > > > > > If we ignore these non-critical sysfs creation failures, the disk remains > > > usable with no loss of functionality. Deleting the disk seems to escalate > > > the error? > > > > It is more like a workaround by ignoring the sysfs register failure. And if > > the issue need to be fixed in this way, you have to document it. > > > In case of OOM, it usually means that the system isn't usable any more. > > But it is NOIO allocation and the typical use case is for error recovery in > > nvme pci, so there may not be enough pages for noio allocation only. That is > > the reason for ignoring sysfs register in blk_mq_update_nr_hw_queues()? > > > > But NVMe has been pretty fragile in this area by using non-owner queue > > freeze, and call blk_mq_update_nr_hw_queues() on frozen queue, so it is > > really necessary to take it into account? > > I agree with your points about NOIO and NVMe. > > I hit this issue in null_blk during fuzz testing with memory-fault > injection. Changing the number of hardware queues under OOM is extremely > rare in real-world usage. So I think adding a workaround and documenting it > is sufficient. What do you think? Looks fine for me. Thanks, Ming