On Thu, Jul 17, 2025 at 10:12 PM Nilay Shroff <nilay@xxxxxxxxxxxxx> wrote: > > > > On 7/17/25 5:32 AM, Ming Lei wrote: > > On Thu, Jul 17, 2025 at 12:54:31AM +0530, Nilay Shroff wrote: > >> > >> > >> On 7/16/25 4:10 PM, Ming Lei wrote: > >>> On Wed, Jul 16, 2025 at 03:50:34PM +0800, Yu Kuai wrote: > >>>> Hi, > >>>> > >>>> 在 2025/07/16 9:54, Jens Axboe 写道: > >>>>> unreferenced object 0xffff8882e7fbb000 (size 2048): > >>>>> comm "check", pid 10460, jiffies 4324980514 > >>>>> hex dump (first 32 bytes): > >>>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > >>>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > >>>>> backtrace (crc c47e6a37): > >>>>> __kvmalloc_node_noprof+0x55d/0x7a0 > >>>>> sbitmap_init_node+0x15a/0x6a0 > >>>>> kyber_init_hctx+0x316/0xb90 > >>>>> blk_mq_init_sched+0x416/0x580 > >>>>> elevator_switch+0x18b/0x630 > >>>>> elv_update_nr_hw_queues+0x219/0x2c0 > >>>>> __blk_mq_update_nr_hw_queues+0x36a/0x6f0 > >>>>> blk_mq_update_nr_hw_queues+0x3a/0x60 > >>>>> find_fallback+0x510/0x540 [nbd] > >>>> > >>>> This is werid, and I check the code that it's impossible > >>>> blk_mq_update_nr_hw_queues() can be called from find_fallback(). > >>> > >>> Yes. > >>> > >>>> Does kmemleak show wrong backtrace? > >>> > >>> I tried to run blktests block/005 over nbd, but can't reproduce this > >>> kmemleak report after setting up the detector. > >> > >> I have analyzed this bug and found the root cause: > >> > >> The issue arises while we run nr_hw_queue update, Specifically, we first > >> reallocate hardware contexts (hctx) via __blk_mq_realloc_hw_ctxs(), and > >> then later invoke elevator_switch() (assuming q->elevator is not NULL). > >> The elevator switch code would first exit old elevator (elevator_exit) > >> and then switch to new elevator. The elevator_exit loops through > >> each hctx and invokes the elevator’s per-hctx exit method ->exit_hctx(), > >> which releases resources allocated during ->init_hctx(). > >> > >> This memleak manifests when we reduce the num of h/w queues - for example, > >> when the initial update sets the number of queues to X, and a later update > >> reduces it to Y, where Y < X. In this case, we'd loose the access to old > >> hctxs while we get to elevator exit code because __blk_mq_realloc_hw_ctxs > >> would have already released the old hctxs. As we don't now have any reference > >> left to the old hctxs, we don't have any way to free the scheduler resources > >> (which are allocate in ->init_hctx()) and kmemleak complains about it. > >> > >> Regarding reproduction, I was also not able to recreate it using block/005 > >> but then I wrote a script using null-blk driver which updates nr_hw_queue > >> from X to Y (where Y < X) and I encountered this memleak. So this is not > >> an issue with nbd driver. > >> > >> I've implemented a potential fix for the above issue and I'm unit > >> testing it now. I will post a formal patch in some time. > > > > Great! > > > > Looks it is introduced in commit 596dce110b7d ("block: simplify elevator reattachment > > for updating nr_hw_queues"), but easy to cause panic with that patchset. > > > Yeah correct. > > > One simple fix is to restore to original two-stage elevator switch, meantime saving > > elevator name in xarray for not adding boilerplate code back. > > Agreed, I did implement the same and the fix is on its way... > > Thanks, > --Nilay > Hi Nilay How about update the patch with the below trace which doesn't have nbd info: unreferenced object 0xffff8881b82f7400 (size 512): comm "check", pid 68454, jiffies 4310588881 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 5bac8b34): __kvmalloc_node_noprof+0x55d/0x7a0 sbitmap_init_node+0x15a/0x6a0 kyber_init_hctx+0x316/0xb90 blk_mq_init_sched+0x419/0x580 elevator_switch+0x18b/0x630 elv_update_nr_hw_queues+0x219/0x2c0 __blk_mq_update_nr_hw_queues+0x36a/0x6f0 blk_mq_update_nr_hw_queues+0x3a/0x60 0xffffffffc09ceb80 0xffffffffc09d7e0b configfs_write_iter+0x2b1/0x470 vfs_write+0x527/0xe70 ksys_write+0xff/0x200 do_syscall_64+0x98/0x3c0 entry_SYSCALL_64_after_hwframe+0x76/0x7e unreferenced object 0xffff8881b82f6000 (size 512): comm "check", pid 68454, jiffies 4310588881 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 5bac8b34): __kvmalloc_node_noprof+0x55d/0x7a0 sbitmap_init_node+0x15a/0x6a0 kyber_init_hctx+0x316/0xb90 blk_mq_init_sched+0x419/0x580 elevator_switch+0x18b/0x630 elv_update_nr_hw_queues+0x219/0x2c0 __blk_mq_update_nr_hw_queues+0x36a/0x6f0 blk_mq_update_nr_hw_queues+0x3a/0x60 0xffffffffc09ceb80 0xffffffffc09d7e0b configfs_write_iter+0x2b1/0x470 vfs_write+0x527/0xe70 ksys_write+0xff/0x200 do_syscall_64+0x98/0x3c0 entry_SYSCALL_64_after_hwframe+0x76/0x7e unreferenced object 0xffff8881b82f5800 (size 512): comm "check", pid 68454, jiffies 4310588881 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 5bac8b34): __kvmalloc_node_noprof+0x55d/0x7a0 sbitmap_init_node+0x15a/0x6a0 kyber_init_hctx+0x316/0xb90 blk_mq_init_sched+0x419/0x580 elevator_switch+0x18b/0x630 elv_update_nr_hw_queues+0x219/0x2c0 __blk_mq_update_nr_hw_queues+0x36a/0x6f0 blk_mq_update_nr_hw_queues+0x3a/0x60 0xffffffffc09ceb80 0xffffffffc09d7e0b configfs_write_iter+0x2b1/0x470 vfs_write+0x527/0xe70 ksys_write+0xff/0x200 do_syscall_64+0x98/0x3c0 entry_SYSCALL_64_after_hwframe+0x76/0x7e -- Best Regards, Yi Zhang