On Thu, Jul 17, 2025 at 12:54:31AM +0530, Nilay Shroff wrote: > > > On 7/16/25 4:10 PM, Ming Lei wrote: > > On Wed, Jul 16, 2025 at 03:50:34PM +0800, Yu Kuai wrote: > >> Hi, > >> > >> 在 2025/07/16 9:54, Jens Axboe 写道: > >>> unreferenced object 0xffff8882e7fbb000 (size 2048): > >>> comm "check", pid 10460, jiffies 4324980514 > >>> hex dump (first 32 bytes): > >>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > >>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > >>> backtrace (crc c47e6a37): > >>> __kvmalloc_node_noprof+0x55d/0x7a0 > >>> sbitmap_init_node+0x15a/0x6a0 > >>> kyber_init_hctx+0x316/0xb90 > >>> blk_mq_init_sched+0x416/0x580 > >>> elevator_switch+0x18b/0x630 > >>> elv_update_nr_hw_queues+0x219/0x2c0 > >>> __blk_mq_update_nr_hw_queues+0x36a/0x6f0 > >>> blk_mq_update_nr_hw_queues+0x3a/0x60 > >>> find_fallback+0x510/0x540 [nbd] > >> > >> This is werid, and I check the code that it's impossible > >> blk_mq_update_nr_hw_queues() can be called from find_fallback(). > > > > Yes. > > > >> Does kmemleak show wrong backtrace? > > > > I tried to run blktests block/005 over nbd, but can't reproduce this > > kmemleak report after setting up the detector. > > I have analyzed this bug and found the root cause: > > The issue arises while we run nr_hw_queue update, Specifically, we first > reallocate hardware contexts (hctx) via __blk_mq_realloc_hw_ctxs(), and > then later invoke elevator_switch() (assuming q->elevator is not NULL). > The elevator switch code would first exit old elevator (elevator_exit) > and then switch to new elevator. The elevator_exit loops through > each hctx and invokes the elevator’s per-hctx exit method ->exit_hctx(), > which releases resources allocated during ->init_hctx(). > > This memleak manifests when we reduce the num of h/w queues - for example, > when the initial update sets the number of queues to X, and a later update > reduces it to Y, where Y < X. In this case, we'd loose the access to old > hctxs while we get to elevator exit code because __blk_mq_realloc_hw_ctxs > would have already released the old hctxs. As we don't now have any reference > left to the old hctxs, we don't have any way to free the scheduler resources > (which are allocate in ->init_hctx()) and kmemleak complains about it. > > Regarding reproduction, I was also not able to recreate it using block/005 > but then I wrote a script using null-blk driver which updates nr_hw_queue > from X to Y (where Y < X) and I encountered this memleak. So this is not > an issue with nbd driver. > > I've implemented a potential fix for the above issue and I'm unit > testing it now. I will post a formal patch in some time. Great! Looks it is introduced in commit 596dce110b7d ("block: simplify elevator reattachment for updating nr_hw_queues"), but easy to cause panic with that patchset. One simple fix is to restore to original two-stage elevator switch, meantime saving elevator name in xarray for not adding boilerplate code back. Thanks, Ming