On 7/16/25 4:10 PM, Ming Lei wrote: > On Wed, Jul 16, 2025 at 03:50:34PM +0800, Yu Kuai wrote: >> Hi, >> >> 在 2025/07/16 9:54, Jens Axboe 写道: >>> unreferenced object 0xffff8882e7fbb000 (size 2048): >>> comm "check", pid 10460, jiffies 4324980514 >>> hex dump (first 32 bytes): >>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >>> backtrace (crc c47e6a37): >>> __kvmalloc_node_noprof+0x55d/0x7a0 >>> sbitmap_init_node+0x15a/0x6a0 >>> kyber_init_hctx+0x316/0xb90 >>> blk_mq_init_sched+0x416/0x580 >>> elevator_switch+0x18b/0x630 >>> elv_update_nr_hw_queues+0x219/0x2c0 >>> __blk_mq_update_nr_hw_queues+0x36a/0x6f0 >>> blk_mq_update_nr_hw_queues+0x3a/0x60 >>> find_fallback+0x510/0x540 [nbd] >> >> This is werid, and I check the code that it's impossible >> blk_mq_update_nr_hw_queues() can be called from find_fallback(). > > Yes. > >> Does kmemleak show wrong backtrace? > > I tried to run blktests block/005 over nbd, but can't reproduce this > kmemleak report after setting up the detector. I have analyzed this bug and found the root cause: The issue arises while we run nr_hw_queue update, Specifically, we first reallocate hardware contexts (hctx) via __blk_mq_realloc_hw_ctxs(), and then later invoke elevator_switch() (assuming q->elevator is not NULL). The elevator switch code would first exit old elevator (elevator_exit) and then switch to new elevator. The elevator_exit loops through each hctx and invokes the elevator’s per-hctx exit method ->exit_hctx(), which releases resources allocated during ->init_hctx(). This memleak manifests when we reduce the num of h/w queues - for example, when the initial update sets the number of queues to X, and a later update reduces it to Y, where Y < X. In this case, we'd loose the access to old hctxs while we get to elevator exit code because __blk_mq_realloc_hw_ctxs would have already released the old hctxs. As we don't now have any reference left to the old hctxs, we don't have any way to free the scheduler resources (which are allocate in ->init_hctx()) and kmemleak complains about it. Regarding reproduction, I was also not able to recreate it using block/005 but then I wrote a script using null-blk driver which updates nr_hw_queue from X to Y (where Y < X) and I encountered this memleak. So this is not an issue with nbd driver. I've implemented a potential fix for the above issue and I'm unit testing it now. I will post a formal patch in some time. Thanks, --Nilay