On Thu, Jul 17, 2025 at 8:02 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > On Thu, Jul 17, 2025 at 12:54:31AM +0530, Nilay Shroff wrote: > > > > > > On 7/16/25 4:10 PM, Ming Lei wrote: > > > On Wed, Jul 16, 2025 at 03:50:34PM +0800, Yu Kuai wrote: > > >> Hi, > > >> > > >> 在 2025/07/16 9:54, Jens Axboe 写道: > > >>> unreferenced object 0xffff8882e7fbb000 (size 2048): > > >>> comm "check", pid 10460, jiffies 4324980514 > > >>> hex dump (first 32 bytes): > > >>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > >>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > > >>> backtrace (crc c47e6a37): > > >>> __kvmalloc_node_noprof+0x55d/0x7a0 > > >>> sbitmap_init_node+0x15a/0x6a0 > > >>> kyber_init_hctx+0x316/0xb90 > > >>> blk_mq_init_sched+0x416/0x580 > > >>> elevator_switch+0x18b/0x630 > > >>> elv_update_nr_hw_queues+0x219/0x2c0 > > >>> __blk_mq_update_nr_hw_queues+0x36a/0x6f0 > > >>> blk_mq_update_nr_hw_queues+0x3a/0x60 > > >>> find_fallback+0x510/0x540 [nbd] > > >> > > >> This is werid, and I check the code that it's impossible > > >> blk_mq_update_nr_hw_queues() can be called from find_fallback(). > > > > > > Yes. > > > > > >> Does kmemleak show wrong backtrace? > > > > > > I tried to run blktests block/005 over nbd, but can't reproduce this > > > kmemleak report after setting up the detector. > > > > I have analyzed this bug and found the root cause: > > > > The issue arises while we run nr_hw_queue update, Specifically, we first > > reallocate hardware contexts (hctx) via __blk_mq_realloc_hw_ctxs(), and > > then later invoke elevator_switch() (assuming q->elevator is not NULL). > > The elevator switch code would first exit old elevator (elevator_exit) > > and then switch to new elevator. The elevator_exit loops through > > each hctx and invokes the elevator’s per-hctx exit method ->exit_hctx(), > > which releases resources allocated during ->init_hctx(). > > > > This memleak manifests when we reduce the num of h/w queues - for example, > > when the initial update sets the number of queues to X, and a later update > > reduces it to Y, where Y < X. In this case, we'd loose the access to old > > hctxs while we get to elevator exit code because __blk_mq_realloc_hw_ctxs > > would have already released the old hctxs. As we don't now have any reference > > left to the old hctxs, we don't have any way to free the scheduler resources > > (which are allocate in ->init_hctx()) and kmemleak complains about it. > > > > Regarding reproduction, I was also not able to recreate it using block/005 > > but then I wrote a script using null-blk driver which updates nr_hw_queue > > from X to Y (where Y < X) and I encountered this memleak. So this is not > > an issue with nbd driver. > > > > I've implemented a potential fix for the above issue and I'm unit > > testing it now. I will post a formal patch in some time. > > Great! > > Looks it is introduced in commit 596dce110b7d ("block: simplify elevator reattachment > for updating nr_hw_queues"), but easy to cause panic with that patchset. > > One simple fix is to restore to original two-stage elevator switch, meantime saving > elevator name in xarray for not adding boilerplate code back. > > > Thanks, > Ming > Sorry for the late response, it takes me some time to find which case triggered the kmemleak. It turns out that block/040[1] triggered the kmemleak, and just running [2] after block/040 can not trigger the kmemleak immediately. We have to wait for more time. [1] [ 458.175983] null_blk: disk nullb0 created [ 458.180035] null_blk: module loaded [ 458.397994] run blktests block/040 at 2025-07-16 20:31:20 [ 458.571488] null_blk: disk nullb1 created [ 874.620574] kmemleak: 522 new suspected memory leaks (see /sys/kernel/debug/kmemleak) [2] echo scan >/sys/kernel/debug/kmemleak -- Best Regards, Yi Zhang