Re: [bug report] kmemleak issue observed during blktests

Yi Zhang <yi.zhang@xxxxxxxxxx> · Thu, 17 Jul 2025 08:46:06 +0800

On Thu, Jul 17, 2025 at 8:02 AM Ming Lei <ming.lei@xxxxxxxxxx> wrote:
>
> On Thu, Jul 17, 2025 at 12:54:31AM +0530, Nilay Shroff wrote:
> >
> >
> > On 7/16/25 4:10 PM, Ming Lei wrote:
> > > On Wed, Jul 16, 2025 at 03:50:34PM +0800, Yu Kuai wrote:
> > >> Hi,
> > >>
> > >> 在 2025/07/16 9:54, Jens Axboe 写道:
> > >>> unreferenced object 0xffff8882e7fbb000 (size 2048):
> > >>>    comm "check", pid 10460, jiffies 4324980514
> > >>>    hex dump (first 32 bytes):
> > >>>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > >>>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> > >>>    backtrace (crc c47e6a37):
> > >>>      __kvmalloc_node_noprof+0x55d/0x7a0
> > >>>      sbitmap_init_node+0x15a/0x6a0
> > >>>      kyber_init_hctx+0x316/0xb90
> > >>>      blk_mq_init_sched+0x416/0x580
> > >>>      elevator_switch+0x18b/0x630
> > >>>      elv_update_nr_hw_queues+0x219/0x2c0
> > >>>      __blk_mq_update_nr_hw_queues+0x36a/0x6f0
> > >>>      blk_mq_update_nr_hw_queues+0x3a/0x60
> > >>>      find_fallback+0x510/0x540 [nbd]
> > >>
> > >> This is werid, and I check the code that it's impossible
> > >> blk_mq_update_nr_hw_queues() can be called from find_fallback().
> > >
> > > Yes.
> > >
> > >> Does kmemleak show wrong backtrace?
> > >
> > > I tried to run blktests block/005 over nbd, but can't reproduce this
> > > kmemleak report after setting up the detector.
> >
> > I have analyzed this bug and found the root cause:
> >
> > The issue arises while we run nr_hw_queue update,  Specifically, we first
> > reallocate hardware contexts (hctx) via __blk_mq_realloc_hw_ctxs(), and
> > then later invoke elevator_switch() (assuming q->elevator is not NULL).
> > The elevator switch code would first exit old elevator (elevator_exit)
> > and then switch to new elevator. The elevator_exit loops through
> > each hctx and invokes the elevator’s per-hctx exit method ->exit_hctx(),
> > which releases resources allocated during ->init_hctx().
> >
> > This memleak manifests when we reduce the num of h/w queues - for example,
> > when the initial update sets the number of queues to X, and a later update
> > reduces it to Y, where Y < X. In this case, we'd loose the access to old
> > hctxs while we get to elevator exit code because __blk_mq_realloc_hw_ctxs
> > would have already released the old hctxs. As we don't now have any reference
> > left to the old hctxs, we don't have any way to free the scheduler resources
> > (which are allocate in ->init_hctx()) and kmemleak complains about it.
> >
> > Regarding reproduction, I was also not able to recreate it using block/005
> > but then I wrote a script using null-blk driver which updates nr_hw_queue
> > from X to Y (where Y < X) and I encountered this memleak. So this is not
> > an issue with nbd driver.
> >
> > I've implemented a potential fix for the above issue and I'm unit
> > testing it now. I will post a formal patch in some time.
>
> Great!
>
> Looks it is introduced in commit 596dce110b7d ("block: simplify elevator reattachment
> for updating nr_hw_queues"), but easy to cause panic with that patchset.
>
> One simple fix is to restore to original two-stage elevator switch, meantime saving
> elevator name in xarray for not adding boilerplate code back.
>
>
> Thanks,
> Ming
>

Sorry for the late response, it takes me some time to find which case
triggered the kmemleak.
It turns out that block/040[1] triggered the kmemleak, and just
running [2] after block/040 can not trigger the kmemleak immediately.
We have to wait for more time.
[1]
[  458.175983] null_blk: disk nullb0 created
[  458.180035] null_blk: module loaded
[  458.397994] run blktests block/040 at 2025-07-16 20:31:20
[  458.571488] null_blk: disk nullb1 created
[  874.620574] kmemleak: 522 new suspected memory leaks (see
/sys/kernel/debug/kmemleak)
[2]
echo scan >/sys/kernel/debug/kmemleak

-- 
Best Regards,
  Yi Zhang