Re: [bug report] kmemleak issue observed during blktests

Nilay Shroff <nilay@xxxxxxxxxxxxx> · Thu, 17 Jul 2025 00:54:31 +0530

On 7/16/25 4:10 PM, Ming Lei wrote:
> On Wed, Jul 16, 2025 at 03:50:34PM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2025/07/16 9:54, Jens Axboe 写道:
>>> unreferenced object 0xffff8882e7fbb000 (size 2048):
>>>    comm "check", pid 10460, jiffies 4324980514
>>>    hex dump (first 32 bytes):
>>>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>    backtrace (crc c47e6a37):
>>>      __kvmalloc_node_noprof+0x55d/0x7a0
>>>      sbitmap_init_node+0x15a/0x6a0
>>>      kyber_init_hctx+0x316/0xb90
>>>      blk_mq_init_sched+0x416/0x580
>>>      elevator_switch+0x18b/0x630
>>>      elv_update_nr_hw_queues+0x219/0x2c0
>>>      __blk_mq_update_nr_hw_queues+0x36a/0x6f0
>>>      blk_mq_update_nr_hw_queues+0x3a/0x60
>>>      find_fallback+0x510/0x540 [nbd]
>>
>> This is werid, and I check the code that it's impossible
>> blk_mq_update_nr_hw_queues() can be called from find_fallback().
> 
> Yes.
> 
>> Does kmemleak show wrong backtrace?
> 
> I tried to run blktests block/005 over nbd, but can't reproduce this
> kmemleak report after setting up the detector.

I have analyzed this bug and found the root cause:

The issue arises while we run nr_hw_queue update,  Specifically, we first
reallocate hardware contexts (hctx) via __blk_mq_realloc_hw_ctxs(), and 
then later invoke elevator_switch() (assuming q->elevator is not NULL). 
The elevator switch code would first exit old elevator (elevator_exit)
and then switch to new elevator. The elevator_exit loops through
each hctx and invokes the elevator’s per-hctx exit method ->exit_hctx(),
which releases resources allocated during ->init_hctx().

This memleak manifests when we reduce the num of h/w queues - for example,
when the initial update sets the number of queues to X, and a later update
reduces it to Y, where Y < X. In this case, we'd loose the access to old 
hctxs while we get to elevator exit code because __blk_mq_realloc_hw_ctxs
would have already released the old hctxs. As we don't now have any reference
left to the old hctxs, we don't have any way to free the scheduler resources
(which are allocate in ->init_hctx()) and kmemleak complains about it.

Regarding reproduction, I was also not able to recreate it using block/005
but then I wrote a script using null-blk driver which updates nr_hw_queue
from X to Y (where Y < X) and I encountered this memleak. So this is not
an issue with nbd driver.

I've implemented a potential fix for the above issue and I'm unit 
testing it now. I will post a formal patch in some time.

Thanks,
--Nilay