Re: [bug report] kmemleak issue observed during blktests

Nilay Shroff <nilay@xxxxxxxxxxxxx> · Thu, 17 Jul 2025 19:41:30 +0530

On 7/17/25 5:32 AM, Ming Lei wrote:
> On Thu, Jul 17, 2025 at 12:54:31AM +0530, Nilay Shroff wrote:
>>
>>
>> On 7/16/25 4:10 PM, Ming Lei wrote:
>>> On Wed, Jul 16, 2025 at 03:50:34PM +0800, Yu Kuai wrote:
>>>> Hi,
>>>>
>>>> 在 2025/07/16 9:54, Jens Axboe 写道:
>>>>> unreferenced object 0xffff8882e7fbb000 (size 2048):
>>>>>    comm "check", pid 10460, jiffies 4324980514
>>>>>    hex dump (first 32 bytes):
>>>>>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>>>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>>>>>    backtrace (crc c47e6a37):
>>>>>      __kvmalloc_node_noprof+0x55d/0x7a0
>>>>>      sbitmap_init_node+0x15a/0x6a0
>>>>>      kyber_init_hctx+0x316/0xb90
>>>>>      blk_mq_init_sched+0x416/0x580
>>>>>      elevator_switch+0x18b/0x630
>>>>>      elv_update_nr_hw_queues+0x219/0x2c0
>>>>>      __blk_mq_update_nr_hw_queues+0x36a/0x6f0
>>>>>      blk_mq_update_nr_hw_queues+0x3a/0x60
>>>>>      find_fallback+0x510/0x540 [nbd]
>>>>
>>>> This is werid, and I check the code that it's impossible
>>>> blk_mq_update_nr_hw_queues() can be called from find_fallback().
>>>
>>> Yes.
>>>
>>>> Does kmemleak show wrong backtrace?
>>>
>>> I tried to run blktests block/005 over nbd, but can't reproduce this
>>> kmemleak report after setting up the detector.
>>
>> I have analyzed this bug and found the root cause:
>>
>> The issue arises while we run nr_hw_queue update,  Specifically, we first
>> reallocate hardware contexts (hctx) via __blk_mq_realloc_hw_ctxs(), and 
>> then later invoke elevator_switch() (assuming q->elevator is not NULL). 
>> The elevator switch code would first exit old elevator (elevator_exit)
>> and then switch to new elevator. The elevator_exit loops through
>> each hctx and invokes the elevator’s per-hctx exit method ->exit_hctx(),
>> which releases resources allocated during ->init_hctx().
>>
>> This memleak manifests when we reduce the num of h/w queues - for example,
>> when the initial update sets the number of queues to X, and a later update
>> reduces it to Y, where Y < X. In this case, we'd loose the access to old 
>> hctxs while we get to elevator exit code because __blk_mq_realloc_hw_ctxs
>> would have already released the old hctxs. As we don't now have any reference
>> left to the old hctxs, we don't have any way to free the scheduler resources
>> (which are allocate in ->init_hctx()) and kmemleak complains about it.
>>
>> Regarding reproduction, I was also not able to recreate it using block/005
>> but then I wrote a script using null-blk driver which updates nr_hw_queue
>> from X to Y (where Y < X) and I encountered this memleak. So this is not
>> an issue with nbd driver.
>>
>> I've implemented a potential fix for the above issue and I'm unit 
>> testing it now. I will post a formal patch in some time.
> 
> Great!
> 
> Looks it is introduced in commit 596dce110b7d ("block: simplify elevator reattachment
> for updating nr_hw_queues"), but easy to cause panic with that patchset.
> 
Yeah correct.

> One simple fix is to restore to original two-stage elevator switch, meantime saving
> elevator name in xarray for not adding boilerplate code back.

Agreed, I did implement the same and the fix is on its way...

Thanks,
--Nilay