On 7/17/25 5:32 AM, Ming Lei wrote: > On Thu, Jul 17, 2025 at 12:54:31AM +0530, Nilay Shroff wrote: >> >> >> On 7/16/25 4:10 PM, Ming Lei wrote: >>> On Wed, Jul 16, 2025 at 03:50:34PM +0800, Yu Kuai wrote: >>>> Hi, >>>> >>>> 在 2025/07/16 9:54, Jens Axboe 写道: >>>>> unreferenced object 0xffff8882e7fbb000 (size 2048): >>>>> comm "check", pid 10460, jiffies 4324980514 >>>>> hex dump (first 32 bytes): >>>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >>>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >>>>> backtrace (crc c47e6a37): >>>>> __kvmalloc_node_noprof+0x55d/0x7a0 >>>>> sbitmap_init_node+0x15a/0x6a0 >>>>> kyber_init_hctx+0x316/0xb90 >>>>> blk_mq_init_sched+0x416/0x580 >>>>> elevator_switch+0x18b/0x630 >>>>> elv_update_nr_hw_queues+0x219/0x2c0 >>>>> __blk_mq_update_nr_hw_queues+0x36a/0x6f0 >>>>> blk_mq_update_nr_hw_queues+0x3a/0x60 >>>>> find_fallback+0x510/0x540 [nbd] >>>> >>>> This is werid, and I check the code that it's impossible >>>> blk_mq_update_nr_hw_queues() can be called from find_fallback(). >>> >>> Yes. >>> >>>> Does kmemleak show wrong backtrace? >>> >>> I tried to run blktests block/005 over nbd, but can't reproduce this >>> kmemleak report after setting up the detector. >> >> I have analyzed this bug and found the root cause: >> >> The issue arises while we run nr_hw_queue update, Specifically, we first >> reallocate hardware contexts (hctx) via __blk_mq_realloc_hw_ctxs(), and >> then later invoke elevator_switch() (assuming q->elevator is not NULL). >> The elevator switch code would first exit old elevator (elevator_exit) >> and then switch to new elevator. The elevator_exit loops through >> each hctx and invokes the elevator’s per-hctx exit method ->exit_hctx(), >> which releases resources allocated during ->init_hctx(). >> >> This memleak manifests when we reduce the num of h/w queues - for example, >> when the initial update sets the number of queues to X, and a later update >> reduces it to Y, where Y < X. In this case, we'd loose the access to old >> hctxs while we get to elevator exit code because __blk_mq_realloc_hw_ctxs >> would have already released the old hctxs. As we don't now have any reference >> left to the old hctxs, we don't have any way to free the scheduler resources >> (which are allocate in ->init_hctx()) and kmemleak complains about it. >> >> Regarding reproduction, I was also not able to recreate it using block/005 >> but then I wrote a script using null-blk driver which updates nr_hw_queue >> from X to Y (where Y < X) and I encountered this memleak. So this is not >> an issue with nbd driver. >> >> I've implemented a potential fix for the above issue and I'm unit >> testing it now. I will post a formal patch in some time. > > Great! > > Looks it is introduced in commit 596dce110b7d ("block: simplify elevator reattachment > for updating nr_hw_queues"), but easy to cause panic with that patchset. > Yeah correct. > One simple fix is to restore to original two-stage elevator switch, meantime saving > elevator name in xarray for not adding boilerplate code back. Agreed, I did implement the same and the fix is on its way... Thanks, --Nilay