On Wed, Jul 2, 2025 at 10:17 PM Nilay Shroff <nilay@xxxxxxxxxxxxx> wrote: > > > > On 7/2/25 7:23 PM, Yi Zhang wrote: > > Hi Nilay > > > > With the patch on the latest linux-block/for-next, I reproduced the > > following WARNING with blktests block/005, here is the full log: > > > > [ 342.845331] run blktests block/005 at 2025-07-02 09:48:55 > > > > [ 343.835605] ====================================================== > > [ 343.841783] WARNING: possible circular locking dependency detected > > [ 343.847966] 6.16.0-rc4.fix+ #3 Not tainted > > [ 343.852073] ------------------------------------------------------ > > [ 343.858250] check/1365 is trying to acquire lock: > > [ 343.862957] ffffffff98141db0 (pcpu_alloc_mutex){+.+.}-{4:4}, at: > > pcpu_alloc_noprof+0x8eb/0xd70 > > [ 343.871587] > > but task is already holding lock: > > [ 343.877421] ffff888300cfb040 (&q->elevator_lock){+.+.}-{4:4}, at: > > elevator_change+0x152/0x530 > > [ 343.885958] > > which lock already depends on the new lock. > > > > [ 343.894131] > > the existing dependency chain (in reverse order) is: > > [ 343.901609] > > -> #3 (&q->elevator_lock){+.+.}-{4:4}: > > [ 343.907891] __lock_acquire+0x6f1/0xc00 > > [ 343.912259] lock_acquire.part.0+0xb6/0x240 > > [ 343.916966] __mutex_lock+0x17b/0x1690 > > [ 343.921247] elevator_change+0x152/0x530 > > [ 343.925692] elv_iosched_store+0x205/0x2f0 > > [ 343.930312] queue_attr_store+0x23b/0x300 > > [ 343.934853] kernfs_fop_write_iter+0x357/0x530 > > [ 343.939829] vfs_write+0x9bc/0xf60 > > [ 343.943763] ksys_write+0xf3/0x1d0 > > [ 343.947695] do_syscall_64+0x8c/0x3d0 > > [ 343.951883] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > [ 343.957462] > > -> #2 (&q->q_usage_counter(io)#4){++++}-{0:0}: > > [ 343.964440] __lock_acquire+0x6f1/0xc00 > > [ 343.968799] lock_acquire.part.0+0xb6/0x240 > > [ 343.973507] blk_alloc_queue+0x5c5/0x710 > > [ 343.977959] blk_mq_alloc_queue+0x14e/0x240 > > [ 343.982666] __blk_mq_alloc_disk+0x15/0xd0 > > [ 343.987294] nvme_alloc_ns+0x208/0x1690 [nvme_core] > > [ 343.992727] nvme_scan_ns+0x362/0x4c0 [nvme_core] > > [ 343.997978] async_run_entry_fn+0x96/0x4f0 > > [ 344.002599] process_one_work+0x8cd/0x1950 > > [ 344.007226] worker_thread+0x58d/0xcf0 > > [ 344.011499] kthread+0x3d8/0x7a0 > > [ 344.015259] ret_from_fork+0x406/0x510 > > [ 344.019532] ret_from_fork_asm+0x1a/0x30 > > [ 344.023980] > > -> #1 (fs_reclaim){+.+.}-{0:0}: > > [ 344.029654] __lock_acquire+0x6f1/0xc00 > > [ 344.034015] lock_acquire.part.0+0xb6/0x240 > > [ 344.038727] fs_reclaim_acquire+0x103/0x150 > > [ 344.043433] prepare_alloc_pages+0x15f/0x600 > > [ 344.048230] __alloc_frozen_pages_noprof+0x14a/0x3a0 > > [ 344.053722] __alloc_pages_noprof+0xd/0x1d0 > > [ 344.058438] pcpu_alloc_pages.constprop.0+0x104/0x420 > > [ 344.064017] pcpu_populate_chunk+0x38/0x80 > > [ 344.068644] pcpu_alloc_noprof+0x650/0xd70 > > [ 344.073265] iommu_dma_init_fq+0x183/0x730 > > [ 344.077893] iommu_dma_init_domain+0x566/0x990 > > [ 344.082866] iommu_setup_dma_ops+0xca/0x230 > > [ 344.087571] bus_iommu_probe+0x1f8/0x4a0 > > [ 344.092020] iommu_device_register+0x153/0x240 > > [ 344.096993] iommu_init_pci+0x53c/0x1040 > > [ 344.101447] amd_iommu_init_pci+0xb6/0x5c0 > > [ 344.106066] state_next+0xaf7/0xff0 > > [ 344.110080] iommu_go_to_state+0x21/0x80 > > [ 344.114535] amd_iommu_init+0x15/0x70 > > [ 344.118728] pci_iommu_init+0x29/0x70 > > [ 344.122914] do_one_initcall+0x100/0x5a0 > > [ 344.127361] do_initcalls+0x138/0x1d0 > > [ 344.131556] kernel_init_freeable+0x8b7/0xbd0 > > [ 344.136442] kernel_init+0x1b/0x1f0 > > [ 344.140456] ret_from_fork+0x406/0x510 > > [ 344.144735] ret_from_fork_asm+0x1a/0x30 > > [ 344.149182] > > -> #0 (pcpu_alloc_mutex){+.+.}-{4:4}: > > [ 344.155379] check_prev_add+0xf1/0xce0 > > [ 344.159653] validate_chain+0x470/0x580 > > [ 344.164019] __lock_acquire+0x6f1/0xc00 > > [ 344.168378] lock_acquire.part.0+0xb6/0x240 > > [ 344.173085] __mutex_lock+0x17b/0x1690 > > [ 344.177365] pcpu_alloc_noprof+0x8eb/0xd70 > > [ 344.181984] kyber_queue_data_alloc+0x16d/0x660 > > [ 344.187047] kyber_init_sched+0x14/0x90 > > [ 344.191413] blk_mq_init_sched+0x264/0x4e0 > > [ 344.196033] elevator_switch+0x186/0x6a0 > > [ 344.200478] elevator_change+0x305/0x530 > > [ 344.204924] elv_iosched_store+0x205/0x2f0 > > [ 344.209545] queue_attr_store+0x23b/0x300 > > [ 344.214084] kernfs_fop_write_iter+0x357/0x530 > > [ 344.219051] vfs_write+0x9bc/0xf60 > > [ 344.222976] ksys_write+0xf3/0x1d0 > > [ 344.226902] do_syscall_64+0x8c/0x3d0 > > [ 344.231088] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > [ 344.236660] > > Thanks for the report! > > I see that the above warning is different from the one addressed by the > current patchset. In the warning you've reported, the kyber elevator > allocates per-CPU data after acquiring ->elevator_lock, which introduces > a per-CPU lock dependency on the ->elevator_lock. > > In contrast, the current patchset addresses a separate issue [1] that arises > due to elevator tag allocation. This allocation occurs after both ->freeze_lock > and ->elevator_lock are held. Internally, elevator tags allocation sets up > per-CPU sbitmap->alloc_hint, which also introduces a similar per-CPU lock > dependency on ->elevator_lock. > > That said, I'll plan to address the issue you've just reported in a separate > patch, once the current patchset is merged. OK, the issue[1] was easy to reproduce by blktests block/005 on my environment, that's why I re-run the test on this patchset, and now this one new WARNING triggered, anyway, thanks for the info and will test your new patch later, thanks. > > Thanks, > --Nilay > > [1]https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@xxxxxxxxxxxxx/ > > > -- Best Regards, Yi Zhang