Re: [PATCHv7 0/3] block: move sched_tags allocation/de-allocation outside of locking context

Yi Zhang <yi.zhang@xxxxxxxxxx> · Wed, 2 Jul 2025 22:41:28 +0800

On Wed, Jul 2, 2025 at 10:17 PM Nilay Shroff <nilay@xxxxxxxxxxxxx> wrote:
>
>
>
> On 7/2/25 7:23 PM, Yi Zhang wrote:
> > Hi Nilay
> >
> > With the patch on the latest linux-block/for-next, I reproduced the
> > following WARNING with blktests block/005, here is the full log:
> >
> > [  342.845331] run blktests block/005 at 2025-07-02 09:48:55
> >
> > [  343.835605] ======================================================
> > [  343.841783] WARNING: possible circular locking dependency detected
> > [  343.847966] 6.16.0-rc4.fix+ #3 Not tainted
> > [  343.852073] ------------------------------------------------------
> > [  343.858250] check/1365 is trying to acquire lock:
> > [  343.862957] ffffffff98141db0 (pcpu_alloc_mutex){+.+.}-{4:4}, at:
> > pcpu_alloc_noprof+0x8eb/0xd70
> > [  343.871587]
> >                but task is already holding lock:
> > [  343.877421] ffff888300cfb040 (&q->elevator_lock){+.+.}-{4:4}, at:
> > elevator_change+0x152/0x530
> > [  343.885958]
> >                which lock already depends on the new lock.
> >
> > [  343.894131]
> >                the existing dependency chain (in reverse order) is:
> > [  343.901609]
> >                -> #3 (&q->elevator_lock){+.+.}-{4:4}:
> > [  343.907891]        __lock_acquire+0x6f1/0xc00
> > [  343.912259]        lock_acquire.part.0+0xb6/0x240
> > [  343.916966]        __mutex_lock+0x17b/0x1690
> > [  343.921247]        elevator_change+0x152/0x530
> > [  343.925692]        elv_iosched_store+0x205/0x2f0
> > [  343.930312]        queue_attr_store+0x23b/0x300
> > [  343.934853]        kernfs_fop_write_iter+0x357/0x530
> > [  343.939829]        vfs_write+0x9bc/0xf60
> > [  343.943763]        ksys_write+0xf3/0x1d0
> > [  343.947695]        do_syscall_64+0x8c/0x3d0
> > [  343.951883]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [  343.957462]
> >                -> #2 (&q->q_usage_counter(io)#4){++++}-{0:0}:
> > [  343.964440]        __lock_acquire+0x6f1/0xc00
> > [  343.968799]        lock_acquire.part.0+0xb6/0x240
> > [  343.973507]        blk_alloc_queue+0x5c5/0x710
> > [  343.977959]        blk_mq_alloc_queue+0x14e/0x240
> > [  343.982666]        __blk_mq_alloc_disk+0x15/0xd0
> > [  343.987294]        nvme_alloc_ns+0x208/0x1690 [nvme_core]
> > [  343.992727]        nvme_scan_ns+0x362/0x4c0 [nvme_core]
> > [  343.997978]        async_run_entry_fn+0x96/0x4f0
> > [  344.002599]        process_one_work+0x8cd/0x1950
> > [  344.007226]        worker_thread+0x58d/0xcf0
> > [  344.011499]        kthread+0x3d8/0x7a0
> > [  344.015259]        ret_from_fork+0x406/0x510
> > [  344.019532]        ret_from_fork_asm+0x1a/0x30
> > [  344.023980]
> >                -> #1 (fs_reclaim){+.+.}-{0:0}:
> > [  344.029654]        __lock_acquire+0x6f1/0xc00
> > [  344.034015]        lock_acquire.part.0+0xb6/0x240
> > [  344.038727]        fs_reclaim_acquire+0x103/0x150
> > [  344.043433]        prepare_alloc_pages+0x15f/0x600
> > [  344.048230]        __alloc_frozen_pages_noprof+0x14a/0x3a0
> > [  344.053722]        __alloc_pages_noprof+0xd/0x1d0
> > [  344.058438]        pcpu_alloc_pages.constprop.0+0x104/0x420
> > [  344.064017]        pcpu_populate_chunk+0x38/0x80
> > [  344.068644]        pcpu_alloc_noprof+0x650/0xd70
> > [  344.073265]        iommu_dma_init_fq+0x183/0x730
> > [  344.077893]        iommu_dma_init_domain+0x566/0x990
> > [  344.082866]        iommu_setup_dma_ops+0xca/0x230
> > [  344.087571]        bus_iommu_probe+0x1f8/0x4a0
> > [  344.092020]        iommu_device_register+0x153/0x240
> > [  344.096993]        iommu_init_pci+0x53c/0x1040
> > [  344.101447]        amd_iommu_init_pci+0xb6/0x5c0
> > [  344.106066]        state_next+0xaf7/0xff0
> > [  344.110080]        iommu_go_to_state+0x21/0x80
> > [  344.114535]        amd_iommu_init+0x15/0x70
> > [  344.118728]        pci_iommu_init+0x29/0x70
> > [  344.122914]        do_one_initcall+0x100/0x5a0
> > [  344.127361]        do_initcalls+0x138/0x1d0
> > [  344.131556]        kernel_init_freeable+0x8b7/0xbd0
> > [  344.136442]        kernel_init+0x1b/0x1f0
> > [  344.140456]        ret_from_fork+0x406/0x510
> > [  344.144735]        ret_from_fork_asm+0x1a/0x30
> > [  344.149182]
> >                -> #0 (pcpu_alloc_mutex){+.+.}-{4:4}:
> > [  344.155379]        check_prev_add+0xf1/0xce0
> > [  344.159653]        validate_chain+0x470/0x580
> > [  344.164019]        __lock_acquire+0x6f1/0xc00
> > [  344.168378]        lock_acquire.part.0+0xb6/0x240
> > [  344.173085]        __mutex_lock+0x17b/0x1690
> > [  344.177365]        pcpu_alloc_noprof+0x8eb/0xd70
> > [  344.181984]        kyber_queue_data_alloc+0x16d/0x660
> > [  344.187047]        kyber_init_sched+0x14/0x90
> > [  344.191413]        blk_mq_init_sched+0x264/0x4e0
> > [  344.196033]        elevator_switch+0x186/0x6a0
> > [  344.200478]        elevator_change+0x305/0x530
> > [  344.204924]        elv_iosched_store+0x205/0x2f0
> > [  344.209545]        queue_attr_store+0x23b/0x300
> > [  344.214084]        kernfs_fop_write_iter+0x357/0x530
> > [  344.219051]        vfs_write+0x9bc/0xf60
> > [  344.222976]        ksys_write+0xf3/0x1d0
> > [  344.226902]        do_syscall_64+0x8c/0x3d0
> > [  344.231088]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > [  344.236660]
>
> Thanks for the report!
>
> I see that the above warning is different from the one addressed by the
> current patchset. In the warning you've reported, the kyber elevator
> allocates per-CPU data after acquiring ->elevator_lock, which introduces
> a per-CPU lock dependency on the ->elevator_lock.
>
> In contrast, the current patchset addresses a separate issue [1] that arises
> due to elevator tag allocation. This allocation occurs after both ->freeze_lock
> and ->elevator_lock are held. Internally, elevator tags allocation sets up
> per-CPU sbitmap->alloc_hint, which also introduces a similar per-CPU lock
> dependency on ->elevator_lock.
>
> That said, I'll plan to address the issue you've just reported in a separate
> patch, once the current patchset is merged.

OK, the issue[1] was easy to reproduce by blktests block/005 on my
environment, that's why I re-run the test on this patchset, and now
this one new WARNING triggered, anyway, thanks for the info and will
test your new patch later, thanks.

>
> Thanks,
> --Nilay
>
> [1]https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@xxxxxxxxxxxxx/
>
>
>

-- 
Best Regards,
  Yi Zhang