Re: [PATCH 0/2] block: blk-rq-qos: replace static key with atomic bitop

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jens,

On 8/6/25 7:14 AM, Yu Kuai wrote:
> Hi,
> 
> 在 2025/08/06 9:28, Jens Axboe 写道:
>> On 8/4/25 10:58 PM, Nilay Shroff wrote:
>>>
>>>
>>> On 8/4/25 7:12 PM, Ming Lei wrote:
>>>> On Mon, Aug 04, 2025 at 05:51:09PM +0530, Nilay Shroff wrote:
>>>>> This patchset replaces the use of a static key in the I/O path (rq_qos_
>>>>> xxx()) with an atomic queue flag (QUEUE_FLAG_QOS_ENABLED). This change
>>>>> is made to eliminate a potential deadlock introduced by the use of static
>>>>> keys in the blk-rq-qos infrastructure, as reported by lockdep during
>>>>> blktests block/005[1].
>>>>>
>>>>> The original static key approach was introduced to avoid unnecessary
>>>>> dereferencing of q->rq_qos when no blk-rq-qos module (e.g., blk-wbt or
>>>>> blk-iolatency) is configured. While efficient, enabling a static key at
>>>>> runtime requires taking cpu_hotplug_lock and jump_label_mutex, which
>>>>> becomes problematic if the queue is already frozen — causing a reverse
>>>>> dependency on ->freeze_lock. This results in a lockdep splat indicating
>>>>> a potential deadlock.
>>>>>
>>>>> To resolve this, we now gate q->rq_qos access with a q->queue_flags
>>>>> bitop (QUEUE_FLAG_QOS_ENABLED), avoiding the static key and the associated
>>>>> locking altogether.
>>>>>
>>>>> I compared both static key and atomic bitop implementations using ftrace
>>>>> function graph tracer over ~50 invocations of rq_qos_issue() while ensuring
>>>>> blk-wbt/blk-iolatency were disabled (i.e., no QoS functionality). For
>>>>> easy comparision, I made rq_qos_issue() noinline. The comparision was
>>>>> made on PowerPC machine.
>>>>>
>>>>> Static Key (disabled : QoS is not configured):
>>>>> 5d0: 00 00 00 60     nop    # patched in by static key framework (not taken)
>>>>> 5d4: 20 00 80 4e     blr    # return (branch to link register)
>>>>>
>>>>> Only a nop and blr (branch to link register) are executed — very lightweight.
>>>>>
>>>>> atomic bitop (QoS is not configured):
>>>>> 5d0: 20 00 23 e9     ld      r9,32(r3)     # load q->queue_flags
>>>>> 5d4: 00 80 29 71     andi.   r9,r9,32768   # check QUEUE_FLAG_QOS_ENABLED (bit 15)
>>>>> 5d8: 20 00 82 4d     beqlr                 # return if bit not set
>>>>>
>>>>> This performs an ld and and andi. before returning. Slightly more work,
>>>>> but q->queue_flags is typically hot in cache during I/O submission.
>>>>>
>>>>> With Static Key (disabled):
>>>>> Duration (us): min=0.668 max=0.816 avg≈0.750
>>>>>
>>>>> With atomic bitop QUEUE_FLAG_QOS_ENABLED (bit not set):
>>>>> Duration (us): min=0.684 max=0.834 avg≈0.759
>>>>>
>>>>> As expected, both versions are almost similar in cost. The added latency
>>>>> from an extra ld and andi. is in the range of ~9ns.
>>>>>
>>>>> There're two patches in the series. The first patch replaces static key
>>>>> with QUEUE_FLAG_QOS_ENABLED. The second patch ensures that we disable
>>>>> the QUEUE_FLAG_QOS_ENABLED when the queue no longer has any associated
>>>>> rq_qos policies.
>>>>>
>>>>> As usual, feedback and review comments are welcome!
>>>>>
>>>>> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>>>>
>>>>
>>>> Another approach is to call memalloc_noio_save() in cpu hotplug code...
>>>>
>>> Yes that would help fix this. However per the general usage of GFP_NOIO scope in
>>> kernel, it is used when we're performing memory allocations in a context where I/O
>>> must not be initiated, because doing so could cause deadlocks or recursion.
>>>
>>> So we typically, use GFP_NOIO in a code path that is already doing I/O, such as:
>>> - In block layer context: during request submission
>>> - Filesystem writeback, or swap-out.
>>> - Memory reclaim or writeback triggered by memory pressure.
>>>
>>> The cpu hotplug code may not be running in any of the above context. So
>>> IMO, adding memalloc_noio_save() in the cpu hotplug code would not be
>>> a good idea, isn't it?
>>
>> Please heed Ming's advice, moving this from a static key to an atomic
>> queue flags ops is pointless, may as well kill it at that point.
> 
> Nilay already tested and replied this is a dead end :(
> 
> I don't quite understand why it's pointless, if rq_qos is never enabled,
> an atmoic queue_flag is still minor optimization, isn't it?
> 
>>
>> I see v2 is out now with the exact same approach.
>>
As mentioned earlier, I tried Ming's original recommendation, but it didn’t
resolve the issue. In a separate thread, Ming agreed that using an atomic queue
flag is a reasonable approach and would avoid the lockdep problem while still
keeping a minor fast-path optimization.

That leaves us with two options:
- Use an atomic queue flag, or
- Remove the static key entirely.

So before I send v3, do you prefer the atomic queue flag approach, or would you rather
see the static key removed altogether? My preference is for the atomic queue flag, 
as it maintains a lightweight check without the static key’s locking concerns. 

Thanks,
--Nilay




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux