Re: [PATCH for-6.18/block 04/10] blk-mq: convert to serialize updating nr_requests with update_nr_hwq_lock

Nilay Shroff <nilay@xxxxxxxxxxxxx> · Tue, 9 Sep 2025 12:22:49 +0530



On 9/9/25 12:08 PM, Yu Kuai wrote:
> Hi,
> 
> 在 2025/09/09 14:29, Nilay Shroff 写道:
>>
>>
>> On 9/8/25 11:45 AM, Yu Kuai wrote:
>>> From: Yu Kuai <yukuai3@xxxxxxxxxx>
>>>
>>> request_queue->nr_requests can be changed by:
>>>
>>> a) switching elevator by update nr_hw_queues
>>> b) switching elevator by elevator sysfs attribute
>>> c) configue queue sysfs attribute nr_requests
>>>
>>> Current lock order is:
>>>
>>> 1) update_nr_hwq_lock, case a,b
>>> 2) freeze_queue
>>> 3) elevator_lock, cas a,b,c
>>>
>>> And update nr_requests is seriablized by elevator_lock() already,
>>> however, in the case c), we'll have to allocate new sched_tags if
>>> nr_requests grow, and do this with elevator_lock held and queue
>>> freezed has the risk of deadlock.
>>>
>>> Hence use update_nr_hwq_lock instead, make it possible to allocate
>>> memory if tags grow, meanwhile also prevent nr_requests to be changed
>>> concurrently.
>>>
>>> Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>
>>> ---
>>>   block/blk-sysfs.c | 12 +++++++++---
>>>   1 file changed, 9 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>>> index f99519f7a820..7ea15bf68b4b 100644
>>> --- a/block/blk-sysfs.c
>>> +++ b/block/blk-sysfs.c
>>> @@ -68,13 +68,14 @@ queue_requests_store(struct gendisk *disk, const char *page, size_t count)
>>>       int ret, err;
>>>       unsigned int memflags;
>>>       struct request_queue *q = disk->queue;
>>> +    struct blk_mq_tag_set *set = q->tag_set;
>>>         ret = queue_var_store(&nr, page, count);
>>>       if (ret < 0)
>>>           return ret;
>>>   -    memflags = blk_mq_freeze_queue(q);
>>> -    mutex_lock(&q->elevator_lock);
>>> +    /* serialize updating nr_requests with switching elevator */
>>> +    down_write(&set->update_nr_hwq_lock);
>>>   
>> For serializing update of nr_requests with switching elevator,
>> we should use disable_elv_switch(). So with this change we
>> don't need to acquire ->update_nr_hwq_lock in writer context
>> while running blk_mq_update_nr_requests but instead it can run
>> acquiring ->update_nr_hwq_lock in the reader context.
>>
>> So the code flow should be,
>>
>> disable_elv_switch  => this would set QUEUE_FLAG_NO_ELV_SWITCH
>> ...
>> down_read ->update_nr_hwq_lock
>> acquire ->freeze_lock
>> acquire ->elevator_lock;
>> ...
>> ...
>> release ->elevator_lock;
>> release ->freeze_lock
>>
>> clear QUEUE_FLAG_NO_ELV_SWITCH
>> up_read ->update_nr_hwq_lock
>>
> 
> Yes, this make sense, however, there is also an implied condition that
> we should serialize queue_requests_store() with itself, what if a
> concurrent caller succeed the disable_elv_switch() before the
> down_read() in this way?
> 
> t1:
> disable_elv_switch
>         t2:
>         disable_elv_switch
> 
> down_read    down_read
> 
I believe this is already protected with the kernfs internal
mutex locks. So you shouldn't be able to run two sysfs store 
operations concurrently on the same attribute file.

Thanks,
--Nilay