On 9/9/25 12:08 PM, Yu Kuai wrote: > Hi, > > 在 2025/09/09 14:29, Nilay Shroff 写道: >> >> >> On 9/8/25 11:45 AM, Yu Kuai wrote: >>> From: Yu Kuai <yukuai3@xxxxxxxxxx> >>> >>> request_queue->nr_requests can be changed by: >>> >>> a) switching elevator by update nr_hw_queues >>> b) switching elevator by elevator sysfs attribute >>> c) configue queue sysfs attribute nr_requests >>> >>> Current lock order is: >>> >>> 1) update_nr_hwq_lock, case a,b >>> 2) freeze_queue >>> 3) elevator_lock, cas a,b,c >>> >>> And update nr_requests is seriablized by elevator_lock() already, >>> however, in the case c), we'll have to allocate new sched_tags if >>> nr_requests grow, and do this with elevator_lock held and queue >>> freezed has the risk of deadlock. >>> >>> Hence use update_nr_hwq_lock instead, make it possible to allocate >>> memory if tags grow, meanwhile also prevent nr_requests to be changed >>> concurrently. >>> >>> Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx> >>> --- >>> block/blk-sysfs.c | 12 +++++++++--- >>> 1 file changed, 9 insertions(+), 3 deletions(-) >>> >>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c >>> index f99519f7a820..7ea15bf68b4b 100644 >>> --- a/block/blk-sysfs.c >>> +++ b/block/blk-sysfs.c >>> @@ -68,13 +68,14 @@ queue_requests_store(struct gendisk *disk, const char *page, size_t count) >>> int ret, err; >>> unsigned int memflags; >>> struct request_queue *q = disk->queue; >>> + struct blk_mq_tag_set *set = q->tag_set; >>> ret = queue_var_store(&nr, page, count); >>> if (ret < 0) >>> return ret; >>> - memflags = blk_mq_freeze_queue(q); >>> - mutex_lock(&q->elevator_lock); >>> + /* serialize updating nr_requests with switching elevator */ >>> + down_write(&set->update_nr_hwq_lock); >>> >> For serializing update of nr_requests with switching elevator, >> we should use disable_elv_switch(). So with this change we >> don't need to acquire ->update_nr_hwq_lock in writer context >> while running blk_mq_update_nr_requests but instead it can run >> acquiring ->update_nr_hwq_lock in the reader context. >> >> So the code flow should be, >> >> disable_elv_switch => this would set QUEUE_FLAG_NO_ELV_SWITCH >> ... >> down_read ->update_nr_hwq_lock >> acquire ->freeze_lock >> acquire ->elevator_lock; >> ... >> ... >> release ->elevator_lock; >> release ->freeze_lock >> >> clear QUEUE_FLAG_NO_ELV_SWITCH >> up_read ->update_nr_hwq_lock >> > > Yes, this make sense, however, there is also an implied condition that > we should serialize queue_requests_store() with itself, what if a > concurrent caller succeed the disable_elv_switch() before the > down_read() in this way? > > t1: > disable_elv_switch > t2: > disable_elv_switch > > down_read down_read > I believe this is already protected with the kernfs internal mutex locks. So you shouldn't be able to run two sysfs store operations concurrently on the same attribute file. Thanks, --Nilay