On Tue, Apr 22, 2025 at 02:57:01PM +0530, Nilay Shroff wrote: > > > On 4/22/25 1:43 PM, Ming Lei wrote: > > On Tue, Apr 22, 2025 at 11:44:59AM +0530, Nilay Shroff wrote: > >> > >> > >> On 4/21/25 12:57 PM, Ming Lei wrote: > >>> On Sat, Apr 19, 2025 at 08:09:04PM +0530, Nilay Shroff wrote: > >>>> > >>>> > >>>> On 4/18/25 10:07 PM, Ming Lei wrote: > >>>>> scheduler's ->exit() is called with queue frozen and elevator lock is held, and > >>>>> wbt_enable_default() can't be called with queue frozen, otherwise the > >>>>> following lockdep warning is triggered: > >>>>> > >>>>> #6 (&q->rq_qos_mutex){+.+.}-{4:4}: > >>>>> #5 (&eq->sysfs_lock){+.+.}-{4:4}: > >>>>> #4 (&q->elevator_lock){+.+.}-{4:4}: > >>>>> #3 (&q->q_usage_counter(io)#3){++++}-{0:0}: > >>>>> #2 (fs_reclaim){+.+.}-{0:0}: > >>>>> #1 (&sb->s_type->i_mutex_key#3){+.+.}-{4:4}: > >>>>> #0 (&q->debugfs_mutex){+.+.}-{4:4}: > >>>>> > >>>>> Fix the issue by moving wbt_enable_default() out of bfq's exit(), and > >>>>> call it from elevator_change_done(). > >>>>> > >>>>> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > >>>>> --- > >>>>> block/bfq-iosched.c | 2 +- > >>>>> block/elevator.c | 5 +++++ > >>>>> block/elevator.h | 1 + > >>>>> 3 files changed, 7 insertions(+), 1 deletion(-) > >>>>> > >>>>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c > >>>>> index 40e4106a71e7..310ce1d8c41e 100644 > >>>>> --- a/block/bfq-iosched.c > >>>>> +++ b/block/bfq-iosched.c > >>>>> @@ -7211,7 +7211,7 @@ static void bfq_exit_queue(struct elevator_queue *e) > >>>>> > >>>>> blk_stat_disable_accounting(bfqd->queue); > >>>>> blk_queue_flag_clear(QUEUE_FLAG_DISABLE_WBT, bfqd->queue); > >>>>> - wbt_enable_default(bfqd->queue->disk); > >>>>> + set_bit(ELEVATOR_FLAG_ENABLE_WBT_ON_EXIT, &e->flags); > >>>>> > >>>>> kfree(bfqd); > >>>>> } > >>>>> diff --git a/block/elevator.c b/block/elevator.c > >>>>> index 8652fe45a2db..378553fce5d8 100644 > >>>>> --- a/block/elevator.c > >>>>> +++ b/block/elevator.c > >>>>> @@ -687,8 +687,13 @@ int elevator_change_done(struct request_queue *q, struct elv_change_ctx *ctx) > >>>>> int ret = 0; > >>>>> > >>>>> if (ctx->old) { > >>>>> + bool enable_wbt = test_bit(ELEVATOR_FLAG_ENABLE_WBT_ON_EXIT, > >>>>> + &ctx->old->flags); > >>>>> + > >>>>> elv_unregister_queue(q, ctx->old); > >>>>> kobject_put(&ctx->old->kobj); > >>>>> + if (enable_wbt) > >>>>> + wbt_enable_default(q->disk); > >>>>> } > >>>>> if (ctx->new) { > >>>>> ret = elv_register_queue(q, ctx->new, ctx->uevent); > >>>>> diff --git a/block/elevator.h b/block/elevator.h > >>>>> index 486be0690499..b14c611c74b6 100644 > >>>>> --- a/block/elevator.h > >>>>> +++ b/block/elevator.h > >>>>> @@ -122,6 +122,7 @@ struct elevator_queue > >>>>> > >>>>> #define ELEVATOR_FLAG_REGISTERED 0 > >>>>> #define ELEVATOR_FLAG_DYING 1 > >>>>> +#define ELEVATOR_FLAG_ENABLE_WBT_ON_EXIT 2 > >>>>> > >>>>> /* Holding context data for changing elevator */ > >>>>> struct elv_change_ctx { > >>>> > >>>> It seems invoking wbt_enable_default from elevator_change_done could probably > >>>> still race with ioc_qos_write or queue_wb_lat_store. Both ioc_qos_write and > >>>> queue_wb_lat_store run with ->freeze_lock and ->elevator_lock protection. > >>> > >>> Actually wbt_enable_default() and wbt_init() needn't the above protection, > >>> especially since the patch 2/20 removes q->elevator use in > >>> wbt_enable_default(). > >>> > >> Yes agreed, and as I understand XXX_FLAG_DISABLE_WBT was earlier elevator_queue->flags > >> but now (with patch 2/20) it has been moved to request_queue->flags. As elevator_change_done > >> first puts elevator_queue object which would potentially releases/frees the elevator_queue > >> object. Next while we enable wbt (in elevator_change_done) we may not have access to the > >> elevator_queue object and so now we reference QUEUE_FLAG_DISABLE_WBT using request_queue->flags. > >> That's, I believe, the purpose of patch 2/20. > >> > >> However even with patch 2/20 change, both elevator_change_done and ioc_qos_write or > >> queue_wb_lat_store may run in parallel, isn't it? > >> > >> therad1: > >> blk_mq_update_nr_hw_queues > >> -> __blk_mq_update_nr_hw_queues > >> -> elevator_change_done > >> -> wbt_enable_default > >> -> wbt_init > >> -> wbt_update_limits > > > > Here wbt_update_limits() is called on un-attached `struct rq_wb` instance, > > which is just allocated from heap. > > > >> > >> therad2: > >> queue_wb_lat_store > >> -> wbt_set_min_lat > >> -> wbt_update_limits > > > > The above one is run on attached `struct rq_wb` instance. > > > > And there can only be one attached `struct rq_wb` instance, so the above > > race doesn't exist because attaching wbt to queue/disk is covered by `q->rq_qos_mutex`. > > > Yes you were correct, however, what if throttling is already enabled/attached to > the queue? In that case we'd race updating rq_wb->enable_state, no? For instance, > > thread1: > blk_mq_update_nr_hw_queues > -> elevator_change_done > -> wbt_enable_default ==> (updates ->enable_state) > > thread2: > queue_wb_lat_store > -> wbt_set_min_lat ==> (updates ->enable_state) > > thread3: > ioc_qos_write > -> wbt_disable_default ==> (updates ->enable_state) OK, that is one race, but should have been handled by one rqos dedicated lock instead of ->elevator_lock. Thanks, Ming