On Tue, Apr 22, 2025 at 04:36:37PM +0530, Nilay Shroff wrote: > > > On 4/22/25 4:23 PM, Ming Lei wrote: > > On Sat, Apr 19, 2025 at 07:25:31PM +0530, Nilay Shroff wrote: > >> > >> > >> On 4/18/25 10:06 PM, Ming Lei wrote: > >>> Move elv_register[unregister]_queue out of ->elevator_lock & queue freezing, > >>> so we can kill many lockdep warnings. > >>> > >>> elv_register[unregister]_queue() is serialized, and just dealing with sysfs/ > >>> debugfs things, no need to be done with queue frozen. > >>> > >>> With this change, elevator's ->exit() is called before calling > >>> elv_unregister_queue, then user may call into ->show()/store() of elevator's > >>> sysfs attributes, and we have covered this issue by adding `ELEVATOR_FLAG_DYNG`. > >>> > >>> For blk-mq debugfs, hctx->sched_tags is always checked with ->elevator_lock by > >>> debugfs code, meantime hctx->sched_tags is updated with ->elevator_lock, so > >>> there isn't such issue. > >>> > >>> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > >>> --- > >>> block/blk-mq.c | 9 ++++---- > >>> block/blk.h | 1 + > >>> block/elevator.c | 58 ++++++++++++++++++++++++++++++++++-------------- > >>> block/elevator.h | 5 +++++ > >>> 4 files changed, 52 insertions(+), 21 deletions(-) > >>> > >> > >> [...] > >> > >>> +static void elevator_exit(struct request_queue *q) > >>> +{ > >>> + __elevator_exit(q); > >>> + kobject_put(&q->elevator->kobj); > >>> } > >> > >> [...] > >>> +int elevator_change_done(struct request_queue *q, struct elv_change_ctx *ctx) > >>> +{ > >>> + int ret = 0; > >>> + > >>> + if (ctx->old) { > >>> + elv_unregister_queue(q, ctx->old); > >>> + kobject_put(&ctx->old->kobj); > >>> + } > >>> + if (ctx->new) { > >>> + ret = elv_register_queue(q, ctx->new, ctx->uevent); > >>> + if (ret) { > >>> + unsigned memflags = blk_mq_freeze_queue(q); > >>> + > >>> + mutex_lock(&q->elevator_lock); > >>> + elevator_exit(q); > >>> + mutex_unlock(&q->elevator_lock); > >>> + blk_mq_unfreeze_queue(q, memflags); > >>> + } > >>> + } > >>> + return 0; > >>> +} > >>> + > >> It seems that we're still leaking ->elevator_lock in sysfs/kernfs with > >> the above elevator_exit call. I think we probably want to move out > >> kobject_put(&q->elevator->kobj) from elevator_exit and invoke it > >> after we release ->elevator_lock in elevator_change_done. > > > > q->elevator_lock is owned by request queue instead of elevator queue, so it > > shouldn't be one issue, or can you explain in details? > > > I meant that one of the objective of this patchset is to ensure that > we don't grab any block/queue lock and enter the kernefs/sysfs/debugfs > code. However here in elevator_change_done() we invoke elevator_exit() > after grabbing q->elevator_lock. Now if we look at the definition of > elevator_exit() then it appears that we invoke kobject_put(&ctx->old->kobj) > from it. So that means that while holding q->elevator_lock we enter in > kernfs/sysfs code. > > So my point was to move kobject_put() out of the elevator_exit() and > call it after we release q->elevator_lock, maybe something like shown > below: > > elevator_change_done() > { > ... > ... > if (ctx->new) { > ret = elv_register_queue(q, ctx->new, ctx->uevent); > if (ret) { > unsigned memflags = blk_mq_freeze_queue(q); > > mutex_lock(&q->elevator_lock); > __elevator_exit(q); > mutex_unlock(&q->elevator_lock); > > kobject_put(&q->elevator->kobj); This way is correct, but not necessary because elevator_release() contributes nothing to the current lock problem. However, this way aligns with normal code path wrt. elevator_lock & freezing queue, and can save elevator_exit(), and will take it in V3. Thanks, Ming