Re: [PATCH V2 16/20] block: move elv_register[unregister]_queue out of elevator_lock

Ming Lei <ming.lei@xxxxxxxxxx> · Tue, 22 Apr 2025 19:14:10 +0800

On Tue, Apr 22, 2025 at 04:36:37PM +0530, Nilay Shroff wrote:
> 
> 
> On 4/22/25 4:23 PM, Ming Lei wrote:
> > On Sat, Apr 19, 2025 at 07:25:31PM +0530, Nilay Shroff wrote:
> >>
> >>
> >> On 4/18/25 10:06 PM, Ming Lei wrote:
> >>> Move elv_register[unregister]_queue out of ->elevator_lock & queue freezing,
> >>> so we can kill many lockdep warnings.
> >>>
> >>> elv_register[unregister]_queue() is serialized, and just dealing with sysfs/
> >>> debugfs things, no need to be done with queue frozen.
> >>>
> >>> With this change, elevator's ->exit() is called before calling
> >>> elv_unregister_queue, then user may call into ->show()/store() of elevator's
> >>> sysfs attributes, and we have covered this issue by adding `ELEVATOR_FLAG_DYNG`.
> >>>
> >>> For blk-mq debugfs, hctx->sched_tags is always checked with ->elevator_lock by
> >>> debugfs code, meantime hctx->sched_tags is updated with ->elevator_lock, so
> >>> there isn't such issue.
> >>>
> >>> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
> >>> ---
> >>>  block/blk-mq.c   |  9 ++++----
> >>>  block/blk.h      |  1 +
> >>>  block/elevator.c | 58 ++++++++++++++++++++++++++++++++++--------------
> >>>  block/elevator.h |  5 +++++
> >>>  4 files changed, 52 insertions(+), 21 deletions(-)
> >>>
> >>
> >> [...]
> >>
> >>> +static void elevator_exit(struct request_queue *q)
> >>> +{
> >>> +	__elevator_exit(q);
> >>> +	kobject_put(&q->elevator->kobj);
> >>>  }
> >>
> >> [...]  
> >>> +int elevator_change_done(struct request_queue *q, struct elv_change_ctx *ctx)
> >>> +{
> >>> +	int ret = 0;
> >>> +
> >>> +	if (ctx->old) {
> >>> +		elv_unregister_queue(q, ctx->old);
> >>> +		kobject_put(&ctx->old->kobj);
> >>> +	}
> >>> +	if (ctx->new) {
> >>> +		ret = elv_register_queue(q, ctx->new, ctx->uevent);
> >>> +		if (ret) {
> >>> +			unsigned memflags = blk_mq_freeze_queue(q);
> >>> +
> >>> +			mutex_lock(&q->elevator_lock);
> >>> +			elevator_exit(q);
> >>> +			mutex_unlock(&q->elevator_lock);
> >>> +			blk_mq_unfreeze_queue(q, memflags);
> >>> +		}
> >>> +	}
> >>> +	return 0;
> >>> +}
> >>> +
> >> It seems that we're still leaking ->elevator_lock in sysfs/kernfs with
> >> the above elevator_exit call. I think we probably want to move out 
> >> kobject_put(&q->elevator->kobj) from elevator_exit and invoke it 
> >> after we release ->elevator_lock in elevator_change_done.
> > 
> > q->elevator_lock is owned by request queue instead of elevator queue, so it
> > shouldn't be one issue, or can you explain in details?
> > 
> I meant that one of the objective of this patchset is to ensure that
> we don't grab any block/queue lock and enter the kernefs/sysfs/debugfs 
> code. However here in elevator_change_done() we invoke elevator_exit()
> after grabbing q->elevator_lock. Now if we look at the definition of
> elevator_exit() then it appears that we invoke kobject_put(&ctx->old->kobj)
> from it. So that means that while holding q->elevator_lock we enter in 
> kernfs/sysfs code.
> 
> So my point was to move kobject_put() out of the elevator_exit() and 
> call it after we release q->elevator_lock, maybe something like shown
> below:
> 
> elevator_change_done()
> {
>         ...
>         ...
> 	if (ctx->new) {
> 		ret = elv_register_queue(q, ctx->new, ctx->uevent);
> 		if (ret) {
> 			unsigned memflags = blk_mq_freeze_queue(q);
> 
> 			mutex_lock(&q->elevator_lock);
> 			__elevator_exit(q);
> 			mutex_unlock(&q->elevator_lock);
> 
> 			kobject_put(&q->elevator->kobj);

This way is correct, but not necessary because elevator_release()
contributes nothing to the current lock problem.

However, this way aligns with normal code path wrt. elevator_lock &
freezing queue, and can save elevator_exit(), and will take it in V3.

Thanks,
Ming