Re: [PATCH V2 16/20] block: move elv_register[unregister]_queue out of elevator_lock

Nilay Shroff <nilay@xxxxxxxxxxxxx> · Tue, 22 Apr 2025 16:36:37 +0530



On 4/22/25 4:23 PM, Ming Lei wrote:
> On Sat, Apr 19, 2025 at 07:25:31PM +0530, Nilay Shroff wrote:
>>
>>
>> On 4/18/25 10:06 PM, Ming Lei wrote:
>>> Move elv_register[unregister]_queue out of ->elevator_lock & queue freezing,
>>> so we can kill many lockdep warnings.
>>>
>>> elv_register[unregister]_queue() is serialized, and just dealing with sysfs/
>>> debugfs things, no need to be done with queue frozen.
>>>
>>> With this change, elevator's ->exit() is called before calling
>>> elv_unregister_queue, then user may call into ->show()/store() of elevator's
>>> sysfs attributes, and we have covered this issue by adding `ELEVATOR_FLAG_DYNG`.
>>>
>>> For blk-mq debugfs, hctx->sched_tags is always checked with ->elevator_lock by
>>> debugfs code, meantime hctx->sched_tags is updated with ->elevator_lock, so
>>> there isn't such issue.
>>>
>>> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
>>> ---
>>>  block/blk-mq.c   |  9 ++++----
>>>  block/blk.h      |  1 +
>>>  block/elevator.c | 58 ++++++++++++++++++++++++++++++++++--------------
>>>  block/elevator.h |  5 +++++
>>>  4 files changed, 52 insertions(+), 21 deletions(-)
>>>
>>
>> [...]
>>
>>> +static void elevator_exit(struct request_queue *q)
>>> +{
>>> +	__elevator_exit(q);
>>> +	kobject_put(&q->elevator->kobj);
>>>  }
>>
>> [...]  
>>> +int elevator_change_done(struct request_queue *q, struct elv_change_ctx *ctx)
>>> +{
>>> +	int ret = 0;
>>> +
>>> +	if (ctx->old) {
>>> +		elv_unregister_queue(q, ctx->old);
>>> +		kobject_put(&ctx->old->kobj);
>>> +	}
>>> +	if (ctx->new) {
>>> +		ret = elv_register_queue(q, ctx->new, ctx->uevent);
>>> +		if (ret) {
>>> +			unsigned memflags = blk_mq_freeze_queue(q);
>>> +
>>> +			mutex_lock(&q->elevator_lock);
>>> +			elevator_exit(q);
>>> +			mutex_unlock(&q->elevator_lock);
>>> +			blk_mq_unfreeze_queue(q, memflags);
>>> +		}
>>> +	}
>>> +	return 0;
>>> +}
>>> +
>> It seems that we're still leaking ->elevator_lock in sysfs/kernfs with
>> the above elevator_exit call. I think we probably want to move out 
>> kobject_put(&q->elevator->kobj) from elevator_exit and invoke it 
>> after we release ->elevator_lock in elevator_change_done.
> 
> q->elevator_lock is owned by request queue instead of elevator queue, so it
> shouldn't be one issue, or can you explain in details?
> 
I meant that one of the objective of this patchset is to ensure that
we don't grab any block/queue lock and enter the kernefs/sysfs/debugfs 
code. However here in elevator_change_done() we invoke elevator_exit()
after grabbing q->elevator_lock. Now if we look at the definition of
elevator_exit() then it appears that we invoke kobject_put(&ctx->old->kobj)
from it. So that means that while holding q->elevator_lock we enter in 
kernfs/sysfs code.

So my point was to move kobject_put() out of the elevator_exit() and 
call it after we release q->elevator_lock, maybe something like shown
below:

elevator_change_done()
{
        ...
        ...
	if (ctx->new) {
		ret = elv_register_queue(q, ctx->new, ctx->uevent);
		if (ret) {
			unsigned memflags = blk_mq_freeze_queue(q);

			mutex_lock(&q->elevator_lock);
			__elevator_exit(q);
			mutex_unlock(&q->elevator_lock);

			kobject_put(&q->elevator->kobj);

			blk_mq_unfreeze_queue(q, memflags);
		}
	}    
}

Thanks,
--Nilay