Re: [PATCH V3 08/20] block: don't allow to switch elevator if updating nr_hw_queues is in-progress

Ming Lei <ming.lei@xxxxxxxxxx> · Sun, 27 Apr 2025 10:27:46 +0800

On Fri, Apr 25, 2025 at 06:18:33PM +0530, Nilay Shroff wrote:
> 
> 
> On 4/24/25 8:51 PM, Ming Lei wrote:
> > Elevator switch code is another `nr_hw_queue` reader in non-fast-IO code
> > path, so it can't be done if updating `nr_hw_queues` is in-progress.
> > 
> > Take same approach with not allowing add/del disk when updating
> > nr_hw_queues is in-progress, by grabbing read lock of
> > set->update_nr_hwq_sema.
> > 
> > Take the nested variant for avoiding the following false positive
> > splat[1], and this way is correct because:
> > 
> > - the read lock in elv_iosched_store() is not overlapped with the read lock
> > in adding/deleting disk:
> > 
> > - kobject attribute is only available after the kobject is added and
> > before it is deleted
> > 
> >   -> #4 (&q->q_usage_counter(queue){++++}-{0:0}:
> >   -> #3 (&q->limits_lock){+.+.}-{4:4}:
> >   -> #2 (&disk->open_mutex){+.+.}-{4:4}:
> >   -> #1 (&set->update_nr_hwq_lock){.+.+}-{4:4}:
> >   -> #0 (kn->active#103){++++}-{0:0}:
> > 
> > Link: https://lore.kernel.org/linux-block/aAWv3NPtNIKKvJZc@fedora/ [1]
> > Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@xxxxxxx>
> > Closes: https://lore.kernel.org/linux-block/mz4t4tlwiqjijw3zvqnjb7ovvvaegkqganegmmlc567tt5xj67@xal5ro544cnc/
> > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
> > ---
> >  block/elevator.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/block/elevator.c b/block/elevator.c
> > index 4400eb8fe54f..56da6ab7691a 100644
> > --- a/block/elevator.c
> > +++ b/block/elevator.c
> > @@ -723,6 +723,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
> >  	int ret;
> >  	unsigned int memflags;
> >  	struct request_queue *q = disk->queue;
> > +	struct blk_mq_tag_set *set = q->tag_set;
> >  
> >  	/*
> >  	 * If the attribute needs to load a module, do it before freezing the
> > @@ -734,6 +735,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
> >  
> >  	elv_iosched_load_module(name);
> >  
> > +	down_read_nested(&set->update_nr_hwq_sema, 1);
> 
> Why do we need to add nested read lock here? The lockdep splat[1] which
> you reported earlier is possibly due to the same reader lock being acquired
> recursively in elv_iosched_store and then elevator_change? 

The splat isn't related with the nested read lock.

If you replace down_read_nested() with down_read(), the same splat can be
triggered again when running `blktests block/001`.

>  
> On another note, if we suspect possible one-depth recursion for the same 
> class of lock then then we should use SINGLE_DEPTH_NESTING (instead of using
> 1 here) for subclass. But still I am not clear why this lock needs nesting.

It is just one false positive, because elv_iosched_store() won't happen
when adding disk.

Thanks,
Ming