On 4/18/25 10:06 PM, Ming Lei wrote: > Both adding/deleting disk code are reader of `nr_hw_queues`, so we can't > allow them in-progress when updating nr_hw_queues, kernel panic and > kasan has been reported in [1]. > > Prevent adding/deleting disk during updating nr_hw_queues by setting > set->updating_nr_hwq, and use SRCU to fail & retry to add/delete disk. > > This way avoids lot of trouble. > > Reported-by: Nilay Shroff <nilay@xxxxxxxxxxxxx> > Closes: https://lore.kernel.org/linux-block/a5896cdb-a59a-4a37-9f99-20522f5d2987@xxxxxxxxxxxxx/ > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > --- [...] [...] > > +static int retry_on_updating_nr_hwq(struct gendisk_data *data, > + int (*cb)(struct gendisk_data *data)) > +{ > + struct gendisk *disk = data->disk; > + struct blk_mq_tag_set *set; > + > + if (!queue_is_mq(disk->queue)) > + return cb(data); > + > + set = disk->queue->tag_set; > + do { > + int idx, ret; > + > + idx = srcu_read_lock(&set->update_nr_hwq_srcu); > + if (set->updating_nr_hwq) { > + srcu_read_unlock(&set->update_nr_hwq_srcu, idx); > + goto wait; > + } > + ret = cb(data); > + srcu_read_unlock(&set->update_nr_hwq_srcu, idx); > + return ret; > + wait: > + wait_event_interruptible(set->update_nr_hwq_wq, > + !set->updating_nr_hwq); > + } while (true); > +} > + Yes using srcu with wait event would fix this. However after looking through changes it seems to me that the current changes are now modeled as below: - Writer(blk_mq_update_nr_hw_queues) must wait until all readers are done (or prior SRCU read-side critical-section complete) - Readers (add/del disk) must wait until a writer(blk_mq_update_nr_hw_queues) finishes And IMO, this behavior is best modeled with a reader-writer lock, rather than (S)RCU. In my view, RCU is optimized for read-heavy workloads with: - Non-blocking readers - Deferred updates by writers - Writers don’t block readers and readers don’t block writers. So I believe the simpler choice shall be using reader-writer lock instead of srcu. With reader-writer lock we should be also able to get away with implementing wait queue for add/del disk. Moreover, the reader-writer lock shall also work well with the another reader elv_iosched_store. Thanks, --Nilay