Re: [PATCHv6 3/3] block: fix potential deadlock while running nr_hw_queue update

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 6/30/25 11:54 AM, Hannes Reinecke wrote:
> On 6/30/25 07:21, Nilay Shroff wrote:
>> Move scheduler tags (sched_tags) allocation and deallocation outside
>> both the ->elevator_lock and ->freeze_lock when updating nr_hw_queues.
>> This change breaks the dependency chain from the percpu allocator lock
>> to the elevator lock, helping to prevent potential deadlocks, as
>> observed in the reported lockdep splat[1].
>>
>> This commit introduces batch allocation and deallocation helpers for
>> sched_tags, which are now used from within __blk_mq_update_nr_hw_queues
>> routine while iterating through the tagset.
>>
>> With this change, all sched_tags memory management is handled entirely
>> outside the ->elevator_lock and the ->freeze_lock context, thereby
>> eliminating the lock dependency that could otherwise manifest during
>> nr_hw_queues updates.
>>
>> [1] https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@xxxxxxxxxxxxx/
>>
>> Reported-by: Stefan Haberland <sth@xxxxxxxxxxxxx>
>> Closes: https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@xxxxxxxxxxxxx/
>> Signed-off-by: Nilay Shroff <nilay@xxxxxxxxxxxxx>
>> ---
>>   block/blk-mq-sched.c | 63 ++++++++++++++++++++++++++++++++++++++++++++
>>   block/blk-mq-sched.h |  4 +++
>>   block/blk-mq.c       | 11 +++++++-
>>   block/blk.h          |  2 +-
>>   block/elevator.c     |  4 +--
>>   5 files changed, 80 insertions(+), 4 deletions(-)
>>
>> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
>> index 2d6d1ebdd8fb..da802df34a8c 100644
>> --- a/block/blk-mq-sched.c
>> +++ b/block/blk-mq-sched.c
>> @@ -427,6 +427,30 @@ void blk_mq_free_sched_tags(struct elevator_tags *et,
>>       kfree(et);
>>   }
>>   +void blk_mq_free_sched_tags_batch(struct xarray *et_table,
>> +        struct blk_mq_tag_set *set)
>> +{
>> +    struct request_queue *q;
>> +    struct elevator_tags *et;
>> +
>> +    lockdep_assert_held_write(&set->update_nr_hwq_lock);
>> +
>> +    list_for_each_entry(q, &set->tag_list, tag_set_list) {
>> +        /*
>> +         * Accessing q->elevator without holding q->elevator_lock is
>> +         * safe because we're holding here set->update_nr_hwq_lock in
>> +         * the writer context. So, scheduler update/switch code (which
>> +         * acquires the same lock but in the reader context) can't run
>> +         * concurrently.
>> +         */
>> +        if (q->elevator) {
>> +            et = xa_load(et_table, q->id);
>> +            if (et)
>> +                blk_mq_free_sched_tags(et, set);
>> +        }
>> +    }
>> +}
>> +
> Odd. Do we allow empty sched_tags?
> Why isn't this an error (or at least a warning)?
> 
We don't allow empty sched_tags and xa_load is not supposed to return an
empty sched_tags. It's just a paranoia guard that we evaluate whether @et
is non-NULL. As this function returns nothing we don't return any error 
code to the caller here. But yes, we may add a warning here if that seems
reasonable. I'd add a 'WARN_ON(!et)' and update it in the next patch.

Thanks,
--Nilay




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux