Re: [PATCHv2] block: restore two stage elevator switch while running nr_hw_queue update

Nilay Shroff <nilay@xxxxxxxxxxxxx> · Wed, 23 Jul 2025 11:54:33 +0530

On 7/23/25 6:07 AM, Yu Kuai wrote:
>>>>    static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
>>>>                                int nr_hw_queues)
>>>>    {
>>>> @@ -4973,6 +5029,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
>>>>        int prev_nr_hw_queues = set->nr_hw_queues;
>>>>        unsigned int memflags;
>>>>        int i;
>>>> +    struct xarray elv_tbl;
>>>
>>> Perhaps:
>>>
>>> DEFINE_XARRAY(elv_tbl)
>> It may not work because then we have to initialize it at compile time
>> at file scope. Then if we've multiple threads running nr_hw_queue update
>> (for different tagsets) then we can't use that shared copy of elv_tbl
>> as is and we've to protect it with another lock. So, IMO, instead creating
>> xarray locally here makes sense.
> 
> I'm confused, I don't add static and this should still be a stack value,
> I mean use this help to initialize it is simpler :)

Using DEFINE_XARRAY() to initialize a local(stack) variable is not safe because
the xarray structure embeds a spinlock (.xa_lock), and initializing spinlocks
in local scope can cause issues when lockdep is enabled.
Lockdep expects lock objects to be initialized at static file scope to correctly
track lock dependencies, specifically, when locks are initialized at compile time.
Please note that lockdep assigns unique keys to lock object created at compile time 
which it can use for analysis. This mechanism does not work properly with local
compile time initialization, and attempting to do so triggers warnings or errors
from lockdep.

Therefore, DEFINE_XARRAY() should only be used for static/global declarations. For
local usage, it's safer to use xa_init() or xa_init_flags() to initialize the xarray
dynamically at runtime.

>>> BTW, this is not related to this patch. Should we handle fall_back
>>> failure like blk_mq_sysfs_register_hctxs()?
>>>
>> OKay I guess you meant here handle failure case by unwinding the
>> queue instead of looping through it from start to end. If yes, then
>> it could be done but again we may not want to do it the bug fix patch.
>>
> 
> Not like that, actually I don't have any ideas for now, the hctxs is
> unregistered first, and if register failed, for example, due to -ENOMEM,
> I can't find a way to fallback :(
> 
If registering new hctxs fails, we fall back to the previous value of 
nr_hw_queues (prev_nr_hw_queues). When prev_nr_hw_queues is less than
the new nr_hw_queues, we do not reallocate memory for the existing hctxs—
instead, we reuse the memory that was already allocated.

Memory allocation is only attempted for the additional hctxs beyond
prev_nr_hw_queues. Therefore, if memory allocation for these new hctxs
fails, we can safely fall back to prev_nr_hw_queues because the memory
of the previously allocated hctxs remains intact.

Thanks,
--Nilay