On 6/9/25 7:43 AM, Ming Lei wrote: > On Thu, Jun 05, 2025 at 11:22:17AM +0530, Nilay Shroff wrote: >> >> >> On 6/3/25 2:07 PM, Ming Lei wrote: >>> On Wed, May 28, 2025 at 06:03:58PM +0530, Nilay Shroff wrote: >> >>>> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c >>>> index 0cb1e9873aab..51d406da4abf 100644 >>>> --- a/block/bfq-iosched.c >>>> +++ b/block/bfq-iosched.c >>>> @@ -7232,17 +7232,12 @@ static void bfq_init_root_group(struct bfq_group *root_group, >>>> root_group->sched_data.bfq_class_idle_last_service = jiffies; >>>> } >>>> >>>> -static int bfq_init_queue(struct request_queue *q, struct elevator_type *e) >>>> +static int bfq_init_queue(struct request_queue *q, struct elevator_queue *eq) >>>> { >>>> struct bfq_data *bfqd; >>>> - struct elevator_queue *eq; >>>> unsigned int i; >>>> struct blk_independent_access_ranges *ia_ranges = q->disk->ia_ranges; >>>> >>>> - eq = elevator_alloc(q, e); >>>> - if (!eq) >>>> - return -ENOMEM; >>>> - >>> >>> Please make the above elevator interface change be one standalone patch if possible, >>> which may help review much. >> >> Yeah you're right, in fact I considered to make elevator interface changes into >> a separate patch however the changes are so much tightly coupled with block layer >> change that I couldn't make it independent. As you may see here I replaced second >> argument of ->init_sched from "struct elevator_type *" to "struct elevator_queue *". >> And as you might have noticed we allocate the "struct elevator_queue *" very early >> while switching the elevator and that's passed down to this ->init_sched function >> through elv_change_ctx. So making elevator interface change a standalone patch >> is not possible. > > You can make a patch to move elevator_queue allocation into > blk_mq_init_sched() first, then pass it to all ->init_sched() > implementation first. And you can avoid to pass 'elevator_queue *' > from 'struct elv_change_ctx'. > Hmm okay, so we'd allocate elevator_queue, while we're holding ->elevator_lock and ->freeze_lock, from within blk_mq_init_sched(). As you mentioned it's safe because elevator_queue dynamic allocation uses a plain kmalloc, I'd change this in the next patch. >> >>>> >>>> +int blk_mq_alloc_elevator_and_sched_tags(struct request_queue *q, >>>> + struct elv_change_ctx *ctx) >>>> +{ >>>> + unsigned long i; >>>> + struct elevator_queue *elevator; >>>> + struct elevator_type *e; >>>> + struct blk_mq_tag_set *set = q->tag_set; >>>> + gfp_t gfp = GFP_NOIO | __GFP_NOWARN | __GFP_NORETRY; >>>> + >>>> + if (strncmp(ctx->new.name, "none", 4)) { >>> >>> You can return early if new name is "none", then all indent in the code >>> block can be avoided. >> Yeah agreed, will do it in the next patch. >> >>> >>>> + >>>> + e = elevator_find_get(ctx->new.name); >>>> + if (!e) >>>> + return -EINVAL; >>>> + >>>> + elevator = ctx->new.elevator = elevator_alloc(q, e); >>> >>> You needn't to allocate elevator queue here, then the big elv_change_ctx change >>> may be avoided by passing sched tags directly. >> >> Hmm, okay so you recommend keeping allocation of elevator queue in ->init_sched >> method as it's today? > > I meant you can move elevator queue allocation into blk_mq_init_sched(), then > you just need to pass allocated sched tag via `elv_change_ctx` or function > parameter. Yeah got it! Will do it in the next patch. > >> If so then yes it could be done but then as I see the elvator >> queue is being freed in elevator_change_done() after we release ->elevator_lock >> So I thought it's also quite possible in theory to allocate elevator queue outside >> of ->elevator_lock. Moreover, we shall not require holding ->elevator_lock or >> ->freeze_lock while allocating elevator queue, and thus probably avoid holding >> block layer locks while perfroming dynamic allocation, isn't it? >> >>> >>> Also it may be fragile to allocate elevator queue here without holding elevator >>> lock. You may have to document this way is safe by allocating elevator >>> queue with specified elevator type lockless. >>> >> Yes I'd document it. And looking through this code path it appears safe because >> we first get reference to the elevator type before assiging elevator type to >> the elevator queue in elevator_alloc(). Do you see any potential issue with >> this, if we were to allocate elevator queue lockless here? > > The correctness depends that our current implementation disables > updating nr_hw_queues by holding set->update_nr_hwq_lock, otherwise you > have to double check nr_hw_queues after grabbing elevator_lock, so this > point might need to document. > Yep! In fact you might have seen I commented about it in elv_update_nr_hw_queues() just before we call blk_mq_alloc_elevator_and_sched_tags(). But now anyways, as discussed above, we're moving elevator_queue allocation under blk_mq_init_sched(), we may not need to document this. >> >>>> diff --git a/block/blk-mq-sched.h b/block/blk-mq-sched.h >>>> index 1326526bb733..6c0f1936b81c 100644 >>>> --- a/block/blk-mq-sched.h >>>> +++ b/block/blk-mq-sched.h >>>> @@ -7,6 +7,26 @@ >>>> >>>> #define MAX_SCHED_RQ (16 * BLKDEV_DEFAULT_RQ) >>>> >>>> +/* Holding context data for changing elevator */ >>>> +struct elv_change_ctx { >>>> + /* for unregistering/freeing old elevator */ >>>> + struct { >>>> + struct elevator_queue *elevator; >>>> + } old; >>>> + >>>> + /* for registering/allocating new elevator */ >>>> + struct { >>>> + const char *name; >>>> + bool no_uevent; >>>> + unsigned long nr_requests; >>>> + struct elevator_queue *elevator; >>>> + int inited; >>>> + } new; >>>> + >>>> + /* elevator switch status */ >>>> + int status; >>>> +}; >>>> + >>> >>> As I mentioned, 'elv_change_ctx' becomes more complicated, which may be >>> avoided. >> Yes I agreed, however as I mentioned above if we choose to allocate >> elevator queue before we acquire ->elevator_lock or ->freeze_lock > > Allocating elevator queue is just plain kmalloc(), which can be done > under ->elevator_lock or ->freeze_lock, since we have called > memalloc_noio_save(). > Yes agreed! >> then we may need to keep this updated elv_change_ctx. So it depends >> on whether we choose to allocate elevator queue befor acquiring locks or >> keeping allocation under ->init_sched. >> >>> Queue is still frozen, so the dependency between freeze lock and >>> the global percpu allocation lock isn't cut completely. >>> >> Yes correct and so in the commit message I mentioned that this depedency is >> currenlty cut only while calling elevator switch from elv_iosched_store context >> through sysfs. We do need to still cut it in the conetxt of elevator switch >> being called from blk_mq_update_nr_hw_queues context or called while unregistering >> queue from del_gendisk. And this I thought should be handled in a seperate patch. >> Anyways, I would explicitly call this out in the commit message. > > OK, looks fine, I'd suggest to address them all in single patchset, because > it is same problem, and helps us to review with single context. > Alright, I'd make a patchset and handle all cases. Thanks, --Nilay