On 6/23/25 11:40 AM, Christoph Hellwig wrote: > On Mon, Jun 16, 2025 at 11:02:26PM +0530, Nilay Shroff wrote: >> +static void blk_mq_init_sched_tags(struct request_queue *q, >> + struct blk_mq_hw_ctx *hctx, >> + unsigned int hctx_idx, >> + struct elevator_queue *eq) >> { >> if (blk_mq_is_shared_tags(q->tag_set->flags)) { >> hctx->sched_tags = q->sched_shared_tags; >> + return; >> } >> >> + hctx->sched_tags = eq->tags->u.tags[hctx_idx]; >> } > > Given how trivial this function now is, please inline it in the only > caller. That should also allow moving the blk_mq_is_shared_tags > shared out of the loop over all hw contexts, and merge it with the > check right next to it. > okay, will do it the next patchset. >> +static void blk_mq_init_sched_shared_tags(struct request_queue *queue, >> + struct elevator_queue *eq) >> { >> + queue->sched_shared_tags = eq->tags->u.shared_tags; >> blk_mq_tag_update_sched_shared_tags(queue); >> } > > This helper can also just be open coded in the caller now. Yes agreed... > >> + if (blk_mq_is_shared_tags(set->flags)) { >> + if (tags->u.shared_tags) { >> + blk_mq_free_rqs(set, tags->u.shared_tags, >> + BLK_MQ_NO_HCTX_IDX); >> + blk_mq_free_rq_map(tags->u.shared_tags); >> + } >> + goto out; >> + } >> + >> + if (!tags->u.tags) >> + goto out; >> + >> + for (i = 0; i < tags->nr_hw_queues; i++) { >> + if (tags->u.tags[i]) { >> + blk_mq_free_rqs(set, tags->u.tags[i], i); >> + blk_mq_free_rq_map(tags->u.tags[i]); >> + } >> + } > > Maybe restructucture this a bit: > > if (blk_mq_is_shared_tags(set->flags)) { > .. > } else if (tags->u.tags) { > } > > kfree(tags); > > to have a simpler flow and remove the need for the "goto out"? > Okay but as you know I am rewriting this logic using Xarray as Ming pointed in last email. So this code will be restructured and I will ensure that your above comment will be addressed in new logic. >> + tags = kcalloc(1, sizeof(struct elevator_tags), gfp); > > This can use plain kzalloc. > >> + if (blk_mq_is_shared_tags(set->flags)) { >> + >> + tags->u.shared_tags = blk_mq_alloc_map_and_rqs(set, > > The empty line above is a bit odd. Yes I would address this in next patchset. > >> + BLK_MQ_NO_HCTX_IDX, >> + MAX_SCHED_RQ); >> + if (!tags->u.shared_tags) >> + goto out; >> + >> + return tags; >> + } >> + >> + tags->u.tags = kcalloc(nr_hw_queues, sizeof(struct blk_mq_tags *), gfp); >> + if (!tags->u.tags) >> + goto out; >> + >> + tags->nr_hw_queues = nr_hw_queues; >> + for (i = 0; i < nr_hw_queues; i++) { >> + tags->u.tags[i] = blk_mq_alloc_map_and_rqs(set, i, >> + tags->nr_requests); >> + if (!tags->u.tags[i]) >> + goto out; >> + } >> + >> + return tags; >> + >> +out: >> + __blk_mq_free_sched_tags(set, tags); > > Is __blk_mq_free_sched_tags really the right thing here vs just unwinding > what this function did? > Hmm, __blk_mq_free_sched_tags actually releases everything which is allocated in this function (so there won't be any leak), however, it really _dose_not_ free resources in the reverse order. It just loops over starting from the first nr_hw_queue index and releases whatsoever allocated. But I think you're right, we could probably have more structured way of unwinding stuff here instead of calling __blk_mq_free_sched_tags. I will handle this in the next patch. >> + /* >> + * Accessing q->elevator without holding q->elevator_lock is safe >> + * because we're holding here set->update_nr_hwq_lock in the writer >> + * context. So, scheduler update/switch code (which acquires the same >> + * lock but in the reader context) can't run concurrently. >> + */ >> + list_for_each_entry(q, &set->tag_list, tag_set_list) { >> + if (q->elevator) >> + count++; >> + } > > Maybe add a helper for this code and the comment? Yeah but as I mentioned above with Xarray code being added, we may not need the above loop. So lets see, if needed I will add a macro and add proper comment in next patch. > >> - lockdep_assert_held(&q->tag_set->update_nr_hwq_lock); >> + lockdep_assert_held(&set->update_nr_hwq_lock); >> + >> + if (strncmp(ctx->name, "none", 4)) { > > This is a check for not having an elevator so far, right? Wouldn't > !q->elevator be the more obvious check for that? Or am I missing > something why that's not safe here? > This code runs in the context of an elevator switch, not as part of an nr_hw_queues update. Hence, at this point, q->elevator has not yet been updated to the new elevator we’re switching to, so accessing q->elevator here would be incorrect. Since we've already stored the name of the target elevator in ctx->name, we use that instead of referencing q->elevator here. >> diff --git a/block/elevator.h b/block/elevator.h >> index a4de5f9ad790..0b92121005cf 100644 >> --- a/block/elevator.h >> +++ b/block/elevator.h >> @@ -23,6 +23,17 @@ enum elv_merge { >> struct blk_mq_alloc_data; >> struct blk_mq_hw_ctx; >> >> +/* Holding context data for changing elevator */ >> +struct elv_change_ctx { >> + const char *name; >> + bool no_uevent; >> + >> + /* for unregistering old elevator */ >> + struct elevator_queue *old; >> + /* for registering new elevator */ >> + struct elevator_queue *new; >> +}; > > No need to move this, it is still only used in elevator.c. > Yes agreed. I'm moved it back to its original place in elevator.c >> extern struct elevator_queue *elevator_alloc(struct request_queue *, >> - struct elevator_type *); >> + struct elevator_type *, struct elevator_tags *); > > Drop the extern while you're at it. > Yeah sure. Thanks, --Nilay