On Mon, Jun 30, 2025 at 10:51:55AM +0530, Nilay Shroff wrote: > Recent lockdep reports [1] have revealed a potential deadlock caused by a > lock dependency between the percpu allocator lock and the elevator lock. > This issue can be avoided by ensuring that the allocation and release of > scheduler tags (sched_tags) are performed outside the elevator lock. > Furthermore, the queue does not need to be remain frozen during these > operations. > > To address this, move all sched_tags allocations and deallocations outside > of both the ->elevator_lock and the ->freeze_lock. Since the lifetime of > the elevator queue and its associated sched_tags is closely tied, the > allocated sched_tags are now stored in the elevator queue structure. Then, > during the actual elevator switch (which runs under ->freeze_lock and > ->elevator_lock), the pre-allocated sched_tags are assigned to the > appropriate q->hctx. Once the elevator switch is complete and the locks > are released, the old elevator queue and its associated sched_tags are > freed. > > This commit specifically addresses the allocation/deallocation of sched_ > tags during elevator switching. Note that sched_tags may also be allocated > in other contexts, such as during nr_hw_queues updates. Supporting that > use case will require batch allocation/deallocation, which will be handled > in a follow-up patch. > > This restructuring ensures that sched_tags memory management occurs > entirely outside of the ->elevator_lock and ->freeze_lock context, > eliminating the lock dependency problem seen during scheduler updates. > > [1] https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@xxxxxxxxxxxxx/ > > Reported-by: Stefan Haberland <sth@xxxxxxxxxxxxx> > Closes: https://lore.kernel.org/all/0659ea8d-a463-47c8-9180-43c719e106eb@xxxxxxxxxxxxx/ > Signed-off-by: Nilay Shroff <nilay@xxxxxxxxxxxxx> Reviewed-by: Ming Lei <ming.lei@xxxxxxxxxx> Thanks, Ming