Hi Nilay, On Tue, May 20, 2025 at 04:03:49PM +0530, Nilay Shroff wrote: > Recent lockdep reports [1] have indicated a potential deadlock arising > from the dependency between the percpu allocator lock and the elevaor > lock. This issue can be mitigated by ensuring that elevator/sched tags > allocation and release occur outside the eleavtor lock. Moreover, we also > don't require queue remains frozen while we allocate/release sched tags. > So this patch addresses this problem by moving the allocation and de- > allocation of elevator sched tags outside the elevator switch path that > is protected by the ->freeze_lock and ->elevator_lock. Specifically, new > elevator sched tags are now allocated before switching the elevator and > outside the freeze section and elevator lock section. The old elevator's > sched tags are then freed after the elevator lock is released and queue > is unfrozen. > > To support this, the elv_change_ctx structure is extended to hold relevant > data needed for allocation and deferred release of sched tags during > elevator switching. With these changes, all sched tag allocations and > releases are performed outside both ->freeze_lock and ->elevator_lock, > preventing the lock ordering issue when elv_iosched_store is triggered > via sysfs. Not dig into this implementation, will look into later. I guess it should work by extending elv_change_ctx. However we have other elevator_queue lifetime issue, that is why ->elevator_lock is used almost everywhere. Another solution is to move all `sched_data` into 'struct elevator_queue': I feel it may be simpler in concept: - sched data and elevator queue share same lifetime - kobject_put(&eq->kobj) is already called without holding ->elevator_lock & queue isn't freezed - replace unnecessary ->elevator_lock by blk_get_elevator()/blk_put_elevator() But it needs some cleanup/refactor on scheduler interface. What do you think about the above way? Thanks, Ming