On 8/5/25 2:58 PM, Yu Kuai wrote: > Hi, > > 在 2025/08/04 20:21, Nilay Shroff 写道: >> This patchset replaces the use of a static key in the I/O path (rq_qos_ >> xxx()) with an atomic queue flag (QUEUE_FLAG_QOS_ENABLED). This change >> is made to eliminate a potential deadlock introduced by the use of static >> keys in the blk-rq-qos infrastructure, as reported by lockdep during >> blktests block/005[1]. >> >> The original static key approach was introduced to avoid unnecessary >> dereferencing of q->rq_qos when no blk-rq-qos module (e.g., blk-wbt or >> blk-iolatency) is configured. While efficient, enabling a static key at >> runtime requires taking cpu_hotplug_lock and jump_label_mutex, which >> becomes problematic if the queue is already frozen — causing a reverse >> dependency on ->freeze_lock. This results in a lockdep splat indicating >> a potential deadlock. > > Take a look at the report, the static key is from: > > elevator_change_done > wbt_init > > And looks like the queue is not frozen in this context, am I missing > something? We freeze queue from rq_qos_add() before we increment the static key. > > However, wbt_init() from queue_wb_lat_store() is indeed under > blk_mq_freeze_queue(), I understand the deadlock here, however, I'm > still confused about the report. > If you're following the report then you should notice that the thread#0 is blocked on cpu_hotplug_lock after freezing the queue or in another words after it acquired ->freeze_lock (as mentioned above we do freeze queue in rq_qos_add() first and then increment the static key). Then thread#1 blocks on ->fs_reclaim (after it acquired the cpu_hotplug_lock). And the last thread#3 in this report, waits for the queue to be unfrozen (the queue has been frozen by thread #1). So this creates a cpu_hotplug_lock dependency on ->freeze_lock. Hope this helps clarify your doubt. > And for the deadlock, looks like blk-iocost and blk-iolatency, that > rq_qos_add is called from cgroupfs path, where queue is not freezed, We have following code paths (including blk-iocost) from where we invoke rq_qos_xxx() APIs with queue already frozen: ioc_qos_write() -> blkg_conf_open_bdev_frozen() => freezes queue -> blk_iocost_init() -> rq_qos_add() => again freezes queue -> static_branch_inc() => acquires cpu_hotplug_lock queue_wb_lat_store() => freezes queue -> wbt_init() -> rq_qos_add() => again freezes queue -> static_branch_inc => acquires cpu_hotplug_lock __del_gendisk() => freezes queue -> rq_qos_exit() -> static_branch_dec() => acquires cpu_hotplug_lock ioc_qos_write() -> blkg_conf_open_bdev_frozen() => freezes queue -> blk_iocost_init() -> rq_qos_del() We have to ideally decrement the static key in re_qos_del() but that was missed. So the second patch in the series handles this case, albeit using atomic bitops. > is it better to fix it from wbt, by calling rq_qos_add() first and > set rwb->enable_state to WBT_STATE_OFF_DEFAULT in wbt_init(), later > change this to WBT_STATE_ON_DEFAULT while queue is freezed. > Hmm, as shown above other than wbt_init, we do have multiple code paths from where we call rq_qos_xxx() APIs. So it's not the only wbt path which we need to handle. Thanks, --Nilay