Re: [PATCH V3 20/20] block: move wbt_enable_default() out of queue freezing from sched ->exit()

Nilay Shroff <nilay@xxxxxxxxxxxxx> · Tue, 29 Apr 2025 16:29:23 +0530

This is a multi-part message in MIME format.

On 4/24/25 8:51 PM, Ming Lei wrote:
> scheduler's ->exit() is called with queue frozen and elevator lock is held, and
> wbt_enable_default() can't be called with queue frozen, otherwise the
> following lockdep warning is triggered:
> 
> 	#6 (&q->rq_qos_mutex){+.+.}-{4:4}:
> 	#5 (&eq->sysfs_lock){+.+.}-{4:4}:
> 	#4 (&q->elevator_lock){+.+.}-{4:4}:
> 	#3 (&q->q_usage_counter(io)#3){++++}-{0:0}:
> 	#2 (fs_reclaim){+.+.}-{0:0}:
> 	#1 (&sb->s_type->i_mutex_key#3){+.+.}-{4:4}:
> 	#0 (&q->debugfs_mutex){+.+.}-{4:4}:
> 
> Fix the issue by moving wbt_enable_default() out of bfq's exit(), and
> call it from elevator_change_done().
> 
> Meantime add disk->rqos_state_mutex for covering wbt state change, which
> matches the purpose more than ->elevator_lock.
> 
> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>

While testing this patch on my machine using blktests, I stumbled upon
a lockdep splat shown below.(I could consistently recreate it):

run blktests block/005 at 2025-04-28 06:57:51

======================================================
WARNING: possible circular locking dependency detected
6.15.0-rc2+ #174 Not tainted
------------------------------------------------------
check/8088 is trying to acquire lock:
c0000000a0c03538 (&disk->rqos_state_mutex){+.+.}-{4:4}, at: wbt_disable_default+0x9c/0x118

but task is already holding lock:
c00000005b8f6c38 (&q->elevator_lock){+.+.}-{4:4}, at: elevator_change+0x94/0x214

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&q->elevator_lock){+.+.}-{4:4}:
       __mutex_lock+0x128/0xdd8
       elevator_change+0x94/0x214
       elv_iosched_store+0x14c/0x1f4
       queue_attr_store+0x194/0x1d0
       sysfs_kf_write+0xbc/0x110
       kernfs_fop_write_iter+0x264/0x384
       vfs_write+0x5b0/0x77c
       ksys_write+0xa0/0x180
       system_call_exception+0x1b0/0x4f0
       system_call_vectored_common+0x15c/0x2ec

-> #2 (&q->q_usage_counter(io)#23){++++}-{0:0}:
       blk_alloc_queue+0x46c/0x4bc
       blk_mq_alloc_queue+0xc0/0x160
       __blk_mq_alloc_disk+0x34/0x128
       nvme_alloc_ns+0x140/0x1804 [nvme_core]
       nvme_scan_ns+0x42c/0x564 [nvme_core]
       async_run_entry_fn+0x9c/0x30c
       process_one_work+0x514/0xd38
       worker_thread+0x390/0x6dc
       kthread+0x230/0x278
       start_kernel_thread+0x14/0x18

-> #1 (fs_reclaim){+.+.}-{0:0}:
       fs_reclaim_acquire+0x114/0x150
       __kmalloc_cache_noprof+0x70/0x5c0
       wbt_init+0x64/0x2fc
       wbt_enable_default+0x140/0x15c
       elevator_change_done+0x314/0x3a8
       elv_iosched_store+0x14c/0x1f4
       queue_attr_store+0x194/0x1d0
       sysfs_kf_write+0xbc/0x110
       kernfs_fop_write_iter+0x264/0x384
       vfs_write+0x5b0/0x77c
       ksys_write+0xa0/0x180
       system_call_exception+0x1b0/0x4f0
       system_call_vectored_common+0x15c/0x2ec

-> #0 (&disk->rqos_state_mutex){+.+.}-{4:4}:
       __lock_acquire+0x1b5c/0x29f8
       lock_acquire+0x23c/0x3f8
       __mutex_lock+0x128/0xdd8
       wbt_disable_default+0x9c/0x118
       bfq_init_queue+0x7b0/0x8c0
       blk_mq_init_sched+0x29c/0x3a8
       __elevator_change+0x3a4/0x8a4
       elevator_change+0x1a4/0x214
       elv_iosched_store+0x14c/0x1f4
       queue_attr_store+0x194/0x1d0
       sysfs_kf_write+0xbc/0x110
       kernfs_fop_write_iter+0x264/0x384
       vfs_write+0x5b0/0x77c
       ksys_write+0xa0/0x180
       system_call_exception+0x1b0/0x4f0
       system_call_vectored_common+0x15c/0x2ec

other info that might help us debug this:

Chain exists of:
  &disk->rqos_state_mutex --> &q->q_usage_counter(io)#23 --> &q->elevator_lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&q->elevator_lock);
                               lock(&q->q_usage_counter(io)#23);
                               lock(&q->elevator_lock);
  lock(&disk->rqos_state_mutex);

 *** DEADLOCK ***

7 locks held by check/8088:
 #0: c0000000873f2400 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0xa0/0x180
 #1: c00000008c10c088 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x1e0/0x384
 #2: c000000085239248 (kn->active#57){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x1f8/0x384
 #3: c0000000f801c190 (&set->update_nr_hwq_sema){.+.+}-{4:4}, at: elv_iosched_store+0x13c/0x1f4
 #4: c00000005b8f6718 (&q->q_usage_counter(io)#23){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x28/0x40
 #5: c00000005b8f6750 (&q->q_usage_counter(queue)#21){+.+.}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x28/0x40
 #6: c00000005b8f6c38 (&q->elevator_lock){+.+.}-{4:4}, at: elevator_change+0x94/0x214

stack backtrace:
CPU: 26 UID: 0 PID: 8088 Comm: check Kdump: loaded Not tainted 6.15.0-rc2+ #174 VOLUNTARY
Hardware name: IBM,9043-MRX POWER10 (architected) 0x800200 0xf000006 of:IBM,FW1060.00 (NM1060_028) hv:phyp pSeries
Call Trace:
[c0000000d7497240] [c0000000017b9888] dump_stack_lvl+0x100/0x184 (unreliable)
[c0000000d7497270] [c0000000002b546c] print_circular_bug+0x448/0x604
[c0000000d7497320] [c0000000002b5874] check_noncircular+0x24c/0x26c
[c0000000d74973f0] [c0000000002bbb78] __lock_acquire+0x1b5c/0x29f8
[c0000000d7497520] [c0000000002b915c] lock_acquire+0x23c/0x3f8
[c0000000d7497620] [c00000000181277c] __mutex_lock+0x128/0xdd8
[c0000000d7497780] [c000000000c73bf8] wbt_disable_default+0x9c/0x118
[c0000000d74977c0] [c000000000c4c2c0] bfq_init_queue+0x7b0/0x8c0
[c0000000d7497890] [c000000000bff634] blk_mq_init_sched+0x29c/0x3a8
[c0000000d7497910] [c000000000bc2a18] __elevator_change+0x3a4/0x8a4
[c0000000d74979b0] [c000000000bc30bc] elevator_change+0x1a4/0x214
[c0000000d7497a00] [c000000000bc427c] elv_iosched_store+0x14c/0x1f4
[c0000000d7497ae0] [c000000000bd07ec] queue_attr_store+0x194/0x1d0
[c0000000d7497c00] [c000000000a40f00] sysfs_kf_write+0xbc/0x110
[c0000000d7497c50] [c000000000a3cc4c] kernfs_fop_write_iter+0x264/0x384
[c0000000d7497cb0] [c0000000008bb9bc] vfs_write+0x5b0/0x77c
[c0000000d7497d90] [c0000000008bbf88] ksys_write+0xa0/0x180
[c0000000d7497df0] [c000000000039f70] system_call_exception+0x1b0/0x4f0
[c0000000d7497e50] [c00000000000cedc] system_call_vectored_common+0x15c/0x2ec
--- interrupt: 3000 at 0x7fffa413b034
NIP:  00007fffa413b034 LR: 00007fffa413b034 CTR: 0000000000000000
REGS: c0000000d7497e80 TRAP: 3000   Not tainted  (6.15.0-rc2+)
MSR:  800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 44422408  XER: 00000000
IRQMASK: 0
GPR00: 0000000000000004 00007ffffd011260 000000010dfa7e00 0000000000000001
GPR04: 000000011c30b720 0000000000000004 0000000000000010 0000000000000001
GPR08: 0000000000000003 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000000000 00007fffa43fab60 000000011c3adbc0 000000010dfa87b8
GPR16: 000000010dfa94d8 0000000020000000 0000000000000000 000000010deb9070
GPR20: 000000010df4beb8 00007ffffd011404 000000010df4f8a0 000000010dfa89bc
GPR24: 000000010dfa8a50 0000000000000000 000000011c30b720 0000000000000004
GPR28: 0000000000000004 00007fffa42418e0 000000011c30b720 0000000000000004
NIP [00007fffa413b034] 0x7fffa413b034
LR [00007fffa413b034] 0x7fffa413b034
--- interrupt: 3000