Re: [PATCH V2 2/2] ublk: fix race between io_uring_cmd_complete_in_task and ublk_cancel_cmd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Apr 25, 2025 / 09:37, Ming Lei wrote:
> ublk_cancel_cmd() calls io_uring_cmd_done() to complete uring_cmd, but
> we may have scheduled task work via io_uring_cmd_complete_in_task() for
> dispatching request, then kernel crash can be triggered.
> 
> Fix it by not trying to canceling the command if ublk block request is
> started.

I found that the blktests test case ublk/002 often hangs with the recent
v6.15-rcX kernel tags with the INFO at iou-wrk-X [1]. The hang is recreated
in stable manner when I repeat the test case a few times.

I bisected and this patch as the commit f40139fde527 triggers the hang.
When I reverted the commit from the kernel v6.15-rc6, the hang disappeared.
(I repeated ublk/002 20 times, and observed no hang.)

The hang was observed with test systems with Fedora 42. I do not observe
the hang with Fedora 41, but not sure why. liburing version difference
could be the reason (v2.6 for Fedora 41, v2.9 for Fedora 42).

Actions for fix will be appreciated.

[1]

[ 4497.777695] [ T130863] run blktests ublk/002 at 2025-05-07 14:48:32
[ 4499.983130] [  T67084] blk_print_req_error: 58 callbacks suppressed
[ 4499.983136] [  T67084] I/O error, dev ublkb0, sector 106830432 op 0x0:(READ) flags 0x800 phys_seg 1 prio class 0
[ 4499.985230] [  T67084] I/O error, dev ublkb0, sector 165364248 op 0x0:(READ) flags 0x800 phys_seg 1 prio class 0
[ 4499.998194] [  T57173] I/O error, dev ublkb0, sector 251279584 op 0x0:(READ) flags 0x800 phys_seg 1 prio class 0
[ 4671.208372] [     T46] INFO: task iou-wrk-131061:131062 blocked for more than 122 seconds.
[ 4671.209417] [     T46]       Not tainted 6.15.0-rc5 #317
[ 4671.210182] [     T46] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4671.211194] [     T46] task:iou-wrk-131061  state:D stack:0     pid:131062 tgid:131061 ppid:131060 task_flags:0x404150 flags:0x00004000
[ 4671.212573] [     T46] Call Trace:
[ 4671.213364] [     T46]  <TASK>
[ 4671.214047] [     T46]  __schedule+0xf8c/0x5c80
[ 4671.214920] [     T46]  ? rcu_is_watching+0x11/0xb0
[ 4671.215831] [     T46]  ? __mutex_lock+0x2d9/0x1b50
[ 4671.216760] [     T46]  ? rcu_is_watching+0x11/0xb0
[ 4671.217718] [     T46]  ? lock_release+0x244/0x300
[ 4671.218621] [     T46]  ? lock_release+0x244/0x300
[ 4671.219521] [     T46]  ? __pfx___schedule+0x10/0x10
[ 4671.220433] [     T46]  ? do_raw_spin_lock+0x124/0x260
[ 4671.221425] [     T46]  ? rcu_is_watching+0x11/0xb0
[ 4671.222292] [     T46]  ? rcu_is_watching+0x11/0xb0
[ 4671.223121] [     T46]  ? lock_release+0x244/0x300
[ 4671.223984] [     T46]  ? lock_release+0x244/0x300
[ 4671.224844] [     T46]  schedule+0xda/0x390
[ 4671.225667] [     T46]  blk_mq_freeze_queue_wait+0x121/0x170
[ 4671.226653] [     T46]  ? __pfx_blk_mq_freeze_queue_wait+0x10/0x10
[ 4671.227656] [     T46]  ? __pfx_autoremove_wake_function+0x10/0x10
[ 4671.228656] [     T46]  ? trace_hardirqs_on+0x14/0x150
[ 4671.229564] [     T46]  ? _raw_spin_unlock_irq+0x34/0x50
[ 4671.230484] [     T46]  del_gendisk+0x4f9/0xa20
[ 4671.231320] [     T46]  ? ublk_stop_dev+0x24/0x200 [ublk_drv]
[ 4671.232379] [     T46]  ? __pfx_del_gendisk+0x10/0x10
[ 4671.233197] [     T46]  ? __pfx___mutex_lock+0x10/0x10
[ 4671.234094] [     T46]  ublk_stop_dev_unlocked.part.0+0xb6/0x580 [ublk_drv]
[ 4671.235153] [     T46]  ? __pfx_ublk_stop_dev_unlocked.part.0+0x10/0x10 [ublk_drv]
[ 4671.236303] [     T46]  ? kvm_sched_clock_read+0xd/0x20
[ 4671.237156] [     T46]  ublk_stop_dev+0x62/0x200 [ublk_drv]
[ 4671.238081] [     T46]  ? security_capable+0x79/0x110
[ 4671.238964] [     T46]  ublk_ctrl_uring_cmd+0xe87/0x2a70 [ublk_drv]
[ 4671.239967] [     T46]  ? __pfx_avc_has_perm+0x10/0x10
[ 4671.240858] [     T46]  ? lock_release+0x244/0x300
[ 4671.241759] [     T46]  ? rcu_is_watching+0x11/0xb0
[ 4671.242677] [     T46]  ? __pfx_ublk_ctrl_uring_cmd+0x10/0x10 [ublk_drv]
[ 4671.243751] [     T46]  ? finish_task_switch.isra.0+0x202/0x850
[ 4671.244728] [     T46]  ? selinux_uring_cmd+0x1c3/0x280
[ 4671.245623] [     T46]  ? __schedule+0x2f21/0x5c80
[ 4671.246504] [     T46]  ? __pfx_selinux_uring_cmd+0x10/0x10
[ 4671.247438] [     T46]  ? rcu_is_watching+0x11/0xb0
[ 4671.248228] [     T46]  ? __pfx_do_raw_spin_lock+0x10/0x10
[ 4671.249109] [     T46]  io_uring_cmd+0x221/0x3c0
[ 4671.249952] [     T46]  io_issue_sqe+0x6ab/0x1800
[ 4671.250767] [     T46]  ? trace_hardirqs_on+0x14/0x150
[ 4671.251634] [     T46]  ? __pfx_io_issue_sqe+0x10/0x10
[ 4671.252502] [     T46]  ? __timer_delete_sync+0x16c/0x270
[ 4671.253392] [     T46]  ? __pfx___timer_delete_sync+0x10/0x10
[ 4671.254402] [     T46]  ? rcu_is_watching+0x11/0xb0
[ 4671.255162] [     T46]  io_wq_submit_work+0x3ea/0xf70
[ 4671.255990] [     T46]  io_worker_handle_work+0x615/0x1280
[ 4671.256854] [     T46]  io_wq_worker+0x2ee/0xbd0
[ 4671.257670] [     T46]  ? __pfx_io_wq_worker+0x10/0x10
[ 4671.258491] [     T46]  ? rcu_is_watching+0x11/0xb0
[ 4671.259285] [     T46]  ? lock_acquire+0x2af/0x310
[ 4671.260035] [     T46]  ? rcu_is_watching+0x11/0xb0
[ 4671.260800] [     T46]  ? do_raw_spin_lock+0x124/0x260
[ 4671.261604] [     T46]  ? lock_acquire+0x2af/0x310
[ 4671.262376] [     T46]  ? __pfx_do_raw_spin_lock+0x10/0x10
[ 4671.263186] [     T46]  ? lock_release+0x244/0x300
[ 4671.263947] [     T46]  ? rcu_is_watching+0x11/0xb0
[ 4671.264718] [     T46]  ? __pfx_io_wq_worker+0x10/0x10
[ 4671.265529] [     T46]  ret_from_fork+0x30/0x70
[ 4671.266277] [     T46]  ? __pfx_io_wq_worker+0x10/0x10
[ 4671.267070] [     T46]  ret_from_fork_asm+0x1a/0x30
[ 4671.267855] [     T46]  </TASK>
[ 4671.268495] [     T46] INFO: lockdep is turned off.
[ 4794.088765] [     T46] INFO: task iou-wrk-131061:131062 blocked for more than 245 seconds.
[ 4794.090310] [     T46]       Not tainted 6.15.0-rc5 #317
[ 4794.091453] [     T46] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4794.093379] [     T46] task:iou-wrk-131061  state:D stack:0     pid:131062 tgid:131061 ppid:131060 task_flags:0x404150 flags:0x00004000
[ 4794.095353] [     T46] Call Trace:
[ 4794.096142] [     T46]  <TASK>
[ 4794.096929] [     T46]  __schedule+0xf8c/0x5c80
[ 4794.097878] [     T46]  ? rcu_is_watching+0x11/0xb0
[ 4794.098869] [     T46]  ? __mutex_lock+0x2d9/0x1b50
[ 4794.099866] [     T46]  ? rcu_is_watching+0x11/0xb0
[ 4794.100870] [     T46]  ? lock_release+0x244/0x300
[ 4794.101857] [     T46]  ? lock_release+0x244/0x300
[ 4794.102797] [     T46]  ? __pfx___schedule+0x10/0x10
[ 4794.103726] [     T46]  ? do_raw_spin_lock+0x124/0x260
[ 4794.104752] [     T46]  ? rcu_is_watching+0x11/0xb0
[ 4794.105744] [     T46]  ? rcu_is_watching+0x11/0xb0
[ 4794.106678] [     T46]  ? lock_release+0x244/0x300
[ 4794.107575] [     T46]  ? lock_release+0x244/0x300
[ 4794.108528] [     T46]  schedule+0xda/0x390
[ 4794.109391] [     T46]  blk_mq_freeze_queue_wait+0x121/0x170
[ 4794.110437] [     T46]  ? __pfx_blk_mq_freeze_queue_wait+0x10/0x10
[ 4794.111527] [     T46]  ? __pfx_autoremove_wake_function+0x10/0x10
[ 4794.112662] [     T46]  ? trace_hardirqs_on+0x14/0x150
[ 4794.113626] [     T46]  ? _raw_spin_unlock_irq+0x34/0x50
[ 4794.114644] [     T46]  del_gendisk+0x4f9/0xa20
[ 4794.115518] [     T46]  ? ublk_stop_dev+0x24/0x200 [ublk_drv]
[ 4794.116548] [     T46]  ? __pfx_del_gendisk+0x10/0x10
[ 4794.117536] [     T46]  ? __pfx___mutex_lock+0x10/0x10
[ 4794.118521] [     T46]  ublk_stop_dev_unlocked.part.0+0xb6/0x580 [ublk_drv]
[ 4794.119749] [     T46]  ? __pfx_ublk_stop_dev_unlocked.part.0+0x10/0x10 [ublk_drv]
[ 4794.121060] [     T46]  ? kvm_sched_clock_read+0xd/0x20
[ 4794.122081] [     T46]  ublk_stop_dev+0x62/0x200 [ublk_drv]
[ 4794.123102] [     T46]  ? security_capable+0x79/0x110
[ 4794.124139] [     T46]  ublk_ctrl_uring_cmd+0xe87/0x2a70 [ublk_drv]
[ 4794.125225] [     T46]  ? __pfx_avc_has_perm+0x10/0x10
[ 4794.126054] [     T46]  ? lock_release+0x244/0x300
[ 4794.126850] [     T46]  ? rcu_is_watching+0x11/0xb0
[ 4794.127579] [     T46]  ? __pfx_ublk_ctrl_uring_cmd+0x10/0x10 [ublk_drv]
[ 4794.128512] [     T46]  ? finish_task_switch.isra.0+0x202/0x850
[ 4794.129379] [     T46]  ? selinux_uring_cmd+0x1c3/0x280
[ 4794.130178] [     T46]  ? __schedule+0x2f21/0x5c80
[ 4794.130926] [     T46]  ? __pfx_selinux_uring_cmd+0x10/0x10
[ 4794.131701] [     T46]  ? rcu_is_watching+0x11/0xb0
[ 4794.132357] [     T46]  ? __pfx_do_raw_spin_lock+0x10/0x10
[ 4794.133159] [     T46]  io_uring_cmd+0x221/0x3c0
[ 4794.133931] [     T46]  io_issue_sqe+0x6ab/0x1800
[ 4794.134653] [     T46]  ? trace_hardirqs_on+0x14/0x150
[ 4794.135417] [     T46]  ? __pfx_io_issue_sqe+0x10/0x10
[ 4794.136191] [     T46]  ? __timer_delete_sync+0x16c/0x270
[ 4794.136984] [     T46]  ? __pfx___timer_delete_sync+0x10/0x10
[ 4794.137833] [     T46]  ? rcu_is_watching+0x11/0xb0
[ 4794.138527] [     T46]  io_wq_submit_work+0x3ea/0xf70
[ 4794.139268] [     T46]  io_worker_handle_work+0x615/0x1280
[ 4794.140062] [     T46]  io_wq_worker+0x2ee/0xbd0
[ 4794.140806] [     T46]  ? __pfx_io_wq_worker+0x10/0x10
[ 4794.141478] [     T46]  ? rcu_is_watching+0x11/0xb0
[ 4794.142189] [     T46]  ? lock_acquire+0x2af/0x310
[ 4794.142930] [     T46]  ? rcu_is_watching+0x11/0xb0
[ 4794.143663] [     T46]  ? do_raw_spin_lock+0x124/0x260
[ 4794.144363] [     T46]  ? lock_acquire+0x2af/0x310
[ 4794.145087] [     T46]  ? __pfx_do_raw_spin_lock+0x10/0x10
[ 4794.145897] [     T46]  ? lock_release+0x244/0x300
[ 4794.146617] [     T46]  ? rcu_is_watching+0x11/0xb0
[ 4794.147340] [     T46]  ? __pfx_io_wq_worker+0x10/0x10
[ 4794.148070] [     T46]  ret_from_fork+0x30/0x70
[ 4794.148817] [     T46]  ? __pfx_io_wq_worker+0x10/0x10
[ 4794.149576] [     T46]  ret_from_fork_asm+0x1a/0x30
[ 4794.150307] [     T46]  </TASK>
[ 4794.150959] [     T46] INFO: lockdep is turned off.
...




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux