On Sun, Jun 01, 2025 at 03:00:59PM +0800, Li Chen wrote: > From: Li Chen <chenl311@xxxxxxxxxxxxxxx> > > If `xfs_freeze -u` goes D-state (because of freeze-reclaim deadlock) > the test never finishes and the harness stalls. > Run thaw in background, wait 10 s, and when it’s still alive: > > * emit a warning plus the fixing commit > ab23a7768739 “xfs: per-cpu deferred inode inactivation queues” > * `umount -l` the scratch FS so the rest of xfstests can proceed > * skip any `wait` that would block on the hung tasks. > > Fixed kernels behave as before; broken ones no longer wedge the run. > > The hung task call trace would be as below: > [ 20.535519] Not tainted 5.14.0-rc4+ #27 > [ 20.537855] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 20.539420] task:738 state:D stack:14544 pid: 7124 ppid: 753 flags:0x00004002 > [ 20.540892] Call Trace: > [ 20.541424] __schedule+0x22d/0x6c0 > [ 20.542128] schedule+0x3f/0xa0 > [ 20.542751] percpu_rwsem_wait+0x100/0x130 > [ 20.543516] ? percpu_free_rwsem+0x30/0x30 > [ 20.544259] __percpu_down_read+0x44/0x50 > [ 20.545002] xfs_trans_alloc+0x19a/0x1f0 > [ 20.545747] xfs_free_eofblocks+0x47/0x100 > [ 20.546519] xfs_inode_mark_reclaimable+0x115/0x160 > [ 20.547398] destroy_inode+0x36/0x70 > [ 20.548077] prune_icache_sb+0x79/0xb0 > [ 20.548789] super_cache_scan+0x159/0x1e0 > [ 20.549536] shrink_slab.constprop.0+0x1b1/0x370 > [ 20.550363] drop_slab_node+0x1d/0x40 > [ 20.551041] drop_slab+0x30/0x70 > [ 20.551600] drop_caches_sysctl_handler+0x6b/0x80 > [ 20.552311] proc_sys_call_handler+0x12b/0x250 > [ 20.552931] new_sync_write+0x117/0x1b0 > [ 20.553462] vfs_write+0x1bd/0x250 > [ 20.553914] ksys_write+0x5a/0xd0 > [ 20.554381] do_syscall_64+0x3b/0x90 > [ 20.554854] entry_SYSCALL_64_after_hwframe+0x44/0xae > [ 20.555481] RIP: 0033:0x7f90928d3300 > [ 20.555946] RSP: 002b:00007ffc2b50b998 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 > [ 20.556853] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f90928d3300 > [ 20.557686] RDX: 0000000000000002 RSI: 000055a5d6c47750 RDI: 0000000000000001 > [ 20.558524] RBP: 000055a5d6c47750 R08: 0000000000000007 R09: 0000000000000073 > [ 20.559335] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002 > [ 20.560154] R13: 00007f90929ae760 R14: 0000000000000002 R15: 00007f90929a99e0 > > localhost login: [ 30.773559] INFO: task 738:7124 blocked for more than 20 seconds. > [ 30.775236] Not tainted 5.14.0-rc4+ #27 > [ 30.777449] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 30.779729] task:738 state:D stack:14544 pid: 7124 ppid: 753 flags:0x00004002 > [ 30.781267] Call Trace: > [ 30.781850] __schedule+0x22d/0x6c0 > [ 30.782618] schedule+0x3f/0xa0 > [ 30.783297] percpu_rwsem_wait+0x100/0x130 > [ 30.784110] ? percpu_free_rwsem+0x30/0x30 > [ 30.785085] __percpu_down_read+0x44/0x50 > [ 30.786071] xfs_trans_alloc+0x19a/0x1f0 > [ 30.786877] xfs_free_eofblocks+0x47/0x100 > [ 30.787727] xfs_inode_mark_reclaimable+0x115/0x160 > [ 30.788708] destroy_inode+0x36/0x70 > [ 30.789395] prune_icache_sb+0x79/0xb0 > [ 30.790056] super_cache_scan+0x159/0x1e0 > [ 30.790712] shrink_slab.constprop.0+0x1b1/0x370 > [ 30.791381] drop_slab_node+0x1d/0x40 > [ 30.791924] drop_slab+0x30/0x70 > [ 30.792469] drop_caches_sysctl_handler+0x6b/0x80 > [ 30.793328] proc_sys_call_handler+0x12b/0x250 > [ 30.793948] new_sync_write+0x117/0x1b0 > [ 30.794471] vfs_write+0x1bd/0x250 > [ 30.794941] ksys_write+0x5a/0xd0 > [ 30.795414] do_syscall_64+0x3b/0x90 > [ 30.795928] entry_SYSCALL_64_after_hwframe+0x44/0xae > [ 30.796595] RIP: 0033:0x7f90928d3300 > [ 30.797090] RSP: 002b:00007ffc2b50b998 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 > [ 30.798033] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f90928d3300 > [ 30.798852] RDX: 0000000000000002 RSI: 000055a5d6c47750 RDI: 0000000000000001 > [ 30.799703] RBP: 000055a5d6c47750 R08: 0000000000000007 R09: 0000000000000073 > [ 30.800833] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002 > [ 30.801764] R13: 00007f90929ae760 R14: 0000000000000002 R15: 00007f90929a99e0 > [ 30.802628] INFO: task xfs_io:7130 blocked for more than 10 seconds. > [ 30.803421] Not tainted 5.14.0-rc4+ #27 > [ 30.803985] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 30.804979] task:xfs_io state:D stack:13712 pid: 7130 ppid: 7127 flags:0x00000002 > [ 30.806013] Call Trace: > [ 30.806399] __schedule+0x22d/0x6c0 > [ 30.806867] schedule+0x3f/0xa0 > [ 30.807334] rwsem_down_write_slowpath+0x1d8/0x510 > [ 30.808018] thaw_super+0xd/0x20 > [ 30.808748] __x64_sys_ioctl+0x5d/0xb0 > [ 30.809292] do_syscall_64+0x3b/0x90 > [ 30.809797] entry_SYSCALL_64_after_hwframe+0x44/0xae > [ 30.810454] RIP: 0033:0x7ff1b48c5d1b > [ 30.810943] RSP: 002b:00007fff0bf88ac0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 > [ 30.811874] RAX: ffffffffffffffda RBX: 000055b93ae5fc40 RCX: 00007ff1b48c5d1b > [ 30.812743] RDX: 00007fff0bf88b2c RSI: ffffffffc0045878 RDI: 0000000000000003 > [ 30.813583] RBP: 000055b93ae60fe0 R08: 0000000000000000 R09: 0000000000000000 > [ 30.814497] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 > [ 30.815413] R13: 000055b93a3a94e9 R14: 0000000000000000 R15: 000055b93ae61150 > --- > tests/generic/738 | 20 ++++++++++++++++++-- > 1 file changed, 18 insertions(+), 2 deletions(-) > > diff --git a/tests/generic/738 b/tests/generic/738 > index 6f1ea7f8..9a90eefa 100755 > --- a/tests/generic/738 > +++ b/tests/generic/738 > @@ -11,8 +11,24 @@ _begin_fstest auto quick freeze > > _cleanup() > { > - xfs_freeze -u $SCRATCH_MNT 2>/dev/null > - wait > + # Thaw may dead-lock on unfixed XFS kernels. Run it in background, > + # wait a tiny bit, then decide whether it is stuck. > + xfs_freeze -u $SCRATCH_MNT 2>/dev/null & > + _thaw_pid=$! > + > + sleep 8 > + > + if [ -e "/proc/$_thaw_pid" ]; then > + # still running → stuck in D-state > + if [ "$FSTYP" = "xfs" ]; then > + echo "generic/738: known XFS freeze-reclaim deadlock; " \ > + "fixed by kernel commit ab23a7768739 " \ > + '"xfs: per-cpu deferred inode inactivation queues"' \ If want to mark a known fix, you can add below line to this case: _fixed_by_kernel_commit ab23a7768739 \ "xfs: per-cpu deferred inode inactivation queues" But for this patch, I don't think we should do this for a bug. If it blocks your testing on someone downstream system, you can skip this test. CC xfs list if you need more review points for this xfs bug. Thanks, Zorro > + | tee -a "$seqres.full" > + fi > + umount -l "$SCRATCH_MNT" 2>/dev/null > + fi > + > cd / > rm -r -f $tmp.* > } > -- > 2.49.0 > >