Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 3/24/2025 9:55 PM, K Prateek Nayak wrote:
Hello Mateusz,

On 3/24/2025 9:33 PM, Mateusz Guzik wrote:
On Mon, Mar 24, 2025 at 3:52 PM K Prateek Nayak <kprateek.nayak@xxxxxxx> wrote:
So far, with tracing, this is where I'm:

o Mainline + Oleg's optimization reverted:

      ...
      kworker/43:1-1723    [043] .....   115.309065: p9_read_work: Data read wait 55
      kworker/43:1-1723    [043] .....   115.309066: p9_read_work: Data read 55
      kworker/43:1-1723    [043] .....   115.309067: p9_read_work: Data read wait 7
      kworker/43:1-1723    [043] .....   115.309068: p9_read_work: Data read 7
             repro-4138    [043] .....   115.309084: netfs_wake_write_collector: Wake collector
             repro-4138    [043] .....   115.309085: netfs_wake_write_collector: Queuing collector work
             repro-4138    [043] .....   115.309088: netfs_unbuffered_write: netfs_unbuffered_write
             repro-4138    [043] .....   115.309088: netfs_end_issue_write: netfs_end_issue_write
             repro-4138    [043] .....   115.309089: netfs_end_issue_write: Write collector need poke 0
             repro-4138    [043] .....   115.309091: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS!
   kworker/u1030:1-1951    [168] .....   115.309096: netfs_wake_write_collector: Wake collector
   kworker/u1030:1-1951    [168] .....   115.309097: netfs_wake_write_collector: Queuing collector work
   kworker/u1030:1-1951    [168] .....   115.309102: netfs_write_collection_worker: Write collect clearing and waking up!
      ... (syzbot reproducer continues)

o Mainline:

     kworker/185:1-1767    [185] .....   109.485961: p9_read_work: Data read wait 7
     kworker/185:1-1767    [185] .....   109.485962: p9_read_work: Data read 7
     kworker/185:1-1767    [185] .....   109.485962: p9_read_work: Data read wait 55
     kworker/185:1-1767    [185] .....   109.485963: p9_read_work: Data read 55
             repro-4038    [185] .....   114.225717: netfs_wake_write_collector: Wake collector
             repro-4038    [185] .....   114.225723: netfs_wake_write_collector: Queuing collector work
             repro-4038    [185] .....   114.225727: netfs_unbuffered_write: netfs_unbuffered_write
             repro-4038    [185] .....   114.225727: netfs_end_issue_write: netfs_end_issue_write
             repro-4038    [185] .....   114.225728: netfs_end_issue_write: Write collector need poke 0
             repro-4038    [185] .....   114.225728: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS!
     ... (syzbot reproducer hangs)

There is a third "kworker/u1030" component that never gets woken up for
reasons currently unknown to me with Oleg's optimization. I'll keep
digging.


Thanks for the update.

It is unclear to me if you checked, so I'm going to have to ask just
in case: when there is a hang, is there *anyone* stuck in pipe code
(and if so, where)?

You can get the kernel to print stacks for all threads with sysrq:
echo t > /proc/sysrq-trigger

This dumps an insane amount of stuff on my console. Let me search if there
is any reference to pipe somewhere in there.

Only pipe_read() and pipe_write() pairs I found were for:

[ 1043.618621] task:containerd      state:S stack:0     pid:3567  tgid:3475  ppid:1      task_flags:0x400040 flags:0x00000002
[ 1043.629673] Call Trace:
[ 1043.632133]  <TASK>
[ 1043.634252]  __schedule+0x436/0x1620
[ 1043.637832]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1043.642638]  ? syscall_exit_to_user_mode+0x51/0x1a0
[ 1043.647538]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1043.652347]  ? __smp_call_single_queue+0xc7/0x150
[ 1043.657072]  schedule+0x28/0x110
[ 1043.660320]  futex_wait_queue+0x69/0xa0
[ 1043.664176]  __futex_wait+0x13c/0x1b0
[ 1043.667863]  ? __pfx_futex_wake_mark+0x10/0x10
[ 1043.672325]  futex_wait+0x68/0x110
[ 1043.675747]  ? __x64_sys_futex+0x77/0x1d0
[ 1043.679775]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1043.684585]  ? pipe_write+0x42c/0x630                   <----------- here
[ 1043.688269]  ? copy_fpstate_to_sigframe+0x330/0x3d0
[ 1043.693167]  do_futex+0x13c/0x1d0
[ 1043.696501]  __x64_sys_futex+0x77/0x1d0
[ 1043.700359]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1043.705170]  ? vfs_write+0x376/0x420
[ 1043.708767]  do_syscall_64+0x6f/0x110
[ 1043.712449]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1043.717259]  ? ksys_write+0x90/0xe0
[ 1043.720768]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1043.725579]  ? syscall_exit_to_user_mode+0x51/0x1a0
[ 1043.730475]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1043.735288]  ? do_syscall_64+0x7b/0x110
[ 1043.739144]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1043.743953]  ? irqentry_exit_to_user_mode+0x2e/0x160
[ 1043.748935]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

and

[ 1048.980518] task:dockerd         state:S stack:0     pid:3591  tgid:3582  ppid:1      task_flags:0x400040 flags:0x00000002
[ 1048.991569] Call Trace:
[ 1048.994024]  <TASK>
[ 1048.996143]  __schedule+0x436/0x1620
[ 1048.999741]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.004554]  ? get_nohz_timer_target+0x2a/0x180
[ 1049.009101]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.013911]  ? timerqueue_add+0x6a/0xc0
[ 1049.017768]  schedule+0x28/0x110
[ 1049.021017]  schedule_hrtimeout_range_clock+0x78/0xd0
[ 1049.026088]  ? __pfx_hrtimer_wakeup+0x10/0x10
[ 1049.030470]  do_epoll_wait+0x666/0x7d0
[ 1049.034245]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.039053]  ? __pfx_ep_autoremove_wake_function+0x10/0x10
[ 1049.044556]  do_compat_epoll_pwait.part.0+0xc/0x70
[ 1049.049368]  __x64_sys_epoll_pwait+0x83/0x140
[ 1049.053746]  do_syscall_64+0x6f/0x110
[ 1049.057425]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.062235]  ? do_syscall_64+0x7b/0x110
[ 1049.066079]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.070887]  ? do_futex+0xc2/0x1d0
[ 1049.074309]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.079120]  ? __x64_sys_futex+0x77/0x1d0
[ 1049.083151]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.087958]  ? current_time+0x31/0x130
[ 1049.091731]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.096539]  ? inode_update_timestamps+0xc8/0x110
[ 1049.101264]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.106074]  ? generic_update_time+0x51/0x60
[ 1049.110365]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.115173]  ? touch_atime+0xb5/0x100
[ 1049.118857]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.123671]  ? pipe_read+0x3fe/0x480                   <----------- here
[ 1049.127265]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.132073]  ? ep_done_scan+0xab/0xf0
[ 1049.135757]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.140567]  ? do_epoll_wait+0xe1/0x7d0
[ 1049.144425]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.149234]  ? do_compat_epoll_pwait.part.0+0xc/0x70
[ 1049.154218]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.159027]  ? __x64_sys_epoll_pwait+0x83/0x140
[ 1049.163577]  ? do_syscall_64+0x7b/0x110
[ 1049.167433]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.172243]  ? syscall_exit_to_user_mode+0x51/0x1a0
[ 1049.177141]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.181951]  ? do_syscall_64+0x7b/0x110
[ 1049.185811]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.190619]  ? ksys_read+0x90/0xe0
[ 1049.194025]  ? do_syscall_64+0x7b/0x110
[ 1049.197877]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.202682]  ? syscall_exit_to_user_mode+0x51/0x1a0
[ 1049.207579]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.212391]  ? do_syscall_64+0x7b/0x110
[ 1049.216243]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.221053]  ? do_syscall_64+0x7b/0x110
[ 1049.224900]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 1049.229703]  ? do_syscall_64+0x7b/0x110
[ 1049.233561]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

--
Thanks and Regards,
Prateek





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux