On Mon, 2025-03-24 at 17:52 -0400, Jeff Layton wrote: > Hi Trond, > > We hit a new failure mode with the containerized NFS client teardown > patches. > > The symptoms are that we see the state recovery thread stuck, and > then > we see a flusher thread stuck like this: > > #0 context_switch (kernel/sched/core.c:5434:2) > #1 __schedule (kernel/sched/core.c:6799:8) > #2 __schedule_loop (kernel/sched/core.c:6874:3) > #3 schedule (kernel/sched/core.c:6889:2) > #4 nfs_wait_bit_killable (fs/nfs/inode.c:77:2) > #5 __wait_on_bit (kernel/sched/wait_bit.c:49:10) > #6 out_of_line_wait_on_bit (kernel/sched/wait_bit.c:64:9) > #7 wait_on_bit_action (./include/linux/wait_bit.h:156:9) > #8 nfs4_wait_clnt_recover (fs/nfs/nfs4state.c:1329:8) > #9 nfs4_client_recover_expired_lease (fs/nfs/nfs4state.c:1347:9) > #10 pnfs_update_layout (fs/nfs/pnfs.c:2133:17) > #11 ff_layout_pg_get_mirror_count_write > (fs/nfs/flexfilelayout/flexfilelayout.c:949:4) > #12 nfs_pageio_setup_mirroring (fs/nfs/pagelist.c:1114:18) > #13 nfs_pageio_add_request (fs/nfs/pagelist.c:1406:2) > #14 nfs_page_async_flush (fs/nfs/write.c:631:7) > #15 nfs_do_writepage (fs/nfs/write.c:657:9) > #16 nfs_writepages_callback (fs/nfs/write.c:684:8) > #17 write_cache_pages (mm/page-writeback.c:2569:11) > #18 nfs_writepages (fs/nfs/write.c:723:9) > #19 do_writepages (mm/page-writeback.c:2612:10) > #20 __writeback_single_inode (fs/fs-writeback.c:1650:8) > #21 writeback_sb_inodes (fs/fs-writeback.c:1942:3) > #22 __writeback_inodes_wb (fs/fs-writeback.c:2013:12) > #23 wb_writeback (fs/fs-writeback.c:2120:15) > #24 wb_check_old_data_flush (fs/fs-writeback.c:2224:10) > #25 wb_do_writeback (fs/fs-writeback.c:2277:11) > #26 wb_workfn (fs/fs-writeback.c:2305:20) > #27 process_one_work (kernel/workqueue.c:3267:2) > #28 process_scheduled_works (kernel/workqueue.c:3348:3) > #29 worker_thread (kernel/workqueue.c:3429:4) > #30 kthread (kernel/kthread.c:388:9) > #31 ret_from_fork (arch/x86/kernel/process.c:147:3) > #32 ret_from_fork_asm+0x11/0x16 (arch/x86/entry/entry_64.S:244) > > Turning up some tracepoints, I see this: > > kworker/u1028:1-997456 [159] ..... 9027.330396: > rpc_task_run_action: task:0000174b@00000000 > flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO > runstate=RUNNING|ACTIVE status=0 action=call_reserve [sunrpc] > kworker/u1028:1-997456 [159] ..... 9027.330397: xprt_reserve: > task:0000174b@00000000 xid=0x6bfc099c > kworker/u1028:1-997456 [159] ..... 9027.330397: > rpc_task_run_action: task:0000174b@00000000 > flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEO > UT|NORTO runstate=RUNNING|ACTIVE status=0 action=call_reserveresult > [sunrpc] > kworker/u1028:1-997456 [159] ..... 9027.330397: > rpc_task_run_action: task:0000174b@00000000 > flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEO > UT|NORTO runstate=RUNNING|ACTIVE status=0 action=call_refresh > [sunrpc] > kworker/u1028:1-997456 [159] ..... 9027.330399: > rpc_task_run_action: task:0000174b@00000000 > flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEO > UT|NORTO runstate=RUNNING|ACTIVE status=0 action=call_refreshresult > [sunrpc] > kworker/u1028:1-997456 [159] ..... 9027.330399: > rpc_task_run_action: task:0000174b@00000000 > flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEO > UT|NORTO runstate=RUNNING|ACTIVE status=0 action=call_allocate > [sunrpc] > kworker/u1028:1-997456 [159] ..... 9027.330400: rpc_buf_alloc: > task:0000174b@00000000 callsize=368 recvsize=80 status=0 > kworker/u1028:1-997456 [159] ..... 9027.330400: > rpc_task_run_action: task:0000174b@00000000 > flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO > runstate=RUNNING|ACTIVE status=0 action=call_encode [sunrpc] > kworker/u1028:1-997456 [159] ..... 9027.330402: > rpc_task_run_action: task:0000174b@00000000 > flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0 > action=call_connect [sunrpc] > kworker/u1028:1-997456 [159] ..... 9027.330403: xprt_reserve_xprt: > task:0000174b@00000000 snd_task:0000174b > kworker/u1028:1-997456 [159] ..... 9027.330404: xprt_connect: > peer=[2a03:83e4:4002:0:12a::2a]:20492 state=LOCKED|BOUND > kworker/u1028:1-997456 [159] ..... 9027.330405: rpc_task_sleep: > task:0000174b@00000000 > flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0 timeout=0 > queue=xprt_pending > kworker/u1028:1-997456 [159] ..... 9027.330442: > rpc_socket_connect: error=-22 socket:[15084763] srcaddr=[::]:993 > dstaddr=[2a03:83e4:4002:0:12a::2a]:0 state=1 (UNCONNECTED) sk_state=7 > (CLOSE) > kworker/u1028:1-997456 [159] ..... 9027.330443: rpc_task_wakeup: > task:0000174b@00000000 > flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO > runstate=QUEUED|ACTIVE|NEED_XMIT|NEED_RECV status=-22 timeout=60000 > queue=xprt_pending > kworker/u1028:1-997456 [159] ..... 9027.330444: > xprt_disconnect_force: peer=[2a03:83e4:4002:0:12a::2a]:20492 > state=LOCKED|CONNECTING|BOUND|SND_IS_COOKIE > kworker/u1028:1-997456 [159] ..... 9027.330444: xprt_release_xprt: > task:ffffffff@ffffffff snd_task:ffffffff > kworker/u1028:1-997456 [159] ..... 9027.330445: > rpc_task_run_action: task:0000174b@00000000 > flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-22 > action=call_connect_status [sunrpc] > kworker/u1028:1-997456 [159] ..... 9027.330446: > rpc_connect_status: task:0000174b@00000000 status=-22 > kworker/u1028:1-997456 [159] ..... 9027.330446: rpc_call_rpcerror: > task:0000174b@00000000 tk_status=-22 rpc_status=-22 > kworker/u1028:1-997456 [159] ..... 9027.330446: > rpc_task_run_action: task:0000174b@00000000 > flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEO > UT|NORTO runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-22 > action=rpc_exit_task [sunrpc] > kworker/u1028:1-997456 [159] ..... 9027.330447: rpc_task_end: > task:0000174b@00000000 > flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO > runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-22 > action=rpc_exit_task [sunrpc] > > There is no traffic on the wire. So, the problem seems to be in this > situation that we're calling call_connect, and getting back -EINVAL > from the network layer, which doesn't trigger the -ENETUNREACH > handling. > > Is this a bug from the network layer, or do we also need to handle - > EINVAL in call_connect_status? > > ...or do we need a different fix? > > Thoughts? We need a way to pass down to the client calling nfs4_client_recover_expired_lease() that the client is terminally gone away, and that it should stop looping. I'm thinking that we probably need to mark the client as errored out by using a call to nfs_mark_client_ready() to set clp->cl_cons_state in error. However I do find it odd that we're seeing an EINVAL error instead of ENETUNREACH. It looks from your traces above as if that is coming from the call to xs_tcp_finish_connecting(), which almost has to mean that it is being returned by kernel_connect(). Hmm. Maybe it is this: "dstaddr=[2a03:83e4:4002:0:12a::2a]:0"? A zero port value might explain it... -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx