Re: client looping in state recovery after netns teardown

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Mon, 24 Mar 2025 22:49:22 +0000

On Mon, 2025-03-24 at 17:52 -0400, Jeff Layton wrote:
> Hi Trond,
> 
> We hit a new failure mode with the containerized NFS client teardown
> patches.
> 
> The symptoms are that we see the state recovery thread stuck, and
> then
> we see a flusher thread stuck like this:
> 
> #0  context_switch (kernel/sched/core.c:5434:2)
> #1  __schedule (kernel/sched/core.c:6799:8)
> #2  __schedule_loop (kernel/sched/core.c:6874:3)
> #3  schedule (kernel/sched/core.c:6889:2)
> #4  nfs_wait_bit_killable (fs/nfs/inode.c:77:2)
> #5  __wait_on_bit (kernel/sched/wait_bit.c:49:10)
> #6  out_of_line_wait_on_bit (kernel/sched/wait_bit.c:64:9)
> #7  wait_on_bit_action (./include/linux/wait_bit.h:156:9)
> #8  nfs4_wait_clnt_recover (fs/nfs/nfs4state.c:1329:8)
> #9  nfs4_client_recover_expired_lease (fs/nfs/nfs4state.c:1347:9)
> #10 pnfs_update_layout (fs/nfs/pnfs.c:2133:17)
> #11 ff_layout_pg_get_mirror_count_write
> (fs/nfs/flexfilelayout/flexfilelayout.c:949:4)
> #12 nfs_pageio_setup_mirroring (fs/nfs/pagelist.c:1114:18)
> #13 nfs_pageio_add_request (fs/nfs/pagelist.c:1406:2)
> #14 nfs_page_async_flush (fs/nfs/write.c:631:7)
> #15 nfs_do_writepage (fs/nfs/write.c:657:9)
> #16 nfs_writepages_callback (fs/nfs/write.c:684:8)
> #17 write_cache_pages (mm/page-writeback.c:2569:11)
> #18 nfs_writepages (fs/nfs/write.c:723:9)
> #19 do_writepages (mm/page-writeback.c:2612:10)
> #20 __writeback_single_inode (fs/fs-writeback.c:1650:8)
> #21 writeback_sb_inodes (fs/fs-writeback.c:1942:3)
> #22 __writeback_inodes_wb (fs/fs-writeback.c:2013:12)
> #23 wb_writeback (fs/fs-writeback.c:2120:15)
> #24 wb_check_old_data_flush (fs/fs-writeback.c:2224:10)
> #25 wb_do_writeback (fs/fs-writeback.c:2277:11)
> #26 wb_workfn (fs/fs-writeback.c:2305:20)
> #27 process_one_work (kernel/workqueue.c:3267:2)
> #28 process_scheduled_works (kernel/workqueue.c:3348:3)
> #29 worker_thread (kernel/workqueue.c:3429:4)
> #30 kthread (kernel/kthread.c:388:9)
> #31 ret_from_fork (arch/x86/kernel/process.c:147:3)
> #32 ret_from_fork_asm+0x11/0x16 (arch/x86/entry/entry_64.S:244)
> 
> Turning up some tracepoints, I see this:
> 
>  kworker/u1028:1-997456  [159] .....  9027.330396:
> rpc_task_run_action: task:0000174b@00000000
> flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO
> runstate=RUNNING|ACTIVE status=0 action=call_reserve [sunrpc]
>  kworker/u1028:1-997456  [159] .....  9027.330397: xprt_reserve:
> task:0000174b@00000000 xid=0x6bfc099c
>  kworker/u1028:1-997456  [159] .....  9027.330397:
> rpc_task_run_action: task:0000174b@00000000
> flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEO
> UT|NORTO runstate=RUNNING|ACTIVE status=0 action=call_reserveresult
> [sunrpc]
>  kworker/u1028:1-997456  [159] .....  9027.330397:
> rpc_task_run_action: task:0000174b@00000000
> flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEO
> UT|NORTO runstate=RUNNING|ACTIVE status=0 action=call_refresh
> [sunrpc]
>  kworker/u1028:1-997456  [159] .....  9027.330399:
> rpc_task_run_action: task:0000174b@00000000
> flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEO
> UT|NORTO runstate=RUNNING|ACTIVE status=0 action=call_refreshresult
> [sunrpc]
>  kworker/u1028:1-997456  [159] .....  9027.330399:
> rpc_task_run_action: task:0000174b@00000000
> flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEO
> UT|NORTO runstate=RUNNING|ACTIVE status=0 action=call_allocate
> [sunrpc]
>  kworker/u1028:1-997456  [159] .....  9027.330400: rpc_buf_alloc:
> task:0000174b@00000000 callsize=368 recvsize=80 status=0
>  kworker/u1028:1-997456  [159] .....  9027.330400:
> rpc_task_run_action: task:0000174b@00000000
> flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO
> runstate=RUNNING|ACTIVE status=0 action=call_encode [sunrpc]
>  kworker/u1028:1-997456  [159] .....  9027.330402:
> rpc_task_run_action: task:0000174b@00000000
> flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO
> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0
> action=call_connect [sunrpc]
>  kworker/u1028:1-997456  [159] .....  9027.330403: xprt_reserve_xprt:
> task:0000174b@00000000 snd_task:0000174b
>  kworker/u1028:1-997456  [159] .....  9027.330404: xprt_connect:
> peer=[2a03:83e4:4002:0:12a::2a]:20492 state=LOCKED|BOUND
>  kworker/u1028:1-997456  [159] .....  9027.330405: rpc_task_sleep:
> task:0000174b@00000000
> flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO
> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0 timeout=0
> queue=xprt_pending
>  kworker/u1028:1-997456  [159] .....  9027.330442:
> rpc_socket_connect: error=-22 socket:[15084763] srcaddr=[::]:993
> dstaddr=[2a03:83e4:4002:0:12a::2a]:0 state=1 (UNCONNECTED) sk_state=7
> (CLOSE)
>  kworker/u1028:1-997456  [159] .....  9027.330443: rpc_task_wakeup:
> task:0000174b@00000000
> flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO
> runstate=QUEUED|ACTIVE|NEED_XMIT|NEED_RECV status=-22 timeout=60000
> queue=xprt_pending
>  kworker/u1028:1-997456  [159] .....  9027.330444:
> xprt_disconnect_force: peer=[2a03:83e4:4002:0:12a::2a]:20492
> state=LOCKED|CONNECTING|BOUND|SND_IS_COOKIE
>  kworker/u1028:1-997456  [159] .....  9027.330444: xprt_release_xprt:
> task:ffffffff@ffffffff snd_task:ffffffff
>  kworker/u1028:1-997456  [159] .....  9027.330445:
> rpc_task_run_action: task:0000174b@00000000
> flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO
> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-22
> action=call_connect_status [sunrpc]
>  kworker/u1028:1-997456  [159] .....  9027.330446:
> rpc_connect_status: task:0000174b@00000000 status=-22
>  kworker/u1028:1-997456  [159] .....  9027.330446: rpc_call_rpcerror:
> task:0000174b@00000000 tk_status=-22 rpc_status=-22
>  kworker/u1028:1-997456  [159] .....  9027.330446:
> rpc_task_run_action: task:0000174b@00000000
> flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEO
> UT|NORTO runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-22
> action=rpc_exit_task [sunrpc]
>  kworker/u1028:1-997456  [159] .....  9027.330447: rpc_task_end:
> task:0000174b@00000000
> flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO
> runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-22
> action=rpc_exit_task [sunrpc]
> 
> There is no traffic on the wire. So, the problem seems to be in this
> situation that we're calling call_connect, and getting back -EINVAL
> from the network layer, which doesn't trigger the -ENETUNREACH
> handling.
> 
> Is this a bug from the network layer, or do we also need to handle -
> EINVAL in call_connect_status?
> 
> ...or do we need a different fix?
> 
> Thoughts?

We need a way to pass down to the client calling
nfs4_client_recover_expired_lease() that the client is terminally gone
away, and that it should stop looping.
I'm thinking that we probably need to mark the client as errored out by
using a call to nfs_mark_client_ready() to set clp->cl_cons_state in
error.

However I do find it odd that we're seeing an EINVAL error instead of
ENETUNREACH. It looks from your traces above as if that is coming from
the call to xs_tcp_finish_connecting(), which almost has to mean that
it is being returned by kernel_connect().
Hmm. Maybe it is this: "dstaddr=[2a03:83e4:4002:0:12a::2a]:0"? A zero
port value might explain it...

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx