client looping in state recovery after netns teardown

Jeff Layton <jlayton@xxxxxxxxxx> · Mon, 24 Mar 2025 17:52:28 -0400

Hi Trond,

We hit a new failure mode with the containerized NFS client teardown
patches.

The symptoms are that we see the state recovery thread stuck, and then
we see a flusher thread stuck like this:

#0  context_switch (kernel/sched/core.c:5434:2)
#1  __schedule (kernel/sched/core.c:6799:8)
#2  __schedule_loop (kernel/sched/core.c:6874:3)
#3  schedule (kernel/sched/core.c:6889:2)
#4  nfs_wait_bit_killable (fs/nfs/inode.c:77:2)
#5  __wait_on_bit (kernel/sched/wait_bit.c:49:10)
#6  out_of_line_wait_on_bit (kernel/sched/wait_bit.c:64:9)
#7  wait_on_bit_action (./include/linux/wait_bit.h:156:9)
#8  nfs4_wait_clnt_recover (fs/nfs/nfs4state.c:1329:8)
#9  nfs4_client_recover_expired_lease (fs/nfs/nfs4state.c:1347:9)
#10 pnfs_update_layout (fs/nfs/pnfs.c:2133:17)
#11 ff_layout_pg_get_mirror_count_write
(fs/nfs/flexfilelayout/flexfilelayout.c:949:4)
#12 nfs_pageio_setup_mirroring (fs/nfs/pagelist.c:1114:18)
#13 nfs_pageio_add_request (fs/nfs/pagelist.c:1406:2)
#14 nfs_page_async_flush (fs/nfs/write.c:631:7)
#15 nfs_do_writepage (fs/nfs/write.c:657:9)
#16 nfs_writepages_callback (fs/nfs/write.c:684:8)
#17 write_cache_pages (mm/page-writeback.c:2569:11)
#18 nfs_writepages (fs/nfs/write.c:723:9)
#19 do_writepages (mm/page-writeback.c:2612:10)
#20 __writeback_single_inode (fs/fs-writeback.c:1650:8)
#21 writeback_sb_inodes (fs/fs-writeback.c:1942:3)
#22 __writeback_inodes_wb (fs/fs-writeback.c:2013:12)
#23 wb_writeback (fs/fs-writeback.c:2120:15)
#24 wb_check_old_data_flush (fs/fs-writeback.c:2224:10)
#25 wb_do_writeback (fs/fs-writeback.c:2277:11)
#26 wb_workfn (fs/fs-writeback.c:2305:20)
#27 process_one_work (kernel/workqueue.c:3267:2)
#28 process_scheduled_works (kernel/workqueue.c:3348:3)
#29 worker_thread (kernel/workqueue.c:3429:4)
#30 kthread (kernel/kthread.c:388:9)
#31 ret_from_fork (arch/x86/kernel/process.c:147:3)
#32 ret_from_fork_asm+0x11/0x16 (arch/x86/entry/entry_64.S:244)

Turning up some tracepoints, I see this:

 kworker/u1028:1-997456  [159] .....  9027.330396: rpc_task_run_action: task:0000174b@00000000 flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO runstate=RUNNING|ACTIVE status=0 action=call_reserve [sunrpc]
 kworker/u1028:1-997456  [159] .....  9027.330397: xprt_reserve: task:0000174b@00000000 xid=0x6bfc099c
 kworker/u1028:1-997456  [159] .....  9027.330397: rpc_task_run_action: task:0000174b@00000000 flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEO
UT|NORTO runstate=RUNNING|ACTIVE status=0 action=call_reserveresult [sunrpc]
 kworker/u1028:1-997456  [159] .....  9027.330397: rpc_task_run_action: task:0000174b@00000000 flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEO
UT|NORTO runstate=RUNNING|ACTIVE status=0 action=call_refresh [sunrpc]
 kworker/u1028:1-997456  [159] .....  9027.330399: rpc_task_run_action: task:0000174b@00000000 flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEO
UT|NORTO runstate=RUNNING|ACTIVE status=0 action=call_refreshresult [sunrpc]
 kworker/u1028:1-997456  [159] .....  9027.330399: rpc_task_run_action: task:0000174b@00000000 flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEO
UT|NORTO runstate=RUNNING|ACTIVE status=0 action=call_allocate [sunrpc]
 kworker/u1028:1-997456  [159] .....  9027.330400: rpc_buf_alloc: task:0000174b@00000000 callsize=368 recvsize=80 status=0
 kworker/u1028:1-997456  [159] .....  9027.330400: rpc_task_run_action: task:0000174b@00000000 flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO runstate=RUNNING|ACTIVE status=0 action=call_encode [sunrpc]
 kworker/u1028:1-997456  [159] .....  9027.330402: rpc_task_run_action: task:0000174b@00000000 flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0 action=call_connect [sunrpc]
 kworker/u1028:1-997456  [159] .....  9027.330403: xprt_reserve_xprt: task:0000174b@00000000 snd_task:0000174b
 kworker/u1028:1-997456  [159] .....  9027.330404: xprt_connect: peer=[2a03:83e4:4002:0:12a::2a]:20492 state=LOCKED|BOUND
 kworker/u1028:1-997456  [159] .....  9027.330405: rpc_task_sleep: task:0000174b@00000000 flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=0 timeout=0 queue=xprt_pending
 kworker/u1028:1-997456  [159] .....  9027.330442: rpc_socket_connect: error=-22 socket:[15084763] srcaddr=[::]:993 dstaddr=[2a03:83e4:4002:0:12a::2a]:0 state=1 (UNCONNECTED) sk_state=7 (CLOSE)
 kworker/u1028:1-997456  [159] .....  9027.330443: rpc_task_wakeup: task:0000174b@00000000 flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO runstate=QUEUED|ACTIVE|NEED_XMIT|NEED_RECV status=-22 timeout=60000 queue=xprt_pending
 kworker/u1028:1-997456  [159] .....  9027.330444: xprt_disconnect_force: peer=[2a03:83e4:4002:0:12a::2a]:20492 state=LOCKED|CONNECTING|BOUND|SND_IS_COOKIE
 kworker/u1028:1-997456  [159] .....  9027.330444: xprt_release_xprt: task:ffffffff@ffffffff snd_task:ffffffff
 kworker/u1028:1-997456  [159] .....  9027.330445: rpc_task_run_action: task:0000174b@00000000 flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-22 action=call_connect_status [sunrpc]
 kworker/u1028:1-997456  [159] .....  9027.330446: rpc_connect_status: task:0000174b@00000000 status=-22
 kworker/u1028:1-997456  [159] .....  9027.330446: rpc_call_rpcerror: task:0000174b@00000000 tk_status=-22 rpc_status=-22
 kworker/u1028:1-997456  [159] .....  9027.330446: rpc_task_run_action: task:0000174b@00000000 flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEO
UT|NORTO runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-22 action=rpc_exit_task [sunrpc]
 kworker/u1028:1-997456  [159] .....  9027.330447: rpc_task_end: task:0000174b@00000000 flags=ASYNC|MOVEABLE|NETUNREACH_FATAL|DYNAMIC|SOFT|TIMEOUT|NORTO runstate=RUNNING|ACTIVE|NEED_XMIT|NEED_RECV status=-22 action=rpc_exit_task [sunrpc]

There is no traffic on the wire. So, the problem seems to be in this
situation that we're calling call_connect, and getting back -EINVAL
from the network layer, which doesn't trigger the -ENETUNREACH
handling.

Is this a bug from the network layer, or do we also need to handle -
EINVAL in call_connect_status?

...or do we need a different fix?

Thoughts?
-- 
Jeff Layton <jlayton@xxxxxxxxxx>