On Mon, 2025-08-04 at 13:13 -0700, Dai Ngo wrote: > > On 8/4/25 12:21 PM, Trond Myklebust wrote: > > On Mon, 2025-08-04 at 12:08 -0700, Dai Ngo wrote: > > > Currently, when an RPC connection times out during the connect > > > phase, > > > the task is retried by placing it back on the pending queue and > > > waiting > > > again. In some cases, the timeout occurs because TCP is unable to > > > send > > > the SYN packet. This situation most often arises on bare metal > > > systems > > > at boot time, when the NFS mount is attempted while the network > > > link > > > appears to be up but is not yet stable. > > > > > > This patch addresses the issue by updating call_connect_status to > > > destroy > > > the transport on ETIMEDOUT error before retrying the connection. > > > This > > > ensures that subsequent connection attempts use a fresh > > > transport, > > > reducing the likelihood of repeated failures due to lingering > > > network > > > issues. > > > > > > Signed-off-by: Dai Ngo <dai.ngo@xxxxxxxxxx> > > > --- > > > net/sunrpc/clnt.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c > > > index 21426c3049d3..701b742750c5 100644 > > > --- a/net/sunrpc/clnt.c > > > +++ b/net/sunrpc/clnt.c > > > @@ -2215,6 +2215,7 @@ call_connect_status(struct rpc_task *task) > > > case -EHOSTUNREACH: > > > case -EPIPE: > > > case -EPROTO: > > > + case -ETIMEDOUT: > > > xprt_conditional_disconnect(task->tk_rqstp- > > > >rq_xprt, > > > task->tk_rqstp- > > > > rq_connect_cookie); > > > if (RPC_IS_SOFTCONN(task)) > > > @@ -2225,7 +2226,6 @@ call_connect_status(struct rpc_task *task) > > > case -EADDRINUSE: > > > case -ENOTCONN: > > > case -EAGAIN: > > > - case -ETIMEDOUT: > > > if (!(task->tk_flags & RPC_TASK_NO_ROUND_ROBIN) > > > && > > > (task->tk_flags & RPC_TASK_MOVEABLE) && > > > test_bit(XPRT_REMOVE, &xprt->state)) { > > Why is this needed? The ETIMEDOUT is supposed to be a task level > > error, > > not a connection level thing. > > If TCP was not able to sent the SYN out on due to the transient error > with the link status and the task just turns around a wait again, > since > TCP does not retry the SYN, eventually systemd times out and stops > the > mount. > > > Below is the snippet from the system log with rpcdebug enabled: > > > Jun 20 10:23:01 qa-i360-odi03 kernel: i40e 0000:98:00.0 eth1: NIC > Link is Up, 10 Gbps Full Duplex, Flow Control: None > Jun 20 10:23:09 qa-i360-odi03 NetworkManager[1511]: <info> > [1750414989.6033] manager: startup complete > > ... <NFS mount starts> > Jun 20 10:23:09 qa-i360-odi03 systemd[1]: Mounting /odi... > ... > Jun 20 10:23:09 qa-i360-odi03 kernel: RPC: 1 added to queue > 0000000093f481cd "xprt_pending" > Jun 20 10:23:09 qa-i360-odi03 kernel: RPC: 1 setting alarm for > 60000 ms > > ... <link status down & up> > Jun 20 10:23:10 qa-i360-odi03 kernel: tg3 0000:04:00.0 em1: Link is > up at 1000 Mbps, full duplex > Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info> > [1750414990.6359] device (em1): state change: disconnected -> prepare > (reason 'none', sys-iface-state: 'managed') > Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info> > [1750414990.6361] device (em1): state change: prepare -> config > (reason 'none', sys-iface-state: 'managed') > Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info> > [1750414990.6364] device (em1): state change: config -> ip-config > (reason 'none', sys-iface-state: 'managed') > Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info> > [1750414990.6383] device (em1): Activation: successful, device > activated. > > ... <connect timed out> > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC: 1 timeout > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC: 1 __rpc_wake_up_task > (now 4294742016) > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC: 1 disabling timer > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC: 1 removed from queue > 0000000093f481cd "xprt_pending" > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC: 1 call_connect_status > (status -110) > > ... <wait again> > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC: 1 sleep_on(queue > "xprt_pending" time 4294742016) > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC: 1 added to queue > 0000000093f481cd "xprt_pending" > > ... <systemd timed out> > Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mounting timed > out. Terminating. > > Jun 20 10:24:39 qa-i360-odi03 kernel: RPC: 1 got signal > Jun 20 10:24:39 qa-i360-odi03 kernel: RPC: 1 __rpc_wake_up_task > (now 4294770229) > Jun 20 10:24:39 qa-i360-odi03 kernel: RPC: 1 disabling timer > Jun 20 10:24:39 qa-i360-odi03 kernel: RPC: 1 removed from queue > 0000000093f481cd "xprt_pending" > > Jun 20 10:24:39 qa-i360-odi03 kernel: <-- nfs4_try_mount() = -512 > [error] > Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mount process > exited, code=killed status=15 > Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Failed with > result 'timeout'. > > This patch forces TCP to send the SYN on ETIMEDOUT error. > Something is very wrong here... If this patch is correct, and the call to xprt_conditional_disconnect() does indeed cause the socket to close, then something must have bumped xprt->connect_cookie. The only things that can do that are the state changes recorded in xs_tcp_state_change(), xs_sock_reset_connection_flags(), or xprt_autoclose(). If it was xs_tcp_state_change() that bumped xprt->connect_cookie, then either we're in TCP_ESTABLISHED (in which case triggering a close on ETIMEDOUT is just wrong), or we're in the TCP_CLOSE state, in which case autoclose should already be scheduled. If xs_sock_reset_connection_flags() got called, then the socket is in the process of being closed. Ditto if xprt_autoclose() got called. So why do we need the call to xprt_conditional_disconnect()? -- Trond Myklebust Linux NFS client maintainer, Hammerspace trondmy@xxxxxxxxxx, trond.myklebust@xxxxxxxxxxxxxxx