On Tue, 2025-08-05 at 10:07 -0700, Dai Ngo wrote: > > On 8/5/25 9:41 AM, Trond Myklebust wrote: > > On Tue, 2025-08-05 at 08:46 -0700, Dai Ngo wrote: > > > On 8/4/25 4:55 PM, Trond Myklebust wrote: > > > > On Mon, 2025-08-04 at 13:13 -0700, Dai Ngo wrote: > > > > > On 8/4/25 12:21 PM, Trond Myklebust wrote: > > > > > > On Mon, 2025-08-04 at 12:08 -0700, Dai Ngo wrote: > > > > > > > Currently, when an RPC connection times out during the > > > > > > > connect > > > > > > > phase, > > > > > > > the task is retried by placing it back on the pending > > > > > > > queue > > > > > > > and > > > > > > > waiting > > > > > > > again. In some cases, the timeout occurs because TCP is > > > > > > > unable to > > > > > > > send > > > > > > > the SYN packet. This situation most often arises on bare > > > > > > > metal > > > > > > > systems > > > > > > > at boot time, when the NFS mount is attempted while the > > > > > > > network > > > > > > > link > > > > > > > appears to be up but is not yet stable. > > > > > > > > > > > > > > This patch addresses the issue by updating > > > > > > > call_connect_status to > > > > > > > destroy > > > > > > > the transport on ETIMEDOUT error before retrying the > > > > > > > connection. > > > > > > > This > > > > > > > ensures that subsequent connection attempts use a fresh > > > > > > > transport, > > > > > > > reducing the likelihood of repeated failures due to > > > > > > > lingering > > > > > > > network > > > > > > > issues. > > > > > > > > > > > > > > Signed-off-by: Dai Ngo <dai.ngo@xxxxxxxxxx> > > > > > > > --- > > > > > > > net/sunrpc/clnt.c | 2 +- > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c > > > > > > > index 21426c3049d3..701b742750c5 100644 > > > > > > > --- a/net/sunrpc/clnt.c > > > > > > > +++ b/net/sunrpc/clnt.c > > > > > > > @@ -2215,6 +2215,7 @@ call_connect_status(struct rpc_task > > > > > > > *task) > > > > > > > case -EHOSTUNREACH: > > > > > > > case -EPIPE: > > > > > > > case -EPROTO: > > > > > > > + case -ETIMEDOUT: > > > > > > > xprt_conditional_disconnect(task- > > > > > > > >tk_rqstp- > > > > > > > > rq_xprt, > > > > > > > task- > > > > > > > >tk_rqstp- > > > > > > > > rq_connect_cookie); > > > > > > > if (RPC_IS_SOFTCONN(task)) > > > > > > > @@ -2225,7 +2226,6 @@ call_connect_status(struct rpc_task > > > > > > > *task) > > > > > > > case -EADDRINUSE: > > > > > > > case -ENOTCONN: > > > > > > > case -EAGAIN: > > > > > > > - case -ETIMEDOUT: > > > > > > > if (!(task->tk_flags & > > > > > > > RPC_TASK_NO_ROUND_ROBIN) > > > > > > > && > > > > > > > (task->tk_flags & RPC_TASK_MOVEABLE) > > > > > > > && > > > > > > > test_bit(XPRT_REMOVE, &xprt->state)) > > > > > > > { > > > > > > Why is this needed? The ETIMEDOUT is supposed to be a task > > > > > > level > > > > > > error, > > > > > > not a connection level thing. > > > > > If TCP was not able to sent the SYN out on due to the > > > > > transient > > > > > error > > > > > with the link status and the task just turns around a wait > > > > > again, > > > > > since > > > > > TCP does not retry the SYN, eventually systemd times out and > > > > > stops > > > > > the > > > > > mount. > > > > > > > > > > > > > > > Below is the snippet from the system log with rpcdebug > > > > > enabled: > > > > > > > > > > > > > > > Jun 20 10:23:01 qa-i360-odi03 kernel: i40e 0000:98:00.0 eth1: > > > > > NIC > > > > > Link is Up, 10 Gbps Full Duplex, Flow Control: None > > > > > Jun 20 10:23:09 qa-i360-odi03 NetworkManager[1511]: <info> > > > > > [1750414989.6033] manager: startup complete > > > > > > > > > > ... <NFS mount starts> > > > > > Jun 20 10:23:09 qa-i360-odi03 systemd[1]: Mounting /odi... > > > > > ... > > > > > Jun 20 10:23:09 qa-i360-odi03 kernel: RPC: 1 added to > > > > > queue > > > > > 0000000093f481cd "xprt_pending" > > > > > Jun 20 10:23:09 qa-i360-odi03 kernel: RPC: 1 setting > > > > > alarm > > > > > for > > > > > 60000 ms > > > > > > > > > > ... <link status down & up> > > > > > Jun 20 10:23:10 qa-i360-odi03 kernel: tg3 0000:04:00.0 em1: > > > > > Link > > > > > is > > > > > up at 1000 Mbps, full duplex > > > > > Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info> > > > > > [1750414990.6359] device (em1): state change: disconnected -> > > > > > prepare > > > > > (reason 'none', sys-iface-state: 'managed') > > > > > Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info> > > > > > [1750414990.6361] device (em1): state change: prepare -> > > > > > config > > > > > (reason 'none', sys-iface-state: 'managed') > > > > > Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info> > > > > > [1750414990.6364] device (em1): state change: config -> ip- > > > > > config > > > > > (reason 'none', sys-iface-state: 'managed') > > > > > Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info> > > > > > [1750414990.6383] device (em1): Activation: successful, > > > > > device > > > > > activated. > > > > > > > > > > ... <connect timed out> > > > > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC: 1 timeout > > > > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC: 1 > > > > > __rpc_wake_up_task > > > > > (now 4294742016) > > > > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC: 1 disabling > > > > > timer > > > > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC: 1 removed from > > > > > queue > > > > > 0000000093f481cd "xprt_pending" > > > > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC: 1 > > > > > call_connect_status > > > > > (status -110) > > > > > > > > > > ... <wait again> > > > > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC: 1 > > > > > sleep_on(queue > > > > > "xprt_pending" time 4294742016) > > > > > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC: 1 added to > > > > > queue > > > > > 0000000093f481cd "xprt_pending" > > > > > > > > > > ... <systemd timed out> > > > > > Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mounting > > > > > timed > > > > > out. Terminating. > > > > > > > > > > Jun 20 10:24:39 qa-i360-odi03 kernel: RPC: 1 got signal > > > > > Jun 20 10:24:39 qa-i360-odi03 kernel: RPC: 1 > > > > > __rpc_wake_up_task > > > > > (now 4294770229) > > > > > Jun 20 10:24:39 qa-i360-odi03 kernel: RPC: 1 disabling > > > > > timer > > > > > Jun 20 10:24:39 qa-i360-odi03 kernel: RPC: 1 removed from > > > > > queue > > > > > 0000000093f481cd "xprt_pending" > > > > > > > > > > Jun 20 10:24:39 qa-i360-odi03 kernel: <-- nfs4_try_mount() = > > > > > -512 > > > > > [error] > > > > > Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mount > > > > > process > > > > > exited, code=killed status=15 > > > > > Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Failed > > > > > with > > > > > result 'timeout'. > > > > > > > > > > This patch forces TCP to send the SYN on ETIMEDOUT error. > > > > > > > > > Something is very wrong here... > > > > > > > > If this patch is correct, and the call to > > > > xprt_conditional_disconnect() > > > > does indeed cause the socket to close, then something must have > > > > bumped > > > > xprt->connect_cookie. > > > I'm a little confused here, xprt_conditional_disconnect only > > > closes > > > the > > > socket if the connect cookie is still the same: > > > > > > if (cookie != xprt->connect_cookie) > > > goto out; > > > > > > So in this case the xprt->connect_cookie has not been bumped. > > You're right... Sorry... > > > > So does this mean that the socket is still in the SYN-SENT state? > > It is > > normally supposed to remain in that state for 60 seconds, and > > resend > > SYN using an exponential back off. > > Apparently this was not done by TCP in this scenario: > > ... mount starts > Jun 20 10:23:09 qa-i360-odi03 systemd[1]: Mounting /odi... > Jun 20 10:23:09 qa-i360-odi03 kernel: RPC: 1 added to queue > 0000000093f481cd "xprt_pending" > > Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info> > [1750414990.6383] device (em1): Activation: successful, device > activated. > > ... RPC task timed out on connect after 62 secs. Note that at this > time the network link status was already up for more than 60 secs > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC: 1 timeout > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC: 1 __rpc_wake_up_task > (now 4294742016) > > ... RPC task goes back and wait again. > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC: 1 sleep_on(queue > "xprt_pending" time 4294742016) > Jun 20 10:24:11 qa-i360-odi03 kernel: RPC: 1 added to queue > 0000000093f481cd "xprt_pending" > > ... nothing happened and systemd timed out after 90 secs and kill the > mount > Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mounting timed > out. Terminating. > Jun 20 10:24:39 qa-i360-odi03 kernel: <-- nfs4_try_mount() = -512 > [error] > Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mount process > exited, code=killed status=15 > Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Failed with > result 'timeout'. > > I don't know why TCP does not retry the SYN after the networkk link > is > up again but I though instead of trying to figure out and fix the > problem > at TCP layer, we can make the RPC connect process a bit more robust > by using > a new socket for retry. > We should never be using individual RPC message timeouts as a measure for whether or not the TCP connection is down. For the initial connection attempt, we use the TCP_SYNCNT socket option to decide when to give up. Once the connection is established, we set TCP_USER_TIMEOUT as well as socket keepalive to ensure that the connection is still alive. I'm willing to consider further socket-specific timers, but not if the reason is to work around bugs (assuming that is indeed the case here). -- Trond Myklebust Linux NFS client maintainer, Hammerspace trondmy@xxxxxxxxxx, trond.myklebust@xxxxxxxxxxxxxxx