Re: [PATCH 1/1] SUNRPC: call_connect_status needs to destroy transport on ETIMEDOUT before retry

Trond Myklebust <trondmy@xxxxxxxxxx> · Mon, 04 Aug 2025 16:55:34 -0700

On Mon, 2025-08-04 at 13:13 -0700, Dai Ngo wrote:
> 
> On 8/4/25 12:21 PM, Trond Myklebust wrote:
> > On Mon, 2025-08-04 at 12:08 -0700, Dai Ngo wrote:
> > > Currently, when an RPC connection times out during the connect
> > > phase,
> > > the task is retried by placing it back on the pending queue and
> > > waiting
> > > again. In some cases, the timeout occurs because TCP is unable to
> > > send
> > > the SYN packet. This situation most often arises on bare metal
> > > systems
> > > at boot time, when the NFS mount is attempted while the network
> > > link
> > > appears to be up but is not yet stable.
> > > 
> > > This patch addresses the issue by updating call_connect_status to
> > > destroy
> > > the transport on ETIMEDOUT error before retrying the connection.
> > > This
> > > ensures that subsequent connection attempts use a fresh
> > > transport,
> > > reducing the likelihood of repeated failures due to lingering
> > > network
> > > issues.
> > > 
> > > Signed-off-by: Dai Ngo <dai.ngo@xxxxxxxxxx>
> > > ---
> > >   net/sunrpc/clnt.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> > > index 21426c3049d3..701b742750c5 100644
> > > --- a/net/sunrpc/clnt.c
> > > +++ b/net/sunrpc/clnt.c
> > > @@ -2215,6 +2215,7 @@ call_connect_status(struct rpc_task *task)
> > >   	case -EHOSTUNREACH:
> > >   	case -EPIPE:
> > >   	case -EPROTO:
> > > +	case -ETIMEDOUT:
> > >   		xprt_conditional_disconnect(task->tk_rqstp-
> > > >rq_xprt,
> > >   					    task->tk_rqstp-
> > > > rq_connect_cookie);
> > >   		if (RPC_IS_SOFTCONN(task))
> > > @@ -2225,7 +2226,6 @@ call_connect_status(struct rpc_task *task)
> > >   	case -EADDRINUSE:
> > >   	case -ENOTCONN:
> > >   	case -EAGAIN:
> > > -	case -ETIMEDOUT:
> > >   		if (!(task->tk_flags & RPC_TASK_NO_ROUND_ROBIN)
> > > &&
> > >   		    (task->tk_flags & RPC_TASK_MOVEABLE) &&
> > >   		    test_bit(XPRT_REMOVE, &xprt->state)) {
> > Why is this needed? The ETIMEDOUT is supposed to be a task level
> > error,
> > not a connection level thing.
> 
> If TCP was not able to sent the SYN out on due to the transient error
> with the link status and the task just turns around a wait again,
> since
> TCP does not retry the SYN, eventually systemd times out and stops
> the
> mount.
> 
> 
> Below is the snippet from the system log with rpcdebug enabled:
> 
> 
> Jun 20 10:23:01 qa-i360-odi03 kernel: i40e 0000:98:00.0 eth1: NIC
> Link is Up, 10 Gbps Full Duplex, Flow Control: None
> Jun 20 10:23:09 qa-i360-odi03 NetworkManager[1511]: <info> 
> [1750414989.6033] manager: startup complete
> 
> ... <NFS mount starts>
> Jun 20 10:23:09 qa-i360-odi03 systemd[1]: Mounting /odi...
> ...
> Jun 20 10:23:09 qa-i360-odi03 kernel: RPC:     1 added to queue
> 0000000093f481cd "xprt_pending"
> Jun 20 10:23:09 qa-i360-odi03 kernel: RPC:     1 setting alarm for
> 60000 ms
> 
> ... <link status down & up>
> Jun 20 10:23:10 qa-i360-odi03 kernel: tg3 0000:04:00.0 em1: Link is
> up at 1000 Mbps, full duplex
> Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info> 
> [1750414990.6359] device (em1): state change: disconnected -> prepare
> (reason 'none', sys-iface-state: 'managed')
> Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info> 
> [1750414990.6361] device (em1): state change: prepare -> config
> (reason 'none', sys-iface-state: 'managed')
> Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info> 
> [1750414990.6364] device (em1): state change: config -> ip-config
> (reason 'none', sys-iface-state: 'managed')
> Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info> 
> [1750414990.6383] device (em1): Activation: successful, device
> activated.
> 
> ... <connect timed out>
> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 timeout
> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 __rpc_wake_up_task
> (now 4294742016)
> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 disabling timer
> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 removed from queue
> 0000000093f481cd "xprt_pending"
> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 call_connect_status
> (status -110)
> 
> ... <wait again>
> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 sleep_on(queue
> "xprt_pending" time 4294742016)
> Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 added to queue
> 0000000093f481cd "xprt_pending"
> 
> ... <systemd timed out>
> Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mounting timed
> out. Terminating.
> 
> Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 got signal
> Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 __rpc_wake_up_task
> (now 4294770229)
> Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 disabling timer
> Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 removed from queue
> 0000000093f481cd "xprt_pending"
> 
> Jun 20 10:24:39 qa-i360-odi03 kernel: <-- nfs4_try_mount() = -512
> [error]
> Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mount process
> exited, code=killed status=15
> Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Failed with
> result 'timeout'.
> 
> This patch forces TCP to send the SYN on ETIMEDOUT error.
> 

Something is very wrong here...

If this patch is correct, and the call to xprt_conditional_disconnect()
does indeed cause the socket to close, then something must have bumped
xprt->connect_cookie. The only things that can do that are the state
changes recorded in xs_tcp_state_change(),
xs_sock_reset_connection_flags(), or xprt_autoclose().

If it was xs_tcp_state_change() that bumped xprt->connect_cookie, then
either we're in TCP_ESTABLISHED (in which case triggering a close on
ETIMEDOUT is just wrong), or we're in the TCP_CLOSE state, in which
case autoclose should already be scheduled.
If xs_sock_reset_connection_flags() got called, then the socket is in
the process of being closed.
Ditto if xprt_autoclose() got called.

So why do we need the call to xprt_conditional_disconnect()?

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@xxxxxxxxxx, trond.myklebust@xxxxxxxxxxxxxxx