Re: [PATCH 1/1] SUNRPC: call_connect_status needs to destroy transport on ETIMEDOUT before retry

Dai Ngo <dai.ngo@xxxxxxxxxx> · Tue, 5 Aug 2025 10:07:53 -0700

On 8/5/25 9:41 AM, Trond Myklebust wrote:
On Tue, 2025-08-05 at 08:46 -0700, Dai Ngo wrote:
On 8/4/25 4:55 PM, Trond Myklebust wrote:
On Mon, 2025-08-04 at 13:13 -0700, Dai Ngo wrote:
On 8/4/25 12:21 PM, Trond Myklebust wrote:
On Mon, 2025-08-04 at 12:08 -0700, Dai Ngo wrote:
Currently, when an RPC connection times out during the
connect
phase,
the task is retried by placing it back on the pending queue
and
waiting
again. In some cases, the timeout occurs because TCP is
unable to
send
the SYN packet. This situation most often arises on bare
metal
systems
at boot time, when the NFS mount is attempted while the
network
link
appears to be up but is not yet stable.

This patch addresses the issue by updating
call_connect_status to
destroy
the transport on ETIMEDOUT error before retrying the
connection.
This
ensures that subsequent connection attempts use a fresh
transport,
reducing the likelihood of repeated failures due to lingering
network
issues.

Signed-off-by: Dai Ngo <dai.ngo@xxxxxxxxxx>
---
    net/sunrpc/clnt.c | 2 +-
    1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 21426c3049d3..701b742750c5 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -2215,6 +2215,7 @@ call_connect_status(struct rpc_task
*task)
    	case -EHOSTUNREACH:
    	case -EPIPE:
    	case -EPROTO:
+	case -ETIMEDOUT:
    		xprt_conditional_disconnect(task->tk_rqstp-
rq_xprt,
    					    task->tk_rqstp-
rq_connect_cookie);
    		if (RPC_IS_SOFTCONN(task))
@@ -2225,7 +2226,6 @@ call_connect_status(struct rpc_task
*task)
    	case -EADDRINUSE:
    	case -ENOTCONN:
    	case -EAGAIN:
-	case -ETIMEDOUT:
    		if (!(task->tk_flags &
RPC_TASK_NO_ROUND_ROBIN)
&&
    		    (task->tk_flags & RPC_TASK_MOVEABLE) &&
    		    test_bit(XPRT_REMOVE, &xprt->state)) {
Why is this needed? The ETIMEDOUT is supposed to be a task
level
error,
not a connection level thing.
If TCP was not able to sent the SYN out on due to the transient
error
with the link status and the task just turns around a wait again,
since
TCP does not retry the SYN, eventually systemd times out and
stops
the
mount.


Below is the snippet from the system log with rpcdebug enabled:


Jun 20 10:23:01 qa-i360-odi03 kernel: i40e 0000:98:00.0 eth1: NIC
Link is Up, 10 Gbps Full Duplex, Flow Control: None
Jun 20 10:23:09 qa-i360-odi03 NetworkManager[1511]: <info>
[1750414989.6033] manager: startup complete

... <NFS mount starts>
Jun 20 10:23:09 qa-i360-odi03 systemd[1]: Mounting /odi...
...
Jun 20 10:23:09 qa-i360-odi03 kernel: RPC:     1 added to queue
0000000093f481cd "xprt_pending"
Jun 20 10:23:09 qa-i360-odi03 kernel: RPC:     1 setting alarm
for
60000 ms

... <link status down & up>
Jun 20 10:23:10 qa-i360-odi03 kernel: tg3 0000:04:00.0 em1: Link
is
up at 1000 Mbps, full duplex
Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
[1750414990.6359] device (em1): state change: disconnected ->
prepare
(reason 'none', sys-iface-state: 'managed')
Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
[1750414990.6361] device (em1): state change: prepare -> config
(reason 'none', sys-iface-state: 'managed')
Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
[1750414990.6364] device (em1): state change: config -> ip-config
(reason 'none', sys-iface-state: 'managed')
Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>
[1750414990.6383] device (em1): Activation: successful, device
activated.

... <connect timed out>
Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 timeout
Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1
__rpc_wake_up_task
(now 4294742016)
Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 disabling timer
Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 removed from
queue
0000000093f481cd "xprt_pending"
Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1
call_connect_status
(status -110)

... <wait again>
Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 sleep_on(queue
"xprt_pending" time 4294742016)
Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 added to queue
0000000093f481cd "xprt_pending"

... <systemd timed out>
Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mounting
timed
out. Terminating.

Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 got signal
Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1
__rpc_wake_up_task
(now 4294770229)
Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 disabling timer
Jun 20 10:24:39 qa-i360-odi03 kernel: RPC:     1 removed from
queue
0000000093f481cd "xprt_pending"

Jun 20 10:24:39 qa-i360-odi03 kernel: <-- nfs4_try_mount() = -512
[error]
Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mount
process
exited, code=killed status=15
Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Failed with
result 'timeout'.

This patch forces TCP to send the SYN on ETIMEDOUT error.

Something is very wrong here...

If this patch is correct, and the call to
xprt_conditional_disconnect()
does indeed cause the socket to close, then something must have
bumped
xprt->connect_cookie.
I'm a little confused here, xprt_conditional_disconnect only closes
the
socket if the connect cookie is still the same:

          if (cookie != xprt->connect_cookie)
                  goto out;

So in this case the xprt->connect_cookie has not been bumped.
You're right... Sorry...

So does this mean that the socket is still in the SYN-SENT state? It is
normally supposed to remain in that state for 60 seconds, and resend
SYN using an exponential back off.

Apparently this was not done by TCP in this scenario:

... mount starts
Jun 20 10:23:09 qa-i360-odi03 systemd[1]: Mounting /odi...
Jun 20 10:23:09 qa-i360-odi03 kernel: RPC:     1 added to queue 0000000093f481cd "xprt_pending"

Jun 20 10:23:10 qa-i360-odi03 NetworkManager[1511]: <info>  [1750414990.6383] device (em1): Activation: successful, device activated.

... RPC task timed out on connect after 62 secs.  Note that at this time the network link status was already up for more than 60 secs
Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 timeout
Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 __rpc_wake_up_task (now 4294742016)

... RPC task goes back and wait again.
Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 sleep_on(queue "xprt_pending" time 4294742016)
Jun 20 10:24:11 qa-i360-odi03 kernel: RPC:     1 added to queue 0000000093f481cd "xprt_pending"

... nothing happened and systemd timed out after 90 secs and kill the mount
Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mounting timed out. Terminating.
Jun 20 10:24:39 qa-i360-odi03 kernel: <-- nfs4_try_mount() = -512 [error]
Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Mount process exited, code=killed status=15
Jun 20 10:24:39 qa-i360-odi03 systemd[1]: odi.mount: Failed with result 'timeout'.

I don't know why TCP does not retry the SYN after the networkk link is
up again but I though instead of trying to figure out and fix the problem
at TCP layer, we can make the RPC connect process a bit more robust by using
a new socket for retry.

-Dai

What you are doing should not be
necessary if the socket is in that state.

-Dai

   The only things that can do that are the state
changes recorded in xs_tcp_state_change(),
xs_sock_reset_connection_flags(), or xprt_autoclose().

If it was xs_tcp_state_change() that bumped xprt->connect_cookie,
then
either we're in TCP_ESTABLISHED (in which case triggering a close
on
ETIMEDOUT is just wrong), or we're in the TCP_CLOSE state, in which
case autoclose should already be scheduled.
If xs_sock_reset_connection_flags() got called, then the socket is
in
the process of being closed.
Ditto if xprt_autoclose() got called.

So why do we need the call to xprt_conditional_disconnect()?