Re: [PATCH v3 6/6] NFSv4: Treat ENETUNREACH errors as fatal for state recovery

Benjamin Coddington <bcodding@xxxxxxxxxx> · Wed, 26 Mar 2025 06:39:59 -0400

On 25 Mar 2025, at 18:35, trondmy@xxxxxxxxxx wrote:

> From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
>
> If a containerised process is killed and causes an ENETUNREACH or
> ENETDOWN error to be propagated to the state manager, then mark the
> nfs_client as being dead so that we don't loop in functions that are
> expecting recovery to succeed.
>
> Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>
> Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> ---
>  fs/nfs/nfs4state.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 272d2ebdae0f..7612e977e80b 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -2739,7 +2739,15 @@ static void nfs4_state_manager(struct nfs_client *clp)
>  	pr_warn_ratelimited("NFS: state manager%s%s failed on NFSv4 server %s"
>  			" with error %d\n", section_sep, section,
>  			clp->cl_hostname, -status);
> -	ssleep(1);
> +	switch (status) {
> +	case -ENETDOWN:
> +	case -ENETUNREACH:
> +		nfs_mark_client_ready(clp, -EIO);
> +		break;
> +	default:
> +		ssleep(1);
> +		break;
> +	}
>  out_drain:
>  	memalloc_nofs_restore(memflags);
>  	nfs4_end_drain_session(clp);
> -- 
> 2.49.0

Doesn't this have the same bug as the sysfs shutdown - in that a mount with
fatal_neterrors=ENETDOWN:ENETUNREACH can take down the state manager for a
mount without it?  I think the same consideration applies as shutdown so
far: in practical use, you're not going to care.

Another thought - its pretty subtle that the only way those errors
might/should reach us here is if that mount option is in play.

Reviewed-by: Benjamin Coddington <bcodding@xxxxxxxxxx>

Ben