Re: [PATCH v2 4/4] NFSv4: Treat ENETUNREACH errors as fatal for state recovery

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Tue, 25 Mar 2025 18:50:48 +0000

On Tue, 2025-03-25 at 14:04 -0400, Jeff Layton wrote:
> On Tue, 2025-03-25 at 12:17 -0400, trondmy@xxxxxxxxxx wrote:
> > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > 
> > If a containerised process is killed and causes an ENETUNREACH or
> > ENETDOWN error to be propagated to the state manager, then mark the
> > nfs_client as being dead so that we don't loop in functions that
> > are
> > expecting recovery to succeed.
> > 
> > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > ---
> >  fs/nfs/nfs4state.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > index 738eb2789266..14ba3f96e6fc 100644
> > --- a/fs/nfs/nfs4state.c
> > +++ b/fs/nfs/nfs4state.c
> > @@ -2739,7 +2739,15 @@ static void nfs4_state_manager(struct
> > nfs_client *clp)
> >  	pr_warn_ratelimited("NFS: state manager%s%s failed on
> > NFSv4 server %s"
> >  			" with error %d\n", section_sep, section,
> >  			clp->cl_hostname, -status);
> > -	ssleep(1);
> > +	switch (status) {
> > +	case -ENETDOWN:
> > +	case -ENETUNREACH:
> > +		nfs_mark_client_ready(clp, -EIO);
> > +		break;
> 
> Should this be conditional on clnt->cl_netunreach_fatal being true?

I should hope not. We shouldn't ever be seeing these errors here if it
is false.

>  
> > +	default:
> > +		ssleep(1);
> > +		break;
> > +	}
> >  out_drain:
> >  	memalloc_nofs_restore(memflags);
> >  	nfs4_end_drain_session(clp);
> 

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx