Re: [PATCH v2 4/4] NFSv4: Treat ENETUNREACH errors as fatal for state recovery

Jeff Layton <jlayton@xxxxxxxxxx> · Tue, 25 Mar 2025 15:26:06 -0400

On Tue, 2025-03-25 at 18:50 +0000, Trond Myklebust wrote:
> On Tue, 2025-03-25 at 14:04 -0400, Jeff Layton wrote:
> > On Tue, 2025-03-25 at 12:17 -0400, trondmy@xxxxxxxxxx wrote:
> > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > > 
> > > If a containerised process is killed and causes an ENETUNREACH or
> > > ENETDOWN error to be propagated to the state manager, then mark the
> > > nfs_client as being dead so that we don't loop in functions that
> > > are
> > > expecting recovery to succeed.
> > > 
> > > Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> > > ---
> > >  fs/nfs/nfs4state.c | 10 +++++++++-
> > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > > index 738eb2789266..14ba3f96e6fc 100644
> > > --- a/fs/nfs/nfs4state.c
> > > +++ b/fs/nfs/nfs4state.c
> > > @@ -2739,7 +2739,15 @@ static void nfs4_state_manager(struct
> > > nfs_client *clp)
> > >  	pr_warn_ratelimited("NFS: state manager%s%s failed on
> > > NFSv4 server %s"
> > >  			" with error %d\n", section_sep, section,
> > >  			clp->cl_hostname, -status);
> > > -	ssleep(1);
> > > +	switch (status) {
> > > +	case -ENETDOWN:
> > > +	case -ENETUNREACH:
> > > +		nfs_mark_client_ready(clp, -EIO);
> > > +		break;
> > 
> > Should this be conditional on clnt->cl_netunreach_fatal being true?
> 
> I should hope not. We shouldn't ever be seeing these errors here if it
> is false.
> 

Oh right, the only way that these errors bubble up from sunrpc layer is
if this option is enabled. You can add:

Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>