On Fri, 2025-03-21 at 10:36 -0400, Jeff Layton wrote: > On Thu, 2025-03-20 at 16:40 -0400, trondmy@xxxxxxxxxx wrote: > > From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> > > > > When a NFS client is started from inside a container, it is often > > not > > possible to ensure a safe shutdown and flush of the data before the > > container orchestrator steps in to tear down the network. > > Typically, > > what can happen is that the orchestrator triggers a lazy umount of > > the > > mounted filesystems, then proceeds to delete virtual network device > > links, bridges, NAT configurations, etc. > > > > Once that happens, it may be impossible to reach into the container > > to > > perform any further shutdown actions on the NFS client. > > > > This patchset proposes to allow the client to deal with these > > situations > > by treating the two errors ENETDOWN and ENETUNREACH as being > > fatal. > > The intention is to then allow the I/O queue to drain, and any > > remaining > > RPC calls to error out, so that the lazy umounts can complete the > > shutdown process. > > > > In order to do so, a new mount option "fatal_errors" is introduced, > > which can take the values "default", "none" and > > "enetdown:enetunreach". > > The value "none" forces the existing behaviour, whereby hard mounts > > are > > unaffected by the ENETDOWN and ENETUNREACH errors. > > The value "enetdown:enetunreach" forces ENETDOWN and ENETUNREACH > > errors > > to always be fatal. > > If the user does not specify the "fatal_errors" option, or uses the > > value "default", then ENETDOWN and ENETUNREACH will be fatal if the > > mount was started from inside a network namespace that is not > > "init_net", and otherwise not. > > > > The expectation is that users will normally not need to set this > > option, > > unless they are running inside a container, and want to prevent > > ENETDOWN > > and ENETUNREACH from being fatal by setting "-ofatal_errors=none". > > > > --- > > v2: > > - Fix NFSv4 client cl_flag initialisation > > - Add RPC task flag trace decoding > > > > Trond Myklebust (4): > > NFS: Add a mount option to make ENETUNREACH errors fatal > > NFS: Treat ENETUNREACH errors as fatal in containers > > pNFS/flexfiles: Treat ENETUNREACH errors as fatal in containers > > pNFS/flexfiles: Report ENETDOWN as a connection error > > > > fs/nfs/client.c | 5 ++++ > > fs/nfs/flexfilelayout/flexfilelayout.c | 24 ++++++++++++++-- > > fs/nfs/fs_context.c | 38 > > ++++++++++++++++++++++++++ > > fs/nfs/nfs3client.c | 2 ++ > > fs/nfs/nfs4client.c | 7 +++++ > > fs/nfs/nfs4proc.c | 3 ++ > > fs/nfs/super.c | 2 ++ > > include/linux/nfs4.h | 1 + > > include/linux/nfs_fs_sb.h | 2 ++ > > include/linux/sunrpc/clnt.h | 5 +++- > > include/linux/sunrpc/sched.h | 1 + > > include/trace/events/sunrpc.h | 1 + > > net/sunrpc/clnt.c | 30 ++++++++++++++------ > > 13 files changed, 110 insertions(+), 11 deletions(-) > > > > With the bug in patch #3 fixed, you can add: > > Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx> > Tested-by: Jeff Layton <jlayton@xxxxxxxxxx> Thanks for both the bugfix and the testing! I'll send out a v3. In addition to the above fix, I want to change the name of the mount option to be "fatal_neterror", and then capitalise the ENETDOWN:ENETUNREACH, so that it is more obvious that it refers to the POSIX errors. At some point, we may want to add support for further such errors, hence the fussiness. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx