Re: [PATCH RFC v2 0/4] Containerised NFS clients and teardown

Jeff Layton <jlayton@xxxxxxxxxx> · Fri, 21 Mar 2025 10:36:32 -0400

On Thu, 2025-03-20 at 16:40 -0400, trondmy@xxxxxxxxxx wrote:
> From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> 
> When a NFS client is started from inside a container, it is often not
> possible to ensure a safe shutdown and flush of the data before the
> container orchestrator steps in to tear down the network. Typically,
> what can happen is that the orchestrator triggers a lazy umount of the
> mounted filesystems, then proceeds to delete virtual network device
> links, bridges, NAT configurations, etc.
> 
> Once that happens, it may be impossible to reach into the container to
> perform any further shutdown actions on the NFS client.
> 
> This patchset proposes to allow the client to deal with these situations
> by treating the two errors ENETDOWN  and ENETUNREACH as being fatal.
> The intention is to then allow the I/O queue to drain, and any remaining
> RPC calls to error out, so that the lazy umounts can complete the
> shutdown process.
> 
> In order to do so, a new mount option "fatal_errors" is introduced,
> which can take the values "default", "none" and "enetdown:enetunreach".
> The value "none" forces the existing behaviour, whereby hard mounts are
> unaffected by the ENETDOWN and ENETUNREACH errors.
> The value "enetdown:enetunreach" forces ENETDOWN and ENETUNREACH errors
> to always be fatal.
> If the user does not specify the "fatal_errors" option, or uses the
> value "default", then ENETDOWN and ENETUNREACH will be fatal if the
> mount was started from inside a network namespace that is not
> "init_net", and otherwise not.
> 
> The expectation is that users will normally not need to set this option,
> unless they are running inside a container, and want to prevent ENETDOWN
> and ENETUNREACH from being fatal by setting "-ofatal_errors=none".
> 
> ---
> v2:
> - Fix NFSv4 client cl_flag initialisation
> - Add RPC task flag trace decoding
> 
> Trond Myklebust (4):
>   NFS: Add a mount option to make ENETUNREACH errors fatal
>   NFS: Treat ENETUNREACH errors as fatal in containers
>   pNFS/flexfiles: Treat ENETUNREACH errors as fatal in containers
>   pNFS/flexfiles: Report ENETDOWN as a connection error
> 
>  fs/nfs/client.c                        |  5 ++++
>  fs/nfs/flexfilelayout/flexfilelayout.c | 24 ++++++++++++++--
>  fs/nfs/fs_context.c                    | 38 ++++++++++++++++++++++++++
>  fs/nfs/nfs3client.c                    |  2 ++
>  fs/nfs/nfs4client.c                    |  7 +++++
>  fs/nfs/nfs4proc.c                      |  3 ++
>  fs/nfs/super.c                         |  2 ++
>  include/linux/nfs4.h                   |  1 +
>  include/linux/nfs_fs_sb.h              |  2 ++
>  include/linux/sunrpc/clnt.h            |  5 +++-
>  include/linux/sunrpc/sched.h           |  1 +
>  include/trace/events/sunrpc.h          |  1 +
>  net/sunrpc/clnt.c                      | 30 ++++++++++++++------
>  13 files changed, 110 insertions(+), 11 deletions(-)
> 

With the bug in patch #3 fixed, you can add:

Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>
Tested-by: Jeff Layton <jlayton@xxxxxxxxxx>