Re: [PATCH RFC 0/4] Containerised NFS clients and teardown

Jeff Layton <jlayton@xxxxxxxxxx> · Thu, 20 Mar 2025 15:32:10 -0400

On Thu, 2025-03-20 at 13:44 -0400, trondmy@xxxxxxxxxx wrote:
> From: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> 
> When a NFS client is started from inside a container, it is often not
> possible to ensure a safe shutdown and flush of the data before the
> container orchestrator steps in to tear down the network. Typically,
> what can happen is that the orchestrator triggers a lazy umount of the
> mounted filesystems, then proceeds to delete virtual network device
> links, bridges, NAT configurations, etc.
> 
> Once that happens, it may be impossible to reach into the container to
> perform any further shutdown actions on the NFS client.
> 
> This patchset proposes to allow the client to deal with these situations
> by treating the two errors ENETDOWN  and ENETUNREACH as being fatal.
> The intention is to then allow the I/O queue to drain, and any remaining
> RPC calls to error out, so that the lazy umounts can complete the
> shutdown process.
> 
> In order to do so, a new mount option "fatal_errors" is introduced,
> which can take the values "default", "none" and "enetdown:enetunreach".
> The value "none" forces the existing behaviour, whereby hard mounts are
> unaffected by the ENETDOWN and ENETUNREACH errors.
> The value "enetdown:enetunreach" forces ENETDOWN and ENETUNREACH errors
> to always be fatal.
> If the user does not specify the "fatal_errors" option, or uses the
> value "default", then ENETDOWN and ENETUNREACH will be fatal if the
> mount was started from inside a network namespace that is not
> "init_net", and otherwise not.
> 
> The expectation is that users will normally not need to set this option,
> unless they are running inside a container, and want to prevent ENETDOWN
> and ENETUNREACH from being fatal by setting "-ofatal_errors=none".
> 
> Trond Myklebust (4):
>   NFS: Add a mount option to make ENETUNREACH errors fatal
>   NFS: Treat ENETUNREACH errors as fatal in containers
>   pNFS/flexfiles: Treat ENETUNREACH errors as fatal in containers
>   pNFS/flexfiles: Report ENETDOWN as a connection error
> 
>  fs/nfs/client.c                        |  5 ++++
>  fs/nfs/flexfilelayout/flexfilelayout.c | 24 ++++++++++++++--
>  fs/nfs/fs_context.c                    | 38 ++++++++++++++++++++++++++
>  fs/nfs/nfs3client.c                    |  2 ++
>  fs/nfs/nfs4client.c                    |  5 ++++
>  fs/nfs/nfs4proc.c                      |  3 ++
>  fs/nfs/super.c                         |  2 ++
>  include/linux/nfs4.h                   |  1 +
>  include/linux/nfs_fs_sb.h              |  2 ++
>  include/linux/sunrpc/clnt.h            |  5 +++-
>  include/linux/sunrpc/sched.h           |  1 +
>  net/sunrpc/clnt.c                      | 30 ++++++++++++++------
>  12 files changed, 107 insertions(+), 11 deletions(-)
> 

I like the concept, but unfortunately it doesn't help with the
reproducer I have. The rpc_tasks remain stuck. Here's the contents of
the rpc_tasks file:

  252 c825      0 0x3 0xd2147cd2     2147 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  251 c825      0 0x3 0xd3147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  241 c825      0 0x3 0xd4147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  531 c825      0 0x3 0xd5147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  640 c825      0 0x3 0xd6147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 READ a:call_bind [sunrpc] q:delayq
  634 c825      0 0x3 0xd7147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  564 c825      0 0x3 0xd8147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  567 c825      0 0x3 0xd9147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  258 c825      0 0x3 0xda147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  259 c825      0 0x3 0xdb147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
 1159 c825      0 0x3 0xdc147cd2     2146 nfs_commit_ops [nfs] nfsv4 COMMIT a:call_bind [sunrpc] q:delayq
  246 c825      0 0x3 0xdd147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  536 c825      0 0x3 0xde147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  645 c825      0 0x3 0xdf147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 READ a:call_bind [sunrpc] q:delayq
  637 c825      0 0x3 0xe0147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  572 c825      0 0x3 0xe1147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  568 c825      0 0x3 0xe2147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  263 c825      0 0x3 0xe3147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
 1163 c825      0 0x3 0xe4147cd2     2146 nfs_commit_ops [nfs] nfsv4 COMMIT a:call_bind [sunrpc] q:delayq
  262 c825      0 0x3 0xe5147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
 1162 c825      0 0x3 0xe6147cd2     2146 nfs_commit_ops [nfs] nfsv4 COMMIT a:call_bind [sunrpc] q:delayq
  250 c825      0 0x3 0xe7147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  537 c825      0 0x3 0xe8147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  646 c825      0 0x3 0xe9147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 READ a:call_bind [sunrpc] q:delayq
  642 c825      0 0x3 0xea147cd2     2146 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
 1165 c825      0 0x3 0xeb147cd2     2146 nfs_commit_ops [nfs] nfsv4 COMMIT a:call_bind [sunrpc] q:delayq
  579 c825      0 0x3 0xec147cd2     2145 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  574 c825      0 0x3 0xed147cd2     2145 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  269 c825      0 0x3 0xee147cd2     2145 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq
  265 c825      0 0x3 0xef147cd2     2145 nfs_pgio_common_ops [nfs] nfsv4 WRITE a:call_bind [sunrpc] q:delayq

I turned up a bunch of tracepoints, and collected some output for a
while waiting for the tasks to die. It's attached.

I see some ENETUNREACH (-101) errors in there, but the rpc_tasks didn't
die off. It looks sort of like the rpc_task flag didn't get set
properly? I'll plan to take a closer look tomorrow unless you figure it
out.
-- 
Jeff Layton <jlayton@xxxxxxxxxx>
Attachment:
nfs-nonet-trace.txt.gz

Description: application/gzip