leaked pNFS DS nfs_client references

Jeff Layton <jlayton@xxxxxxxxxx> · Mon, 21 Apr 2025 15:46:44 -0400

Hi Trond/Anna:

We (at Meta) have been hunting a number of problems surrounding leaked
network namespaces with containerized workloads. We recently deployed a
v6.9 based kernel on the clients that has all the known containerized
NFS fixes from upstream.

Usually, when we've found problems with leaked netns's it has been
because there were still outstanding RPCs associated with the rpc_clnt.
Today, we found a host that seems to have some leaked nfs_client
structures, but there is no associated RPC activity.

In this case, we had 2 leaked net namespaces. We discovered them by
looking under /sys/kernel/debugfs/rpc_xprt for xprts associated with
netns's that no longer have any userland tasks attached.

Some drgn (pardon my terrible Python):

>>> for net in for_each_net():
...     if (net.ns.inum == 4026558887 or net.ns.inum == 4026558805):                                   
...         print("netns:", net.ns.inum)                                                               
...         nfs_net = cast("struct nfs_net *", net.gen.ptr[prog["nfs_net_id"]])
...         print("Volume list empty:", list_empty(nfs_net.nfs_volume_list.address_of_()))
...         for clnt in list_for_each_entry("struct nfs_client", nfs_net.nfs_client_list.address_of_(), "cl_share_link"):
...             rpcclnt = clnt.cl_rpcclient
...             print(clnt.cl_count.refs.counter, clnt.cl_hostname, rpcclnt.cl_vers, "tasks: ", list_count_nodes(rpcclnt.cl_tasks.address_of_()))
... 
netns: (unsigned int)4026558805
Volume list empty: True
(int)1 (char *)0xffff8a12e988a500 = "f00::3117:a4f1:a940:94af" (u32)3 tasks:  0
(int)1 (char *)0xffff8881a0f694c0 = "f00::bfaa:cec2:8ee2:295" (u32)3 tasks:  0
(int)1 (char *)0xffff889e81a74e40 = "f00::8f23:f52d:9d79:a7b0" (u32)3 tasks:  0
(int)1 (char *)0xffff8a027d8e0780 = "f00::d209:97ba:1c6:3282" (u32)3 tasks:  0
netns: (unsigned int)4026558887
Volume list empty: True
(int)1 (char *)0xffff8a14d5b0e2c0 = "f00::3f52:fea6:4ccb:96dd" (u32)3 tasks:  0
(int)1 (char *)0xffff8881e6626cc0 = "f00::705:c924:ddc1:51e4" (u32)3 tasks:  0
(int)1 (char *)0xffff8a149cdb6680 = "f00::3117:a4f1:a940:94af" (u32)3 tasks:  0
(int)1 (char *)0xffff8896ada2f800 = "f00::d56c:cd93:1f0c:99c7" (u32)3 tasks:  0
(int)1 (char *)0xffff8a159251f240 = "f00::614d:87c1:a73f:1f09" (u32)3 tasks:  0
(int)1 (char *)0xffff888e699f4940 = "f00::1285:b785:f114:d38b" (u32)3 tasks:  0
(int)1 (char *)0xffff88812ae41500 = "f00::fb1c:bc4a:3d9a:c2a6" (u32)3 tasks:  0
(int)1 (char *)0xffff8a137dbc4e00 = "f00::bd2f:5851:b552:5bce" (u32)3 tasks:  0

There are 12 leaked nfs_clients in 2 netns's. There are no longer any
struct nfs_servers associated with either netns. Each leaked client has
a single outstanding reference. They're all connections to different
DS's (except for one between the two netns's, but I suspect that's just
coincidence). They're all NFSv3, which indicates that they are pNFS DS
clients. None of them have any running RPCs.

I took a look at the nfs_client refcount handling in the pNFS code but
didn't see any obvious bugs.

One thing we could consider is adding a refcount tracker for these
objects. That would tell us pretty quickly what took the leftover
references in the first place, assuming this is reproducible.

This kernel is based on v6.9, so it's possible we missed a fix that we
need. I didn't see anything obvious in recent git fixes though.

Any thoughts?
-- 
Jeff Layton <jlayton@xxxxxxxxxx>