Re: unable to run NFSD in container if "options sunrpc pool_mode=pernode"

Jeff Layton <jlayton@xxxxxxxxxx> · Tue, 27 May 2025 09:50:13 -0400

On Sat, 2025-05-24 at 10:33 -0400, Mike Snitzer wrote:
> On Sat, May 24, 2025 at 08:05:19AM -0400, Jeff Layton wrote:
> > On Fri, 2025-05-23 at 23:53 -0400, Mike Snitzer wrote:
> > > On Fri, May 23, 2025 at 07:09:27PM -0400, Mike Snitzer wrote:
> > > > On Fri, May 23, 2025 at 06:40:45PM -0400, Jeff Layton wrote:
> > > > > On Fri, 2025-05-23 at 18:19 -0400, Mike Snitzer wrote:
> > > > > > On Fri, May 23, 2025 at 02:40:17PM -0400, Jeff Layton wrote:
> > > > > > > On Fri, 2025-05-23 at 14:29 -0400, Mike Snitzer wrote:
> > > > > > > > I don't know if $SUBJECT ever worked... but with latest 6.15 or
> > > > > > > > nfsd-testing if I just use pool_mode=global then all is fine.
> > > > > > > > 
> > > > > > > > If pool_mode=pernode then mounting the container's NFSv3 export fails.
> > > > > > > > 
> > > > > > > > I haven't started to dig into code yet but pool_mode=pernode works
> > > > > > > > perfectly fine if NFSD isn't running in a container.
> > > > > > > > 
> > > > > 
> > > > > Oops, I went and looked and nfsd isn't running in a container on these
> > > > > boxes. There are some other containerized apps running on the box, but
> > > > > nfsd isn't running in a container.
> > > > 
> > > > OK.
> > > > 
> > > > > > I'm using nfs-utils-2.8.2.  I don't see any nfsd threads running if I
> > > > > > use "options sunrpc pool_mode=pernode".
> > > > > > 
> > > > > 
> > > > > I'll have a look soon, but if you figure it out in the meantime, let us
> > > > > know.
> > > > 
> > > > Will do.
> > > > 
> > > > Just the latest info I have, with sunrpc's pool_mode=pernode dd hangs
> > > > with this stack trace:
> > > 
> > > Turns out this pool_mode=pernode issue is a regression caused by the
> > > very recent nfs-utils 2.8.2 (I rebuilt EL10's nfs-utils package,
> > > because why not upgrade to the latest!?).
> > > 
> > > If I use EL9.5's latest nfs-utils-2.5.4-37.el8.x86_64 then sunrpc's
> > > pool_mode=pernode works fine.
> > > 
> > > And this issue doesn't have anything to do with running in a container
> > > (it seemed to be container related purely because I happened to be
> > > seeing the issue with an EL9.5 container that had the EL10-based
> > > nfs-utils 2.8.2 installed).
> > > 
> > > Steved, unfortunately I'm not sure what the problem is with the newer
> > > nfs-utils and setting "options sunrpc pool_mode=pernode"
> > > 
> > 
> > I tried to reproduce this using fedora-41 VMs (no f42 available for
> > virt-builder yet), but everything worked. I don't have any actual NUMA
> > hw here though, so maybe that matters?
> > 
> > Can you run this on the nfs server and send back the output? I'm
> > wondering if this setting might not track the module option properly on
> > that host for some reason:
> > 
> >     # nfsdctl pool-mode
> 
> (from EL9.5 container with nfs-utils 2.8.2)
> # nfsdctl pool-mode
> pool-mode: pernode
> npools: 2
> 
> (on host)
> # numactl -H
> available: 2 nodes (0-1)
> node 0 cpus: 0 1 2 3 4 5 6 7
> node 0 size: 11665 MB
> node 0 free: 9892 MB
> node 1 cpus: 8 9 10 11 12 13 14 15
> node 1 size: 6042 MB
> node 1 free: 5127 MB
> node distances:
> node   0   1
>   0:  10  20
>   1:  20  10
> 
> (and yeahh I was aware the newer nfs-utils uses the netlink interface,
> will be interesting to pin down what the issue is with
> pool-mode=pernode)

Ok, I can reproduce this on a true NUMA machine. The first thing that's
interesting is that it seems to be intermittent. Occasionally I can
mount and operate on the socket, but socket requests hang most of the
time.

I turned up all of the nfsd and sunrpc tracepoints. After attempting a
mount that hung, I see only a single tracepoint fire:

          <idle>-0       [038] ..s..  5942.572721: svc_xprt_enqueue: server=[::]:2049 client=(einval) flags=CONN|CHNGBUF|LISTENER|CACHE_AUTH|CONG_CTRL

Based on the flags, svc_xprt_ready should have returned true. That should
make the xprt be enqueued and an idle thread be awoken. It looks like
that last bit may not be happening for some reason.

At this point, I'll probably have to add some debugging. I'll keep
poking at it -- stay tuned.
-- 
Jeff Layton <jlayton@xxxxxxxxxx>