Re: [PATCH 1/2] nfsd: use threads array as-is in netlink interface

Jeff Layton <jlayton@xxxxxxxxxx> · Fri, 13 Jun 2025 17:05:06 -0400

On Fri, 2025-06-13 at 10:56 -0400, Chuck Lever wrote:
> On 6/13/25 7:33 AM, Benjamin Coddington wrote:
> > On 12 Jun 2025, at 12:42, Chuck Lever wrote:
> > 
> > > On 6/12/25 12:15 PM, Jeff Layton wrote:
> > > > On Thu, 2025-06-12 at 12:05 -0400, Chuck Lever wrote:
> > > > > On 6/12/25 11:57 AM, Jeff Layton wrote:
> > > > > > On Tue, 2025-05-27 at 20:12 -0400, Jeff Layton wrote:
> > > > > > > The old nfsdfs interface for starting a server with multiple pools
> > > > > > > handles the special case of a single entry array passed down from
> > > > > > > userland by distributing the threads over every NUMA node.
> > > > > > > 
> > > > > > > The netlink control interface however constructs an array of length
> > > > > > > nfsd_nrpools() and fills any unprovided slots with 0's. This behavior
> > > > > > > defeats the special casing that the old interface relies on.
> > > > > > > 
> > > > > > > Change nfsd_nl_threads_set_doit() to pass down the array from userland
> > > > > > > as-is.
> > > > > > > 
> > > > > > > Fixes: 7f5c330b2620 ("nfsd: allow passing in array of thread counts via netlink")
> > > > > > > Reported-by: Mike Snitzer <snitzer@xxxxxxxxxx>
> > > > > > > Closes: https://lore.kernel.org/linux-nfs/aDC-ftnzhJAlwqwh@xxxxxxxxxx/
> > > > > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > > > > > > ---
> > > > > > >  fs/nfsd/nfsctl.c | 5 ++---
> > > > > > >  1 file changed, 2 insertions(+), 3 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > > > > > > index ac265d6fde35df4e02b955050f5b0ef22e6e519c..22101e08c3e80350668e94c395058bc228b08e64 100644
> > > > > > > --- a/fs/nfsd/nfsctl.c
> > > > > > > +++ b/fs/nfsd/nfsctl.c
> > > > > > > @@ -1611,7 +1611,7 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
> > > > > > >   */
> > > > > > >  int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
> > > > > > >  {
> > > > > > > -	int *nthreads, count = 0, nrpools, i, ret = -EOPNOTSUPP, rem;
> > > > > > > +	int *nthreads, nrpools = 0, i, ret = -EOPNOTSUPP, rem;
> > > > > > >  	struct net *net = genl_info_net(info);
> > > > > > >  	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > > > > > >  	const struct nlattr *attr;
> > > > > > > @@ -1623,12 +1623,11 @@ int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
> > > > > > >  	/* count number of SERVER_THREADS values */
> > > > > > >  	nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) {
> > > > > > >  		if (nla_type(attr) == NFSD_A_SERVER_THREADS)
> > > > > > > -			count++;
> > > > > > > +			nrpools++;
> > > > > > >  	}
> > > > > > > 
> > > > > > >  	mutex_lock(&nfsd_mutex);
> > > > > > > 
> > > > > > > -	nrpools = max(count, nfsd_nrpools(net));
> > > > > > >  	nthreads = kcalloc(nrpools, sizeof(int), GFP_KERNEL);
> > > > > > >  	if (!nthreads) {
> > > > > > >  		ret = -ENOMEM;
> > > > > > 
> > > > > > I noticed that this didn't go in to the recent merge window.
> > > > > > 
> > > > > > This patch fixes a rather nasty regression when you try to start the
> > > > > > server on a NUMA-capable box.
> > > > > 
> > > > > The NFSD netlink interface is not broadly used yet, is it?
> > > > > 
> > > > 
> > > > It is. RHEL10 shipped with it, for instance and it's been in Fedora for
> > > > a while.
> > > 
> > > RHEL 10 is shiny and new, and Fedora is bleeding edge. It's not likely
> > > either of these are deployed in production environments yet. Just sayin
> > > that in this case, the Bayesian filter leans towards waiting a full dev
> > > cycle.
> > 
> > We don't consider it acceptable to allow known defects to persist in our
> > products just because they are bleeding edge.
> 
> I'm not letting this issue persist. Proper testing takes time.
> 
> The patch description and discussion around this change did not include
> any information about its pervasiveness and only a little about its
> severity. I used my best judgement and followed my usual rules, which
> are:
> 
> 1. Crashers, data corrupters, and security bugs with public bug reports
>    and confirmed fix effectiveness go in as quickly as we can test.
>    Note well that we have to balance the risk of introducing regressions
>    in this case, since going in quickly means the fix lacks significant
>    test experience.
> 
> 1a. Rashes and bug bites require application of topical hydrocortisone.
> 
> 2. Patches sit in nfsd-testing for at least two weeks; better if they
>    are there for four. I have CI running daily on that branch, and
>    sometimes it takes a while for a problem to surface and be noticed.
> 
> 3. Patches should sit in nfsd-next or nfsd-fixes for at least as long
>    as it takes for them to matriculate into linux-next and fs-next.
> 
> 4. If the patch fixes an issue that was introduced in the most recent
>    merge window, it goes in -fixes .
> 
> 5. If the patch fixes an issue that is already in released kernels
>    (and we are at rule 5 because the patch does not fix an immediate
>    issue) then it goes in -next .
> 
> These evidence-oriented guidelines are in place to ensure that we don't
> panic and rush commits into the kernel without careful review and
> testing. There have been plenty of times when a fix that was pushed
> urgently was not complete or even made things worse. It's a long
> pipeline on purpose.
> 
> The issues with this patch were:
> 
> - It was posted very late in the dev cycle for v6.16. (Jeff's urgent
>   fixes always seem to happen during -rc7 ;-)
> 
> - The Fixes: tag refers to a commit that was several releases ago, and
>   I am not aware of specific reports of anyone hitting a similar issue.
> 
> - IME, the adoption of enterprise distributions is slow. RHEL 10 is
>   still only on its GA release. Therefore my estimation is that the
>   number of potentially impacted customers will be small for some time,
>   enough time for us to test Jeff's fix appropriately.
> 
> - The issue did not appear to me to be severe, but maybe I didn't read
>   the patch description carefully enough.
> 
> - Although I respect, admire, and greatly appreciate the effort Jeff
>   made to nail this one, that does not mean it is a pervasive problem.
>   Jeff is quite capable of applying his own work to the kernels he and
>   his employer care about.
> 

The rules all make sense to me. In this case though, I felt the
potential harm from not taking this patch outweighed the risk. NUMA
hardware is more prevalent than you might think. 

> 
> > > > > Since this one came in late during the 6.16 dev cycle and the Fixes: tag
> > > > > references a commit that is already in released kernels, I put in the
> > > > > "next merge window" pile. On it's own it doesn't look urgent to me.
> > > > > 
> > > > 
> > > > I'd really like to see this go in soon and to stable. If you want me to
> > > > respin the changelog, I can. It's not a crash, but it manifests as lost
> > > > RPCs that just hang. It took me quite a while to figure out what was
> > > > going on, and I'd prefer that we not put users through that.
> > > 
> > > If someone can confirm that it is effective, I'll add it to nfsd-fixes.
> > 
> > I'm sure it is if Jeff spent time on it.
> > 
> > We're going to try to get this into RHEL-10 ASAP, because dropped RPCs
> > manifest as datacenter-wide problems that are very hard to diagnose.
> 
> It sounds like Red Hat also does not have clear evidence that links this
> patch to a specific failure experienced by your customers. This affirms
> my understanding that this fix is defensive rather than urgent.
> 
> As a rule, defensive fixes go in during merge windows.
> 
> 
> > Its a
> > real pain that we won't have an upstream commit assigned for it.
> 
> It's not reasonable for any upstream maintainer not employed by Red Hat
> to know about or cleave to Red Hat's internal processes. But, if an
> issue is on Red Hat's radar, then you are welcome to make its priority
> known to me so I can schedule fixes appropriately.
> 
> All that said, I've promoted the fix to nfsd-fixes, since it's narrow
> and has several weeks of test experience now.
> 

Many thanks!
-- 
Jeff Layton <jlayton@xxxxxxxxxx>