On Fri, 2025-06-13 at 10:56 -0400, Chuck Lever wrote: > On 6/13/25 7:33 AM, Benjamin Coddington wrote: > > On 12 Jun 2025, at 12:42, Chuck Lever wrote: > > > > > On 6/12/25 12:15 PM, Jeff Layton wrote: > > > > On Thu, 2025-06-12 at 12:05 -0400, Chuck Lever wrote: > > > > > On 6/12/25 11:57 AM, Jeff Layton wrote: > > > > > > On Tue, 2025-05-27 at 20:12 -0400, Jeff Layton wrote: > > > > > > > The old nfsdfs interface for starting a server with multiple pools > > > > > > > handles the special case of a single entry array passed down from > > > > > > > userland by distributing the threads over every NUMA node. > > > > > > > > > > > > > > The netlink control interface however constructs an array of length > > > > > > > nfsd_nrpools() and fills any unprovided slots with 0's. This behavior > > > > > > > defeats the special casing that the old interface relies on. > > > > > > > > > > > > > > Change nfsd_nl_threads_set_doit() to pass down the array from userland > > > > > > > as-is. > > > > > > > > > > > > > > Fixes: 7f5c330b2620 ("nfsd: allow passing in array of thread counts via netlink") > > > > > > > Reported-by: Mike Snitzer <snitzer@xxxxxxxxxx> > > > > > > > Closes: https://lore.kernel.org/linux-nfs/aDC-ftnzhJAlwqwh@xxxxxxxxxx/ > > > > > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > > > > > > > --- > > > > > > > fs/nfsd/nfsctl.c | 5 ++--- > > > > > > > 1 file changed, 2 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c > > > > > > > index ac265d6fde35df4e02b955050f5b0ef22e6e519c..22101e08c3e80350668e94c395058bc228b08e64 100644 > > > > > > > --- a/fs/nfsd/nfsctl.c > > > > > > > +++ b/fs/nfsd/nfsctl.c > > > > > > > @@ -1611,7 +1611,7 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb, > > > > > > > */ > > > > > > > int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info) > > > > > > > { > > > > > > > - int *nthreads, count = 0, nrpools, i, ret = -EOPNOTSUPP, rem; > > > > > > > + int *nthreads, nrpools = 0, i, ret = -EOPNOTSUPP, rem; > > > > > > > struct net *net = genl_info_net(info); > > > > > > > struct nfsd_net *nn = net_generic(net, nfsd_net_id); > > > > > > > const struct nlattr *attr; > > > > > > > @@ -1623,12 +1623,11 @@ int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info) > > > > > > > /* count number of SERVER_THREADS values */ > > > > > > > nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) { > > > > > > > if (nla_type(attr) == NFSD_A_SERVER_THREADS) > > > > > > > - count++; > > > > > > > + nrpools++; > > > > > > > } > > > > > > > > > > > > > > mutex_lock(&nfsd_mutex); > > > > > > > > > > > > > > - nrpools = max(count, nfsd_nrpools(net)); > > > > > > > nthreads = kcalloc(nrpools, sizeof(int), GFP_KERNEL); > > > > > > > if (!nthreads) { > > > > > > > ret = -ENOMEM; > > > > > > > > > > > > I noticed that this didn't go in to the recent merge window. > > > > > > > > > > > > This patch fixes a rather nasty regression when you try to start the > > > > > > server on a NUMA-capable box. > > > > > > > > > > The NFSD netlink interface is not broadly used yet, is it? > > > > > > > > > > > > > It is. RHEL10 shipped with it, for instance and it's been in Fedora for > > > > a while. > > > > > > RHEL 10 is shiny and new, and Fedora is bleeding edge. It's not likely > > > either of these are deployed in production environments yet. Just sayin > > > that in this case, the Bayesian filter leans towards waiting a full dev > > > cycle. > > > > We don't consider it acceptable to allow known defects to persist in our > > products just because they are bleeding edge. > > I'm not letting this issue persist. Proper testing takes time. > > The patch description and discussion around this change did not include > any information about its pervasiveness and only a little about its > severity. I used my best judgement and followed my usual rules, which > are: > > 1. Crashers, data corrupters, and security bugs with public bug reports > and confirmed fix effectiveness go in as quickly as we can test. > Note well that we have to balance the risk of introducing regressions > in this case, since going in quickly means the fix lacks significant > test experience. > > 1a. Rashes and bug bites require application of topical hydrocortisone. > > 2. Patches sit in nfsd-testing for at least two weeks; better if they > are there for four. I have CI running daily on that branch, and > sometimes it takes a while for a problem to surface and be noticed. > > 3. Patches should sit in nfsd-next or nfsd-fixes for at least as long > as it takes for them to matriculate into linux-next and fs-next. > > 4. If the patch fixes an issue that was introduced in the most recent > merge window, it goes in -fixes . > > 5. If the patch fixes an issue that is already in released kernels > (and we are at rule 5 because the patch does not fix an immediate > issue) then it goes in -next . > > These evidence-oriented guidelines are in place to ensure that we don't > panic and rush commits into the kernel without careful review and > testing. There have been plenty of times when a fix that was pushed > urgently was not complete or even made things worse. It's a long > pipeline on purpose. > > The issues with this patch were: > > - It was posted very late in the dev cycle for v6.16. (Jeff's urgent > fixes always seem to happen during -rc7 ;-) > > - The Fixes: tag refers to a commit that was several releases ago, and > I am not aware of specific reports of anyone hitting a similar issue. > > - IME, the adoption of enterprise distributions is slow. RHEL 10 is > still only on its GA release. Therefore my estimation is that the > number of potentially impacted customers will be small for some time, > enough time for us to test Jeff's fix appropriately. > > - The issue did not appear to me to be severe, but maybe I didn't read > the patch description carefully enough. > > - Although I respect, admire, and greatly appreciate the effort Jeff > made to nail this one, that does not mean it is a pervasive problem. > Jeff is quite capable of applying his own work to the kernels he and > his employer care about. > The rules all make sense to me. In this case though, I felt the potential harm from not taking this patch outweighed the risk. NUMA hardware is more prevalent than you might think. > > > > > > Since this one came in late during the 6.16 dev cycle and the Fixes: tag > > > > > references a commit that is already in released kernels, I put in the > > > > > "next merge window" pile. On it's own it doesn't look urgent to me. > > > > > > > > > > > > > I'd really like to see this go in soon and to stable. If you want me to > > > > respin the changelog, I can. It's not a crash, but it manifests as lost > > > > RPCs that just hang. It took me quite a while to figure out what was > > > > going on, and I'd prefer that we not put users through that. > > > > > > If someone can confirm that it is effective, I'll add it to nfsd-fixes. > > > > I'm sure it is if Jeff spent time on it. > > > > We're going to try to get this into RHEL-10 ASAP, because dropped RPCs > > manifest as datacenter-wide problems that are very hard to diagnose. > > It sounds like Red Hat also does not have clear evidence that links this > patch to a specific failure experienced by your customers. This affirms > my understanding that this fix is defensive rather than urgent. > > As a rule, defensive fixes go in during merge windows. > > > > Its a > > real pain that we won't have an upstream commit assigned for it. > > It's not reasonable for any upstream maintainer not employed by Red Hat > to know about or cleave to Red Hat's internal processes. But, if an > issue is on Red Hat's radar, then you are welcome to make its priority > known to me so I can schedule fixes appropriately. > > All that said, I've promoted the fix to nfsd-fixes, since it's narrow > and has several weeks of test experience now. > Many thanks! -- Jeff Layton <jlayton@xxxxxxxxxx>