[for-6.16-final PATCH 0/9] NFSD/NFS/LOCALIO: stable fixes and revert 6.16 LOCALIO changes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Apologies for so many words...]

Hi,

I wanted to get this on all the NFS and NFSD maintainers' radar ASAP.

I realize the timing of this is not great due to how late we are in
the 6.16 release cycle (v6.16-rc7).  But I feel it prudent to make it
clear that the LOCALIO changes that went upstream during the 6.16 merge
window are unstable under load.  So this week we'll need to make a
call on how to handle this for v6.16 final.

And just FYI: I unfortunately don't have time this week to assist with
developing/testing a smaller fix to solve this situation.  The window
for extensive testing (by myself and others at Hammerspace) was late
last week.  At this point, given we are short on time, reverting is
the sane thing to do.

Also, the 6.16-rc7 release's LOCALIO changes put it on something of an
island relative to more enterprise production kernels I am helping to
maintain (both the RHEL10 kernel and Oracle's OCI kernel, which is
actually an Ubuntu kernel, both have NFS LOCALIO that is 6.14 based).

All that said:

The past few weeks I had to assist with an HPC benchmarking effort
that generates heavy load using the "MLperf" benchmark suite. Testing
was done on 10 enterprise grade NVMe storage systems (each with 48
CPUs, and 8 NVMe devices) that depend on LOCALIO to "just work
_well_" to achieve a favorable score.  Unfortunately LOCALIO didn't,
so I got to reverting. I started with this partial revert patch but it
wasn't enough (it just made the problem harder to hit), labeling this
previous revert proposal as "RFC" rather than "URGENT" was a mistake:
https://lore.kernel.org/linux-nfs/aG0pJXVtApZ9C5vy@xxxxxxxxxx/
(which is very similar to patch 2 in this series)

It wasn't until I did a full revert of 6.16's LOCALIO changes that
LOCALIO stopped having resource leaks (nfsd_file in particular) that
prevented proper NFSD shutdown and the inability to unload nfsd.ko.ko
(which I had to do a lot of while developing other NFS and NFSD
changes that were unrelated to LOCALIO).

Neil, I value the work you did to try to address the lingering
complaints about RCU related compiler errors in LOCALIO (but when you
posted your changes months ago I didn't have time to review, and then
they went upstream; so I assumed they were ready and made sure to
include them in Hammerspace's more recent kernels so that I could gain
"production" confidence in the changes even though I still hadn't had
time to review them properly.. ugh).  Glad "we" did this heavy load
testing because otherwise we'd be oblivious about LOCALIO changes
merged for 6.16 causing regression. (I'm sending this later on my
Sunday evening in the hopes that you being in Australia enables us to
not lose a day of communication on this situation).

Patch 2 gets into how simple it is to trigger the nfsd_file leaks
resulting from running fio followed by NFSD shutdown and nfsd.ko
module removal.

Regards,
Mike

Mike Snitzer (9):
  Revert "NFSD: Clean up kdoc for nfsd_open_local_fh()"
  Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer"
  Revert "nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()"
  Revert "nfs_localio: duplicate nfs_close_local_fh()"
  Revert "nfs_localio: simplify interface to nfsd for getting nfsd_file"
  Revert "nfs_localio: always hold nfsd net ref with nfsd_file ref"
  Revert "nfs_localio: use cmpxchg() to install new nfs_file_localio"
  nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local()
  nfs/localio: add localio_async_probe modparm

 fs/nfs/localio.c           | 64 ++++++++++++++++--------
 fs/nfs_common/nfslocalio.c | 99 +++++++++++++-------------------------
 fs/nfsd/filecache.c        | 34 ++-----------
 fs/nfsd/filecache.h        |  3 +-
 fs/nfsd/localio.c          | 44 ++---------------
 include/linux/nfslocalio.h | 26 +++++-----
 6 files changed, 100 insertions(+), 170 deletions(-)

-- 
2.44.0





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux