Memory reclaim and high nfsd usage

Rik Theys <rik.theys@xxxxxxxxx> · Mon, 31 Mar 2025 21:05:54 +0200

Hi,

Our fileserver is currently running 6.12.13 with the following 3
patches (from nfsd-testing) applied to it:

- fix-decoding-in-nfs4_xdr_dec_cb_getattr
- skip-sending-CB_RECALL_ANY
- fix-cb_getattr_status-fix

Frequently the load on the system goes up and top shows a lot of
kswapd and kcompact threads next to nfsd threads. During these period
(which can last for hours), users complain about very slow NFS access.
We have approx 260 systems connecting to this server and the number of
nfs client states (from the states files in the clients directory) are
around 200000.

When I look at our monitoring logs, the system has frequent direct
reclaim stalls (allocstall_movable, and some allocstall_normal) and
pgscan_kswapd goes up to ~10000000. The kswapd_low_wmark_hit_quickly
is about 50. So it seems the system is out of memory and is constantly
trying to free pages? If I understand it correctly the system hits a
threshold which makes it scan for pages to free, frees some pages and
when it stops it very quickly hits the low watermark again?

But the system has over 150G of memory dedicated to cache, and
slab_reclaim is only about 16G. Why is the system not dropping more
caches to free memory instead of constantly looking to free memory? Is
there a tunable that we can set so the system will prefer to drop
caches and increase memory usage for other nfsd related things? Any
tips on how to debug where the memory pressure is coming from, or why
the system decides to keep the pages used for cache instead of freeing
some of those?

I've ran a perf record for 10s and the top 4 of the events seem to be:

1. 54% is swapper in intel_idle_ibrs
2. 12% is swapper in intel_idle
3. 7.43% is nfsd in native_queued_spin_lock_slowpath:
4. 5% is kswapd0 in __list_del_entry_valid_or_report

Are there any know memory management changes related to NFS that have
been introduced that could explain this behavior? What steps can I
take to debug the root cause of this? Looking at iftop there isn't
much going on regarding throughput. The top 3 NFS4 server operations
are sequence 9563/s), putfh(9032/s) and getattr (7150/s).

Regards,
Rik