Hi,
On 4/18/25 3:31 PM, Daniel Kobras wrote:
Hi Rik!
Am 01.04.25 um 14:15 schrieb Rik Theys:
On 4/1/25 2:05 PM, Daniel Kobras wrote:
Am 15.12.24 um 13:38 schrieb Rik Theys:
Suddenly, a number of clients start to send an abnormal amount of
NFS traffic to the server that saturates their link and never seems
to stop. Running iotop on the clients shows kworker-
{rpciod,nfsiod,xprtiod} processes generating the write traffic. On
the server side, the system seems to process the traffic as the
disks are processing the write requests.
This behavior continues even after stopping all user processes on
the clients and unmounting the NFS mount on the client. Is this
normal? I was under the impression that once the NFS mount is
unmounted no further traffic to the server should be visible?
I'm currently looking at an issue that resembles your description
above (excess traffic to the server for data that was already
written and committed), and part of the packet capture also looks
roughly similar to what you've sent in a followup. Before I dig any
deeper: Did you manage to pinpoint or resolve the problem in the
meantime?
Our server is currently running the 6.12 LTS kernel and we haven't
had this specific issue any more. But we were never able to reproduce
it, so unfortunately I can't say for sure if it's fixed, or what
fixed it :-/.
Thanks for the update! Indeed, in the meantime the affected
environment here stopped showing the reported behavior as well after a
few days, and I don't have a clear indication what might have been the
fix, either.
When the issue still occurred, it could (once) be provoked by dd'ing
4GB of /dev/zero to a test file on an NFSv4.2 mount. The network trace
shows that the file is completely written at wire speed. But after a
five second pause, the client then starts sending the same file again
in smaller chunks of a few hundred MB at five second intervals. So it
appears that the file's pages are background-flushed to storage again,
even though they've already been written out. On the NFS layer, none
of the passes look conspicuous to me: WRITE and COMMIT operations all
get NFS4_OK'ed by the server.
Which kernel version(s) are your server and clients running?
The systems in the affected environment run Debian-packaged kernels.
The servers are on Debian's 6.1.0-32 which corresponds to upstream's
6.1.129. The issues was seen on clients running the same kernel
version, but also on older systems running Debian's 5.10.0-33,
corresponding to 5.10.226 upstream. I've skimmed the list of patches
that went into either of these kernel versions, but nothing stood out
as clearly related.
Our server and clients are currently showing the same behavior again:
clients are sending abnormal amounts of write traffic to the NFS server
and the server is actually processing it as the writes end up on the
disk (which fills up our replication journals). iotop shows that the
kworker-{rpciod,nfsiod,xprtiod} are responsible for this traffic. A
reboot of the server does not solve the issue. Also rebooting individual
clients that are participating in this does not help. After a few
minutes of user traffic they show the same behavior again. We also see
this on multiple clients at the same time.
The NFS operations that are being sent are mostly putfh, sequence and
getattr.
The server is running upstream 6.12.25 and the clients are running Rocky
8 (4.18.0-553.51.1.el8_10) and 9 (5.14.0-503.38.1.el9_5).
What are some of the steps we can take to debug the root cause of this?
Any idea on how to stop this traffic flood?
Regards,
Rik
--
Rik Theys
System Engineer
KU Leuven - Dept. Elektrotechniek (ESAT)
Kasteelpark Arenberg 10 bus 2440 - B-3001 Leuven-Heverlee
+32(0)16/32.11.07
----------------------------------------------------------------
<<Any errors in spelling, tact or fact are transmission errors>>