Re: non-stop kworker NFS/RPC write traffic even after unmount

Rik Theys <Rik.Theys@xxxxxxxxxxxxxxxx> · Fri, 16 May 2025 11:47:52 +0200

Hi,

On 5/16/25 8:17 AM, Rik Theys wrote:
Hi,

On 5/16/25 7:51 AM, Rik Theys wrote:
Hi,

On 4/18/25 3:31 PM, Daniel Kobras wrote:
Hi Rik!

Am 01.04.25 um 14:15 schrieb Rik Theys:
On 4/1/25 2:05 PM, Daniel Kobras wrote:
Am 15.12.24 um 13:38 schrieb Rik Theys:
Suddenly, a number of clients start to send an abnormal amount of 
NFS traffic to the server that saturates their link and never 
seems to stop. Running iotop on the clients shows kworker- 
{rpciod,nfsiod,xprtiod} processes generating the write traffic. 
On the server side, the system seems to process the traffic as 
the disks are processing the write requests.

This behavior continues even after stopping all user processes on 
the clients and unmounting the NFS mount on the client. Is this 
normal? I was under the impression that once the NFS mount is 
unmounted no further traffic to the server should be visible?

I'm currently looking at an issue that resembles your description 
above (excess traffic to the server for data that was already 
written and committed), and part of the packet capture also looks 
roughly similar to what you've sent in a followup. Before I dig 
any deeper: Did you manage to pinpoint or resolve the problem in 
the meantime?

Our server is currently running the 6.12 LTS kernel and we haven't 
had this specific issue any more. But we were never able to 
reproduce it, so unfortunately I can't say for sure if it's fixed, 
or what fixed it :-/.

Thanks for the update! Indeed, in the meantime the affected 
environment here stopped showing the reported behavior as well after 
a few days, and I don't have a clear indication what might have been 
the fix, either.

When the issue still occurred, it could (once) be provoked by dd'ing 
4GB of /dev/zero to a test file on an NFSv4.2 mount. The network 
trace shows that the file is completely written at wire speed. But 
after a five second pause, the client then starts sending the same 
file again in smaller chunks of a few hundred MB at five second 
intervals. So it appears that the file's pages are 
background-flushed to storage again, even though they've already 
been written out. On the NFS layer, none of the passes look 
conspicuous to me: WRITE and COMMIT operations all get NFS4_OK'ed by 
the server.

Which kernel version(s) are your server and clients running?

The systems in the affected environment run Debian-packaged kernels. 
The servers are on Debian's 6.1.0-32 which corresponds to upstream's 
6.1.129. The issues was seen on clients running the same kernel 
version, but also on older systems running Debian's 5.10.0-33, 
corresponding to 5.10.226 upstream. I've skimmed the list of patches 
that went into either of these kernel versions, but nothing stood 
out as clearly related.

Our server and clients are currently showing the same behavior again: 
clients are sending abnormal amounts of write traffic to the NFS 
server and the server is actually processing it as the writes end up 
on the disk (which fills up our replication journals). iotop shows 
that the kworker-{rpciod,nfsiod,xprtiod} are responsible for this 
traffic. A reboot of the server does not solve the issue. Also 
rebooting individual clients that are participating in this does not 
help. After a few minutes of user traffic they show the same behavior 
again. We also see this on multiple clients at the same time.

The NFS operations that are being sent are mostly putfh, sequence and 
getattr.

The server is running upstream 6.12.25 and the clients are running 
Rocky 8 (4.18.0-553.51.1.el8_10) and 9 (5.14.0-503.38.1.el9_5).

What are some of the steps we can take to debug the root cause of 
this? Any idea on how to stop this traffic flood?

I took a tcpdump on one of the clients that was doing this. The pcap 
was stored on the local disk of the server. When I tried to copy the 
pcap to our management server over scp it now hangs at 95%. The target 
disk on the management server is also an NFS mount of the affected 
server. The scp had copied 565MB and our management server has now 
also started to flood the server with non-stop traffic (basically 
saturating its link).

The management server is running Debian's 6.1.135 kernel.

It seems that once a client has triggered some bad state in the 
server, other clients that write a large file to the server also start 
to participate in this behavior. Rebooting the server does not seem to 
help as the same state is triggered almost immediately again by some 
client.

Now that the server is in this state, I can very easily reproduce this 
on a client. I've installed the 6.14.6 kernel on a Rocky 9 client.

1. On a different machine, create an empty 3M file using "dd 
if=/dev/zero of=3M bs=3M count=1"

2. Reboot the Rocky 9 client and log in as root. Verify that there are 
no active NFS mounts to the server. Start dstat and watch the output.

3. From the machine where you created the 3M file, scp the 3M file to 
the Rocky 9 client in a location that is an NFS mount of the server. In 
this case it's my home directory which is automounted.

The file copies normally, but when you look at the amount of data 
transferred out of the client to the server it seems more than the 3M 
file size.

But then in ~30s intervals, the client sends another 3M to the server, 
which seems to be the same data. 30s later again. Even if you unmount 
the server in between these 30s intervals, the client will still send 
it! When the nfs share is unmounted, the tcp connection seems to remain 
open (as can also be seen in the client info file on the server).

You can find a pcap of this at 
https://homes.esat.kuleuven.be/~rtheys/orval-3M.pcap

At 11:00:51 I trigger the NFS mount because it tries to read my 
authorized_keys file.

At 11:01:05 I hit return on the password prompt and the copy starts

At 11:01:06 the copy ends and the NFS servers sends a reply to the 
COMPOUND (SEQUENCE, PUTFH, CLOSE) call. Only warning I see in wireshark 
is that the server sends a StateID on the close operation and according 
to the wireshark dissector this is deprecated. The StateID is 
ffffffff000000000000000000000000 so maybe a fixed value. What follows 
are some DELEGRETURN calls.

At 11:01:37, the client sends the data again.

At 11:02:07, the client sends the data again.

At 11:02:38, the client sends the data again.

At 11:02:47, I unmount the nfs share on the client.

At 11:03:09, the client sends the data again.

Regards,

Rik

--
Rik Theys
System Engineer
KU Leuven - Dept. Elektrotechniek (ESAT)
Kasteelpark Arenberg 10 bus 2440  - B-3001 Leuven-Heverlee
+32(0)16/32.11.07
----------------------------------------------------------------
<<Any errors in spelling, tact or fact are transmission errors>>