On 2025-04-11 15:29, Benjamin Coddington wrote:
On 10 Apr 2025, at 8:55, Holger Hoffstätte wrote:
...
Does this behaviour seem familiar to anybody?
I realize this is "not a bug" (all data is safe and sound etc.) but
somehow it seems that various layers are not working together as one
might expect. It's possible that my expectation is wrong. :)
My first impression is that the writes are being processed out-of-order on
the server, so XFS is using a range of allocation sizes while growing the
file extents.
Thanks for reading!
Yes, that's true but not the cause of the re-ordering - the server just
does what it's being told. The reordering has to come from the client in
async mode.
Compare the following packet traces, first with a sync mount on the client:
holger>tshark -r client-sync.pcap | grep Offset | head -n 20
148 6.349609085 192.168.100.221 → 192.168.100.222 NFS 1826 V4 Call WRITE StateID: 0x7cf4 Offset: 0 Len: 262144
223 6.351592937 192.168.100.221 → 192.168.100.222 NFS 38026 V4 Call WRITE StateID: 0x7cf4 Offset: 262144 Len: 262144
329 6.354313478 192.168.100.221 → 192.168.100.222 NFS 24994 V4 Call WRITE StateID: 0x7cf4 Offset: 524288 Len: 262144
452 6.357495813 192.168.100.221 → 192.168.100.222 NFS 13410 V4 Call WRITE StateID: 0x7cf4 Offset: 786432 Len: 262144
534 6.359567456 192.168.100.221 → 192.168.100.222 NFS 38026 V4 Call WRITE StateID: 0x7cf4 Offset: 1048576 Len: 262144
638 6.362138118 192.168.100.221 → 192.168.100.222 NFS 26442 V4 Call WRITE StateID: 0x7cf4 Offset: 1310720 Len: 262144
762 6.365241741 192.168.100.221 → 192.168.100.222 NFS 13410 V4 Call WRITE StateID: 0x7cf4 Offset: 1572864 Len: 262144
866 6.367883641 192.168.100.221 → 192.168.100.222 NFS 1826 V4 Call WRITE StateID: 0x7cf4 Offset: 1835008 Len: 262144
962 6.370450322 192.168.100.221 → 192.168.100.222 NFS 3274 V4 Call WRITE StateID: 0x7cf4 Offset: 2097152 Len: 262144
1035 6.372527063 192.168.100.221 → 192.168.100.222 NFS 13410 V4 Call WRITE StateID: 0x7cf4 Offset: 2359296 Len: 262144
1132 6.375268418 192.168.100.221 → 192.168.100.222 NFS 13410 V4 Call WRITE StateID: 0x7cf4 Offset: 2621440 Len: 262144
1229 6.377913391 192.168.100.221 → 192.168.100.222 NFS 13410 V4 Call WRITE StateID: 0x7cf4 Offset: 2883584 Len: 262144
1326 6.380523792 192.168.100.221 → 192.168.100.222 NFS 13410 V4 Call WRITE StateID: 0x7cf4 Offset: 3145728 Len: 262144
1423 6.383116733 192.168.100.221 → 192.168.100.222 NFS 13410 V4 Call WRITE StateID: 0x7cf4 Offset: 3407872 Len: 262144
1519 6.385729789 192.168.100.221 → 192.168.100.222 NFS 13410 V4 Call WRITE StateID: 0x7cf4 Offset: 3670016 Len: 262144
1616 6.388442438 192.168.100.221 → 192.168.100.222 NFS 13410 V4 Call WRITE StateID: 0x7cf4 Offset: 3932160 Len: 262144
1713 6.391023297 192.168.100.221 → 192.168.100.222 NFS 13410 V4 Call WRITE StateID: 0x7cf4 Offset: 4194304 Len: 262144
1809 6.393621476 192.168.100.221 → 192.168.100.222 NFS 13410 V4 Call WRITE StateID: 0x7cf4 Offset: 4456448 Len: 262144
1906 6.396189134 192.168.100.221 → 192.168.100.222 NFS 13410 V4 Call WRITE StateID: 0x7cf4 Offset: 4718592 Len: 262144
2004 6.398795974 192.168.100.221 → 192.168.100.222 NFS 13410 V4 Call WRITE StateID: 0x7cf4 Offset: 4980736 Len: 262144
The offset is increasing linearly.
Now with the regular async mount:
holger>tshark -r client-async.pcap | grep Offset | head -n 20
349 1.939181198 192.168.100.221 → 192.168.100.222 NFS 62330 V4 Call WRITE StateID: 0x9abe Offset: 0 Len: 1048576
713 1.948199991 192.168.100.221 → 192.168.100.222 NFS 62330 V4 Call WRITE StateID: 0x9abe Offset: 1048576 Len: 1048576
1078 1.957190498 192.168.100.221 → 192.168.100.222 NFS 62330 V4 Call WRITE StateID: 0x9abe Offset: 2097152 Len: 1048576
1438 1.966163614 192.168.100.221 → 192.168.100.222 NFS 62330 V4 Call WRITE StateID: 0x9abe Offset: 3145728 Len: 1048576
1781 1.974626188 192.168.100.221 → 192.168.100.222 NFS 62330 V4 Call WRITE StateID: 0x9abe Offset: 528482304 Len: 1048576
2140 1.983642048 192.168.100.221 → 192.168.100.222 NFS 62330 V4 Call WRITE StateID: 0x9abe Offset: 4194304 Len: 1048576
2499 1.992622009 192.168.100.221 → 192.168.100.222 NFS 62330 V4 Call WRITE StateID: 0x9abe Offset: 529530880 Len: 1048576
2862 2.001627951 192.168.100.221 → 192.168.100.222 NFS 62330 V4 Call WRITE StateID: 0x9abe Offset: 5242880 Len: 1048576
3224 2.010608959 192.168.100.221 → 192.168.100.222 NFS 62330 V4 Call WRITE StateID: 0x9abe Offset: 530579456 Len: 1048576
3584 2.019595764 192.168.100.221 → 192.168.100.222 NFS 62330 V4 Call WRITE StateID: 0x9abe Offset: 7340032 Len: 1048576
3951 2.028614348 192.168.100.221 → 192.168.100.222 NFS 62330 V4 Call WRITE StateID: 0x9abe Offset: 531628032 Len: 1048576
4290 2.037056179 192.168.100.221 → 192.168.100.222 NFS 62330 V4 Call WRITE StateID: 0x9abe Offset: 6291456 Len: 1048576
4656 2.046079372 192.168.100.221 → 192.168.100.222 NFS 62330 V4 Call WRITE StateID: 0x9abe Offset: 532676608 Len: 1048576
5029 2.055044526 192.168.100.221 → 192.168.100.222 NFS 62330 V4 Call WRITE StateID: 0x9abe Offset: 8388608 Len: 1048576
5399 2.064067160 192.168.100.221 → 192.168.100.222 NFS 62330 V4 Call WRITE StateID: 0x9abe Offset: 533725184 Len: 1048576
5772 2.073040626 192.168.100.221 → 192.168.100.222 NFS 62330 V4 Call WRITE StateID: 0x9abe Offset: 9437184 Len: 1048576
6142 2.082045031 192.168.100.221 → 192.168.100.222 NFS 62330 V4 Call WRITE StateID: 0x9abe Offset: 534773760 Len: 1048576
6497 2.090504113 192.168.100.221 → 192.168.100.222 NFS 62330 V4 Call WRITE StateID: 0x9abe Offset: 10485760 Len: 1048576
6868 2.099502093 192.168.100.221 → 192.168.100.222 NFS 62330 V4 Call WRITE StateID: 0x9abe Offset: 535822336 Len: 1048576
7229 2.108506848 192.168.100.221 → 192.168.100.222 NFS 62330 V4 Call WRITE StateID: 0x9abe Offset: 11534336 Len: 1048576
Offset is all over the place, with a write at 504 MB after the first 3.
So far I tried:
- disabling the write congestion handling (client side)
- increased dirty writeback thresholds to unreasonable levels: >30 seconds,
several GBs
Except for bloating writeback none of this really made a difference.
I've been poking around in nfs/write.c and nfs/pagelist.c but unfortunately
did not really get far. Looking at the async case above it *looks* as if
multiple writebacks are initiated and interleaved, or at least their page
ranges. That's with vm.dirty_background_bytes=536870912 (512 MB) - could be
a coincidence.
If it helps I can make the client packet traces available for download;
I didn't see anything interesting or unusual in there. The server traces
match the client traces 1:1 wrt. to Offset handling, so they are not really
interesting.
cheers
Holger