Dear NFS fellows, As part of research, we have adopted a well-known in the HPC community, IOR[1], to support libnfs[2]. After running a bunch of tests, our observation is that the multiple clients in userspace have a higher throughput than the in-kernel client (or server). In the test below, nfs server runs on RHEL9 with kernel 5.14.0-503.23.1.el9_5.x86_64 exporting /mnt. The results are in operations per second, thus, higher numbers are better. The client is an 80-core single host, running RHEL9 with kernel 5.14.0-427.26.1.el9_4.x86_64. We used NFSv3 in the test to eliminate NFSv4's open/close overhead on zero-byte files. TEST 1: libnfs ``` $ mpirun -n 128 --map-by :OVERSUBSCRIBE ./mdtest -a LIBNFS --libnfs.url='nfs://lab008/mnt/?uid=0&gid=0&version=3' -w 0 -I 128 -i 10 -z 0 -b 0 -F -d /test -- started at 04/03/2025 14:39:30 -- mdtest-4.1.0+dev was launched with 128 total task(s) on 1 node(s) Command line used: ./mdtest '-a' 'LIBNFS' '--libnfs.url=nfs://lab008/mnt/version=3' '-w' '0' '-I' '128' '-i' '10' '-z' '0' '-b' '0' '-F' '-d' '/test' Nodemap: 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 Path : /test FS : 38.2 GiB Used FS: 41.3% Inodes: 2.4 Mi Used Inodes: 5.8% 128 tasks, 16384 files SUMMARY rate (in ops/sec): (of 10 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation 7147.432 6789.531 6996.044 132.149 File stat 97175.603 57844.142 91063.340 12000.718 File read 97004.685 48234.620 89099.077 14715.699 File removal 25172.919 23405.880 24424.384 577.264 Tree creation 2375.031 555.537 1982.139 561.013 Tree removal 99.443 95.475 97.632 1.266 -- finished at 04/03/2025 14:40:05 -- ``` TEST 2: in-kernel client ``` $ mpirun -n 128 --map-by :OVERSUBSCRIBE ./mdtest -w 0 -I 128 -i 10 -z 0 -b 0 -F -d /mnt/test -- started at 04/03/2025 14:36:09 -- mdtest-4.1.0+dev was launched with 128 total task(s) on 1 node(s) Nodemap: 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 Path : /mnt/test FS : 38.2 GiB Used FS: 41.3% Inodes: 2.4 Mi Used Inodes: 5.8% 128 tasks, 16384 files SUMMARY rate (in ops/sec): (of 10 iterations) Operation Max Min Mean Std Dev --------- --- --- ---- ------- File creation 2301.914 2046.406 2203.859 88.793 File stat 101396.240 77386.014 91270.677 6229.657 File read 43631.081 36858.229 40800.066 2534.255 File removal 3102.328 2647.649 2840.170 153.959 Tree creation 2142.137 253.739 1710.416 620.293 Tree removal 42.922 25.670 36.604 4.820 -- finished at 04/03/2025 14:38:28 -- ``` Obviously, the kernel client shares the TCP connection. So, either (a) this is an expected behavior; (b) client thread starvation; and (c) server thread starvation. The last option is unlikely, as we first observed the behavior with the dCache NFS server implementation before falling back to the linux kernel nfsd. Best regards, Tigran. [1]: https://github.com/hpc/ior [2]: https://github.com/sahlberg/libnfs ----------------------------- DESY-IT, Scientific Computing
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature