Best regards, Tigran. ----- Original Message ----- > From: "NeilBrown" <neil@xxxxxxxxxx> > To: "Tigran Mkrtchyan" <tigran.mkrtchyan@xxxxxxx> > Cc: "linux-nfs" <linux-nfs@xxxxxxxxxxxxxxx> > Sent: Friday, 4 April, 2025 05:14:46 > Subject: Re: NFS client low performance in concurrent environment. > On Fri, 04 Apr 2025, Mkrtchyan, Tigran wrote: >> Dear NFS fellows, >> >> As part of research, we have adopted a well-known in the HPC community, IOR[1], >> to support libnfs[2]. After running a bunch of tests, our observation is that >> the >> multiple clients in userspace have a higher throughput than the in-kernel >> client (or server). >> >> In the test below, nfs server runs on RHEL9 with kernel >> 5.14.0-503.23.1.el9_5.x86_64 >> exporting /mnt. The results are in operations per second, thus, higher numbers >> are better. >> >> The client is an 80-core single host, running RHEL9 with kernel >> 5.14.0-427.26.1.el9_4.x86_64. >> We used NFSv3 in the test to eliminate NFSv4's open/close overhead on zero-byte >> files. >> >> >> TEST 1: libnfs >> ``` >> $ mpirun -n 128 --map-by :OVERSUBSCRIBE ./mdtest -a LIBNFS >> --libnfs.url='nfs://lab008/mnt/?uid=0&gid=0&version=3' -w 0 -I 128 -i 10 -z 0 >> -b 0 -F -d /test >> -- started at 04/03/2025 14:39:30 -- >> >> mdtest-4.1.0+dev was launched with 128 total task(s) on 1 node(s) >> Command line used: ./mdtest '-a' 'LIBNFS' >> '--libnfs.url=nfs://lab008/mnt/version=3' '-w' '0' '-I' '128' '-i' '10' '-z' >> '0' '-b' '0' '-F' '-d' '/test' >> Nodemap: >> 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 >> Path : /test >> FS : 38.2 GiB Used FS: 41.3% Inodes: 2.4 Mi Used Inodes: >> 5.8% >> 128 tasks, 16384 files >> >> SUMMARY rate (in ops/sec): (of 10 iterations) >> Operation Max Min Mean Std Dev >> --------- --- --- ---- ------- >> File creation 7147.432 6789.531 6996.044 >> 132.149 >> File stat 97175.603 57844.142 91063.340 >> 12000.718 >> File read 97004.685 48234.620 89099.077 >> 14715.699 >> File removal 25172.919 23405.880 24424.384 >> 577.264 >> Tree creation 2375.031 555.537 1982.139 >> 561.013 >> Tree removal 99.443 95.475 97.632 >> 1.266 >> -- finished at 04/03/2025 14:40:05 -- >> ``` >> >> >> TEST 2: in-kernel client >> ``` >> $ mpirun -n 128 --map-by :OVERSUBSCRIBE ./mdtest -w 0 -I 128 -i 10 -z 0 -b 0 >> -F -d /mnt/test >> -- started at 04/03/2025 14:36:09 -- >> >> mdtest-4.1.0+dev was launched with 128 total task(s) on 1 node(s) >> Nodemap: >> 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 >> Path : /mnt/test >> FS : 38.2 GiB Used FS: 41.3% Inodes: 2.4 Mi Used Inodes: >> 5.8% >> 128 tasks, 16384 files >> >> SUMMARY rate (in ops/sec): (of 10 iterations) >> Operation Max Min Mean Std Dev >> --------- --- --- ---- ------- >> File creation 2301.914 2046.406 2203.859 >> 88.793 >> File stat 101396.240 77386.014 91270.677 >> 6229.657 >> File read 43631.081 36858.229 40800.066 >> 2534.255 >> File removal 3102.328 2647.649 2840.170 >> 153.959 >> Tree creation 2142.137 253.739 1710.416 >> 620.293 >> Tree removal 42.922 25.670 36.604 >> 4.820 >> -- finished at 04/03/2025 14:38:28 -- >> ``` >> >> >> Obviously, the kernel client shares the TCP connection. So, either (a) this is >> an expected behavior; >> (b) client thread starvation; and (c) server thread starvation. The last option >> is unlikely, as we >> first observed the behavior with the dCache NFS server implementation before >> falling back to >> the linux kernel nfsd. > > If you think "kernel client share the TCP connection" then it would be > worth adding the "nconnect=8" option to see if that makes a difference. > > If all these file operations are happening in the one directory then the > problem is probably contention on the directory lock. The Linux VFS > holds an exclusive lock on the directory while creating or removing any > files in that directory. If you can shard the operations over multiple > directories you can ease the contention. > > I am working on removing the dependency on the directory lock, but I > don't have a patch for you to try - unless you are happy to work on a > three-year old kernel > There is a patch set here: > https://lore.kernel.org/all/166147828344.25420.13834885828450967910.stgit@noble.brown/ > which should work on a kernel of that time. > > NeilBrown > >> >> Best regards, >> Tigran. >> >> >> [1]: https://github.com/hpc/ior >> [2]: https://github.com/sahlberg/libnfs >> >> ----------------------------- >> DESY-IT, Scientific Computing ------=_Part_24052707_526459292.1743873051932 Content-Type: application/pkcs7-signature; name=smime.p7s; smime-type=signed-data Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgEFADCABgkqhkiG9w0BBwEAAKCAMIIH XzCCBUegAwIBAgIQGrSZ0tLzGu9JoeeaXGroSzANBgkqhkiG9w0BAQwFADBVMQswCQYDVQQGEwJO TDEZMBcGA1UEChMQR0VBTlQgVmVyZW5pZ2luZzErMCkGA1UEAxMiR0VBTlQgVENTIEF1dGhlbnRp Y2F0aW9uIFJTQSBDQSA0QjAeFw0yNDEyMDQwOTQzMjZaFw0yNjAxMDMwOTQzMjZaMIGpMRMwEQYK CZImiZPyLGQBGRMDb3JnMRYwFAYKCZImiZPyLGQBGRMGdGVyZW5hMRMwEQYKCZImiZPyLGQBGRMD dGNzMQswCQYDVQQGEwJERTEuMCwGA1UEChMlRGV1dHNjaGVzIEVsZWt0cm9uZW4tU3luY2hyb3Ry b24gREVTWTEoMCYGA1UEAwwfVGlncmFuIE1rcnRjaHlhbiB0aWdyYW5AZGVzeS5kZTCCAiIwDQYJ KoZIhvcNAQEBBQADggIPADCCAgoCggIBAKZ1aJleygPW8bRzYJ3VfXwfY2TxAF0QUuTk/6Bqu8Bi UQjIgmBQ1hCzz8DVdJ8saw7p5/c1JDmVHqm2DJPwXLROKACiDdSHPf+N8PFZvxHxOqFNPeO/oJhO jHXG1c/tL8ElfiUlMtEZYtoS60/VUz3A/4FIWP2A5s/UIOSZyCcKz3AUcAanHGEJVS8oWKQj7pNX yjojvX4aPHzsKP+c+c/5wq08/aziRXLCekhKk+VdS8lhlS/3AL1G0VSWKj5/pOpz4ozmv44GEw9z FAsPWuTcLXqCX993BOoWAyQDcygAsb0nQQMzx+4wlSGsI31/gKOE5ZOJ3SErWDswgzxWm8Xht/Kl ymDHPXi8P0ohQjJrQRpJXVwD/tXDwSSbWP9jnVbtqpvLLBkNrSy6elW19nkE1ObpSPcn+be5hs1P 59Y+GPudytAQ3MOoFoNd7kxpVQoM6cdQjRHdyIDbavZrdxr33s7uqSbcI/PE8W5M0iPNnd4ip4kH UIOdpsjk7b7kEdO4Jf9dDrz/fduAEaW+AUTfb+G42LiftUBXkANa50nOseW3tocadYOTySufN9or IwvcQ/1uemVd83On7k8bWevfU159x28aidxv8liqJXrrT28tp/QxtGtDXjo9jdkWi/5d/9XfqQgN IT7KH42fc3ZlaL3pLuJwEQWVtFnWUTRJAgMBAAGjggHUMIIB0DAfBgNVHSMEGDAWgBQQMuoC4vzP 6lYlVIfDmPXog9bFJDAOBgNVHQ8BAf8EBAMCBaAwCQYDVR0TBAIwADAdBgNVHSUEFjAUBggrBgEF BQcDAgYIKwYBBQUHAwQwRQYDVR0gBD4wPDAMBgoqhkiG90wFAgIFMA0GCyqGSIb3TAUCAwMDMA0G CyqGSIb3TAUCAwECMA4GDCsGAQQBgcRaAgMCAjBUBgNVHR8ETTBLMEmgR6BFhkNodHRwOi8vY3Js LmVudGVycHJpc2Uuc2VjdGlnby5jb20vR0VBTlRUQ1NBdXRoZW50aWNhdGlvblJTQUNBNEIuY3Js MIGRBggrBgEFBQcBAQSBhDCBgTBPBggrBgEFBQcwAoZDaHR0cDovL2NydC5lbnRlcnByaXNlLnNl Y3RpZ28uY29tL0dFQU5UVENTQXV0aGVudGljYXRpb25SU0FDQTRCLmNydDAuBggrBgEFBQcwAYYi aHR0cDovL29jc3AuZW50ZXJwcmlzZS5zZWN0aWdvLmNvbTAjBgNVHREEHDAagRh0aWdyYW4ubWty dGNoeWFuQGRlc3kuZGUwHQYDVR0OBBYEFMmhx6vILo+tVVV6rojJTwL+t2eGMA0GCSqGSIb3DQEB DAUAA4ICAQARKKJEO1G3lIe+AA+E3pl5mNYs/+XgswX1316JYDRzBnfVweMR6IaOT7yrP+Mwhx3v yiM8VeSVFtfyLlV6FaHAxNFo5Z19L++g/FWWAg0Wz13aFaEm0+KEp8RkB/Mh3EbSukZxUqmWCgrx zmx+I5zlX8pLxNgrxcc1WW5l7Y7y2sci++W6wE/L7rgMuznqiBLw/qwnkXAeQrw2PIllAGwRqrwa 37kPa+naT1P0HskuBFHQSmMihB5HQl6+2Rs9M5RMW3/IlUQAqkhZQGBXmiWDivjPFKXJQnCmhQmh 76sOcSOScfzYI5xOD+ZGdBRRufkUxaXJ2G//IgkK2R8mqrFEXxBFaBMc0uMBJHKNv+FO7H6VPOe9 BD9FwfLiqWvGwKJrF11Bk/QSfWh+zCJ8JHPAi6irwQO4Xf+0xhPsxb+jBfKK3I84YMf6zsDkdDzH lkNPhDh4xhYhEAk+L228pjTEmnbb2QVv52grZ0dbITuN+Hz2ypvLfaS8p06lrht45COlkmuIUVqp bsc3kRt610qwXSjYcc8zeCQI0Rqnnq+0UN5T0KU7JSzUho6vaTSUG57uc7b3DkIW2Z9VpXX5xKb/ vfl++jC5JzKrbCeS+QOStpXwwaH62IUHwdfWfkvpzb8EFALEmCvu8nlT9NaqYlB/xogMH6oHBm+Y nxmRQxWROAAAMYIDZTCCA2ECAQEwaTBVMQswCQYDVQQGEwJOTDEZMBcGA1UEChMQR0VBTlQgVmVy ZW5pZ2luZzErMCkGA1UEAxMiR0VBTlQgVENTIEF1dGhlbnRpY2F0aW9uIFJTQSBDQSA0QgIQGrSZ 0tLzGu9JoeeaXGroSzANBglghkgBZQMEAgEFAKCBzjAYBgkqhkiG9w0BCQMxCwYJKoZIhvcNAQcB MBwGCSqGSIb3DQEJBTEPFw0yNTA0MDUxNzEwNTJaMC0GCSqGSIb3DQEJNDEgMB4wDQYJYIZIAWUD BAIBBQChDQYJKoZIhvcNAQELBQAwLwYJKoZIhvcNAQkEMSIEIERa0E+qwj8UomZOwqP41HTEiUWL 47/84//m6T+QK1cKMDQGCSqGSIb3DQEJDzEnMCUwCgYIKoZIhvcNAwcwDgYIKoZIhvcNAwICAgCA MAcGBSsOAwIHMA0GCSqGSIb3DQEBCwUABIICAJHAWasqTylsDTkMC6m/XK5+XxkUZhYBadSQwjch vtNQk5CVWMXCaUX7Ml49mpalVjacvtH+C9JgWsFGmOvqORE0ddxo4iRjiVsjrxvHoXoKqfDwwSOF /lzImCZ5xRgTs7sx2dww5o6bnLeCkNndCOFJC5YZe5YQBkdjg3xppj0oQ4B3+S+PfdpQe1v+2vyY lP/BOO5J4H4yj9SaLgMp1JaYVfs+imcefhfSssBOtPRD6NM+iI48pHm5qKRrENCY4ZBkrd63ReK4 ecs/xHPbSC4wSz2usc9Fe9ZxujxbNrFqh8HCEfhVf27YLDuedknR2AHx8yx7o2eJ9uC3B1CzpClh WanNArb/1gvzAs8qfl9EGUySAcIi4KOB0bPy88pnInKwXCYIvKTBdvqWfaxfZmVEXQs2AafamnKu uLrbdClo0uDySVNjbbbOgJ0nPRmTki+blBAP/Wc0eia8QzmJ4d1ErkrK16u+RLQitv+gkXsrIDh9 yAs41wh+VoC0m5nxXG4zsyZ69MLtRdgKn+tosBYxtPFFlj+gfXhjVzCyGq3SKQTfZM9lcham4OMB E+OQNbE9NATonL/Uj55L8vX0fIT57klF9dvEGjUFq1FW7rKzK2/PDG+9EC3z00V21ypXRF/Ykp53 hr+Ik1QDKdXLpQnT2ecXMmGmN7nSFH4m6+NEAAAAAAAA ------=_Part_24052707_526459292.1743873051932--