I was able to reproduce low throughput with the fio command. The examples below read 200GB from multiple files. The --offset=98% is there just to read a small portion of a file, as our files are 33GB each. In 'case 1', the data is read from a single file, and when it reaches EOF, it switches to the next one. In 'case 2', all files are opened in advance, and data is read round-robin through all files. case 1: read files sequentially fio --name test --opendir=/pnfs/data --rw=randread:8 --bssplit=4k/25:512k --offset=98% --io_size=200G --file_service_type=sequential case 2: open all files and select round-robin from which to read fio --name test --opendir=/pnfs/data --rw=randread:8 --bssplit=4k/25:512k --offset=98% --io_size=200G --file_service_type=roundrobin The case 1 takes a couple of minutes (2-3). The case 2 takes two (2) hours. Tigran. ----- Original Message ----- > From: "Tigran Mkrtchyan" <tigran.mkrtchyan@xxxxxxx> > To: "linux-nfs" <linux-nfs@xxxxxxxxxxxxxxx> > Cc: "trondmy" <trondmy@xxxxxxxxxx>, "Olga Kornievskaia" <aglo@xxxxxxxxx> > Sent: Friday, 28 February, 2025 19:13:42 > Subject: Unexpected low pNFS IO performance with parallel workload > Dear NFS fellows, > > During HPC workloads on we notice that Linux NFS4.2/pNFS client menonstraits > unexpected low performance. > The application opens 55 files parallel reads the data with multiple threads. > The server issues flexfile > layout with tighly coupled NFSv4.1 DSes. > > Oservations: > > - despite 1MB rsize/wsize returned by layout, client never issues reads bigger > that 512k (offten much smaller) > - client always uses slot 0 on DS, and > - reads happen sequentialy, i.e. only one in-flight READ requests > - following reads often just read the next 512k block > - If instead of parallel application a simple dd is called, that multiple slots > and 1MB READs are sent > > $ dd if=/pnfs/xxxx/00054.h5 of=/dev/null > 45753381+1 records in > 45753381+1 records out > 23425731171 bytes (23 GB, 22 GiB) copied, 69.702 s, 336 MB/s > > > The client has 80 cores on 2 sockets, 512BG of RAM and runs REHL 9.4 > > $ uname -r > 5.14.0-427.26.1.el9_4.x86_64 > > $ free -g > total used free shared buff/cache available > Mem: 503 84 392 0 29 419 > > $ lscpu | head > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Address sizes: 46 bits physical, 48 bits virtual > Byte Order: Little Endian > CPU(s): 80 > On-line CPU(s) list: 0-79 > Vendor ID: GenuineIntel > BIOS Vendor ID: Intel(R) Corporation > Model name: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz > BIOS Model name: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz > > The client and all DSes equiped with 10GB/s NICs. > > Any ideas where to look? > > Best regards, > Tigran.
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature