Has there been any progress, or solution found? Lionel On Mon, 3 Mar 2025 at 18:45, Mkrtchyan, Tigran <tigran.mkrtchyan@xxxxxxx> wrote: > > > > I was able to reproduce low throughput with the fio command. The examples below read 200GB from multiple files. > The --offset=98% is there just to read a small portion of a file, as our files are 33GB each. In 'case 1', the data is read from a single > file, and when it reaches EOF, it switches to the next one. In 'case 2', all files are opened in advance, and data is read round-robin through > all files. > > case 1: read files sequentially > fio --name test --opendir=/pnfs/data --rw=randread:8 --bssplit=4k/25:512k --offset=98% --io_size=200G --file_service_type=sequential > > > case 2: open all files and select round-robin from which to read > fio --name test --opendir=/pnfs/data --rw=randread:8 --bssplit=4k/25:512k --offset=98% --io_size=200G --file_service_type=roundrobin > > The case 1 takes a couple of minutes (2-3). > The case 2 takes two (2) hours. > > Tigran. > > > ----- Original Message ----- > > From: "Tigran Mkrtchyan" <tigran.mkrtchyan@xxxxxxx> > > To: "linux-nfs" <linux-nfs@xxxxxxxxxxxxxxx> > > Cc: "trondmy" <trondmy@xxxxxxxxxx>, "Olga Kornievskaia" <aglo@xxxxxxxxx> > > Sent: Friday, 28 February, 2025 19:13:42 > > Subject: Unexpected low pNFS IO performance with parallel workload > > > Dear NFS fellows, > > > > During HPC workloads on we notice that Linux NFS4.2/pNFS client menonstraits > > unexpected low performance. > > The application opens 55 files parallel reads the data with multiple threads. > > The server issues flexfile > > layout with tighly coupled NFSv4.1 DSes. > > > > Oservations: > > > > - despite 1MB rsize/wsize returned by layout, client never issues reads bigger > > that 512k (offten much smaller) > > - client always uses slot 0 on DS, and > > - reads happen sequentialy, i.e. only one in-flight READ requests > > - following reads often just read the next 512k block > > - If instead of parallel application a simple dd is called, that multiple slots > > and 1MB READs are sent > > > > $ dd if=/pnfs/xxxx/00054.h5 of=/dev/null > > 45753381+1 records in > > 45753381+1 records out > > 23425731171 bytes (23 GB, 22 GiB) copied, 69.702 s, 336 MB/s > > > > > > The client has 80 cores on 2 sockets, 512BG of RAM and runs REHL 9.4 > > > > $ uname -r > > 5.14.0-427.26.1.el9_4.x86_64 > > > > $ free -g > > total used free shared buff/cache available > > Mem: 503 84 392 0 29 419 > > > > $ lscpu | head > > Architecture: x86_64 > > CPU op-mode(s): 32-bit, 64-bit > > Address sizes: 46 bits physical, 48 bits virtual > > Byte Order: Little Endian > > CPU(s): 80 > > On-line CPU(s) list: 0-79 > > Vendor ID: GenuineIntel > > BIOS Vendor ID: Intel(R) Corporation > > Model name: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz > > BIOS Model name: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz > > > > The client and all DSes equiped with 10GB/s NICs. > > > > Any ideas where to look? > > > > Best regards, > > Tigran. -- Lionel