Re: [RFC PATCH 1/2] NFSD: fix misaligned DIO READ to not use a start_extra_page, exposes rpcrdma bug?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/30/25 1:38 PM, Mike Snitzer wrote:
> From: Mike Snitzer <snitzer@xxxxxxxxxxxxxxx>
> 
> Chuck Lever advised that allocating a single start_extra_page, to
> avoid RDMA corruption on client, definitely shouldn't be needed:
> 
>     "There's nothing I can think of in the RDMA or RPC/RDMA protocols that
>     mandates that the first page offset must always be zero. Moving data
>     at one address on the server to an entirely different address and
>     alignment on the client is exactly what RDMA is supposed to do.
> 
>     It sounds like an implementation omission because the server's upper
>     layers have never needed it before now. If TCP already handles it, I'm
>     guessing it's going to be straightforward to fix."
> 
> So avoid papering over what seems to be an rpcrdma bug, remove the
> allocation and use of an extra start_extra_page.
> 
> With this patch applied ontop of v8 patchset [0], I get the following
> data mismatch errors at the end [3] when using the NFS RDMA client
> with reproducer documented in associated patch header since v2 [1]:
> 
>     "Must allocate and use a bounce-buffer page (called 'start_extra_page')
>     if/when expanding the misaligned READ requires reading extra partial
>     page at the start of the READ so that its DIO-aligned. Otherwise that
>     extra page at the start will make its way back to the NFS client and
>     corruption will occur. As found, and then this fix of using an extra
>     page verified, using the 'dt' utility:
>       dt of=/mnt/share1/dt_a.test passes=1 bs=47008 count=2 \
>          iotype=sequential pattern=iot onerr=abort oncerr=abort
>     see: https://github.com/RobinTMiller/dt.git "
> 
> I really did try to call attention to this misaligned DIO READ
> alloc_page hack to make RDMA work, see [2], but I didn't frame it as
> RDMA specific and definitely should've been clearer on that important
> detail:
> 
>     "Also, I think its worth calling out this
>     nfsd_complete_misaligned_read_dio function for its remapping/shifting
>     of the READ payload reflected in rqstp->rq_bvec[]."
> 
> Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx>
> 
> [0]: https://lore.kernel.org/linux-nfs/20250826185718.5593-1-snitzer@xxxxxxxxxx/
> [1]: https://lore.kernel.org/linux-nfs/20250708160619.64800-9-snitzer@xxxxxxxxxx/
> [2]: https://lore.kernel.org/linux-nfs/aG2MDVyyCbjTpgOv@xxxxxxxxxx/
> [3]: partial output of dt utility that shows NFS client READ data mismatch:
> ++ COUNT=3
> ++ IOSIZE=47008
> ++ dt of=/mnt/hs_test/dt_thisisa.test passes=1 bs=47008 count=3 iotype=sequential pattern=iot onerr=abort oncerr=abort
> dt (j:1 t:1):
> dt (j:1 t:1): Write Statistics:
> dt (j:1 t:1):       Job Information Reported: Job 1, Thread 1
> dt (j:1 t:1):       Last IOT seed value used: 0x01010101
> dt (j:1 t:1):        Total records processed: 3 @ 47008 bytes/record (45.906 Kbytes)
> dt (j:1 t:1):        Total bytes transferred: 141024 (137.719 Kbytes, 0.134 Mbytes)
> dt (j:1 t:1):         Average transfer rates: 1004137 bytes/sec, 980.602 Kbytes/sec, 0.958 Mbytes/sec
> dt (j:1 t:1):        Number I/O's per second: 21.361
> dt (j:1 t:1):         Number seconds per I/O: 0.0468 (46.81ms)
> dt (j:1 t:1):         Total passes completed: 0/1
> dt (j:1 t:1):          Total errors detected: 0/1
> dt (j:1 t:1):             Total elapsed time: 00m00.14s
> dt (j:1 t:1):              Total system time: 00m00.00s
> dt (j:1 t:1):                Total user time: 00m00.00s
> dt (j:1 t:1):                  Starting time: Sat Aug 30 16:14:08 2025
> dt (j:1 t:1):                    Ending time: Sat Aug 30 16:14:08 2025
> dt (j:1 t:1): Warning: The bytes written 141024, is less than the data limit 1880320000 requested!
> dt (j:1 t:1): ERROR: Error number 1 occurred on Sat Aug 30 16:14:08 2025
> dt (j:1 t:1):
> dt (j:1 t:1):                   Error Number: 1
> dt (j:1 t:1):          Time of Current Error: Sat Aug 30 16:14:08 2025
> dt (j:1 t:1):           Read Pass Start Time: Sat Aug 30 16:14:08 2025
> dt (j:1 t:1):          Write Pass Start Time: Sat Aug 30 16:14:08 2025
> dt (j:1 t:1):                    Pass Number: 1
> dt (j:1 t:1):              Pass Elapsed Time: 00m00.10s
> dt (j:1 t:1):              Test Elapsed Time: 00m00.24s
> dt (j:1 t:1):                      File Name: /mnt/hs_test/dt_thisisa.test
> dt (j:1 t:1):                     File Inode: 1199 (0x4af)
> dt (j:1 t:1):                Directory Inode: 1 (0x1)
> dt (j:1 t:1):                      File Size: 1880320000 (0x70136800)
> dt (j:1 t:1):                      Operation: miscompare
> dt (j:1 t:1):                  Record Number: 2
> dt (j:1 t:1):                   Request Size: 47008 (0xb7a0)
> dt (j:1 t:1):                   Block Length: 91 (0x5b)
> dt (j:1 t:1):                       I/O Mode: read
> dt (j:1 t:1):                       I/O Type: sequential
> dt (j:1 t:1):                      File Type: output
> dt (j:1 t:1):                     Direct I/O: disabled (caching data)
> dt (j:1 t:1):                    Device Size: 512 (0x200)
> dt (j:1 t:1):           Starting File Offset: 47008 (0xb7a0)
> dt (j:1 t:1):                   Starting LBA: 91 (0x5b)
> dt (j:1 t:1):             Ending File Offset: 94016 (0x16f40)
> dt (j:1 t:1):                     Ending LBA: 182 (0xb6)
> dt (j:1 t:1):              Error File Offset: 47008 (0xb7a0)
> dt (j:1 t:1):           Error Offset Modulos: %8 = 0, %512 = 416, %4096 = 1952
> dt (j:1 t:1):    Starting Relative Error LBA: 91 (0x5b)
> dt (j:1 t:1):   Relative 4096 byte Error LBA: 11 (0xb)
> dt (j:1 t:1):        Corruption Buffer Index: 0 (byte index into read buffer)
> dt (j:1 t:1):         Corruption Block Index: 0 (byte index in miscompare block)
> dt (j:1 t:1):
> dt (j:1 t:1):
> dt (j:1 t:1): Data compare error at byte 0 in record number 2
> dt (j:1 t:1): Relative block number where the error occurred is 91, offset 47008
> dt (j:1 t:1): Block expected = 91 (0x0000005b), block found = 1919311731 (0x72665f73), count = 47008
> dt (j:1 t:1): The correct data starts at memory address 0x000000003c589000 (marked by asterisk '*')
> dt (j:1 t:1): Dumping Pattern Buffer (base = 0x3c589000, mismatch offset = 0, limit = 512 bytes):
> dt (j:1 t:1):                   / Buffer
> dt (j:1 t:1):    Memory Address / Index
> dt (j:1 t:1): 0x000000003c589000/     0 |*5b 00 00 00 5c 01 01 01 5d 02 02 02 5e 03 03 03 "[   \   ]   ^   "
> dt (j:1 t:1): 0x000000003c589010/    16 | 5f 04 04 04 60 05 05 05 61 06 06 06 62 07 07 07 "_   `   a   b   "
> dt (j:1 t:1): 0x000000003c589020/    32 | 63 08 08 08 64 09 09 09 65 0a 0a 0a 66 0b 0b 0b "c   d   e   f   "
> dt (j:1 t:1): 0x000000003c589030/    48 | 67 0c 0c 0c 68 0d 0d 0d 69 0e 0e 0e 6a 0f 0f 0f "g   h   i   j   "
> dt (j:1 t:1): 0x000000003c589040/    64 | 6b 10 10 10 6c 11 11 11 6d 12 12 12 6e 13 13 13 "k   l   m   n   "
> dt (j:1 t:1): 0x000000003c589050/    80 | 6f 14 14 14 70 15 15 15 71 16 16 16 72 17 17 17 "o   p   q   r   "
> dt (j:1 t:1): 0x000000003c589060/    96 | 73 18 18 18 74 19 19 19 75 1a 1a 1a 76 1b 1b 1b "s   t   u   v   "
> dt (j:1 t:1): 0x000000003c589070/   112 | 77 1c 1c 1c 78 1d 1d 1d 79 1e 1e 1e 7a 1f 1f 1f "w   x   y   z   "
> dt (j:1 t:1): 0x000000003c589080/   128 | 7b 20 20 20 7c 21 21 21 7d 22 22 22 7e 23 23 23 "{   |!!!}"""~###"
> dt (j:1 t:1): 0x000000003c589090/   144 | 7f 24 24 24 80 25 25 25 81 26 26 26 82 27 27 27 " $$$ %%% &&& '''"
> dt (j:1 t:1): 0x000000003c5890a0/   160 | 83 28 28 28 84 29 29 29 85 2a 2a 2a 86 2b 2b 2b " ((( ))) *** +++"
> dt (j:1 t:1): 0x000000003c5890b0/   176 | 87 2c 2c 2c 88 2d 2d 2d 89 2e 2e 2e 8a 2f 2f 2f " ,,, --- ... ///"
> dt (j:1 t:1): 0x000000003c5890c0/   192 | 8b 30 30 30 8c 31 31 31 8d 32 32 32 8e 33 33 33 " 000 111 222 333"
> dt (j:1 t:1): 0x000000003c5890d0/   208 | 8f 34 34 34 90 35 35 35 91 36 36 36 92 37 37 37 " 444 555 666 777"
> dt (j:1 t:1): 0x000000003c5890e0/   224 | 93 38 38 38 94 39 39 39 95 3a 3a 3a 96 3b 3b 3b " 888 999 ::: ;;;"
> dt (j:1 t:1): 0x000000003c5890f0/   240 | 97 3c 3c 3c 98 3d 3d 3d 99 3e 3e 3e 9a 3f 3f 3f " <<< === >>> ???"
> dt (j:1 t:1): 0x000000003c589100/   256 | 9b 40 40 40 9c 41 41 41 9d 42 42 42 9e 43 43 43 " @@@ AAA BBB CCC"
> dt (j:1 t:1): 0x000000003c589110/   272 | 9f 44 44 44 a0 45 45 45 a1 46 46 46 a2 47 47 47 " DDD EEE FFF GGG"
> dt (j:1 t:1): 0x000000003c589120/   288 | a3 48 48 48 a4 49 49 49 a5 4a 4a 4a a6 4b 4b 4b " HHH III JJJ KKK"
> dt (j:1 t:1): 0x000000003c589130/   304 | a7 4c 4c 4c a8 4d 4d 4d a9 4e 4e 4e aa 4f 4f 4f " LLL MMM NNN OOO"
> dt (j:1 t:1): 0x000000003c589140/   320 | ab 50 50 50 ac 51 51 51 ad 52 52 52 ae 53 53 53 " PPP QQQ RRR SSS"
> dt (j:1 t:1): 0x000000003c589150/   336 | af 54 54 54 b0 55 55 55 b1 56 56 56 b2 57 57 57 " TTT UUU VVV WWW"
> dt (j:1 t:1): 0x000000003c589160/   352 | b3 58 58 58 b4 59 59 59 b5 5a 5a 5a b6 5b 5b 5b " XXX YYY ZZZ [[["
> dt (j:1 t:1): 0x000000003c589170/   368 | b7 5c 5c 5c b8 5d 5d 5d b9 5e 5e 5e ba 5f 5f 5f " \\\ ]]] ^^^ ___"
> dt (j:1 t:1): 0x000000003c589180/   384 | bb 60 60 60 bc 61 61 61 bd 62 62 62 be 63 63 63 " ``` aaa bbb ccc"
> dt (j:1 t:1): 0x000000003c589190/   400 | bf 64 64 64 c0 65 65 65 c1 66 66 66 c2 67 67 67 " ddd eee fff ggg"
> dt (j:1 t:1): 0x000000003c5891a0/   416 | c3 68 68 68 c4 69 69 69 c5 6a 6a 6a c6 6b 6b 6b " hhh iii jjj kkk"
> dt (j:1 t:1): 0x000000003c5891b0/   432 | c7 6c 6c 6c c8 6d 6d 6d c9 6e 6e 6e ca 6f 6f 6f " lll mmm nnn ooo"
> dt (j:1 t:1): 0x000000003c5891c0/   448 | cb 70 70 70 cc 71 71 71 cd 72 72 72 ce 73 73 73 " ppp qqq rrr sss"
> dt (j:1 t:1): 0x000000003c5891d0/   464 | cf 74 74 74 d0 75 75 75 d1 76 76 76 d2 77 77 77 " ttt uuu vvv www"
> dt (j:1 t:1): 0x000000003c5891e0/   480 | d3 78 78 78 d4 79 79 79 d5 7a 7a 7a d6 7b 7b 7b " xxx yyy zzz {{{"
> dt (j:1 t:1): 0x000000003c5891f0/   496 | d7 7c 7c 7c d8 7d 7d 7d d9 7e 7e 7e da 7f 7f 7f " ||| }}} ~~~    "
> dt (j:1 t:1):
> dt (j:1 t:1): The incorrect data starts at memory address 0x000000003c596000 (for Robin's debug! :)
> dt (j:1 t:1): The incorrect data starts at file offset 000000000000047008 (marked by asterisk '*')
> dt (j:1 t:1): Dumping Data File offsets (base = 47008, mismatch offset = 0, limit = 512 bytes):
> dt (j:1 t:1):                   / Block
> dt (j:1 t:1):       File Offset / Index
> dt (j:1 t:1): 000000000000047008/     0 |*73 5f 66 72 65 65 5f 63 6f 6d 6d 69 74 5f 61 72 "s_free_commit_ar"
> dt (j:1 t:1): 000000000000047024/    16 | 72 61 79 00 54 43 50 5f 54 49 4d 45 5f 57 41 49 "ray TCP_TIME_WAI"
> dt (j:1 t:1): 000000000000047040/    32 | 54 00 42 50 46 5f 50 52 4f 47 5f 54 59 50 45 5f "T BPF_PROG_TYPE_"
> dt (j:1 t:1): 000000000000047056/    48 | 43 47 52 4f 55 50 5f 53 59 53 43 54 4c 00 4c 41 "CGROUP_SYSCTL LA"
> dt (j:1 t:1): 000000000000047072/    64 | 59 4f 55 54 5f 46 4c 45 58 5f 46 49 4c 45 53 00 "YOUT_FLEX_FILES "
> dt (j:1 t:1): 000000000000047088/    80 | 4e 46 53 45 52 52 5f 4a 55 4b 45 42 4f 58 00 72 "NFSERR_JUKEBOX r"
> dt (j:1 t:1): 000000000000047104/    96 | 78 5f 63 70 75 5f 72 6d 61 70 00 6d 69 67 72 61 "x_cpu_rmap migra"
> dt (j:1 t:1): 000000000000047120/   112 | 74 69 6f 6e 5f 64 69 73 61 62 6c 65 64 00 5f 5f "tion_disabled __"
> dt (j:1 t:1): 000000000000047136/   128 | 64 61 74 61 00 6e 64 6f 5f 64 65 6c 5f 73 6c 61 "data ndo_del_sla"
> dt (j:1 t:1): 000000000000047152/   144 | 76 65 00 6e 66 73 5f 63 6f 6d 6d 69 74 5f 64 61 "ve nfs_commit_da"
> dt (j:1 t:1): 000000000000047168/   160 | 74 61 00 65 78 74 5f 6d 75 74 65 78 00 63 6f 6e "ta ext_mutex con"
> dt (j:1 t:1): 000000000000047184/   176 | 6e 65 63 74 5f 63 6f 6f 6b 69 65 00 54 43 50 5f "nect_cookie TCP_"
> dt (j:1 t:1): 000000000000047200/   192 | 43 4c 4f 53 45 5f 57 41 49 54 00 6d 65 6d 63 6d "CLOSE_WAIT memcm"
> dt (j:1 t:1): 000000000000047216/   208 | 70 00 52 50 4d 5f 52 45 51 5f 53 55 53 50 45 4e "p RPM_REQ_SUSPEN"
> dt (j:1 t:1): 000000000000047232/   224 | 44 00 63 72 6d 61 74 63 68 00 63 61 6e 63 65 6c "D crmatch cancel"
> dt (j:1 t:1): 000000000000047248/   240 | 5f 66 6f 72 6b 00 70 67 70 72 6f 74 5f 74 00 74 "_fork pgprot_t t"
> dt (j:1 t:1): 000000000000047264/   256 | 72 61 63 65 70 6f 69 6e 74 5f 70 74 72 5f 74 00 "racepoint_ptr_t "
> dt (j:1 t:1): 000000000000047280/   272 | 66 6f 72 5f 72 65 63 6c 61 69 6d 00 4e 46 53 45 "for_reclaim NFSE"
> dt (j:1 t:1): 000000000000047296/   288 | 52 52 5f 42 41 44 43 48 41 52 00 5f 73 6b 62 5f "RR_BADCHAR _skb_"
> dt (j:1 t:1): 000000000000047312/   304 | 72 65 66 64 73 74 00 70 68 79 73 69 63 61 6c 5f "refdst physical_"
> dt (j:1 t:1): 000000000000047328/   320 | 6c 6f 63 61 74 69 6f 6e 00 6e 75 6d 5f 72 65 71 "location num_req"
> dt (j:1 t:1): 000000000000047344/   336 | 73 00 5f 5f 53 43 54 5f 5f 74 70 5f 66 75 6e 63 "s __SCT__tp_func"
> dt (j:1 t:1): 000000000000047360/   352 | 5f 70 6e 66 73 5f 6d 64 73 5f 66 61 6c 6c 62 61 "_pnfs_mds_fallba"
> dt (j:1 t:1): 000000000000047376/   368 | 63 6b 5f 77 72 69 74 65 5f 64 6f 6e 65 00 74 61 "ck_write_done ta"
> dt (j:1 t:1): 000000000000047392/   384 | 73 6b 5f 63 6c 65 61 6e 75 70 00 65 78 70 61 6e "sk_cleanup expan"
> dt (j:1 t:1): 000000000000047408/   400 | 64 5f 72 65 61 64 61 68 65 61 64 00 6c 6f 63 6b "d_readahead lock"
> dt (j:1 t:1): 000000000000047424/   416 | 5f 6d 61 6e 61 67 65 72 5f 6f 70 65 72 61 74 69 "_manager_operati"
> dt (j:1 t:1): 000000000000047440/   432 | 6f 6e 73 00 73 72 63 5f 72 65 67 00 63 72 64 65 "ons src_reg crde"
> dt (j:1 t:1): 000000000000047456/   448 | 73 74 72 6f 79 00 63 68 69 6c 64 72 65 6e 5f 6c "stroy children_l"
> dt (j:1 t:1): 000000000000047472/   464 | 6f 77 5f 75 73 61 67 65 00 6e 75 6d 5f 76 66 00 "ow_usage num_vf "
> dt (j:1 t:1): 000000000000047488/   480 | 73 63 72 61 74 63 68 00 50 49 44 54 59 50 45 5f "scratch PIDTYPE_"
> dt (j:1 t:1): 000000000000047504/   496 | 4d 41 58 00 70 72 65 70 61 72 65 5f 77 72 69 74 "MAX prepare_writ"
> dt (j:1 t:1):
> dt (j:1 t:1):
> dt (j:1 t:1): Analyzing IOT Record Data: (Note: Block #'s are relative to start of record!)
> dt (j:1 t:1):
> dt (j:1 t:1):                 IOT block size: 512
> dt (j:1 t:1):         Total number of blocks: 91 (47008 bytes)
> dt (j:1 t:1):         Current IOT seed value: 0x01010101 (pass 1)
> dt (j:1 t:1):      Range of corrupted blocks: 0 - 90
> dt (j:1 t:1):     Length of corrupted blocks: 91 (46592 bytes)
> dt (j:1 t:1):   Corrupted blocks file offset: 47008 (LBA 91)
> dt (j:1 t:1):     Number of corrupted blocks: 91
> dt (j:1 t:1):    Number of good blocks found: 0
> dt (j:1 t:1):    Number of zero blocks found: 0
> dt (j:1 t:1):
> dt (j:1 t:1):                       Record #: 2
> dt (j:1 t:1):         Starting Record Offset: 47008
> dt (j:1 t:1):                 Transfer Count: 47008 (0xb7a0)
> dt (j:1 t:1):           Ending Record Offset: 94016
> dt (j:1 t:1):    Relative Record Block Range: 91 - 182
> dt (j:1 t:1):            Read Buffer Address: 0x3c596000
> dt (j:1 t:1):           Pattern Base Address: 0x3c589000
> dt (j:1 t:1):                           Note: Incorrect data is marked with asterisk '*'
> dt (j:1 t:1):
> dt (j:1 t:1):                   Record Block: 0 (BAD data)
> dt (j:1 t:1):            Record Block Offset: 47008 (LBA 91)
> dt (j:1 t:1):            Record Buffer Index: 0 (0x0)
> dt (j:1 t:1):          Expected Block Number: 91 (0x0000005b)
> dt (j:1 t:1):          Received Block Number: 1919311731 (0x72665f73)
> dt (j:1 t:1):          Received Block Offset: 982687606272
> dt (j:1 t:1):
> dt (j:1 t:1): Byte Expected: address 0x3c589000          Received: address 0x3c596000
> dt (j:1 t:1): 0000 0000005b 0101015c 0202025d 0303035e * 72665f73 635f6565 696d6d6f 72615f74
> dt (j:1 t:1): 0010 0404045f 05050560 06060661 07070762 * 00796172 5f504354 454d4954 4941575f
> dt (j:1 t:1): 0020 08080863 09090964 0a0a0a65 0b0b0b66 * 50420054 52505f46 545f474f 5f455059
> dt (j:1 t:1): 0030 0c0c0c67 0d0d0d68 0e0e0e69 0f0f0f6a * 4f524743 535f5055 54435359 414c004c
> dt (j:1 t:1): 0040 1010106b 1111116c 1212126d 1313136e * 54554f59 454c465f 49465f58 0053454c
> dt (j:1 t:1): 0050 1414146f 15151570 16161671 17171772 * 4553464e 4a5f5252 42454b55 7200584f
> dt (j:1 t:1): 0060 18181873 19191974 1a1a1a75 1b1b1b76 * 70635f78 6d725f75 6d007061 61726769
> dt (j:1 t:1): 0070 1c1c1c77 1d1d1d78 1e1e1e79 1f1f1f7a * 6e6f6974 7369645f 656c6261 5f5f0064
> dt (j:1 t:1): 0080 2020207b 2121217c 2222227d 2323237e * 61746164 6f646e00 6c65645f 616c735f
> dt (j:1 t:1): 0090 2424247f 25252580 26262681 27272782 * 6e006576 635f7366 696d6d6f 61645f74
> dt (j:1 t:1): 00a0 28282883 29292984 2a2a2a85 2b2b2b86 * 65006174 6d5f7478 78657475 6e6f6300
> dt (j:1 t:1): 00b0 2c2c2c87 2d2d2d88 2e2e2e89 2f2f2f8a * 7463656e 6f6f635f 0065696b 5f504354
> dt (j:1 t:1): 00c0 3030308b 3131318c 3232328d 3333338e * 534f4c43 41575f45 6d005449 6d636d65
> dt (j:1 t:1): 00d0 3434348f 35353590 36363691 37373792 * 50520070 45525f4d 55535f51 4e455053
> dt (j:1 t:1): 00e0 38383893 39393994 3a3a3a95 3b3b3b96 * 72630044 6374616d 61630068 6c65636e
> dt (j:1 t:1): 00f0 3c3c3c97 3d3d3d98 3e3e3e99 3f3f3f9a * 726f665f 6770006b 746f7270 7400745f
> dt (j:1 t:1): 0100 4040409b 4141419c 4242429d 4343439e * 65636172 6e696f70 74705f74 00745f72
> dt (j:1 t:1): 0110 4444449f 454545a0 464646a1 474747a2 * 5f726f66 6c636572 006d6961 4553464e
> dt (j:1 t:1): 0120 484848a3 494949a4 4a4a4aa5 4b4b4ba6 * 425f5252 48434441 5f005241 5f626b73
> dt (j:1 t:1): 0130 4c4c4ca7 4d4d4da8 4e4e4ea9 4f4f4faa * 64666572 70007473 69737968 5f6c6163
> dt (j:1 t:1): 0140 505050ab 515151ac 525252ad 535353ae * 61636f6c 6e6f6974 6d756e00 7165725f
> dt (j:1 t:1): 0150 545454af 555555b0 565656b1 575757b2 * 5f5f0073 5f544353 5f70745f 636e7566
> dt (j:1 t:1): 0160 585858b3 595959b4 5a5a5ab5 5b5b5bb6 * 666e705f 646d5f73 61665f73 61626c6c
> dt (j:1 t:1): 0170 5c5c5cb7 5d5d5db8 5e5e5eb9 5f5f5fba * 775f6b63 65746972 6e6f645f 61740065
> dt (j:1 t:1): 0180 606060bb 616161bc 626262bd 636363be * 635f6b73 6e61656c 65007075 6e617078
> dt (j:1 t:1): 0190 646464bf 656565c0 666666c1 676767c2 * 65725f64 68616461 00646165 6b636f6c
> dt (j:1 t:1): 01a0 686868c3 696969c4 6a6a6ac5 6b6b6bc6 * 6e616d5f 72656761 65706f5f 69746172
> dt (j:1 t:1): 01b0 6c6c6cc7 6d6d6dc8 6e6e6ec9 6f6f6fca * 00736e6f 5f637273 00676572 65647263
> dt (j:1 t:1): 01c0 707070cb 717171cc 727272cd 737373ce * 6f727473 68630079 72646c69 6c5f6e65
> dt (j:1 t:1): 01d0 747474cf 757575d0 767676d1 777777d2 * 755f776f 65676173 6d756e00 0066765f
> dt (j:1 t:1): 01e0 787878d3 797979d4 7a7a7ad5 7b7b7bd6 * 61726373 00686374 54444950 5f455059
> dt (j:1 t:1): 01f0 7c7c7cd7 7d7d7dd8 7e7e7ed9 7f7f7fda * 0058414d 70657270 5f657261 74697277
> ...
> dt (j:1 t:1): Reread data does NOT match previous data or expected data!
> dt (j:1 t:1): Writing reread data to file dt_thisisa.test-REREAD3-j1t1, from buffer 0x7f12bc004000, 47008 bytes...
> dt (j:1 t:1): Command line to re-read the corrupted data:
> dt (j:1 t:1):     # dt if=/mnt/hs_test/dt_thisisa.test bs=47008 count=1 offset=47008 pattern=iot disable=retryDC,savecorrupted,trigdefaults
> dt (j:1 t:1):
> dt (j:1 t:1): Command line to re-read the data:
> dt (j:1 t:1):     # dt if=/mnt/hs_test/dt_thisisa.test bs=47008 dsize=512 iotype=sequential iodir=forward limit=94016 records=1 pattern=iot disable=retryDC,savecorrupted,trigdefaults
> dt (j:1 t:1):
> dt (j:1 t:1):
> dt (j:1 t:1): Read Statistics:
> dt (j:1 t:1):       Job Information Reported: Job 1, Thread 1
> dt (j:1 t:1):       Last IOT seed value used: 0x01010101
> dt (j:1 t:1):        Total records processed: 2 @ 47008 bytes/record (45.906 Kbytes)
> dt (j:1 t:1):        Total bytes transferred: 94016 (91.812 Kbytes, 0.090 Mbytes)
> dt (j:1 t:1):         Average transfer rates: 656857 bytes/sec, 641.462 Kbytes/sec, 0.626 Mbytes/sec
> dt (j:1 t:1):        Number I/O's per second: 13.973
> dt (j:1 t:1):         Number seconds per I/O: 0.0716 (71.56ms)
> dt (j:1 t:1):         Total passes completed: 1/1
> dt (j:1 t:1):          Total errors detected: 1/1
> dt (j:1 t:1):             Total elapsed time: 00m00.15s
> dt (j:1 t:1):              Total system time: 00m00.00s
> dt (j:1 t:1):                Total user time: 00m00.00s
> dt (j:1 t:1):                  Starting time: Sat Aug 30 16:14:08 2025
> dt (j:1 t:1):                    Ending time: Sat Aug 30 16:14:08 2025
> dt (j:1 t:1):
> dt (j:1 t:1): Operating System Information:
> dt (j:1 t:1):                      Host name: plsm121c-06.perf.hammer.space (192.168.1.106)
> dt (j:1 t:1):                      User name: root
> dt (j:1 t:1):                     Process ID: 31703
> dt (j:1 t:1):                 OS information: Linux 6.12.24.17.hs.snitm+ #34 SMP PREEMPT_DYNAMIC Fri Aug 15 22:03:10 UTC 2025 x86_64
> dt (j:1 t:1):
> dt (j:1 t:1): File System Information:
> dt (j:1 t:1):            Mounted from device: 192.168.0.105:/hs_test
> dt (j:1 t:1):           Mounted on directory: /mnt/hs_test
> dt (j:1 t:1):                Filesystem type: nfs4
> dt (j:1 t:1):             Filesystem options: rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,fatal_neterrors=none,proto=tcp,nconnect=16,port=20491,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.106,local_lock=none,addr=192.168.0.105
> dt (j:1 t:1):          Filesystem block size: 1048576
> dt (j:1 t:1):          Filesystem free space: 60990430380032 (58165007.000 Mbytes, 56801.765 Gbytes, 55.470 Tbytes)
> dt (j:1 t:1):         Filesystem total space: 60992310476800 (58166800.000 Mbytes, 56803.516 Gbytes, 55.472 Tbytes)
> dt (j:1 t:1):
> dt (j:1 t:1): Total Statistics:
> dt (j:1 t:1):        Output device/file name: /mnt/hs_test/dt_thisisa.test (device type=regular)
> dt (j:1 t:1):        Type of I/O's performed: sequential (forward)
> dt (j:1 t:1):       Job Information Reported: Job 1, Thread 1
> dt (j:1 t:1):       Data pattern string used: 'IOT Pattern' (blocking is 512 bytes)
> dt (j:1 t:1):       Last IOT seed value used: 0x01010101
> dt (j:1 t:1):             Total records read: 2
> dt (j:1 t:1):               Total bytes read: 94016 (91.812 Kbytes, 0.090 Mbytes, 0.000 Gbytes)
> dt (j:1 t:1):          Total records written: 3
> dt (j:1 t:1):            Total bytes written: 141024 (137.719 Kbytes, 0.134 Mbytes, 0.000 Gbytes)
> dt (j:1 t:1):        Total records processed: 5 @ 47008 bytes/record (45.906 Kbytes)
> dt (j:1 t:1):        Total bytes transferred: 235040 (229.531 Kbytes, 0.224 Mbytes)
> dt (j:1 t:1):         Average transfer rates: 828023 bytes/sec, 808.616 Kbytes/sec, 0.790 Mbytes/sec
> dt (j:1 t:1):        Number I/O's per second: 17.615
> dt (j:1 t:1):         Number seconds per I/O: 0.0568 (56.77ms)
> dt (j:1 t:1):         Total passes completed: 1/1
> dt (j:1 t:1):          Total errors detected: 1/1
> dt (j:1 t:1):             Total elapsed time: 00m00.29s
> dt (j:1 t:1):              Total system time: 00m00.00s
> dt (j:1 t:1):                Total user time: 00m00.00s
> dt (j:1 t:1):                  Starting time: Sat Aug 30 16:14:08 2025
> dt (j:1 t:1):                    Ending time: Sat Aug 30 16:14:08 2025
> dt (j:1 t:1):
> dt (j:1 t:1): Command line to re-read the data:
> dt (j:1 t:1):     # dt if=/mnt/hs_test/dt_thisisa.test bs=47008 dsize=512 iotype=sequential iodir=forward limit=141024 records=3 pattern=iot
> dt (j:1 t:1):
> dt (j:1 t:1): Command Line:
> dt (j:1 t:1):
> dt (j:1 t:1):     # dt of=/mnt/hs_test/dt_thisisa.test passes=1 bs=47008 count=3 iotype=sequential pattern=iot onerr=abort oncerr=abort
> dt (j:1 t:1):
> dt (j:1 t:1):         --> Date: September 21st, 2023, Version: 25.05, Author: Robin T. Miller <--
> dt (j:1 t:1):
> dt (j:1 t:1): onerr=abort, so stopping all threads for job 1...
> dt (j:0 t:0): Job 1 is being stopped (1 thread)
> dt (j:0 t:0): Program is exiting with status -1...
> ---
>  fs/nfsd/vfs.c | 25 ++++++-------------------
>  1 file changed, 6 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index f8975ee262b5c..762d745b1b15d 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -1079,13 +1079,11 @@ struct nfsd_read_dio {
>  	loff_t end;
>  	unsigned long start_extra;
>  	unsigned long end_extra;
> -	struct page *start_extra_page;
>  };
>  
>  static void init_nfsd_read_dio(struct nfsd_read_dio *read_dio)
>  {
>  	memset(read_dio, 0, sizeof(*read_dio));
> -	read_dio->start_extra_page = NULL;
>  }
>  
>  #define NFSD_READ_DIO_MIN_KB (32 << 10)
> @@ -1121,9 +1119,8 @@ static bool nfsd_analyze_read_dio(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  
>  	/*
>  	 * Any misaligned READ less than NFSD_READ_DIO_MIN_KB won't be expanded
> -	 * to be DIO-aligned (this heuristic avoids excess work, like allocating
> -	 * start_extra_page, for smaller IO that can generally already perform
> -	 * well using buffered IO).
> +	 * to be DIO-aligned (this heuristic avoids excess work, for smaller IO
> +	 * that can generally already perform well using buffered IO).
>  	 */
>  	if ((read_dio->start_extra || read_dio->end_extra) &&
>  	    (len < NFSD_READ_DIO_MIN_KB)) {
> @@ -1131,15 +1128,6 @@ static bool nfsd_analyze_read_dio(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  		return false;
>  	}
>  
> -	if (read_dio->start_extra) {
> -		read_dio->start_extra_page = alloc_page(GFP_KERNEL);
> -		if (WARN_ONCE(read_dio->start_extra_page == NULL,
> -			      "%s: Unable to allocate start_extra_page\n", __func__)) {
> -			init_nfsd_read_dio(read_dio);
> -			return false;
> -		}
> -	}
> -
>  	/* Show original offset and count, and how it was expanded for DIO */
>  	middle_end = read_dio->end - read_dio->end_extra;
>  	trace_nfsd_analyze_read_dio(rqstp, fhp, offset, len,
> @@ -1162,11 +1150,10 @@ static ssize_t nfsd_complete_misaligned_read_dio(struct svc_rqst *rqstp,
>  	if (!read_dio->start_extra && !read_dio->end_extra)
>  		return host_err;
>  
> -	/* If nfsd_analyze_read_dio() allocated a start_extra_page it must
> -	 * be removed from rqstp->rq_bvec[] to avoid returning unwanted data.
> +	/* If nfsd_analyze_read_dio() found start_extra (front-pad) page needed it
> +	 * must be removed from rqstp->rq_bvec[] to avoid returning unwanted data.
>  	 */
> -	if (read_dio->start_extra_page) {
> -		__free_page(read_dio->start_extra_page);
> +	if (read_dio->start_extra) {
>  		*rq_bvec_numpages -= 1;
>  		v = *rq_bvec_numpages;
>  		memmove(rqstp->rq_bvec, rqstp->rq_bvec + 1,
> @@ -1276,7 +1263,7 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  			if (read_dio.start_extra) {
>  				len = read_dio.start_extra;
>  				bvec_set_page(&rqstp->rq_bvec[v],
> -					      read_dio.start_extra_page,
> +					      *(rqstp->rq_next_page++),
>  					      len, PAGE_SIZE - len);
>  				total -= len;
>  				++v;

Thank you, Mike. This will help me reproduce the problem. Saves me a
bunch of time!

-- 
Chuck Lever




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux