On Fri, Jul 04, 2025 at 08:05:36PM +0530, Ojaswin Mujoo wrote: > On Thu, Jul 03, 2025 at 05:26:49PM +0100, John Garry wrote: > > On 03/07/2025 07:42, Ojaswin Mujoo wrote: > > > > > Why serialization matters here, because even if device reorders our read > > > > > before the atomic write, we still fetch an older $blocksize chunk which > > > > > should have the data and the header. > > > > Yes, we fetch $blocksize, but it may be as multiple read requests, > > > > so we can have: > > > > > > > > partial read $blocksize request, atomic write request, partial read > > > > $blocksize request > > > Ahh now I get the picture. Since software atomic writes might have a > > > value way higher than the max_sectors that the bio may split at, this > > > causes the reads to split which is the issue happnes, as you mentioned. > > > > This issue also exists for atomic writes based on REQ_ATOMIC. > > Yep got it now, thanks. > > > > > > > > > > > > The data crc should match the > > > > > header because it would have gone atomically. What am I missing? > > > > > > > > > > (Or do you mean that max_sectors_kb is not big enough to support > > > > > reads after software atomic writes and thats the issue?) > > > > > > > > > > > Can you cap at atomic write unit max opt? That will mean that we get > > > > > > serialization from atomic write HW support. > > > > > > > > > > > > BTW, to repeat what I said before, it can also fail for atomic write bios > > > > > > using HW support, as reads may be split. > > > > > This one I can understand, the fio's read is split causing a short read > > > > > and seems like fio is not requeing short read properly for the verify > > > > > step(?) so it tries to verify against short read itself. > > > > this is not an fio problem, as above. > > > Yep got it, its the block layer splitting the reads. > > > > > > So as a solution, do you think it is okay to cap $blocksize to > > > max_sectors_kb or would you suggest it would be better to just stick to > > > awu max opt? > > > > I tend to think that it is better to cap at awu max opt. If zero, then don't > > do the test. > > Hmm yep this makes sense but seems like we don't expose it yet in > xfs_io. Will need to add that support as well. Okay so thinking about it a bit more, Im not sure if capping to awu_max_opt is the right thing to do. There is no guarantee in kernel that awu_max_opt or even awu_max would be smaller than lim->max_sectors so why to derive our io size based on that. Further, handling awu_max vs awu_max_opt and their different interpretations for XFS vs EXT4 etc are adding uneccessary complexity. The simplest way seems to be to just limit the $blocksize to something like $(_min 16KB awu_max) and hope that 16KB is small enough to not be split during reads. We anyways have other tests like generic/1230 that can be used to test larger atomic write sizes Thoughts? > > > > > What awu max opt is reported for ext4 (for awu max > FS block size)? Is > > min(bigalloc size, bdev awu max)? > > For awu_max_opt is returned as 0 for ext4, since 0 indicates there > is no such slow fallback. From commit > > 5d894321c49e fs: add atomic write unit max opt to statx > > ... > When zero, it means that there is no such performance boundary. > ... > > Regards, > ojaswin > > > > > Thanks, > > John > >