Re: [PATCH v2 05/13] generic/1226: Add atomic write test using fio crc check verifier

Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx> · Fri, 4 Jul 2025 20:53:21 +0530

On Fri, Jul 04, 2025 at 08:05:36PM +0530, Ojaswin Mujoo wrote:
> On Thu, Jul 03, 2025 at 05:26:49PM +0100, John Garry wrote:
> > On 03/07/2025 07:42, Ojaswin Mujoo wrote:
> > > > > Why serialization matters here, because even if device reorders our read
> > > > > before the atomic write, we still fetch an older $blocksize chunk which
> > > > > should have the data and the header.
> > > > Yes, we fetch $blocksize, but it may be as multiple read requests,
> > > > so we can have:
> > > > 
> > > > partial read $blocksize request, atomic write request, partial read
> > > > $blocksize request
> > > Ahh now I get the picture. Since software atomic writes might have a
> > > value way higher than the max_sectors that the bio may split at, this
> > > causes the reads to split which is the issue happnes, as you mentioned.
> > 
> > This issue also exists for atomic writes based on REQ_ATOMIC.
> 
> Yep got it now, thanks.
> 
> 
> > 
> > > 
> > > > > The data crc should match the
> > > > > header because it would have gone atomically. What am I missing?
> > > > > 
> > > > > (Or do you mean that max_sectors_kb is not big enough to support
> > > > > reads after software atomic writes and thats the issue?)
> > > > > 
> > > > > > Can you cap at atomic write unit max opt? That will mean that we get
> > > > > > serialization from atomic write HW support.
> > > > > > 
> > > > > > BTW, to repeat what I said before, it can also fail for atomic write bios
> > > > > > using HW support, as reads may be split.
> > > > > This one I can understand, the fio's read is split causing a short read
> > > > > and seems like fio is not requeing short read properly for the verify
> > > > > step(?) so it tries to verify against short read itself.
> > > > this is not an fio problem, as above.
> > > Yep got it, its the block layer splitting the reads.
> > > 
> > > So as a solution, do you think it is okay to cap  $blocksize to
> > > max_sectors_kb or would you suggest it would be better to just stick to
> > > awu max opt?
> > 
> > I tend to think that it is better to cap at awu max opt. If zero, then don't
> > do the test.
> 
> Hmm yep this makes sense but seems like we don't expose it yet in
> xfs_io. Will need to add that support as well.

Okay so thinking about it a bit more, Im not sure if capping to
awu_max_opt is the right thing to do. 

There is no guarantee in kernel that awu_max_opt or even awu_max would
be smaller than lim->max_sectors so why to derive our io size based on
that. Further, handling awu_max vs awu_max_opt and their different
interpretations for XFS vs EXT4 etc are adding uneccessary complexity.

The simplest way seems to be to just limit the $blocksize to 
something like $(_min 16KB awu_max) and hope that 16KB is small enough
to not be split during reads. We anyways have other tests like
generic/1230 that can be used to test larger atomic write sizes

Thoughts?
> 
> > 
> > What awu max opt is reported for ext4 (for awu max > FS block size)? Is
> > min(bigalloc size, bdev awu max)?
> 
> For awu_max_opt is returned as 0 for ext4, since 0 indicates there
> is no such slow fallback. From commit 
> 
>   5d894321c49e fs: add atomic write unit max opt to statx
> 
>   ...
>   When zero, it means that there is no such performance boundary.
> 	...
> 
> Regards,
> ojaswin
> 
> > 
> > Thanks,
> > John
> >