Re: [RFC PATCH v2 4/8] lib/iov_iter: remove piecewise bvec length checking in iov_iter_aligned_bvec

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 10, 2025 at 09:52:53AM -0400, Jeff Layton wrote:
> On Tue, 2025-07-08 at 12:06 -0400, Mike Snitzer wrote:
> > iov_iter_aligned_bvec() is strictly checking alignment of each element
> > of the bvec to arrive at whether the bvec is aligned relative to
> > dma_alignment and on-disk alignment.  Checking each element
> > individually results in disallowing a bvec that in aggregate is
> > perfectly aligned relative to the provided @len_mask.
> > 
> > Relax the on-disk alignment checking such that it is done on the full
> > extent described by the bvec but still do piecewise checking of the
> > dma_alignment for each bvec's bv_offset.
> > 
> > This allows for NFS's WRITE payload to be issued using O_DIRECT as
> > long as the bvec created with xdr_buf_to_bvec() is composed of pages
> > that respect the underlying device's dma_alignment (@addr_mask) and
> > the overall contiguous on-disk extent is aligned relative to the
> > logical_block_size (@len_mask).
> > 
> > Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx>
> > ---
> >  lib/iov_iter.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> > index bdb37d572e97..b2ae482b8a1d 100644
> > --- a/lib/iov_iter.c
> > +++ b/lib/iov_iter.c
> > @@ -819,13 +819,14 @@ static bool iov_iter_aligned_bvec(const struct iov_iter *i, unsigned addr_mask,
> >  	unsigned skip = i->iov_offset;
> >  	size_t size = i->count;
> >  
> > +	if (size & len_mask)
> > +		return false;
> > +
> >  	do {
> >  		size_t len = bvec->bv_len;
> >  
> >  		if (len > size)
> >  			len = size;
> > -		if (len & len_mask)
> > -			return false;
> >  		if ((unsigned long)(bvec->bv_offset + skip) & addr_mask)
> >  			return false;
> >  
> 
> cc'ing Keith too since he wrote this helper originally.

Thanks.

There's a comment in __bio_iov_iter_get_pages that says it expects each
vector to be a multiple of the block size. That makes it easier to
slit when needed, and this patch would allow vectors that break the
current assumption when calculating the "trim" value.

But for nvme, you couldn't split such a bvec into a usable command
anyway. I think you'd have to introduce a different queue limit to check
against when validating iter alignment if you don't want to use the
logical block size. 




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux