Re: need SUNRPC TCP to receive into aligned pages [was: Re: [PATCH 1/6] NFSD: add the ability to enable use of RWF_DONTCACHE for all IO]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 16, 2025 at 09:37:01PM -0700, Christoph Hellwig wrote:
> On Mon, Jun 16, 2025 at 12:07:42PM -0400, Mike Snitzer wrote:
> > But that's OK... my test bdev is a bad example (archaic VMware vSphere
> > provided SCSI device): it doesn't reflect expected modern hardware.
> > 
> > But I just slapped together a test pmem blockdevice (memory backed,
> > using memmap=6G!18G) and it too has dma_alignment=511
> 
> That's the block layer default when not overriden by the driver, I guess
> pmem folks didn't care enough.  I suspect it should not have any
> alignment requirements at all.

Yeah, I hacked it with this just to quickly simulate NVMe's dma_alignment:

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 210fb77f51ba..0ab2826073f9 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -457,6 +457,7 @@ static int pmem_attach_disk(struct device *dev,
                .max_hw_sectors         = UINT_MAX,
                .features               = BLK_FEAT_WRITE_CACHE |
                                          BLK_FEAT_SYNCHRONOUS,
+               .dma_alignment          = 3,
        };
        int nid = dev_to_node(dev), fua;
        struct resource *res = &nsio->res;

> > I'd like NFSD to be able to know if its bvec is dma-aligned, before
> > issuing DIO writes to underlying XFS.  AFAIK I can do that simply by
> > checking the STATX_DIOALIGN provided dio_mem_align...
> 
> Exactly.

I'm finding that even with dma_alignment=3 the bvec, that
nfsd_vfs_write()'s call to xdr_buf_to_bvec() produces from NFS's WRITE
payload, still causes iov_iter_aligned_bvec() to return false.

The reason is that iov_iter_aligned_bvec() inspects each member of the
bio_vec in isolation (in its while() loop). So even though NFS WRITE
payload's overall size is aligned on-disk (e.g. offset=0 len=512K) its
first and last bvec members are _not_ aligned (due to 512K NFS WRITE
payload being offset 148 bytes into the first page of the pages
allocated for it by SUNRPC). So iov_iter_aligned_bvec() fails at this
check:

  if (len & len_mask)
          return false;

with tracing I added:

  nfsd-14027   [001] .....  3734.668780: nfsd_vfs_write: iov_iter_aligned_bvec: addr_mask=3 len_mask=511
  nfsd-14027   [001] .....  3734.668781: nfsd_vfs_write: iov_iter_aligned_bvec: len=3948 & len_mask=511 failed

Is this another case of the checks being too strict?  The bvec does
describe a contiguous 512K extent of on-disk LBA, just not if
inspected piece-wise.

BTW, XFS's directio code _will_ also check with
iov_iter_aligned_bvec() via iov_iter_is_aligned().

Mike




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux