Re: need SUNRPC TCP to receive into aligned pages [was: Re: [PATCH 1/6] NFSD: add the ability to enable use of RWF_DONTCACHE for all IO]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 16, 2025 at 05:29:32AM -0700, Christoph Hellwig wrote:
> On Fri, Jun 13, 2025 at 05:23:48AM -0400, Mike Snitzer wrote:
> > Which in practice has proven a hard requirement for O_DIRECT in my
> > testing
> 
> What fails if you don't page align the memory?
> 
> > But if you looking at patch 5 in this series:
> > https://lore.kernel.org/linux-nfs/20250610205737.63343-6-snitzer@xxxxxxxxxx/
> > 
> > I added fs/nfsd/vfs.c:is_dio_aligned(), which is basically a tweaked
> > ditto of fs/btrfs/direct-io.c:check_direct_IO():
> 
> No idea why btrfs still has this, but it's not a general requirement
> from the block layer or other file system.  You just need to be
> aligned to the dma alignment in the queue limits, which for most NVMe,
> SCSI or ATA devices reports a dword alignment.  Some of the more
> obscure drivers might require more alignment, or just report it due to
> copy and paste.

Yeah, should probably be fixed and the rest of filesystems audited.
 
> > What I found is that unless SUNRPC TPC stored the WRITE payload in a
> > page-aligned boundary then iov_iter_alignment() would fail.
> 
> iov_iter_alignment would fail, or yout check based on it?  The latter
> will fail, but it doesn't check anything that matters :)
> 

The latter, the check based on iov_iter_alignment() failed.  I
understand your point.

Thankfully I can confirm that dword alignment is all that is needed on
modern hardware, just showing my work:

I retested a 512K write payload that is aligned to the XFS bdev's
logical_block_size (512b) fails when I skip the iov_iter_alignment()
check at a high level.

Because it fails in fs/iomap/direct-io.c:iomap_dio_bio_iter() with
this check:

        if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1) ||
            !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
                return -EINVAL;

Because:

static inline bool bdev_iter_is_aligned(struct block_device *bdev,
                                        struct iov_iter *iter)
{
        return iov_iter_is_aligned(iter, bdev_dma_alignment(bdev),
                                   bdev_logical_block_size(bdev) - 1);
}

and because bdev_dma_alignment for my particular test bdev is 511 :(

But that's OK... my test bdev is a bad example (archaic VMware vSphere
provided SCSI device): it doesn't reflect expected modern hardware.

But I just slapped together a test pmem blockdevice (memory backed,
using memmap=6G!18G) and it too has dma_alignment=511

I do have access to a KVM guest with a virtio_scsi root bdev that has
dma_alignment=3

I also just confirmed that modern NVMe devices on another testbed also
have dma_alignment=3, whew...

I'd like NFSD to be able to know if its bvec is dma-aligned, before
issuing DIO writes to underlying XFS.  AFAIK I can do that simply by
checking the STATX_DIOALIGN provided dio_mem_align...

Thanks,
Mike




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux