Re: How to handle P2P DMA with only {physaddr,len} in bio_vec?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:

> On Mon, Jun 23, 2025 at 11:50:58AM +0100, David Howells wrote:
> > What's the best way to manage this without having to go back to the page
> > struct for every DMA mapping we want to make?
> 
> There isn't a very easy way.  Also because if you actually need to do
> peer to peer transfers, you right now absolutely need the page to find
> the pgmap that has the information on how to perform the peer to peer
> transfer.

Are you expecting P2P to become particularly common?  Because page struct
lookups will become more expensive because we'll have to do type checking and
Willy may eventually move them from a fixed array into a maple tree - so if we
can record the P2P flag in the bio_vec, it would help speed up the "not P2P"
case.

> > Do we need to have
> > iov_extract_user_pages() note this in the bio_vec?
> > 
> > 	struct bio_vec {
> > 		physaddr_t	bv_base_addr;	/* 64-bits */
> > 		size_t		bv_len:56;	/* Maybe just u32 */
> > 		bool		p2pdma:1;	/* Region is involved in P2P */
> > 		unsigned int	spare:7;
> > 	};
> 
> Having a flag in the bio_vec might be a way to shortcut the P2P or not
> decision a bit.  The downside is that without the flag, the bio_vec
> in the brave new page-less world would actually just be:
> 
> 	struct bio_vec {
> 		phys_addr_t	bv_phys;
> 		u32		bv_len;
> 	} __packed;
> 
> i.e. adding any more information would actually increase the size from
> 12 bytes to 16 bytes for the usualy 64-bit phys_addr_t setups, and thus
> undo all the memory savings that this move would provide.

Do we actually need 32 bits for bv_len, especially given that MAX_RW_COUNT is
capped at a bit less than 2GiB?  Could we, say, do:

 	struct bio_vec {
 		phys_addr_t	bv_phys;
 		u32		bv_len:31;
		u32		bv_use_p2p:1;
 	} __packed;

And rather than storing the how-to-do-P2P info in the page struct, does it
make sense to hold it separately, keyed on bv_phys?

Also, is it possible for the networking stack, say, to trivially map the P2P
memory in order to checksum it?  I presume bv_phys in that case would point to
a mapping of device memory?

Thanks,
David





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux