Re: How to handle P2P DMA with only {physaddr,len} in bio_vec?

Jason Gunthorpe <jgg@xxxxxxxxxx> · Tue, 24 Jun 2025 09:18:46 -0300

On Tue, Jun 24, 2025 at 10:02:05AM +0100, David Howells wrote:
> Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
> 
> > On Mon, Jun 23, 2025 at 11:50:58AM +0100, David Howells wrote:
> > > What's the best way to manage this without having to go back to the page
> > > struct for every DMA mapping we want to make?
> > 
> > There isn't a very easy way.  Also because if you actually need to do
> > peer to peer transfers, you right now absolutely need the page to find
> > the pgmap that has the information on how to perform the peer to peer
> > transfer.
> 
> Are you expecting P2P to become particularly common?  

It is becoming common place in certain kinds of server system
types. If half the system's memory is behind PCI on a GPU or something
then you need P2P.

> Do we actually need 32 bits for bv_len, especially given that MAX_RW_COUNT is
> capped at a bit less than 2GiB?  Could we, say, do:
> 
>  	struct bio_vec {
>  		phys_addr_t	bv_phys;
>  		u32		bv_len:31;
> 		u32		bv_use_p2p:1;
>  	} __packed;
> 
> And rather than storing the how-to-do-P2P info in the page struct, does it
> make sense to hold it separately, keyed on bv_phys?

I though we had agreed these sorts of 'mixed transfers' were not
desirable and we want things to be uniform at this lowest level.

So, I suggest the bio_vec should be entirely uniform, either it is all
CPU memory or it is all P2P from the same source. This is what the
block stack is doing by holding the P2P flag in the bio and splitting
the bios when they are constructed.

My intention to make a more general, less performant, API was to copy
what bio is doing and have a list of bio_vecs, each bio_vec having the
same properties.

The struct enclosing the bio_vec (the bio, etc) would have the the
flag if it is p2p and some way to get the needed p2p source metadata.

The bio_vec itself would just store physical addresses and lengths. No
need for complicated bit slicing.

I think this is important because the new DMA API really doesn't want
to be changing modes on a per-item basis..

Jason