Re: [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings

Jason Gunthorpe <jgg@xxxxxxxxxx> · Wed, 2 Jul 2025 20:32:44 -0300

On Wed, Jul 02, 2025 at 04:58:46PM -0400, Peter Xu wrote:
> > So you have to do it the other way and pass the pgoff to the vmap so
> > the vmap ends up with the same colouring as a user VMa holding the
> > same pages..
> 
> Not sure if I get that point, but.. it'll be hard to achieve at least.
> 
> The vmap() happens (submit/complete queues initializes) when io_uring
> instance is created.  The mmap() happens later, and it can also happen
> multiple times, so that all of the VAs got mmap()ed need to share the same
> colouring with the vmap()..  In this case it sounds reasonable to me to
> have the alignment done at mmap(), against the vmap() results.

The way this usually works is the memory is bound to a mmap "cookie"
- the pgoff - which userspace can use as many times as it likes.

Usually you know the thing being allocated will be mmap'd and what
it's pgoff will be because it is 1:1 with the cookie/pgoff.

Didn't try to guess what io_uring has done here, but, IMHO, it would
be weird if the pgoffs are not 1:1 with the vmaps.

Since you said the pgoff was constant and not exchanged user/kernel
then presumably the vmap just needs to use that constant pgoff for its
colouring.

> > > The changes comparing to previous:
> > > 
> > >     (1) merged pgoff and *phys_pgoff parameters into one unsigned long, so
> > >     the hook can adjust the pgoff for the va allocator to be used.  The
> > >     adjustment will not be visible to future mmap() when VMA is created.
> > 
> > It seems functional, but the above is better, IMHO.
> 
> Do you mean we can start with no modification allowed on *pgoff?  I'd
> prefer having *pgoff modifiable from the start, as it'll not only work for
> io_uring / parisc above since the 1st day (so we don't need to introduce it
> on top, modifying existing users..), but it'll also be cleaner to be used
> in the current VFIO's use case.

I think modifiably pgoff is really a weird concept... Especially if it
is only modified for the alignment calculation.

But if it is the only way so be it

Jason