Re: Revisiting WC mmap Support for VFIO PCI

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 16, 2025 at 07:11:50PM +0200, Mahmoud Nagy Adam wrote:

> - Dealing with legacy regions & drivers created regions, this could be
>   handled as suggested[1] from Jason using maple tree, which I'm
>   implementing to insert flags entry of the range to be mmapped, since
>   this would give us the flexibility to set the flags of any ranges.

I think we agreed that the mmap offset we use today is not ABI so the
maple tree should be filled in either during file open time or
probably simpler done during the regions info ioctl.

> - Scoping the mmap flags locally per request instead of defining it
>   globally on vfio_device/vfio_pci_core_device. This afaict from the
>   code could be handled if vfio_device_file struct is used with the
>   vfio_device_ops instead of the vfio_device, specifically the mmap &
>   ioctl since these the ops of interest here, so that we could access it
>   there and have a per fd maple tree to keep the flags in. This will
>   also keep the life time of the flags to the FD not to the device which
>   I think is better in this case.

It would be best to put the maple tree in the vfio_device_file.

The core code should just do the maple tree lookup and pass the struct
down to the driver, sort of like:

static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
{
	struct vfio_device_file *df = filep->private_data;
[..]

	struct vfio_mmap *mmap_obj = mtree_load(&df->mt_mmap, vma->vm_pgoff << PAGE_SHIFT);
	if (!mmap_obj)
		return -ENXIO;
	/* mmap must be contained to a single object */
	if (vma->vm_start < mmap_obj->vm_pgoff || vma->vm_end > (mmap_obj->vm_pgoff + mmap_obj->length))
		return -ENXIO;
	return device->ops->mmap_obj(device, vma, mmap_obj);

And the ioctls would need the df as well, but that is trivial.

Then all the pgoff touching in the driver around mmap goes away.

Drivers implementing the mmap ops should be structured to sub-struct
the vfio_mmap to store their own information:

struct vfio_pci_mmap {
       struct vfio_mmap core;
       unsigned int bar_index;
       phys_addr_t phys_start;
       unsigned int pgprot;
};

Or similar perhaps.

> Since I'm in the middle of investigating & implementing this topic, I
> would like to collect opinions on the approach so far, specially the
> last point. better ideas or objections with dealing with local flags
> using vfio_device_file or other points would be appreciated.

I strongly feel using a maple tree to translate pgoff's back into
kernel objects associated with the mmap is the right design for every
subsystem to follow.

Nicolin implemented this for iommufd:

https://lore.kernel.org/linux-iommu/9a888a326b12aa5fe940083eae1156304e210fe0.1752126748.git.nicolinc@xxxxxxxxxx/

Which shows the basic shape of the idea. It is quite easy to implement
on top of maple tree.

We are getting close to having problems in VFIO. The 40bits allocated
in VFIO is only about a 1TB of MMIO which is now a real-world size for
CXL devices. Making pgoff dynamic is the best solution to that
problem. It avoids running out of pgoff space as long as possible and
maximally retains compatibility with 32 bit systems.

Just sort of looking at it, I would suggest trying a sequence of
patches like:

 1) Make a new function op for VFIO_DEVICE_GET_REGION_INFO. It looks
    like there is quite some duplication here that can be improved on
    regardless.

 2) N patches converting drivers to use the new op.

 3) Add the basic maple tree stuff and a new mmap_obj op and new
    VFIO_DEVICE_GET_REGION_INFO that returns a 'struct vfio_mmap *'

 4) N patches moving to use the vfio_mmap ops

 5) Remove the unusued old APIs.


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux