On Fri, 2 May 2025 16:40:31 -0600 Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > The vfio-pci huge_fault handler doesn't make any attempt to insert a > mapping containing the faulting address, it only inserts mappings if the > faulting address and resulting pfn are aligned. This works in a lot of > cases, particularly in conjunction with QEMU where DMA mappings linearly > fault the mmap. However, there are configurations where we don't get > that linear faulting and pages are faulted on-demand. > > The scenario reported in the bug below is such a case, where the physical > address width of the CPU is greater than that of the IOMMU, resulting in a > VM where guest firmware has mapped device MMIO beyond the address width of > the IOMMU. In this configuration, the MMIO is faulted on demand and > tracing indicates that occasionally the faults generate a VM_FAULT_OOM. > Given the use case, this results in a "error: kvm run failed Bad address", > killing the VM. > > The host is not under memory pressure in this test, therefore it's > suspected that VM_FAULT_OOM is actually the result of a NULL return from > __pte_offset_map_lock() in the get_locked_pte() path from insert_pfn(). > This suggests a potential race inserting a pte concurrent to a pmd, and > maybe indicates some deficiency in the mm layer properly handling such a > case. > > Nevertheless, Peter noted the inconsistency of vfio-pci's huge_fault > handler where our mapping granularity depends on the alignment of the > faulting address relative to the order rather than aligning the faulting > address to the order to more consistently insert huge mappings. This > change not only uses the page tables more consistently and efficiently, but > as any fault to an aligned page results in the same mapping, the race > condition suspected in the VM_FAULT_OOM is avoided. > > Reported-by: Adolfo <adolfotregosa@xxxxxxxxx> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=220057 > Fixes: 09dfc8a5f2ce ("vfio/pci: Fallback huge faults for unaligned pfn") > Cc: stable@xxxxxxxxxxxxxxx > Tested-by: Adolfo <adolfotregosa@xxxxxxxxx> > Co-developed-by: Peter Xu <peterx@xxxxxxxxxx> > Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx> > --- > drivers/vfio/pci/vfio_pci_core.c | 12 ++++++------ > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c > index 35f9046af315..6328c3a05bcd 100644 > --- a/drivers/vfio/pci/vfio_pci_core.c > +++ b/drivers/vfio/pci/vfio_pci_core.c > @@ -1646,14 +1646,14 @@ static vm_fault_t vfio_pci_mmap_huge_fault(struct vm_fault *vmf, > { > struct vm_area_struct *vma = vmf->vma; > struct vfio_pci_core_device *vdev = vma->vm_private_data; > - unsigned long pfn, pgoff = vmf->pgoff - vma->vm_pgoff; > + unsigned long addr = vmf->address & ~((PAGE_SIZE << order) - 1); > + unsigned long pgoff = (addr - vma->vm_start) >> PAGE_SHIFT; > + unsigned long pfn = vma_to_pfn(vma) + pgoff; > vm_fault_t ret = VM_FAULT_SIGBUS; > > - pfn = vma_to_pfn(vma) + pgoff; > - > - if (order && (pfn & ((1 << order) - 1) || > - vmf->address & ((PAGE_SIZE << order) - 1) || > - vmf->address + (PAGE_SIZE << order) > vma->vm_end)) { > + if (order && (addr < vma->vm_start || > + addr + (PAGE_SIZE << order) > vma->vm_end || > + pfn & ((1 << order) - 1))) { > ret = VM_FAULT_FALLBACK; > goto out; > } Applied to vfio for-linus branch for v6.15. Added Peter as a Sign-off to be compatible with the Co-developed tag, in agreement with Peter. Modified the bz Link: to Closes: for checkpatch. Thanks, Alex