[Bug 220057] Kernel regression. Linux VMs crashing (I did not test Windows guest VMs)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=220057

--- Comment #40 from Alex Williamson (alex.williamson@xxxxxxxxxx) ---
The vfio_pci_mmap_huge_fault logs with order >0 and ending in 0x800 are normal,
they're indicating we can't create the huge page mapping due to alignment
requirements, 0x800 is VM_FAULT_FALLBACK (ie. fallback to a smaller mapping).

However, there are three instances of:

May 01 10:01:37 pve QEMU[1972]: error: kvm run failed Bad address
May 01 10:01:37 pve QEMU[1972]: error: kvm run failed Bad address
May 01 10:01:37 pve QEMU[1972]: error: kvm run failed Bad address

And three instances of:

May 01 10:01:37 pve kernel: vfio-pci 0000:01:00.0:
vfio_pci_mmap_huge_fault(,order = 0) BAR 1 page offset 0x3798: 0x1
May 01 10:01:37 pve kernel: vfio-pci 0000:01:00.0:
vfio_pci_mmap_huge_fault(,order = 0) BAR 1 page offset 0x3710: 0x1
May 01 10:01:37 pve kernel: vfio-pci 0000:01:00.0:
vfio_pci_mmap_huge_fault(,order = 0) BAR 1 page offset 0x3688: 0x1

0x1 is VM_FAULT_OOM, so likely at some point in trying to insert the pte, we
got an -ENOMEM.

The system has 128GB of RAM, 98GB of which is dedicated to 1G hugepages.  This
VM is configured for 32GB.  What happens if fewer hugepages are reserved?

Also note that if we were able to populate the MMIO mappings using huge pages,
which would occur if the VM BIOS had placed the mappings within the DMA
mappable range of the IOMMU (ie. the VFIO_MAP_DMA failures), we'd be using
fewer page table entries than even the previous code (ie. less memory).  The
issue might simply come down to the fact that previously we attempted to fault
in the entire MMIO mapping on the first fault, at that time memory was
available, but now we fault on access with the expectation that we're faulting
less due to huge pages, but the latter is not coming to fruition due to the bad
VM configuration.

I think we're going to need to figure out if/how Proxmox enables setting
guest-phys-bits=39 or the host needs to free up some memory from hugepages.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux