Re: [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 22, 2025, Xiaoyao Li wrote:
> On 7/22/2025 10:37 PM, Sean Christopherson wrote:
> > On Tue, Jul 22, 2025, Xiaoyao Li wrote:
> > > On 7/21/2025 8:22 PM, Xiaoyao Li wrote:
> > > > On 7/18/2025 12:27 AM, Fuad Tabba wrote:
> > > > > +/*
> > > > > + * CoCo VMs with hardware support that use guest_memfd only for
> > > > > backing private
> > > > > + * memory, e.g., TDX, cannot use guest_memfd with userspace mapping
> > > > > enabled.
> > > > > + */
> > > > > +#define kvm_arch_supports_gmem_mmap(kvm)        \
> > > > > +    (IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP) &&    \
> > > > > +     (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM)
> > > > 
> > > > I want to share the findings when I do the POC to enable gmem mmap in QEMU.
> > > > 
> > > > Actually, QEMU can use gmem with mmap support as the normal memory even
> > > > without passing the gmem fd to kvm_userspace_memory_region2.guest_memfd
> > > > on KVM_SET_USER_MEMORY_REGION2.
> > > > 
> > > > Since the gmem is mmapable, QEMU can pass the userspace addr got from
> > > > mmap() on gmem fd to kvm_userspace_memory_region(2).userspace_addr. It
> > > > works well for non-coco VMs on x86.
> > > 
> > > one more findings.
> > > 
> > > I tested with QEMU by creating normal (non-private) memory with mmapable
> > > guest memfd, and enforcily passing the fd of the gmem to struct
> > > kvm_userspace_memory_region2 when QEMU sets up memory region.
> > > 
> > > It hits the kvm_gmem_bind() error since QEMU tries to back different GPA
> > > region with the same gmem.
> > > 
> > > So, the question is do we want to allow the multi-binding for shared-only
> > > gmem?
> > 
> > Can you elaborate, maybe with code?  I don't think I fully understand the setup.
> 
> well, I haven't fully sorted it out. Just share what I get so far.
> 
> the problem hit when SMM is enabled (which is enabled by default).
> 
> - The trace of "-machine q35,smm=off":
> 
> kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x4 gpa=0x0 size=0x80000000
> ua=0x7f5733fff000 guest_memfd=15 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#1 flags=0x4 gpa=0x100000000
> size=0x80000000 ua=0x7f57b3fff000 guest_memfd=15
> guest_memfd_offset=0x80000000 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#2 flags=0x2 gpa=0xffc00000
> size=0x400000 ua=0x7f5840a00000 guest_memfd=-1 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x0 gpa=0x0 size=0x0
> ua=0x7f5733fff000 guest_memfd=15 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x4 gpa=0x0 size=0xc0000
> ua=0x7f5733fff000 guest_memfd=15 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#3 flags=0x2 gpa=0xc0000 size=0x20000
> ua=0x7f5841000000 guest_memfd=-1 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#4 flags=0x2 gpa=0xe0000 size=0x20000
> ua=0x7f5840de0000 guest_memfd=-1 guest_memfd_offset=0x3e0000 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#5 flags=0x4 gpa=0x100000
> size=0x7ff00000 ua=0x7f57340ff000 guest_memfd=15 guest_memfd_offset=0x100000
> ret=0
> 
> - The trace of "-machine q35"
> 
> kvm_set_user_memory AddrSpace#0 Slot#0 flags=0x4 gpa=0x0 size=0x80000000
> ua=0x7f8faffff000 guest_memfd=15 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#1 flags=0x4 gpa=0x100000000
> size=0x80000000 ua=0x7f902ffff000 guest_memfd=15
> guest_memfd_offset=0x80000000 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#2 flags=0x2 gpa=0xffc00000
> size=0x400000 ua=0x7f90bd000000 guest_memfd=-1 guest_memfd_offset=0x0 ret=0
> kvm_set_user_memory AddrSpace#0 Slot#3 flags=0x4 gpa=0xfeda0000 size=0x20000
> ua=0x7f8fb009f000 guest_memfd=15 guest_memfd_offset=0xa0000 ret=-22
> qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION2
> failed, slot=3, start=0xfeda0000, size=0x20000, flags=0x4, guest_memfd=15,
> guest_memfd_offset=0xa0000: Invalid argument
> kvm_set_phys_mem: error registering slot: Invalid argument
> 
> 
> where QEMU tries to setup the memory region for [0xfeda0000, +0x20000],
> which is back'ed by gmem (fd is 15) allocated for normal RAM, from offset
> 0xa0000.
> 
> What I have tracked down in QEMU is mch_realize(), where it sets up some
> memory region starting from 0xfeda0000.

Oh yay, SMM.  The problem lies in memory regions that are aliased into low memory
(IIRC, there's at least one other such scenario, but don't quote me on that).
For SMRAM, when the "high" SMRAM location (0xfeda0000) is enabled, the "legacy"
SMRAM location (0xa0000) gets remapped (aliased in QEMU's vernacular) to the
high location, resulting in two CPU physical addresses pointing at the same
underyling memory[*].  From KVM's perspective, that means two GPA ranges pointing
at the same HVA.

As for whether or not we want to support such madness...  I'd definitely say "not
now", and probably not ever.  Emulating SMM puts the VMM *firmly* in the TCB of
the guest, and so guest_memfd benefits like not having to map guest memory into
userspace pretty much go out the window.  For such a use case, I don't think it's
unreasonable to require QEMU (or any other VMM) to map the aliases via HVA only,
i.e. to not take full advantage of guest_memfd.

[*] https://opensecuritytraining.info/IntroBIOS_files/Day1_08_Advanced%20x86%20-%20BIOS%20and%20SMM%20Internals%20-%20SMRAM.pdf





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux