Re: [PATCH v15 14/21] KVM: x86: Enable guest_memfd mmap for default VM type

Xiaoyao Li <xiaoyao.li@xxxxxxxxx> · Mon, 21 Jul 2025 23:07:25 +0800

On 7/21/2025 10:42 PM, Sean Christopherson wrote:
On Mon, Jul 21, 2025, Vishal Annapurve wrote:
On Mon, Jul 21, 2025 at 5:22 AM Xiaoyao Li <xiaoyao.li@xxxxxxxxx> wrote:

On 7/18/2025 12:27 AM, Fuad Tabba wrote:
+/*
+ * CoCo VMs with hardware support that use guest_memfd only for backing private
+ * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enabled.
+ */
+#define kvm_arch_supports_gmem_mmap(kvm)             \
+     (IS_ENABLED(CONFIG_KVM_GMEM_SUPPORTS_MMAP) &&   \
+      (kvm)->arch.vm_type == KVM_X86_DEFAULT_VM)

I want to share the findings when I do the POC to enable gmem mmap in QEMU.

Actually, QEMU can use gmem with mmap support as the normal memory even
without passing the gmem fd to kvm_userspace_memory_region2.guest_memfd
on KVM_SET_USER_MEMORY_REGION2.

Since the gmem is mmapable, QEMU can pass the userspace addr got from
mmap() on gmem fd to kvm_userspace_memory_region(2).userspace_addr. It
works well for non-coco VMs on x86.

Then it seems feasible to use gmem with mmap for the shared memory of
TDX, and an additional gmem without mmap for the private memory. i.e.,
For struct kvm_userspace_memory_region, the @userspace_addr is passed
with the uaddr returned from gmem0 with mmap, while @guest_memfd is
passed with another gmem1 fd without mmap.

However, it fails actually, because the kvm_arch_suports_gmem_mmap()
returns false for TDX VMs, which means userspace cannot allocate gmem
with mmap just for shared memory for TDX.

Why do you want such a usecase to work?

I'm guessing Xiaoyao was asking an honest question in response to finding a
perceived flaw when trying to get this all working in QEMU.

I'm not sure if it is an flaw. Such usecase is not supported is just 
anti-intuition to me.

If kvm allows mappable guest_memfd files for TDX VMs without
conversion support, userspace will be able to use those for backing

s/able/unable?

I think vishal meant "able", because ...

private memory unless:
1) KVM checks at binding time if the guest_memfd passed during memslot
creation is not a mappable one and doesn't enforce "not mappable"
requirement for TDX VMs at creation time.

Xiaoyao's question is about "just for shared memory", so this is irrelevant for
the question at hand.

... if we allow gmem mmap for TDX, KVM needs to ensure the mmapable gmem 
should only be passed via userspace_addr. IOW, KVM needs to forbid 
userspace from passing the mmap'able guest_memfd to 
kvm_userspace_memory_region2.guest_memfd. Because it allows userspace to 
access the private mmeory.

2) KVM fetches shared faults through userspace page tables and not
guest_memfd directly.

This is also irrelevant.  KVM _already_ supports resolving shared faults through
userspace page tables.  That support won't go away as KVM will always need/want
to support mapping VM_IO and/or VM_PFNMAP memory into the guest (even for TDX).

I don't see value in trying to go out of way to support such a usecase.

But if/when KVM gains support for tracking shared vs. private in guest_memfd
itself, i.e. when TDX _does_ support mmap() on guest_memfd, KVM won't have to go
out of its to support using guest_memfd for the @userspace_addr backing store.
Unless I'm missing something, the only thing needed to "support" this scenario is:

As above, we need 1) mentioned by Vishal as well, to prevent userspace 
from passing mmapable guest_memfd to serve as private memory.

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index d01bd7a2c2bd..34403d2f1eeb 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -533,7 +533,7 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
         u64 flags = args->flags;
         u64 valid_flags = 0;
  
-       if (kvm_arch_supports_gmem_mmap(kvm))
+       // if (kvm_arch_supports_gmem_mmap(kvm))
                 valid_flags |= GUEST_MEMFD_FLAG_MMAP;
  
         if (flags & ~valid_flags)

I think the question we actually want to answer is: do we want to go out of our
way to *prevent* such a usecase.  E.g. is there any risk/danger that we need to
mitigate, and would the cost of the mitigation be acceptable?

I think the answer is "no", because preventing userspace from using guest_memfd
as shared-only memory would require resolving the VMA during hva_to_pfn() in order
to fully prevent such behavior, and I defintely don't want to take mmap_lock
around hva_to_pfn_fast().

I don't see any obvious danger lurking.  KVM's pre-guest_memfd memory management
scheme is all about effectively making KVM behave like "just another" userspace
agent.  E.g. if/when TDX/SNP support comes along, guest_memfd must not allow mapping
private memory into userspace regardless of what KVM supports for page faults.

So unless I'm missing something, for now we do nothing, and let this support come
along naturally once TDX support mmap() on guest_memfd.