Re: [PATCH v16 15/22] KVM: x86/mmu: Extend guest_memfd's max mapping level to shared mappings

Ackerley Tng <ackerleytng@xxxxxxxxxx> · Fri, 25 Jul 2025 14:31:57 -0700

Sean Christopherson <seanjc@xxxxxxxxxx> writes:

> On Fri, Jul 25, 2025, Ackerley Tng wrote:
>> Sean Christopherson <seanjc@xxxxxxxxxx> writes:
>> > Invoking host_pfn_mapping_level() isn't just undesirable, it's flat out wrong, as
>> > KVM will not verify slot->userspace_addr actually points at the (same) guest_memfd
>> > instance.
>> >
>> 
>> This is true too, that invoking host_pfn_mapping_level() could return
>> totally wrong information if slot->userspace_addr points somewhere else
>> completely.
>> 
>> What if slot->userspace_addr is set up to match the fd+offset in the
>> same guest_memfd, and kvm_gmem_max_mapping_level() returns 2M but it's
>> actually mapped into the host at 4K?
>> 
>> A little out of my depth here, but would mappings being recovered to the
>> 2M level be a problem?
>
> No, because again, by design, the host userspace mapping has _zero_ influence on
> the guest mapping.
>

Not trying to solve any problem but mostly trying to understand mapping
levels better.

Before guest_memfd, why does kvm_mmu_max_mapping_level() need to do
host_pfn_mapping_level()?

Was it about THP folios?

>> For enforcement of shared/private-ness of memory, recovering the
>> mappings to the 2M level is okay since if some part had been private,
>> guest_memfd wouldn't have returned 2M.
>> 
>> As for alignment, if guest_memfd could return 2M to
>> kvm_gmem_max_mapping_level(), then userspace_addr would have been 2M
>> aligned, which would correctly permit mapping recovery to 2M, so that
>> sounds like it works too.
>> 
>> Maybe the right solution here is that since slot->userspace_addr need
>> not point at the same guest_memfd+offset configured in the memslot, when
>> guest_memfd responds to kvm_gmem_max_mapping_level(), it should check if
>> the requested GFN is mapped in host userspace, and if so, return the
>> smaller of the two mapping levels.
>
> NAK.
>
> I don't understand what problem you're trying to solve, at all.  Setting aside
> guest_memfd for the moment, GFN=>HVA mappings are 100% userspace controlled, via
> memslots.  If userspace is accessing guest memory, it is userspace's responsibility
> to ensure it's accessing the _right_ guest memory.
>
> That doesn't change in any way for guest_memfd.  It is still userspace's
> responsibility to ensure any accesses to guest memory through an HVA access the
> correct GFN.
>
> But for guest_memfd guest mappings, the HVA is irrelevant, period.  The only reason
> we aren't going to kill off slot->userspace_addr entirely is so that _KVM_ accesses
> to guest memory Just Work, without any meaningful changes to (a well-behaved)
> userspace.
>
> For CoCo VMs (including pKVM), guest_memfd needs to ensure it doesn't create a
> hugepage that contains mixed memory, e.g. must not create a 2MiB userspace mapping
> if the 2MiB range contains private memory.  But that is simply a sub-case of the
> generate requirement that untrusted entities don't have access to private memory,
> and that KVM doesn't induce memory corruption due to mapping memory as both shared
> and private.