Re: [PATCH v8 06/13] KVM: x86: Generalize private fault lookups to guest_memfd fault lookups

David Hildenbrand <david@xxxxxxxxxx> · Mon, 5 May 2025 10:01:26 +0200

On 03.05.25 00:00, Ackerley Tng wrote:
Sean Christopherson <seanjc@xxxxxxxxxx> writes:

On Fri, May 02, 2025, David Hildenbrand wrote:
On 30.04.25 20:58, Ackerley Tng wrote:
-	if (is_private)
+	if (is_gmem)
   		return max_level;

I think this renaming isn't quite accurate.

After our discussion yesterday, does that still hold true?

No.

IIUC in __kvm_mmu_max_mapping_level(), we skip considering
host_pfn_mapping_level() if the gfn is private because private memory
will not be mapped to userspace, so there's no need to query userspace
page tables in host_pfn_mapping_level().

I think the reason was that: for private we won't be walking the user space
pages tables.

Once guest_memfd is also responsible for the shared part, why should this
here still be private-only, and why should we consider querying a user space
mapping that might not even exist?

+1, one of the big selling points for guest_memfd beyond CoCo is that it provides
guest-first memory.  It is very explicitly an intended feature that the guest
mappings KVM creates can be a superset of the host userspace mappings.  E.g. the
guest can use larger page sizes, have RW while the host has RO, etc.

Do you mean that __kvm_mmu_max_mapping_level() should, in addition to
the parameter renaming from is_private to is_gmem, do something like

if (is_gmem)
	return kvm_gmem_get_max_mapping_level(slot, gfn);

I assume you mean, not looking at lpage_info at all?

I have limited understanding what lpage_info is or what it does. I 
believe all it adds is a mechanism to *disable* large page mappings.

We want to disable large pages if (using 2M region as example)

(a) Mixed memory attributes. If a PFN falls into a 2M region, and parts
    of that region are shared vs. private (mixed memory attributes ->
    KVM_LPAGE_MIXED_FLAG)

 -> With gmem-shared we could have mixed memory attributes, not a PFN
    fracturing. (PFNs don't depend on memory attributes)

(b) page track: intercepting (mostly write) access to GFNs

So, I wonder if we still have to take care of lpage_info, at least for
handling (b) correctly [I assume so]. Regarding (a) I am not sure: once 
memory attributes are handled by gmem in the gmem-shared case. IIRC, 
with AMD SEV we might still have to honor it? But gmem itself could 
handle that.

What we could definitely do here for now is:

if (is_gmem)
	/* gmem only supports 4k pages for now. */
	return PG_LEVEL_4K;

And not worry about lpage_infor for the time being, until we actually do 
support larger pages.

and basically defer to gmem as long as gmem should be used for this gfn?

There is another call to __kvm_mmu_max_mapping_level() via
kvm_mmu_max_mapping_level() beginning from recover_huge_pages_range(),
and IIUC that doesn't go through guest_memfd.

Hence, unlike the call to __kvm_mmu_max_mapping_level() from the KVM x86
MMU fault path, guest_memfd didn't get a chance to provide its input in
the form of returning max_order from kvm_gmem_get_pfn().

Right, we essentially say that "this is a private fault", likely 
assuming that we already verified earlier that the memory is also private.

[I can see that happening when the function is called through 
direct_page_fault()]

We could simply call kvm_mmu_max_mapping_level() from 
kvm_mmu_hugepage_adjust() I guess. (could possibly be optimized later)

--
Cheers,

David / dhildenb