David Hildenbrand <david@xxxxxxxxxx> writes: > On 03.05.25 00:00, Ackerley Tng wrote: >> Sean Christopherson <seanjc@xxxxxxxxxx> writes: >> >>> On Fri, May 02, 2025, David Hildenbrand wrote: >>>> On 30.04.25 20:58, Ackerley Tng wrote: >>>>>> - if (is_private) >>>>>> + if (is_gmem) >>>>>> return max_level; >>>>> >>>>> I think this renaming isn't quite accurate. >>>> >>>> After our discussion yesterday, does that still hold true? >>> >>> No. >>> >>>>> IIUC in __kvm_mmu_max_mapping_level(), we skip considering >>>>> host_pfn_mapping_level() if the gfn is private because private memory >>>>> will not be mapped to userspace, so there's no need to query userspace >>>>> page tables in host_pfn_mapping_level(). >>>> >>>> I think the reason was that: for private we won't be walking the user space >>>> pages tables. >>>> >>>> Once guest_memfd is also responsible for the shared part, why should this >>>> here still be private-only, and why should we consider querying a user space >>>> mapping that might not even exist? >>> >>> +1, one of the big selling points for guest_memfd beyond CoCo is that it provides >>> guest-first memory. It is very explicitly an intended feature that the guest >>> mappings KVM creates can be a superset of the host userspace mappings. E.g. the >>> guest can use larger page sizes, have RW while the host has RO, etc. >> >> Do you mean that __kvm_mmu_max_mapping_level() should, in addition to >> the parameter renaming from is_private to is_gmem, do something like >> >> if (is_gmem) >> return kvm_gmem_get_max_mapping_level(slot, gfn); > > I assume you mean, not looking at lpage_info at all? > My bad. I actually meant just to take input from guest_memfd and stop there without checking with host page tables, perhaps something like min(kvm_gmem_get_max_mapping_level(slot, gfn), max_level); > I have limited understanding what lpage_info is or what it does. I > believe all it adds is a mechanism to *disable* large page mappings. > This is my understanding too. > We want to disable large pages if (using 2M region as example) > > (a) Mixed memory attributes. If a PFN falls into a 2M region, and parts > of that region are shared vs. private (mixed memory attributes -> > KVM_LPAGE_MIXED_FLAG) > > -> With gmem-shared we could have mixed memory attributes, not a PFN > fracturing. (PFNs don't depend on memory attributes) > > (b) page track: intercepting (mostly write) access to GFNs > Could you explain more about page track case? > > So, I wonder if we still have to take care of lpage_info, at least for > handling (b) correctly [I assume so]. Regarding (a) I am not sure: once > memory attributes are handled by gmem in the gmem-shared case. IIRC, > with AMD SEV we might still have to honor it? But gmem itself could > handle that. > For AMD SEV, I believe kvm_max_private_mapping_level() already takes care of that, at least for the MMU faulting path [1], where guest_memfd gives input using max_order, then arch-specific callback contributes input. > > What we could definitely do here for now is: > > if (is_gmem) > /* gmem only supports 4k pages for now. */ > return PG_LEVEL_4K; > > And not worry about lpage_infor for the time being, until we actually do > support larger pages. > > Perhaps this is better explained as an RFC in code. I'll put in a patch as part of Fuad's series if Fuad doesn't mind. >> >> and basically defer to gmem as long as gmem should be used for this gfn? >> >> There is another call to __kvm_mmu_max_mapping_level() via >> kvm_mmu_max_mapping_level() beginning from recover_huge_pages_range(), >> and IIUC that doesn't go through guest_memfd. >> >> Hence, unlike the call to __kvm_mmu_max_mapping_level() from the KVM x86 >> MMU fault path, guest_memfd didn't get a chance to provide its input in >> the form of returning max_order from kvm_gmem_get_pfn(). > > Right, we essentially say that "this is a private fault", likely > assuming that we already verified earlier that the memory is also private. > > [I can see that happening when the function is called through > direct_page_fault()] > > We could simply call kvm_mmu_max_mapping_level() from > kvm_mmu_hugepage_adjust() I guess. (could possibly be optimized later) > > -- > Cheers, > > David / dhildenb [1] https://github.com/torvalds/linux/blob/01f95500a162fca88cefab9ed64ceded5afabc12/arch/x86/kvm/mmu/mmu.c#L4480