On Tue, 22 Jul 2025 at 06:36, Xiaoyao Li <xiaoyao.li@xxxxxxxxx> wrote: > > On 7/22/2025 7:17 AM, Sean Christopherson wrote: > > On Fri, Jul 18, 2025, Xiaoyao Li wrote: > >> On 7/18/2025 12:27 AM, Fuad Tabba wrote: > >>> From: Ackerley Tng <ackerleytng@xxxxxxxxxx> > >>> > >>> Refactor kvm_max_private_mapping_level() to accept a NULL kvm_page_fault > >>> pointer and rename it to kvm_gmem_max_mapping_level(). > >>> > >>> The max_mapping_level x86 operation (previously private_max_mapping_level) > >>> is designed to potentially be called without an active page fault, for > >>> instance, when kvm_mmu_max_mapping_level() is determining the maximum > >>> mapping level for a gfn proactively. > >>> > >>> Allow NULL fault pointer: Modify kvm_max_private_mapping_level() to > >>> safely handle a NULL fault argument. This aligns its interface with the > >>> kvm_x86_ops.max_mapping_level operation it wraps, which can also be > >>> called with NULL. > >> > >> are you sure of it? > >> > >> The patch 09 just added the check of fault->is_private for TDX and SEV. > > > > +1, this isn't quite right. That's largely my fault (no pun intended) though, as > > I suggested the basic gist of the NULL @fault handling, and it's a mess. More at > > the bottom. > > > >>> Rename function to kvm_gmem_max_mapping_level(): This reinforces that > >>> the function's scope is for guest_memfd-backed memory, which can be > >>> either private or non-private, removing any remaining "private" > >>> connotation from its name. > >>> > >>> Optimize max_level checks: Introduce a check in the caller to skip > >>> querying for max_mapping_level if the current max_level is already > >>> PG_LEVEL_4K, as no further reduction is possible. > >>> > >>> Acked-by: David Hildenbrand <david@xxxxxxxxxx> > >>> Suggested-by: Sean Christoperson <seanjc@xxxxxxxxxx> > >>> Signed-off-by: Ackerley Tng <ackerleytng@xxxxxxxxxx> > >>> Signed-off-by: Fuad Tabba <tabba@xxxxxxxxxx> > >>> --- > >>> arch/x86/kvm/mmu/mmu.c | 16 +++++++--------- > >>> 1 file changed, 7 insertions(+), 9 deletions(-) > >>> > >>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > >>> index bb925994cbc5..6bd28fda0fd3 100644 > >>> --- a/arch/x86/kvm/mmu/mmu.c > >>> +++ b/arch/x86/kvm/mmu/mmu.c > >>> @@ -4467,17 +4467,13 @@ static inline u8 kvm_max_level_for_order(int order) > >>> return PG_LEVEL_4K; > >>> } > >>> -static u8 kvm_max_private_mapping_level(struct kvm *kvm, > >>> - struct kvm_page_fault *fault, > >>> - int gmem_order) > >>> +static u8 kvm_gmem_max_mapping_level(struct kvm *kvm, int order, > >>> + struct kvm_page_fault *fault) > >>> { > >>> - u8 max_level = fault->max_level; > >>> u8 req_max_level; > >>> + u8 max_level; > >>> - if (max_level == PG_LEVEL_4K) > >>> - return PG_LEVEL_4K; > >>> - > >>> - max_level = min(kvm_max_level_for_order(gmem_order), max_level); > >>> + max_level = kvm_max_level_for_order(order); > >>> if (max_level == PG_LEVEL_4K) > >>> return PG_LEVEL_4K; > >>> @@ -4513,7 +4509,9 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu, > >>> } > >>> fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY); > >>> - fault->max_level = kvm_max_private_mapping_level(vcpu->kvm, fault, max_order); > >>> + if (fault->max_level >= PG_LEVEL_4K) > >>> + fault->max_level = kvm_gmem_max_mapping_level(vcpu->kvm, > >>> + max_order, fault); > >> > >> I cannot understand why this change is required. In what case will > >> fault->max_level < PG_LEVEL_4K? > > > > Yeah, I don't get this code either. I also don't think KVM should call > > kvm_gmem_max_mapping_level() *here*. That's mostly a problem with my suggested > > NULL @fault handling. Dealing with kvm_gmem_max_mapping_level() here leads to > > weirdness, because kvm_gmem_max_mapping_level() also needs to be invoked for the > > !fault path, and then we end up with multiple call sites and the potential for a > > redundant call (gmem only, is private). > > > > Looking through surrounding patches, the ordering of things is also "off". > > "Generalize private_max_mapping_level x86 op to max_mapping_level" should just > > rename the helper; reacting to !is_private memory in TDX belongs in "Consult > > guest_memfd when computing max_mapping_level", because that's where KVM plays > > nice with non-private memory. > > > > But that patch is also doing too much, e.g. shuffling code around and short-circuting > > the non-fault case, which makes it confusing and hard to review. Extending gmem > > hugepage support to shared memory should be "just" this: > > > > @@ -3335,8 +3336,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault, > > if (max_level == PG_LEVEL_4K) > > return PG_LEVEL_4K; > > > > - if (is_private) > > - host_level = kvm_max_private_mapping_level(kvm, fault, slot, gfn); > > + if (is_private || kvm_memslot_is_gmem_only(slot)) > > + host_level = kvm_gmem_max_mapping_level(kvm, fault, slot, gfn, > > + is_private); > > else > > host_level = host_pfn_mapping_level(kvm, gfn, slot); > > return min(host_level, max_level); > > > > plus the plumbing and the small TDX change. All the renames and code shuffling > > should be done in prep patches. > > > > The attached patches are compile-tested only, but I think they get use where we > > want to be, and without my confusing suggestion to try and punt on private mappings > > in the hugepage recovery paths. They should slot it at the right patch numbers > > (relative to v15). > > > > Holler if the patches don't work, I'm happy to help sort things out so that v16 > > is ready to go. > > I have some feedback though the attached patches function well. > > - In 0010-KVM-x86-mmu-Rename-.private_max_mapping_level-to-.gm.patch, > there is double gmem in the name of vmx/vt 's callback implementation: > > vt_gmem_gmem_max_mapping_level > tdx_gmem_gmem_max_mapping_level > vt_op_tdx_only(gmem_gmem_max_mapping_level) Sean's patches do that, then he fixes it in a later patch. I'll fix this at the source. > - In 0013-KVM-x86-mmu-Extend-guest_memfd-s-max-mapping-level-t.patch, > kvm_x86_call(gmem_max_mapping_level)(...) returns 0 for !private case. > It's not correct though it works without issue currently. > > Because current gmem doesn't support hugepage so that the max_level > gotten from gmem is always PG_LEVEL_4K and it returns early in > kvm_gmem_max_mapping_level() on > > if (max_level == PG_LEVEL_4K) > return max_level; > > But just look at the following case: > > return min(max_level, > kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private)); > > For non-TDX case and non-SNP case, it will return 0, i.e. > PG_LEVEL_NONE eventually. > > so either 1) return PG_LEVEL_NUM/PG_LEVEL_1G for the cases where > .gmem_max_mapping_level callback doesn't have specific restriction. > > or 2) > > tmp = kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private); > if (tmp) > return min(max_level, tmp); > > return max-level; Sean? What do you think? Thanks! /fuad