On Tue, 22 Jul 2025 at 00:17, Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Fri, Jul 18, 2025, Xiaoyao Li wrote: > > On 7/18/2025 12:27 AM, Fuad Tabba wrote: > > > From: Ackerley Tng <ackerleytng@xxxxxxxxxx> > > > > > > Refactor kvm_max_private_mapping_level() to accept a NULL kvm_page_fault > > > pointer and rename it to kvm_gmem_max_mapping_level(). > > > > > > The max_mapping_level x86 operation (previously private_max_mapping_level) > > > is designed to potentially be called without an active page fault, for > > > instance, when kvm_mmu_max_mapping_level() is determining the maximum > > > mapping level for a gfn proactively. > > > > > > Allow NULL fault pointer: Modify kvm_max_private_mapping_level() to > > > safely handle a NULL fault argument. This aligns its interface with the > > > kvm_x86_ops.max_mapping_level operation it wraps, which can also be > > > called with NULL. > > > > are you sure of it? > > > > The patch 09 just added the check of fault->is_private for TDX and SEV. > > +1, this isn't quite right. That's largely my fault (no pun intended) though, as > I suggested the basic gist of the NULL @fault handling, and it's a mess. More at > the bottom. > > > > Rename function to kvm_gmem_max_mapping_level(): This reinforces that > > > the function's scope is for guest_memfd-backed memory, which can be > > > either private or non-private, removing any remaining "private" > > > connotation from its name. > > > > > > Optimize max_level checks: Introduce a check in the caller to skip > > > querying for max_mapping_level if the current max_level is already > > > PG_LEVEL_4K, as no further reduction is possible. > > > > > > Acked-by: David Hildenbrand <david@xxxxxxxxxx> > > > Suggested-by: Sean Christoperson <seanjc@xxxxxxxxxx> > > > Signed-off-by: Ackerley Tng <ackerleytng@xxxxxxxxxx> > > > Signed-off-by: Fuad Tabba <tabba@xxxxxxxxxx> > > > --- > > > arch/x86/kvm/mmu/mmu.c | 16 +++++++--------- > > > 1 file changed, 7 insertions(+), 9 deletions(-) > > > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > > index bb925994cbc5..6bd28fda0fd3 100644 > > > --- a/arch/x86/kvm/mmu/mmu.c > > > +++ b/arch/x86/kvm/mmu/mmu.c > > > @@ -4467,17 +4467,13 @@ static inline u8 kvm_max_level_for_order(int order) > > > return PG_LEVEL_4K; > > > } > > > -static u8 kvm_max_private_mapping_level(struct kvm *kvm, > > > - struct kvm_page_fault *fault, > > > - int gmem_order) > > > +static u8 kvm_gmem_max_mapping_level(struct kvm *kvm, int order, > > > + struct kvm_page_fault *fault) > > > { > > > - u8 max_level = fault->max_level; > > > u8 req_max_level; > > > + u8 max_level; > > > - if (max_level == PG_LEVEL_4K) > > > - return PG_LEVEL_4K; > > > - > > > - max_level = min(kvm_max_level_for_order(gmem_order), max_level); > > > + max_level = kvm_max_level_for_order(order); > > > if (max_level == PG_LEVEL_4K) > > > return PG_LEVEL_4K; > > > @@ -4513,7 +4509,9 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu, > > > } > > > fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY); > > > - fault->max_level = kvm_max_private_mapping_level(vcpu->kvm, fault, max_order); > > > + if (fault->max_level >= PG_LEVEL_4K) > > > + fault->max_level = kvm_gmem_max_mapping_level(vcpu->kvm, > > > + max_order, fault); > > > > I cannot understand why this change is required. In what case will > > fault->max_level < PG_LEVEL_4K? > > Yeah, I don't get this code either. I also don't think KVM should call > kvm_gmem_max_mapping_level() *here*. That's mostly a problem with my suggested > NULL @fault handling. Dealing with kvm_gmem_max_mapping_level() here leads to > weirdness, because kvm_gmem_max_mapping_level() also needs to be invoked for the > !fault path, and then we end up with multiple call sites and the potential for a > redundant call (gmem only, is private). > > Looking through surrounding patches, the ordering of things is also "off". > "Generalize private_max_mapping_level x86 op to max_mapping_level" should just > rename the helper; reacting to !is_private memory in TDX belongs in "Consult > guest_memfd when computing max_mapping_level", because that's where KVM plays > nice with non-private memory. > > But that patch is also doing too much, e.g. shuffling code around and short-circuting > the non-fault case, which makes it confusing and hard to review. Extending gmem > hugepage support to shared memory should be "just" this: > > @@ -3335,8 +3336,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault, > if (max_level == PG_LEVEL_4K) > return PG_LEVEL_4K; > > - if (is_private) > - host_level = kvm_max_private_mapping_level(kvm, fault, slot, gfn); > + if (is_private || kvm_memslot_is_gmem_only(slot)) > + host_level = kvm_gmem_max_mapping_level(kvm, fault, slot, gfn, > + is_private); > else > host_level = host_pfn_mapping_level(kvm, gfn, slot); > return min(host_level, max_level); > > plus the plumbing and the small TDX change. All the renames and code shuffling > should be done in prep patches. > > The attached patches are compile-tested only, but I think they get use where we > want to be, and without my confusing suggestion to try and punt on private mappings > in the hugepage recovery paths. They should slot it at the right patch numbers > (relative to v15). > > Holler if the patches don't work, I'm happy to help sort things out so that v16 > is ready to go. These patches apply, build, and run. I'll incorporate them, test them a bit more with allmodconf and friends, along with the other patch that you suggested, and respin v16 soon. Cheers, /fuad