Re: [PATCH v15 11/21] KVM: x86/mmu: Allow NULL-able fault in kvm_max_private_mapping_level

Fuad Tabba <tabba@xxxxxxxxxx> · Tue, 22 Jul 2025 12:08:06 +0100

On Tue, 22 Jul 2025 at 06:36, Xiaoyao Li <xiaoyao.li@xxxxxxxxx> wrote:
>
> On 7/22/2025 7:17 AM, Sean Christopherson wrote:
> > On Fri, Jul 18, 2025, Xiaoyao Li wrote:
> >> On 7/18/2025 12:27 AM, Fuad Tabba wrote:
> >>> From: Ackerley Tng <ackerleytng@xxxxxxxxxx>
> >>>
> >>> Refactor kvm_max_private_mapping_level() to accept a NULL kvm_page_fault
> >>> pointer and rename it to kvm_gmem_max_mapping_level().
> >>>
> >>> The max_mapping_level x86 operation (previously private_max_mapping_level)
> >>> is designed to potentially be called without an active page fault, for
> >>> instance, when kvm_mmu_max_mapping_level() is determining the maximum
> >>> mapping level for a gfn proactively.
> >>>
> >>> Allow NULL fault pointer: Modify kvm_max_private_mapping_level() to
> >>> safely handle a NULL fault argument. This aligns its interface with the
> >>> kvm_x86_ops.max_mapping_level operation it wraps, which can also be
> >>> called with NULL.
> >>
> >> are you sure of it?
> >>
> >> The patch 09 just added the check of fault->is_private for TDX and SEV.
> >
> > +1, this isn't quite right.  That's largely my fault (no pun intended) though, as
> > I suggested the basic gist of the NULL @fault handling, and it's a mess.  More at
> > the bottom.
> >
> >>> Rename function to kvm_gmem_max_mapping_level(): This reinforces that
> >>> the function's scope is for guest_memfd-backed memory, which can be
> >>> either private or non-private, removing any remaining "private"
> >>> connotation from its name.
> >>>
> >>> Optimize max_level checks: Introduce a check in the caller to skip
> >>> querying for max_mapping_level if the current max_level is already
> >>> PG_LEVEL_4K, as no further reduction is possible.
> >>>
> >>> Acked-by: David Hildenbrand <david@xxxxxxxxxx>
> >>> Suggested-by: Sean Christoperson <seanjc@xxxxxxxxxx>
> >>> Signed-off-by: Ackerley Tng <ackerleytng@xxxxxxxxxx>
> >>> Signed-off-by: Fuad Tabba <tabba@xxxxxxxxxx>
> >>> ---
> >>>    arch/x86/kvm/mmu/mmu.c | 16 +++++++---------
> >>>    1 file changed, 7 insertions(+), 9 deletions(-)
> >>>
> >>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> >>> index bb925994cbc5..6bd28fda0fd3 100644
> >>> --- a/arch/x86/kvm/mmu/mmu.c
> >>> +++ b/arch/x86/kvm/mmu/mmu.c
> >>> @@ -4467,17 +4467,13 @@ static inline u8 kvm_max_level_for_order(int order)
> >>>     return PG_LEVEL_4K;
> >>>    }
> >>> -static u8 kvm_max_private_mapping_level(struct kvm *kvm,
> >>> -                                   struct kvm_page_fault *fault,
> >>> -                                   int gmem_order)
> >>> +static u8 kvm_gmem_max_mapping_level(struct kvm *kvm, int order,
> >>> +                                struct kvm_page_fault *fault)
> >>>    {
> >>> -   u8 max_level = fault->max_level;
> >>>     u8 req_max_level;
> >>> +   u8 max_level;
> >>> -   if (max_level == PG_LEVEL_4K)
> >>> -           return PG_LEVEL_4K;
> >>> -
> >>> -   max_level = min(kvm_max_level_for_order(gmem_order), max_level);
> >>> +   max_level = kvm_max_level_for_order(order);
> >>>     if (max_level == PG_LEVEL_4K)
> >>>             return PG_LEVEL_4K;
> >>> @@ -4513,7 +4509,9 @@ static int kvm_mmu_faultin_pfn_private(struct kvm_vcpu *vcpu,
> >>>     }
> >>>     fault->map_writable = !(fault->slot->flags & KVM_MEM_READONLY);
> >>> -   fault->max_level = kvm_max_private_mapping_level(vcpu->kvm, fault, max_order);
> >>> +   if (fault->max_level >= PG_LEVEL_4K)
> >>> +           fault->max_level = kvm_gmem_max_mapping_level(vcpu->kvm,
> >>> +                                                         max_order, fault);
> >>
> >> I cannot understand why this change is required. In what case will
> >> fault->max_level < PG_LEVEL_4K?
> >
> > Yeah, I don't get this code either.  I also don't think KVM should call
> > kvm_gmem_max_mapping_level() *here*.  That's mostly a problem with my suggested
> > NULL @fault handling.  Dealing with kvm_gmem_max_mapping_level() here leads to
> > weirdness, because kvm_gmem_max_mapping_level() also needs to be invoked for the
> > !fault path, and then we end up with multiple call sites and the potential for a
> > redundant call (gmem only, is private).
> >
> > Looking through surrounding patches, the ordering of things is also "off".
> > "Generalize private_max_mapping_level x86 op to max_mapping_level" should just
> > rename the helper; reacting to !is_private memory in TDX belongs in "Consult
> > guest_memfd when computing max_mapping_level", because that's where KVM plays
> > nice with non-private memory.
> >
> > But that patch is also doing too much, e.g. shuffling code around and short-circuting
> > the non-fault case, which makes it confusing and hard to review.  Extending gmem
> > hugepage support to shared memory should be "just" this:
> >
> > @@ -3335,8 +3336,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,
> >          if (max_level == PG_LEVEL_4K)
> >                  return PG_LEVEL_4K;
> >
> > -       if (is_private)
> > -               host_level = kvm_max_private_mapping_level(kvm, fault, slot, gfn);
> > +       if (is_private || kvm_memslot_is_gmem_only(slot))
> > +               host_level = kvm_gmem_max_mapping_level(kvm, fault, slot, gfn,
> > +                                                       is_private);
> >          else
> >                  host_level = host_pfn_mapping_level(kvm, gfn, slot);
> >          return min(host_level, max_level);
> >
> > plus the plumbing and the small TDX change.  All the renames and code shuffling
> > should be done in prep patches.
> >
> > The attached patches are compile-tested only, but I think they get use where we
> > want to be, and without my confusing suggestion to try and punt on private mappings
> > in the hugepage recovery paths.  They should slot it at the right patch numbers
> > (relative to v15).
> >
> > Holler if the patches don't work, I'm happy to help sort things out so that v16
> > is ready to go.
>
> I have some feedback though the attached patches function well.
>
> - In 0010-KVM-x86-mmu-Rename-.private_max_mapping_level-to-.gm.patch,
> there is double gmem in the name of vmx/vt 's callback implementation:
>
>      vt_gmem_gmem_max_mapping_level
>      tdx_gmem_gmem_max_mapping_level
>      vt_op_tdx_only(gmem_gmem_max_mapping_level)

Sean's patches do that, then he fixes it in a later patch. I'll fix
this at the source.

> - In 0013-KVM-x86-mmu-Extend-guest_memfd-s-max-mapping-level-t.patch,
>    kvm_x86_call(gmem_max_mapping_level)(...) returns 0 for !private case.
>    It's not correct though it works without issue currently.
>
>    Because current gmem doesn't support hugepage so that the max_level
>    gotten from gmem is always PG_LEVEL_4K and it returns early in
>    kvm_gmem_max_mapping_level() on
>
>         if (max_level == PG_LEVEL_4K)
>                 return max_level;
>
>    But just look at the following case:
>
>      return min(max_level,
>         kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private));
>
>    For non-TDX case and non-SNP case, it will return 0, i.e.
>    PG_LEVEL_NONE eventually.
>
>    so either 1) return PG_LEVEL_NUM/PG_LEVEL_1G for the cases where
>    .gmem_max_mapping_level callback doesn't have specific restriction.
>
>    or 2)
>
>         tmp = kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private);
>         if (tmp)
>                 return min(max_level, tmp);
>
>         return max-level;

Sean? What do you think?

Thanks!
/fuad