Re: [PATCH v15 11/21] KVM: x86/mmu: Allow NULL-able fault in kvm_max_private_mapping_level

Fuad Tabba <tabba@xxxxxxxxxx> · Tue, 22 Jul 2025 16:30:46 +0100

On Tue, 22 Jul 2025 at 15:32, Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Tue, Jul 22, 2025, Fuad Tabba wrote:
> > On Tue, 22 Jul 2025 at 06:36, Xiaoyao Li <xiaoyao.li@xxxxxxxxx> wrote:
> > > - In 0010-KVM-x86-mmu-Rename-.private_max_mapping_level-to-.gm.patch,
> > > there is double gmem in the name of vmx/vt 's callback implementation:
> > >
> > >      vt_gmem_gmem_max_mapping_level
> > >      tdx_gmem_gmem_max_mapping_level
> > >      vt_op_tdx_only(gmem_gmem_max_mapping_level)
> >
> > Sean's patches do that, then he fixes it in a later patch. I'll fix
> > this at the source.
>
> Dagnabbit.  I goofed a search+replace, caught it when re-reading things, and
> fixed-up the wrong commit.  Sorry :-(
>
> > > - In 0013-KVM-x86-mmu-Extend-guest_memfd-s-max-mapping-level-t.patch,
> > >    kvm_x86_call(gmem_max_mapping_level)(...) returns 0 for !private case.
> > >    It's not correct though it works without issue currently.
> > >
> > >    Because current gmem doesn't support hugepage so that the max_level
> > >    gotten from gmem is always PG_LEVEL_4K and it returns early in
> > >    kvm_gmem_max_mapping_level() on
> > >
> > >         if (max_level == PG_LEVEL_4K)
> > >                 return max_level;
> > >
> > >    But just look at the following case:
> > >
> > >      return min(max_level,
> > >         kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private));
> > >
> > >    For non-TDX case and non-SNP case, it will return 0, i.e.
> > >    PG_LEVEL_NONE eventually.
> > >
> > >    so either 1) return PG_LEVEL_NUM/PG_LEVEL_1G for the cases where
> > >    .gmem_max_mapping_level callback doesn't have specific restriction.
> > >
> > >    or 2)
> > >
> > >         tmp = kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private);
> > >         if (tmp)
> > >                 return min(max_level, tmp);
> > >
> > >         return max-level;
> >
> > Sean? What do you think?
>
> #2, because KVM uses a "ret0" static call when TDX is disabled (and KVM should
> do the same when SEV is disabled, but the SEV #ifdefs are still a bit messy).
> Switching to any other value would require adding a VMX stubs for the !TDX case.
>
> I think it makes sense to explicitly call that out as the "CoCo level", to help
> unfamiliar readers understand why vendor code has any say in the max
> mapping level.
>
> And I would say we adjust max_level instead of having an early return, e.g. to
> reduce the probability of future bugs due to adding code between the call to
> .gmem_max_mapping_level() and the final return.
>
> This as fixup?

Applied it to my tree. Builds and runs fine. Thanks!
/fuad

> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index eead5dca6f72..a51013e0992a 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3279,9 +3279,9 @@ static u8 kvm_gmem_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fau
>                                      const struct kvm_memory_slot *slot, gfn_t gfn,
>                                      bool is_private)
>  {
> +       u8 max_level, coco_level;
>         struct page *page;
>         kvm_pfn_t pfn;
> -       u8 max_level;
>
>         /* For faults, use the gmem information that was resolved earlier. */
>         if (fault) {
> @@ -3305,8 +3305,16 @@ static u8 kvm_gmem_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fau
>         if (max_level == PG_LEVEL_4K)
>                 return max_level;
>
> -       return min(max_level,
> -                  kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private));
> +       /*
> +        * CoCo may influence the max mapping level, e.g. due to RMP or S-EPT
> +        * restrictions.  A return of '0' means "no additional restrictions",
> +        * to allow for using an optional "ret0" static call.
> +        */
> +       coco_level = kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private);
> +       if (coco_level)
> +               max_level = min(max_level, coco_level);
> +
> +       return max_level;
>  }
>
>  int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,