Re: [RFC PATCH 10/21] KVM: x86/mmu: Disallow page merging (huge page adjustment) for mirror root

Yan Zhao <yan.y.zhao@xxxxxxxxx> · Tue, 20 May 2025 18:11:58 +0800

On Tue, May 20, 2025 at 01:42:20AM +0800, Edgecombe, Rick P wrote:
> On Mon, 2025-05-19 at 11:57 +0800, Yan Zhao wrote:
> > Like below?
> > 
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index 1b2bacde009f..0e4a03f44036 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -1275,6 +1275,11 @@ static int tdp_mmu_link_sp(struct kvm *kvm, struct tdp_iter *iter,
> >         return 0;
> >  }
> > 
> > +static inline bool is_fault_disallow_huge_page_adust(struct kvm_page_fault *fault, bool is_mirror)
> > +{
> > +       return fault->nx_huge_page_workaround_enabled || is_mirror;
> > +}
> 
> Err, no. It doesn't seem worth it.
> 
> > +
> >  /*
> >   * Handle a TDP page fault (NPT/EPT violation/misconfiguration) by installing
> >   * page tables and SPTEs to translate the faulting guest physical address.
> > @@ -1297,7 +1302,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> >         for_each_tdp_pte(iter, kvm, root, fault->gfn, fault->gfn + 1) {
> >                 int r;
> > 
> > -               if (fault->nx_huge_page_workaround_enabled || is_mirror)
> > +               if (is_fault_disallow_huge_page_adust(fault, is_mirror))
> >                         disallowed_hugepage_adjust(fault, iter.old_spte, iter.level, is_mirror);
> > 
> >                 /*
> > 
> > 
> > 
> > > Also, why not check is_mirror_sp() in disallowed_hugepage_adjust() instead of
> > > passing in an is_mirror arg?
> > It's an optimization.
> 
> But disallowed_hugepage_adjust() is already checking the sp.
> 
> I think part of the thing that is bugging me is that
> nx_huge_page_workaround_enabled is not conceptually about whether the specific
> fault/level needs to disallow huge page adjustments, it's whether it needs to
> check if it does. Then disallowed_hugepage_adjust() does the actual specific
> checking. But for the mirror logic the check is the same for both. It's
> asymmetric with NX huge pages, and just sort of jammed in. It would be easier to
> follow if the kvm_tdp_mmu_map() conditional checked wither mirror TDP was
> "active", rather than the mirror role.
You are right. It looks clearer.

> > 
> > As is_mirror_sptep(iter->sptep) == is_mirror_sp(root), passing in is_mirror arg
> > can avoid checking mirror for each sp, which remains unchanged in a root.
> 
> Why not just this. It seems easier to comprehend to me. It does add a little bit
> of extra checking in the shared fault for TDX only. I think it's ok and better
> not to litter the generic MMU code.
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index a284dce227a0..37ca77f2ee15 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3328,11 +3328,13 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu,
> struct kvm_page_fault *fault
>  
>  void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int
> cur_level)
>  {
> +       struct kvm_mmu_page * sp = spte_to_child_sp(spte);
if !is_shadow_present_pte(spte) or spte is an leaf entry, it's incorrect to
retrieve child sp. So, maybe

-           spte_to_child_sp(spte)->nx_huge_page_disallowed) {
+           (spte_to_child_sp(spte)->nx_huge_page_disallowed &&
+            is_mirror_sp(spte_to_child_sp(spte))) {

Other changes look good to me.

> +
>         if (cur_level > PG_LEVEL_4K &&
>             cur_level == fault->goal_level &&
>             is_shadow_present_pte(spte) &&
>             !is_large_pte(spte) &&
> -           spte_to_child_sp(spte)->nx_huge_page_disallowed) {
> +           (sp->nx_huge_page_disallowed || sp->role.is_mirror)) {
>                 /*
>                  * A small SPTE exists for this pfn, but FNAME(fetch),
>                  * direct_map(), or kvm_tdp_mmu_map() would like to create a
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 405874f4d088..1d22994576b5 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -1244,6 +1244,8 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct
> kvm_page_fault *fault)
>         struct tdp_iter iter;
>         struct kvm_mmu_page *sp;
>         int ret = RET_PF_RETRY;
> +       bool hugepage_adjust_disallowed = fault->nx_huge_page_workaround_enabled
> ||
> +                                         kvm_has_mirrored_tdp(kvm);
>  
>         kvm_mmu_hugepage_adjust(vcpu, fault);
>  
> @@ -1254,7 +1256,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct
> kvm_page_fault *fault)
>         for_each_tdp_pte(iter, kvm, root, fault->gfn, fault->gfn + 1) {
>                 int r;
>  
> -               if (fault->nx_huge_page_workaround_enabled)
> +               if (hugepage_adjust_disallowed)
>                         disallowed_hugepage_adjust(fault, iter.old_spte,
> iter.level);
>  
>                 /*
> 
>  
>                 /*
>