On Wed, Sep 03, 2025 at 02:57:07PM +0800, Binbin Wu wrote: > > > On 8/7/2025 5:43 PM, Yan Zhao wrote: > > Introduce kvm_split_cross_boundary_leafs() to split huge leaf entries that > > cross the boundary of a specified range. > > > > Splitting huge leaf entries that cross the boundary is essential before > > zapping the range in the mirror root. This ensures that the subsequent zap > > operation does not affect any GFNs outside the specified range. This is > > crucial for the mirror root, as the private page table requires the guest's > > ACCEPT operation after a GFN faults back. > > > > The core of kvm_split_cross_boundary_leafs() leverages the main logic from > > tdp_mmu_split_huge_pages_root(). It traverses the specified root and splits > > huge leaf entries if they cross the range boundary. When splitting is > > necessary, kvm->mmu_lock is temporarily released for memory allocation, > > which means returning -ENOMEM is possible. > > > > Signed-off-by: Xiaoyao Li <xiaoyao.li@xxxxxxxxx> > > Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> > > Signed-off-by: Yan Zhao <yan.y.zhao@xxxxxxxxx> > > --- > > RFC v2: > > - Rename the API to kvm_split_cross_boundary_leafs(). > > - Make the API to be usable for direct roots or under shared mmu_lock. > > - Leverage the main logic from tdp_mmu_split_huge_pages_root(). (Rick) > > > > RFC v1: > > - Split patch. > > - introduced API kvm_split_boundary_leafs(), refined the logic and > > simplified the code. > > --- > > arch/x86/kvm/mmu/mmu.c | 27 +++++++++++++++ > > arch/x86/kvm/mmu/tdp_mmu.c | 68 ++++++++++++++++++++++++++++++++++++-- > > arch/x86/kvm/mmu/tdp_mmu.h | 3 ++ > > include/linux/kvm_host.h | 2 ++ > > 4 files changed, 97 insertions(+), 3 deletions(-) > > > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index 9182192daa3a..13910ae05f76 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > > @@ -1647,6 +1647,33 @@ static bool __kvm_rmap_zap_gfn_range(struct kvm *kvm, > > start, end - 1, can_yield, true, flush); > > } > > +/* > > + * Split large leafs crossing the boundary of the specified range > > + * > > + * Return value: > > + * 0 : success, no flush is required; > > + * 1 : success, flush is required; > > + * <0: failure. > > + */ > > +int kvm_split_cross_boundary_leafs(struct kvm *kvm, struct kvm_gfn_range *range, > > + bool shared) > > +{ > > + bool ret = 0; > > + > > + lockdep_assert_once(kvm->mmu_invalidate_in_progress || > > + lockdep_is_held(&kvm->slots_lock) || > > + srcu_read_lock_held(&kvm->srcu)); > > + > > + if (!range->may_block) > > + return -EOPNOTSUPP; > > + > > + if (tdp_mmu_enabled) > > + ret = kvm_tdp_mmu_gfn_range_split_cross_boundary_leafs(kvm, range, shared); > > + > > + return ret; > > +} > > +EXPORT_SYMBOL_GPL(kvm_split_cross_boundary_leafs); > > + > > bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) > > { > > bool flush = false; > > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c > > index ce49cc850ed5..62a09a9655c3 100644 > > --- a/arch/x86/kvm/mmu/tdp_mmu.c > > +++ b/arch/x86/kvm/mmu/tdp_mmu.c > > @@ -1574,10 +1574,17 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter, > > return ret; > > } > > +static bool iter_cross_boundary(struct tdp_iter *iter, gfn_t start, gfn_t end) > > +{ > > + return !(iter->gfn >= start && > > + (iter->gfn + KVM_PAGES_PER_HPAGE(iter->level)) <= end); > > +} > > + > > static int tdp_mmu_split_huge_pages_root(struct kvm *kvm, > > struct kvm_mmu_page *root, > > gfn_t start, gfn_t end, > > - int target_level, bool shared) > > + int target_level, bool shared, > > + bool only_cross_bounday, bool *flush) > s/only_cross_bounday/only_cross_boundary Will fix. > > { > > struct kvm_mmu_page *sp = NULL; > > struct tdp_iter iter; > > @@ -1589,6 +1596,13 @@ static int tdp_mmu_split_huge_pages_root(struct kvm *kvm, > > * level into one lower level. For example, if we encounter a 1GB page > > * we split it into 512 2MB pages. > > * > > + * When only_cross_bounday is true, just split huge pages above the > > + * target level into one lower level if the huge pages cross the start > > + * or end boundary. > > + * > > + * No need to update @flush for !only_cross_bounday cases, which rely > > + * on the callers to do the TLB flush in the end. > > I think API wise, it's a bit confusing, although it's a local API. > If just look at the API without digging into the function implementation, my > initial thought is *flush will tell whether TLB flush is needed or not. > > Just update *flush unconditionally? Or move the comment as the description for > the function to call it out? > > I have thought another option to combine the two inputs, i.e., if *flush is a > valid pointer, it means it's for only_cross_boundary. Otherwise, just passing > NULL. But then I felt it was a bit risky to reply on the pointer to indicate the > scenario. I feel it's better not to combine flush and only_cross_boundary. Will add a function description to tdp_mmu_split_huge_pages_root(). > > + * > > * Since the TDP iterator uses a pre-order traversal, we are guaranteed > > * to visit an SPTE before ever visiting its children, which means we > > * will correctly recursively split huge pages that are more than one > > @@ -1597,12 +1611,19 @@ static int tdp_mmu_split_huge_pages_root(struct kvm *kvm, > > */ > > for_each_tdp_pte_min_level(iter, kvm, root, target_level + 1, start, end) { > > retry: > > - if (tdp_mmu_iter_cond_resched(kvm, &iter, false, shared)) > > + if (tdp_mmu_iter_cond_resched(kvm, &iter, *flush, shared)) { > > + if (only_cross_bounday) > > + *flush = false; > > continue; > > + } > > if (!is_shadow_present_pte(iter.old_spte) || !is_large_pte(iter.old_spte)) > > continue; > > + if (only_cross_bounday && > > + !iter_cross_boundary(&iter, start, end)) > > + continue; > > + > > if (!sp) { > > rcu_read_unlock(); > > @@ -1637,6 +1658,8 @@ static int tdp_mmu_split_huge_pages_root(struct kvm *kvm, > > goto retry; > > sp = NULL; > > + if (only_cross_bounday) > > + *flush = true; > > } > > rcu_read_unlock(); > [...]