Re: [PATCH] KVM: x86/mmu: Prevent installing hugepages when mem attributes are changing

Yan Zhao <yan.y.zhao@xxxxxxxxx> · Tue, 29 Apr 2025 09:09:49 +0800

On Mon, Apr 28, 2025 at 07:50:21AM -0700, Sean Christopherson wrote:
> On Mon, Apr 28, 2025, Yan Zhao wrote:
> > On Fri, Apr 25, 2025 at 05:10:56PM -0700, Sean Christopherson wrote:
> > > @@ -7686,6 +7707,37 @@ bool kvm_arch_pre_set_memory_attributes(struct kvm *kvm,
> > >  	if (WARN_ON_ONCE(!kvm_arch_has_private_mem(kvm)))
> > >  		return false;
> > >  
> > > +	if (WARN_ON_ONCE(range->end <= range->start))
> > > +		return false;
> > > +
> > > +	/*
> > > +	 * If the head and tail pages of the range currently allow a hugepage,
> > > +	 * i.e. reside fully in the slot and don't have mixed attributes, then
> > > +	 * add each corresponding hugepage range to the ongoing invalidation,
> > > +	 * e.g. to prevent KVM from creating a hugepage in response to a fault
> > > +	 * for a gfn whose attributes aren't changing.  Note, only the range
> > > +	 * of gfns whose attributes are being modified needs to be explicitly
> > > +	 * unmapped, as that will unmap any existing hugepages.
> > > +	 */
> > > +	for (level = PG_LEVEL_2M; level <= KVM_MAX_HUGEPAGE_LEVEL; level++) {
> > > +		gfn_t start = gfn_round_for_level(range->start, level);
> > > +		gfn_t end = gfn_round_for_level(range->end - 1, level);
> > > +		gfn_t nr_pages = KVM_PAGES_PER_HPAGE(level);
> > > +
> > > +		if ((start != range->start || start + nr_pages > range->end) &&
> > > +		    start >= slot->base_gfn &&
> > > +		    start + nr_pages <= slot->base_gfn + slot->npages &&
> > > +		    !hugepage_test_mixed(slot, start, level))
> > Instead of checking mixed flag in disallow_lpage, could we check disallow_lpage
> > directly?
> > 
> > So, if mixed flag is not set but disallow_lpage is 1, there's no need to update
> > the invalidate range.
> > 
> > > +			kvm_mmu_invalidate_range_add(kvm, start, start + nr_pages);
> > > +
> > > +		if (end == start)
> > > +			continue;
> > > +
> > > +		if ((end + nr_pages) <= (slot->base_gfn + slot->npages) &&
> > > +		    !hugepage_test_mixed(slot, end, level))
> > if ((end + nr_pages > range->end) &&
> >     ((end + nr_pages) <= (slot->base_gfn + slot->npages)) &&
> >     !lpage_info_slot(gfn, slot, level)->disallow_lpage)
> > 
> > ?
> 
> No, disallow_lpage is used by write-tracking and shadow paging to prevent creating
> huge pages for a write-protected gfn.  mmu_lock is dropped after the pre_set_range
> call to kvm_handle_gfn_range(), and so disallow_lpage could go to zero if the last
> shadow page for the affected range is zapped.  In practice, KVM isn't going to be
That's a good point. I missed it.

> doing write-tracking or shadow paging for CoCo VMs, so there's no missed optimization
> on that front.
>
> And if disallow_lpage is non-zero due to a misaligned memslot base/size, then the
> start/end checks will skip this level anyways.

If the gfn and userspace address are not aligned wrt each other at a certain
level, the disallow_lpage for that level is set to 1 for the entire slot.
This is often the case at the 1G level.

But as kvm_vm_set_mem_attributes() holds write mmu_lock for most of the time,
preventing fault over a larger range for another short period looks no harm.