Re: Untested fix for attributes vs. hugepage race

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 23, 2025 at 11:17:54AM -0700, Ackerley Tng wrote:
> Michael Roth <michael.roth@xxxxxxx> writes:
> 
> > On Fri, Apr 18, 2025 at 08:13:17AM -0700, Sean Christopherson wrote:
> >> 
> >> That all looks good to me.  And to ensure we don't go off the rails due to bad
> >> inputs (which are supposed to be fully validated by common KVM), we could add a
> >> WARN to detect a non-exclusive range->end.
> >> 
> >> So this?
> >> 
> >> 	if (WARN_ON_ONCE(range->end <= range->start))
> >> 		return false;
> >> 
> >> 	/*
> >> 	 * If the head and tail pages of the range currently allow a hugepage,
> >> 	 * i.e. reside fully in the slot and don't have mixed attributes, then
> >> 	 * add each corresponding hugepage range to the ongoing invalidation,
> >> 	 * e.g. to prevent KVM from creating a hugepage in response to a fault
> >> 	 * for a gfn whose attributes aren't changing.  Note, only the range
> >> 	 * of gfns whose attributes are being modified needs to be explicitly
> >> 	 * unmapped, as that will unmap any existing hugepages.
> >> 	 */
> >> 	for (level = PG_LEVEL_2M; level <= KVM_MAX_HUGEPAGE_LEVEL; level++) {
> >> 		gfn_t start = gfn_round_for_level(range->start, level);
> >> 		gfn_t end = gfn_round_for_level(range->end - 1, level);
> >> 		gfn_t nr_pages = KVM_PAGES_PER_HPAGE(level);
> >> 
> >> 		if ((start != range->start || start + nr_pages > range->end) &&
> >> 		    start >= slot->base_gfn &&
> >> 		    start + nr_pages <= slot->base_gfn + slot->npages &&
> >> 		    !hugepage_test_mixed(slot, start, level))
> >> 			kvm_mmu_invalidate_range_add(kvm, start, start + nr_pages);
> >> 
> >> 		if (end == start)
> >> 			continue;
> >> 
> >> 		if ((end + nr_pages) <= (slot->base_gfn + slot->npages) &&
> >> 		    !hugepage_test_mixed(slot, end, level))
> >> 			kvm_mmu_invalidate_range_add(kvm, end, end + nr_pages);
> >> 	}
> >
> > Looks good! Re-tested with this version of the patch and it seems to address
> > the original issue.
> >
> > Tested-by: Michael Roth <michael.roth@xxxxxxx>
> >
> > Thanks,
> >
> > Mike
> 
> Would you be able to share how you set up a test to trigger this issue?
> Thanks in advance!

The original issue was triggered when running SNP guests using 2MB hugetlbfs
pages for the shared memory, and it would happen when
mem_encrypt_free_decrypted_mem() was called. In this case 20KB of 2MB BSS
decrypted region is used, and the issue was triggering an infinite loop
when the guest tries to re-encrypt the memory as part of freeing it back to
kernel:

  [   36.521488] Freeing unused decrypted memory: 2028K

I don't think it is specific to hugetlbfs, since the below hack can be used
to trigger the race fairly reliably even without hugetlbfs. I noticed that
pretty much every guest vCPU will fault on the same GPA after the initial
invalidation, and think it might be kvmclock that's using that GPA. So the
below just expands the race window to make sure one of those other vCPUs
triggers a fault after the shared->private conversion invalidates a 2MB
NPT entry.

I suspect hugetlbfs was originally a factor because it was using
pre-allocated pages so maybe it able to re-install the zapped entry more
quickly vs. the normal flow where the shared page would get hole-punched
by QEMU and so would need to get re-allocated.

-Mike

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index de2c11dae231..27fdf05c79eb 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2457,6 +2457,8 @@ static bool kvm_pre_set_memory_attributes(struct kvm *kvm,
        return kvm_arch_pre_set_memory_attributes(kvm, range);
 }

+#include <linux/delay.h>
+
 /* Set @attributes for the gfn range [@start, @end). */
 static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
                                     unsigned long attributes)
@@ -2500,6 +2502,11 @@ static int kvm_vm_set_mem_attributes(struct kvm *kvm, gfn_t start, gfn_t end,
        }

        kvm_handle_gfn_range(kvm, &pre_set_range);
+       /* page offset 5 corresponds to the first PSC for the BSS section in question. */
+       if ((start & 0x1FF) == 5) {
+               pr_warn("%s: prolonging GFN invalidation start %llx end %llx attributes %lx", __func__, start, end, attributes);
+               msleep(2000);
+       }

        for (i = start; i < end; i++) {
                r = xa_err(xa_store(&kvm->mem_attr_array, i, entry,




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux