Re: [RFC PATCH] KVM: TDX: Decouple TDX init mem region from kvm_gmem_populate()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 4, 2025 at 6:20 PM Vishal Annapurve <vannapurve@xxxxxxxxxx> wrote:
>
> On Mon, Aug 4, 2025 at 5:22 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > > > > > IIUC, the suggestion in the link is to abandon kvm_gmem_populate().
> > > > > > For TDX, it means adopting the approach in this RFC patch, right?
> > > > > Yes, IMO this RFC is following the right approach as posted.
> >
> > I don't think we want to abandon kvm_gmem_populate().  Unless I'm missing something,
> > SNP has the same AB-BA problem as TDX.  The copy_from_user() on @src can trigger
> > a page fault, and resolving the page fault may require taking mm->mmap_lock.
> >
> > Fundamentally, TDX and SNP are doing the same thing: copying from source to guest
> > memory.  The only differences are in the mechanics of the copy+encrypt, everything
> > else is the same.  I.e. I don't expect that we'll find a magic solution that works
> > well for one and not the other.
> >
> > I also don't want to end up with wildly different ABI for SNP vs. everything else.
> > E.g. cond_resched() needs to be called if the to-be-initialzied range is large,
> > which means dropping mmu_lock between pages, whereas kvm_gmem_populate() can
> > yield without dropping invalidate_lock, which means that the behavior of populating
> > guest_memfd memory will be quite different with respect to guest_memfd operations.
>
> I would think that TDX/CCA VMs [1] will run into the similar behavior
> of needing to simulate stage2 faults i.e. KVM will end up picking up
> and dropping mmu_lock for each page anyways at least for these two
> platforms.
>
> [1] https://lore.kernel.org/kvm/20250611104844.245235-5-steven.price@xxxxxxx/
> (rmi_rtt_create())
>
> >
> > Pulling in the RFC text:
> >
> > : I think the only different scenario is SNP, where the host must write
> > : initial contents to guest memory.
> > :
> > : Will this work for all cases CCA/SNP/TDX during initial memory
> > : population from within KVM:
> > : 1) Simulate stage2 fault
> > : 2) Take a KVM mmu read lock
> >
> > Doing all of this under mmu_lock is pretty much a non-starter.

Looking closer at CPU <-> PSP communication which is not implemented
to work within an atomic context, I agree now that this wouldn't work
for SNP VMs.


> >
> > : 3) Check that the needed gpa is mapped in EPT/NPT entries
> >
> > No, KVM's page tables are not the source of truth.  S-EPT is a special snowflake,
> > and I'd like to avoid foisting the same requirements on NPT.
>
> I agree this would be a new requirement.
>
> >
> > : 4) For SNP, if src != null, make the target pfn to be shared, copy
> > : contents and then make the target pfn back to private.
> >
> > Copying from userspace under spinlock (rwlock) is illegal, as accessing userspace
> > memory might_fault() and thus might_sleep().
>
> I would think that a combination of get_user_pages() and
> kmap_local_pfn() will prevent this situation of might_fault().
>
> >
> > : 5) For TDX, if src != null, pass the same address for source and
> > : target (likely this works for CCA too)
> > : 6) Invoke appropriate memory encryption operations
> > : 7) measure contents
> > : 8) release the KVM mmu read lock
> > :
> > : If this scheme works, ideally we should also not call RMP table
> > : population logic from guest_memfd, but from KVM NPT fault handling
> > : logic directly (a bit of cosmetic change).
> >





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux