On Mon, Aug 4, 2025 at 6:20 PM Vishal Annapurve <vannapurve@xxxxxxxxxx> wrote: > > On Mon, Aug 4, 2025 at 5:22 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > > > > IIUC, the suggestion in the link is to abandon kvm_gmem_populate(). > > > > > > For TDX, it means adopting the approach in this RFC patch, right? > > > > > Yes, IMO this RFC is following the right approach as posted. > > > > I don't think we want to abandon kvm_gmem_populate(). Unless I'm missing something, > > SNP has the same AB-BA problem as TDX. The copy_from_user() on @src can trigger > > a page fault, and resolving the page fault may require taking mm->mmap_lock. > > > > Fundamentally, TDX and SNP are doing the same thing: copying from source to guest > > memory. The only differences are in the mechanics of the copy+encrypt, everything > > else is the same. I.e. I don't expect that we'll find a magic solution that works > > well for one and not the other. > > > > I also don't want to end up with wildly different ABI for SNP vs. everything else. > > E.g. cond_resched() needs to be called if the to-be-initialzied range is large, > > which means dropping mmu_lock between pages, whereas kvm_gmem_populate() can > > yield without dropping invalidate_lock, which means that the behavior of populating > > guest_memfd memory will be quite different with respect to guest_memfd operations. > > I would think that TDX/CCA VMs [1] will run into the similar behavior > of needing to simulate stage2 faults i.e. KVM will end up picking up > and dropping mmu_lock for each page anyways at least for these two > platforms. > > [1] https://lore.kernel.org/kvm/20250611104844.245235-5-steven.price@xxxxxxx/ > (rmi_rtt_create()) > > > > > Pulling in the RFC text: > > > > : I think the only different scenario is SNP, where the host must write > > : initial contents to guest memory. > > : > > : Will this work for all cases CCA/SNP/TDX during initial memory > > : population from within KVM: > > : 1) Simulate stage2 fault > > : 2) Take a KVM mmu read lock > > > > Doing all of this under mmu_lock is pretty much a non-starter. Looking closer at CPU <-> PSP communication which is not implemented to work within an atomic context, I agree now that this wouldn't work for SNP VMs. > > > > : 3) Check that the needed gpa is mapped in EPT/NPT entries > > > > No, KVM's page tables are not the source of truth. S-EPT is a special snowflake, > > and I'd like to avoid foisting the same requirements on NPT. > > I agree this would be a new requirement. > > > > > : 4) For SNP, if src != null, make the target pfn to be shared, copy > > : contents and then make the target pfn back to private. > > > > Copying from userspace under spinlock (rwlock) is illegal, as accessing userspace > > memory might_fault() and thus might_sleep(). > > I would think that a combination of get_user_pages() and > kmap_local_pfn() will prevent this situation of might_fault(). > > > > > : 5) For TDX, if src != null, pass the same address for source and > > : target (likely this works for CCA too) > > : 6) Invoke appropriate memory encryption operations > > : 7) measure contents > > : 8) release the KVM mmu read lock > > : > > : If this scheme works, ideally we should also not call RMP table > > : population logic from guest_memfd, but from KVM NPT fault handling > > : logic directly (a bit of cosmetic change). > >