On Sun, Jun 29, 2025 at 11:28:22AM -0700, Vishal Annapurve wrote: > On Thu, Jun 19, 2025 at 1:59 AM Xiaoyao Li <xiaoyao.li@xxxxxxxxx> wrote: > > > > On 6/19/2025 4:13 PM, Yan Zhao wrote: > > > On Wed, May 14, 2025 at 04:41:39PM -0700, Ackerley Tng wrote: > > >> Hello, > > >> > > >> This patchset builds upon discussion at LPC 2024 and many guest_memfd > > >> upstream calls to provide 1G page support for guest_memfd by taking > > >> pages from HugeTLB. > > >> > > >> This patchset is based on Linux v6.15-rc6, and requires the mmap support > > >> for guest_memfd patchset (Thanks Fuad!) [1]. > > >> > > >> For ease of testing, this series is also available, stitched together, > > >> at https://github.com/googleprodkernel/linux-cc/tree/gmem-1g-page-support-rfc-v2 > > > > > > Just to record a found issue -- not one that must be fixed. > > > > > > In TDX, the initial memory region is added as private memory during TD's build > > > time, with its initial content copied from source pages in shared memory. > > > The copy operation requires simultaneous access to both shared source memory > > > and private target memory. > > > > > > Therefore, userspace cannot store the initial content in shared memory at the > > > mmap-ed VA of a guest_memfd that performs in-place conversion between shared and > > > private memory. This is because the guest_memfd will first unmap a PFN in shared > > > page tables and then check for any extra refcount held for the shared PFN before > > > converting it to private. > > > > I have an idea. > > > > If I understand correctly, the KVM_GMEM_CONVERT_PRIVATE of in-place > > conversion unmap the PFN in shared page tables while keeping the content > > of the page unchanged, right? > > That's correct. > > > > > So KVM_GMEM_CONVERT_PRIVATE can be used to initialize the private memory > > actually for non-CoCo case actually, that userspace first mmap() it and > > ensure it's shared and writes the initial content to it, after it > > userspace convert it to private with KVM_GMEM_CONVERT_PRIVATE. > > I think you mean pKVM by non-coco VMs that care about private memory. > Yes, initial memory regions can start as shared which userspace can > populate and then convert the ranges to private. > > > > > For CoCo case, like TDX, it can hook to KVM_GMEM_CONVERT_PRIVATE if it > > wants the private memory to be initialized with initial content, and > > just do in-place TDH.PAGE.ADD in the hook. > > I think this scheme will be cleaner: > 1) Userspace marks the guest_memfd ranges corresponding to initial > payload as shared. > 2) Userspace mmaps and populates the ranges. > 3) Userspace converts those guest_memfd ranges to private. > 4) For both SNP and TDX, userspace continues to invoke corresponding > initial payload preparation operations via existing KVM ioctls e.g. > KVM_SEV_SNP_LAUNCH_UPDATE/KVM_TDX_INIT_MEM_REGION. > - SNP/TDX KVM logic fetches the right pfns for the target gfns > using the normal paths supported by KVM and passes those pfns directly > to the right trusted module to initialize the "encrypted" memory > contents. > - Avoiding any GUP or memcpy from source addresses. One caveat: when TDX populates the mirror root, kvm_gmem_get_pfn() is invoked. Then kvm_gmem_prepare_folio() is further invoked to zero the folio. > i.e. for TDX VMs, KVM_TDX_INIT_MEM_REGION still does the in-place TDH.PAGE.ADD. So, upon here, the pages should not contain the original content? > Since we need to support VMs that will/won't use in-place conversion, > I think operations like KVM_TDX_INIT_MEM_REGION can introduce explicit > flags to allow userspace to indicate whether to assume in-place > conversion or not. Maybe > kvm_tdx_init_mem_region.source_addr/kvm_sev_snp_launch_update.uaddr > can be null in the scenarios where in-place conversion is used.