Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd

Vishal Annapurve <vannapurve@xxxxxxxxxx> · Sun, 29 Jun 2025 11:28:22 -0700

On Thu, Jun 19, 2025 at 1:59 AM Xiaoyao Li <xiaoyao.li@xxxxxxxxx> wrote:
>
> On 6/19/2025 4:13 PM, Yan Zhao wrote:
> > On Wed, May 14, 2025 at 04:41:39PM -0700, Ackerley Tng wrote:
> >> Hello,
> >>
> >> This patchset builds upon discussion at LPC 2024 and many guest_memfd
> >> upstream calls to provide 1G page support for guest_memfd by taking
> >> pages from HugeTLB.
> >>
> >> This patchset is based on Linux v6.15-rc6, and requires the mmap support
> >> for guest_memfd patchset (Thanks Fuad!) [1].
> >>
> >> For ease of testing, this series is also available, stitched together,
> >> at https://github.com/googleprodkernel/linux-cc/tree/gmem-1g-page-support-rfc-v2
> >
> > Just to record a found issue -- not one that must be fixed.
> >
> > In TDX, the initial memory region is added as private memory during TD's build
> > time, with its initial content copied from source pages in shared memory.
> > The copy operation requires simultaneous access to both shared source memory
> > and private target memory.
> >
> > Therefore, userspace cannot store the initial content in shared memory at the
> > mmap-ed VA of a guest_memfd that performs in-place conversion between shared and
> > private memory. This is because the guest_memfd will first unmap a PFN in shared
> > page tables and then check for any extra refcount held for the shared PFN before
> > converting it to private.
>
> I have an idea.
>
> If I understand correctly, the KVM_GMEM_CONVERT_PRIVATE of in-place
> conversion unmap the PFN in shared page tables while keeping the content
> of the page unchanged, right?

That's correct.

>
> So KVM_GMEM_CONVERT_PRIVATE can be used to initialize the private memory
> actually for non-CoCo case actually, that userspace first mmap() it and
> ensure it's shared and writes the initial content to it, after it
> userspace convert it to private with KVM_GMEM_CONVERT_PRIVATE.

I think you mean pKVM by non-coco VMs that care about private memory.
Yes, initial memory regions can start as shared which userspace can
populate and then convert the ranges to private.

>
> For CoCo case, like TDX, it can hook to KVM_GMEM_CONVERT_PRIVATE if it
> wants the private memory to be initialized with initial content, and
> just do in-place TDH.PAGE.ADD in the hook.

I think this scheme will be cleaner:
1) Userspace marks the guest_memfd ranges corresponding to initial
payload as shared.
2) Userspace mmaps and populates the ranges.
3) Userspace converts those guest_memfd ranges to private.
4) For both SNP and TDX, userspace continues to invoke corresponding
initial payload preparation operations via existing KVM ioctls e.g.
KVM_SEV_SNP_LAUNCH_UPDATE/KVM_TDX_INIT_MEM_REGION.
   - SNP/TDX KVM logic fetches the right pfns for the target gfns
using the normal paths supported by KVM and passes those pfns directly
to the right trusted module to initialize the "encrypted" memory
contents.
       - Avoiding any GUP or memcpy from source addresses.

i.e. for TDX VMs, KVM_TDX_INIT_MEM_REGION still does the in-place TDH.PAGE.ADD.

Since we need to support VMs that will/won't use in-place conversion,
I think operations like KVM_TDX_INIT_MEM_REGION can introduce explicit
flags to allow userspace to indicate whether to assume in-place
conversion or not. Maybe
kvm_tdx_init_mem_region.source_addr/kvm_sev_snp_launch_update.uaddr
can be null in the scenarios where in-place conversion is used.