Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd

Sean Christopherson <seanjc@xxxxxxxxxx> · Thu, 15 May 2025 17:57:57 -0700

On Thu, May 15, 2025, Rick P Edgecombe wrote:
> On Thu, 2025-05-15 at 11:42 -0700, Vishal Annapurve wrote:
> > On Thu, May 15, 2025 at 11:03 AM Edgecombe, Rick P
> > <rick.p.edgecombe@xxxxxxxxx> wrote:
> > > 
> > > On Wed, 2025-05-14 at 16:41 -0700, Ackerley Tng wrote:
> > > > Hello,
> > > > 
> > > > This patchset builds upon discussion at LPC 2024 and many guest_memfd
> > > > upstream calls to provide 1G page support for guest_memfd by taking
> > > > pages from HugeTLB.
> > > 
> > > Do you have any more concrete numbers on benefits of 1GB huge pages for
> > > guestmemfd/coco VMs? I saw in the LPC talk it has the benefits as:
> > > - Increase TLB hit rate and reduce page walks on TLB miss
> > > - Improved IO performance
> > > - Memory savings of ~1.6% from HugeTLB Vmemmap Optimization (HVO)
> > > - Bring guest_memfd to parity with existing VMs that use HugeTLB pages for
> > > backing memory
> > > 
> > > Do you know how often the 1GB TDP mappings get shattered by shared pages?
> > > 
> > > Thinking from the TDX perspective, we might have bigger fish to fry than 1.6%
> > > memory savings (for example dynamic PAMT), and the rest of the benefits don't
> > > have numbers. How much are we getting for all the complexity, over say buddy
> > > allocated 2MB pages?

TDX may have bigger fish to fry, but some of us have bigger fish to fry than TDX :-)

> > This series should work for any page sizes backed by hugetlb memory.
> > Non-CoCo VMs, pKVM and Confidential VMs all need hugepages that are
> > essential for certain workloads and will emerge as guest_memfd users.
> > Features like KHO/memory persistence in addition also depend on
> > hugepage support in guest_memfd.
> > 
> > This series takes strides towards making guest_memfd compatible with
> > usecases where 1G pages are essential and non-confidential VMs are
> > already exercising them.
> > 
> > I think the main complexity here lies in supporting in-place
> > conversion which applies to any huge page size even for buddy
> > allocated 2MB pages or THP.
> > 
> > This complexity arises because page structs work at a fixed
> > granularity, future roadmap towards not having page structs for guest
> > memory (at least private memory to begin with) should help towards
> > greatly reducing this complexity.
> > 
> > That being said, DPAMT and huge page EPT mappings for TDX VMs remain
> > essential and complement this series well for better memory footprint
> > and overall performance of TDX VMs.
> 
> Hmm, this didn't really answer my questions about the concrete benefits.
> 
> I think it would help to include this kind of justification for the 1GB
> guestmemfd pages. "essential for certain workloads and will emerge" is a bit
> hard to review against...
> 
> I think one of the challenges with coco is that it's almost like a sprint to
> reimplement virtualization. But enough things are changing at once that not all
> of the normal assumptions hold, so it can't copy all the same solutions. The
> recent example was that for TDX huge pages we found that normal promotion paths
> weren't actually yielding any benefit for surprising TDX specific reasons.
> 
> On the TDX side we are also, at least currently, unmapping private pages while
> they are mapped shared, so any 1GB pages would get split to 2MB if there are any
> shared pages in them. I wonder how many 1GB pages there would be after all the
> shared pages are converted. At smaller TD sizes, it could be not much.

You're conflating two different things.  guest_memfd allocating and managing
1GiB physical pages, and KVM mapping memory into the guest at 1GiB/2MiB
granularity.  Allocating memory in 1GiB chunks is useful even if KVM can only
map memory into the guest using 4KiB pages.

> So for TDX in isolation, it seems like jumping out too far ahead to effectively
> consider the value. But presumably you guys are testing this on SEV or
> something? Have you measured any performance improvement? For what kind of
> applications? Or is the idea to basically to make guestmemfd work like however
> Google does guest memory?

The longer term goal of guest_memfd is to make it suitable for backing all VMs,
hence Vishal's "Non-CoCo VMs" comment.  Yes, some of this is useful for TDX, but
we (and others) want to use guest_memfd for far more than just CoCo VMs.  And
for non-CoCo VMs, 1GiB hugepages are mandatory for various workloads.