Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd

Vishal Annapurve <vannapurve@xxxxxxxxxx> · Tue, 8 Jul 2025 15:54:49 -0700

On Tue, Jul 8, 2025 at 12:59 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Tue, Jul 08, 2025, Vishal Annapurve wrote:
> > On Tue, Jul 8, 2025 at 11:03 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > Few points that seem important here:
> > 1) Userspace can and should be able to only dictate if memory contents
> > need to be preserved on shared to private conversion.
>
> No, I was wrong, pKVM has use cases where it's desirable to preserve data on
> private => shared conversions.
>
> Side topic, if you're going to use fancy indentation, align the indentation so
> it's actually readable.
>
> >   -> For SNP/TDX VMs:
> >        * Only usecase for preserving contents is initial memory
> >          population, which can be achieved by:
> >               -  Userspace converting the ranges to shared, populating the contents,
> >                  converting them back to private and then calling SNP/TDX specific
> >                  existing ABI functions.
> >        * For runtime conversions, guest_memfd can't ensure memory contents are
> >          preserved during shared to private conversions as the architectures
> >          don't support that behavior.
> >        * So IMO, this "preserve" flag doesn't make sense for SNP/TDX VMs, even
>
> It makes sense, it's just not supported by the architecture *at runtime*.  Case
> in point, *something* needs to allow preserving data prior to launching the VM.
> If we want to go with the PRIVATE => SHARED => FILL => PRIVATE approach for TDX
> and SNP, then we'll probably want to allow PRESERVE only until the VM image is
> finalized.

Maybe we can simplify the story a bit here for today, how about:
1) For shared to private conversions:
       * Is it safe to say that the conversion itself is always
content preserving, it's upto the
           architecture what it does with memory contents on the private faults?
                 - During initial memory setup, userspace can control
how private memory would
                   be faulted in by architecture supported ABI operations.
                 - After initial memory setup, userspace can't control
how private memory would
                   be faulted in.
2) For private to shared conversions:
       * Architecture decides what should be done with the memory on
shared faults.
                 - guest_memfd can query architecture whether to zero
memory or not.

-> guest_memfd will only take on the responsibility of zeroing if
needed by the architecture on shared faults.
-> Architecture is responsible for the behavior on private faults.

In future, if there is a usecase for controlling runtime behavior of
private faults, architecture can expose additional ABI that userspace
can use after initiating guest_memfd conversion.