On Tue, Jul 8, 2025 at 12:59 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Tue, Jul 08, 2025, Vishal Annapurve wrote: > > On Tue, Jul 8, 2025 at 11:03 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > Few points that seem important here: > > 1) Userspace can and should be able to only dictate if memory contents > > need to be preserved on shared to private conversion. > > No, I was wrong, pKVM has use cases where it's desirable to preserve data on > private => shared conversions. > > Side topic, if you're going to use fancy indentation, align the indentation so > it's actually readable. > > > -> For SNP/TDX VMs: > > * Only usecase for preserving contents is initial memory > > population, which can be achieved by: > > - Userspace converting the ranges to shared, populating the contents, > > converting them back to private and then calling SNP/TDX specific > > existing ABI functions. > > * For runtime conversions, guest_memfd can't ensure memory contents are > > preserved during shared to private conversions as the architectures > > don't support that behavior. > > * So IMO, this "preserve" flag doesn't make sense for SNP/TDX VMs, even > > It makes sense, it's just not supported by the architecture *at runtime*. Case > in point, *something* needs to allow preserving data prior to launching the VM. > If we want to go with the PRIVATE => SHARED => FILL => PRIVATE approach for TDX > and SNP, then we'll probably want to allow PRESERVE only until the VM image is > finalized. Maybe we can simplify the story a bit here for today, how about: 1) For shared to private conversions: * Is it safe to say that the conversion itself is always content preserving, it's upto the architecture what it does with memory contents on the private faults? - During initial memory setup, userspace can control how private memory would be faulted in by architecture supported ABI operations. - After initial memory setup, userspace can't control how private memory would be faulted in. 2) For private to shared conversions: * Architecture decides what should be done with the memory on shared faults. - guest_memfd can query architecture whether to zero memory or not. -> guest_memfd will only take on the responsibility of zeroing if needed by the architecture on shared faults. -> Architecture is responsible for the behavior on private faults. In future, if there is a usecase for controlling runtime behavior of private faults, architecture can expose additional ABI that userspace can use after initiating guest_memfd conversion.