On Tue, Jul 8, 2025 at 8:31 AM Edgecombe, Rick P <rick.p.edgecombe@xxxxxxxxx> wrote: > > On Tue, 2025-07-08 at 08:07 -0700, Vishal Annapurve wrote: > > On Tue, Jul 8, 2025 at 7:52 AM Edgecombe, Rick P > > <rick.p.edgecombe@xxxxxxxxx> wrote: > > > > > > On Tue, 2025-07-08 at 07:20 -0700, Sean Christopherson wrote: > > > > > For TDX if we don't zero on conversion from private->shared we will be > > > > > dependent > > > > > on behavior of the CPU when reading memory with keyid 0, which was > > > > > previously > > > > > encrypted and has some protection bits set. I don't *think* the behavior is > > > > > architectural. So it might be prudent to either make it so, or zero it in > > > > > the > > > > > kernel in order to not make non-architectual behavior into userspace ABI. > > > > > > > > Ya, by "vendor specific", I was also lumping in cases where the kernel would > > > > need to zero memory in order to not end up with effectively undefined > > > > behavior. > > > > > > Yea, more of an answer to Vishal's question about if CC VMs need zeroing. And > > > the answer is sort of yes, even though TDX doesn't require it. But we actually > > > don't want to zero memory when reclaiming memory. So TDX KVM code needs to know > > > that the operation is a to-shared conversion and not another type of private > > > zap. Like a callback from gmem, or maybe more simply a kernel internal flag to > > > set in gmem such that it knows it should zero it. > > > > If the answer is that "always zero on private to shared conversions" > > for all CC VMs, then does the scheme outlined in [1] make sense for > > handling the private -> shared conversions? For pKVM, there can be a > > VM type check to avoid the zeroing during conversions and instead just > > zero on allocations. This allows delaying zeroing until the fault time > > for CC VMs and can be done in guest_memfd centrally. We will need more > > inputs from the SEV side for this discussion. > > > > [1] https://lore.kernel.org/lkml/CAGtprH-83EOz8rrUjE+O8m7nUDjt=THyXx=kfft1xQry65mtQg@xxxxxxxxxxxxxx/ > > It's nice that we don't double zero (since TDX module will do it too) for > private allocation/mapping. Seems ok to me. > > > > > > > > > > > > > > > Up the thread Vishal says we need to support operations that use in-place > > > > > conversion (overloaded term now I think, btw). Why exactly is pKVM using > > > > > private/shared conversion for this private data provisioning? > > > > > > > > Because it's literally converting memory from shared to private? And IICU, > > > > it's > > > > not a one-time provisioning, e.g. memory can go: > > > > > > > > shared => fill => private => consume => shared => fill => private => consume > > > > > > > > > Instead of a special provisioning operation like the others? (Xiaoyao's > > > > > suggestion) > > > > > > > > Are you referring to this suggestion? > > > > > > Yea, in general to make it a specific operation preserving operation. > > > > > > > > > > > : And maybe a new flag for KVM_GMEM_CONVERT_PRIVATE for user space to > > > > : explicitly request that the page range is converted to private and the > > > > : content needs to be retained. So that TDX can identify which case needs > > > > : to call in-place TDH.PAGE.ADD. > > > > > > > > If so, I agree with that idea, e.g. add a PRESERVE flag or whatever. That way > > > > userspace has explicit control over what happens to the data during > > > > conversion, > > > > and KVM can reject unsupported conversions, e.g. PRESERVE is only allowed for > > > > shared => private and only for select VM types. > > > > > > Ok, we should POC how it works with TDX. > > > > I don't think we need a flag to preserve memory as I mentioned in [2]. IIUC, > > 1) Conversions are always content-preserving for pKVM. > > 2) Shared to private conversions are always content-preserving for all > > VMs as far as guest_memfd is concerned. > > 3) Private to shared conversions are not content-preserving for CC VMs > > as far as guest_memfd is concerned, subject to more discussions. > > > > [2] https://lore.kernel.org/lkml/CAGtprH-Kzn2kOGZ4JuNtUT53Hugw64M-_XMmhz_gCiDS6BAFtQ@xxxxxxxxxxxxxx/ > > Right, I read that. I still don't see why pKVM needs to do normal private/shared > conversion for data provisioning. Vs a dedicated operation/flag to make it a > special case. It's dictated by pKVM usecases, memory contents need to be preserved for every conversion not just for initial payload population. > > I'm trying to suggest there could be a benefit to making all gmem VM types > behave the same. If conversions are always content preserving for pKVM, why > can't userspace always use the operation that says preserve content? Vs > changing the behavior of the common operations? I don't see a benefit of userspace passing a flag that's kind of default for the VM type (assuming pKVM will use a special VM type). Common operations in guest_memfd will need to either check for the userspace passed flag or the VM type, so no major change in guest_memfd implementation for either mechanism. > > So for all VM types, the user ABI would be: > private->shared - Always zero's page > shared->private - Always destructive > shared->private (w/flag) - Always preserves data or return error if not possible > > > Do you see a problem? >