Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 8, 2025 at 8:31 AM Edgecombe, Rick P
<rick.p.edgecombe@xxxxxxxxx> wrote:
>
> On Tue, 2025-07-08 at 08:07 -0700, Vishal Annapurve wrote:
> > On Tue, Jul 8, 2025 at 7:52 AM Edgecombe, Rick P
> > <rick.p.edgecombe@xxxxxxxxx> wrote:
> > >
> > > On Tue, 2025-07-08 at 07:20 -0700, Sean Christopherson wrote:
> > > > > For TDX if we don't zero on conversion from private->shared we will be
> > > > > dependent
> > > > > on behavior of the CPU when reading memory with keyid 0, which was
> > > > > previously
> > > > > encrypted and has some protection bits set. I don't *think* the behavior is
> > > > > architectural. So it might be prudent to either make it so, or zero it in
> > > > > the
> > > > > kernel in order to not make non-architectual behavior into userspace ABI.
> > > >
> > > > Ya, by "vendor specific", I was also lumping in cases where the kernel would
> > > > need to zero memory in order to not end up with effectively undefined
> > > > behavior.
> > >
> > > Yea, more of an answer to Vishal's question about if CC VMs need zeroing. And
> > > the answer is sort of yes, even though TDX doesn't require it. But we actually
> > > don't want to zero memory when reclaiming memory. So TDX KVM code needs to know
> > > that the operation is a to-shared conversion and not another type of private
> > > zap. Like a callback from gmem, or maybe more simply a kernel internal flag to
> > > set in gmem such that it knows it should zero it.
> >
> > If the answer is that "always zero on private to shared conversions"
> > for all CC VMs, then does the scheme outlined in [1] make sense for
> > handling the private -> shared conversions? For pKVM, there can be a
> > VM type check to avoid the zeroing during conversions and instead just
> > zero on allocations. This allows delaying zeroing until the fault time
> > for CC VMs and can be done in guest_memfd centrally. We will need more
> > inputs from the SEV side for this discussion.
> >
> > [1] https://lore.kernel.org/lkml/CAGtprH-83EOz8rrUjE+O8m7nUDjt=THyXx=kfft1xQry65mtQg@xxxxxxxxxxxxxx/
>
> It's nice that we don't double zero (since TDX module will do it too) for
> private allocation/mapping. Seems ok to me.
>
> >
> > >
> > > >
> > > > > Up the thread Vishal says we need to support operations that use in-place
> > > > > conversion (overloaded term now I think, btw). Why exactly is pKVM using
> > > > > private/shared conversion for this private data provisioning?
> > > >
> > > > Because it's literally converting memory from shared to private?  And IICU,
> > > > it's
> > > > not a one-time provisioning, e.g. memory can go:
> > > >
> > > >   shared => fill => private => consume => shared => fill => private => consume
> > > >
> > > > > Instead of a special provisioning operation like the others? (Xiaoyao's
> > > > > suggestion)
> > > >
> > > > Are you referring to this suggestion?
> > >
> > > Yea, in general to make it a specific operation preserving operation.
> > >
> > > >
> > > >  : And maybe a new flag for KVM_GMEM_CONVERT_PRIVATE for user space to
> > > >  : explicitly request that the page range is converted to private and the
> > > >  : content needs to be retained. So that TDX can identify which case needs
> > > >  : to call in-place TDH.PAGE.ADD.
> > > >
> > > > If so, I agree with that idea, e.g. add a PRESERVE flag or whatever.  That way
> > > > userspace has explicit control over what happens to the data during
> > > > conversion,
> > > > and KVM can reject unsupported conversions, e.g. PRESERVE is only allowed for
> > > > shared => private and only for select VM types.
> > >
> > > Ok, we should POC how it works with TDX.
> >
> > I don't think we need a flag to preserve memory as I mentioned in [2]. IIUC,
> > 1) Conversions are always content-preserving for pKVM.
> > 2) Shared to private conversions are always content-preserving for all
> > VMs as far as guest_memfd is concerned.
> > 3) Private to shared conversions are not content-preserving for CC VMs
> > as far as guest_memfd is concerned, subject to more discussions.
> >
> > [2] https://lore.kernel.org/lkml/CAGtprH-Kzn2kOGZ4JuNtUT53Hugw64M-_XMmhz_gCiDS6BAFtQ@xxxxxxxxxxxxxx/
>
> Right, I read that. I still don't see why pKVM needs to do normal private/shared
> conversion for data provisioning. Vs a dedicated operation/flag to make it a
> special case.

It's dictated by pKVM usecases, memory contents need to be preserved
for every conversion not just for initial payload population.

>
> I'm trying to suggest there could be a benefit to making all gmem VM types
> behave the same. If conversions are always content preserving for pKVM, why
> can't userspace  always use the operation that says preserve content? Vs
> changing the behavior of the common operations?

I don't see a benefit of userspace passing a flag that's kind of
default for the VM type (assuming pKVM will use a special VM type).
Common operations in guest_memfd will need to either check for the
userspace passed flag or the VM type, so no major change in
guest_memfd implementation for either mechanism.

>
> So for all VM types, the user ABI would be:
> private->shared          - Always zero's page
> shared->private          - Always destructive
> shared->private (w/flag) - Always preserves data or return error if not possible
>
>
> Do you see a problem?
>





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux