Re: [RFC PATCH v2 05/18] KVM: TDX: Drop superfluous page pinning in S-EPT management

"Edgecombe, Rick P" <rick.p.edgecombe@xxxxxxxxx> · Tue, 2 Sep 2025 18:55:46 +0000

On Tue, 2025-09-02 at 10:33 -0700, Sean Christopherson wrote:
> > Besides, a cache flush after 2 can essentially cause a memory write to the
> > page.
> > Though we could invoke tdh_phymem_page_wbinvd_hkid() after the KVM_BUG_ON(),
> > the SEAMCALL itself can fail.
> 
> I think this falls into the category of "don't screw up" flows.  Failure to
> remove a private SPTE is a near-catastrophic error.  Going out of our way to
> reduce the impact of such errors increases complexity without providing much
> in the way of value.
> 
> E.g. if VMCLEAR fails, KVM WARNs but continues on and hopes for the best, even
> though there's a decent chance failure to purge the VMCS cache entry could be
> lead to UAF-like problems.  To me, this is largely the same.
> 
> If anything, we should try to prevent #2, e.g. by marking the entire
> guest_memfd as broken or something, and then deliberately leaking _all_ pages.

There was a marathon thread on this subject. We did discuss this option (link to
most relevant part I could find):
https://lore.kernel.org/kvm/a9affa03c7cdc8109d0ed6b5ca30ec69269e2f34.camel@xxxxxxxxx/

The high level summary is that pinning the pages wrinkles guestmemfd's plans to
use refcount for other tracking purposes. Dropping refcounts interferes with the
error handling safety.

I strongly agree that we should not optimize for the error path at all. If we
could bug the guestmemfd (kind of what we were discussing in that link) I think
it would be appropriate to use in these cases. I guess the question is are we ok
dropping the safety before we have a solution like that. In that thread I was
advocating for yes, partly to close it because the conversation was getting
stuck. But there is probably a long tail of potential issues or ways of looking
at it that could put it in the grey area.