Re: [RFC PATCH v2 02/18] KVM: x86/mmu: Add dedicated API to map guest_memfd pfn into TDP MMU

Sean Christopherson <seanjc@xxxxxxxxxx> · Fri, 29 Aug 2025 13:27:14 -0700

On Fri, Aug 29, 2025, Rick P Edgecombe wrote:
> On Thu, 2025-08-28 at 17:06 -0700, Sean Christopherson wrote:
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -4994,6 +4994,65 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
> >  	return min(range->size, end - range->gpa);
> >  }
> >  
> > +int kvm_tdp_mmu_map_private_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, kvm_pfn_t pfn)
> > +{
> > +	struct kvm_page_fault fault = {
> > +		.addr = gfn_to_gpa(gfn),
> > +		.error_code = PFERR_GUEST_FINAL_MASK | PFERR_PRIVATE_ACCESS,
> > +		.prefetch = true,
> > +		.is_tdp = true,
> > +		.nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(vcpu->kvm),
> 
> These fault's don't have fault->exec so nx_huge_page_workaround_enabled
> shouldn't be a factor. Not a functional issue though. Maybe it is more robust?

Whether or not the fault itself is EXEC is irrelevant, nx_huge_page_workaround_enabled
is used to ensure KVM doesn't create hugepage overtop an exiting EXEC 4KiB mapping.
Of course, this fault is irrelevant on that front as well.  But I don't see any
reason to get cute and let .nx_huge_page_workaround_enabled be stale.

> > +
> > +		.max_level = PG_LEVEL_4K,
> > +		.req_level = PG_LEVEL_4K,
> > +		.goal_level = PG_LEVEL_4K,
> > +		.is_private = true,
> > +
> > +		.gfn = gfn,
> > +		.slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn),
> > +		.pfn = pfn,
> > +		.map_writable = true,
> > +	};
> > +	struct kvm *kvm = vcpu->kvm;
> > +	int r;
> > +
> > +	lockdep_assert_held(&kvm->slots_lock);
> > +
> > +	if (KVM_BUG_ON(!tdp_mmu_enabled, kvm))
> > +		return -EIO;
> > +
> > +	if (kvm_gfn_is_write_tracked(kvm, fault.slot, fault.gfn))
> > +		return -EPERM;
> 
> If we care about this, why don't we care about the read only memslot flag?

Because private memory fundamentally can't support read-only memslots.  If we
wanted to be paranoid, this code could assert that the memslot can be private
but for me that reaches a pointless level of paranoia.

> TDX doesn't need this or the nx huge page part above. So this function is
> more general.

I don't see anything that makes nx_huge_page_workaround_enabled mutually exclusive
with TDX though.

> What about calling it __kvm_tdp_mmu_map_private_pfn() and making it a powerful
> "map this pfn at this GFN and don't ask questions" function. Otherwise, I'm not
> sure where to draw the line.

Eh, for me, the line is pretty clear.  This is obviously specific to private memory,
and so implies a guest_memfd source, a private pfn, and everything that comes
along with private gmem pfns.  Everything else should be accounted for.