On Tue, Jul 22, 2025, Fuad Tabba wrote: > On Tue, 22 Jul 2025 at 06:36, Xiaoyao Li <xiaoyao.li@xxxxxxxxx> wrote: > > - In 0010-KVM-x86-mmu-Rename-.private_max_mapping_level-to-.gm.patch, > > there is double gmem in the name of vmx/vt 's callback implementation: > > > > vt_gmem_gmem_max_mapping_level > > tdx_gmem_gmem_max_mapping_level > > vt_op_tdx_only(gmem_gmem_max_mapping_level) > > Sean's patches do that, then he fixes it in a later patch. I'll fix > this at the source. Dagnabbit. I goofed a search+replace, caught it when re-reading things, and fixed-up the wrong commit. Sorry :-( > > - In 0013-KVM-x86-mmu-Extend-guest_memfd-s-max-mapping-level-t.patch, > > kvm_x86_call(gmem_max_mapping_level)(...) returns 0 for !private case. > > It's not correct though it works without issue currently. > > > > Because current gmem doesn't support hugepage so that the max_level > > gotten from gmem is always PG_LEVEL_4K and it returns early in > > kvm_gmem_max_mapping_level() on > > > > if (max_level == PG_LEVEL_4K) > > return max_level; > > > > But just look at the following case: > > > > return min(max_level, > > kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private)); > > > > For non-TDX case and non-SNP case, it will return 0, i.e. > > PG_LEVEL_NONE eventually. > > > > so either 1) return PG_LEVEL_NUM/PG_LEVEL_1G for the cases where > > .gmem_max_mapping_level callback doesn't have specific restriction. > > > > or 2) > > > > tmp = kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private); > > if (tmp) > > return min(max_level, tmp); > > > > return max-level; > > Sean? What do you think? #2, because KVM uses a "ret0" static call when TDX is disabled (and KVM should do the same when SEV is disabled, but the SEV #ifdefs are still a bit messy). Switching to any other value would require adding a VMX stubs for the !TDX case. I think it makes sense to explicitly call that out as the "CoCo level", to help unfamiliar readers understand why vendor code has any say in the max mapping level. And I would say we adjust max_level instead of having an early return, e.g. to reduce the probability of future bugs due to adding code between the call to .gmem_max_mapping_level() and the final return. This as fixup? diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index eead5dca6f72..a51013e0992a 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3279,9 +3279,9 @@ static u8 kvm_gmem_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fau const struct kvm_memory_slot *slot, gfn_t gfn, bool is_private) { + u8 max_level, coco_level; struct page *page; kvm_pfn_t pfn; - u8 max_level; /* For faults, use the gmem information that was resolved earlier. */ if (fault) { @@ -3305,8 +3305,16 @@ static u8 kvm_gmem_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fau if (max_level == PG_LEVEL_4K) return max_level; - return min(max_level, - kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private)); + /* + * CoCo may influence the max mapping level, e.g. due to RMP or S-EPT + * restrictions. A return of '0' means "no additional restrictions", + * to allow for using an optional "ret0" static call. + */ + coco_level = kvm_x86_call(gmem_max_mapping_level)(kvm, pfn, is_private); + if (coco_level) + max_level = min(max_level, coco_level); + + return max_level; } int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault,