On Wed, Jun 25, 2025 at 02:35:59AM +0800, Edgecombe, Rick P wrote: > On Tue, 2025-06-24 at 17:57 +0800, Yan Zhao wrote: > > Could we provide the info via the private_max_mapping_level hook (i.e. via > > tdx_gmem_private_max_mapping_level())? > > This is one of the previous two methods discussed. Can you elaborate on what you > are trying to say? I don't get why we can't use the existing tdx_gmem_private_max_mapping_level() to convey the max_level info at which a vendor hopes a GFN to be mapped. Before TDX huge pages, tdx_gmem_private_max_mapping_level() always returns 4KB; after TDX huge pages, it returns - 4KB during the TD build stage - at TD runtime: 4KB or 2MB Why does KVM need to care how the vendor determines this max_level? I think a vendor should have its freedom to decide based on software limitation, guest's wishes, hardware bugs or whatever. > > Or what about introducing a vendor hook in __kvm_mmu_max_mapping_level() for a > > private fault? > > > > > Maybe we could have EPT violations that contain 4k accept sizes first update the > > > attribute for the GFN to be accepted or not, like have tdx.c call out to set > > > kvm_lpage_info->disallow_lpage in the rarer case of 4k accept size? Or something > > Something like kvm_lpage_info->disallow_lpage would disallow later page > > promotion, though we don't support it right now. > > Well I was originally thinking it would not set kvm_lpage_info->disallow_lpage > directly, but rely on the logic that checks for mixed attributes. But more > below... > > > > > > like that. Maybe set a "accepted" attribute, or something. Not sure if could be > > Setting "accepted" attribute in the EPT violation handler? > > It's a little odd, as the accept operation is not yet completed. > > I guess the question in both of these comments is: what is the life cycle. Guest > could call TDG.MEM.PAGE.RELEASE to unaccept it as well. Oh, geez. It looks like > TDG.MEM.PAGE.RELEASE will give the same size hints in the EPT violation. So an > accept attribute is not going work, at least without TDX module changes. > > > Actually, the problem we have doesn't fit the mixed attributes behavior. If many > vCPU's accept at 2MB region at 4k page size, the entire 2MB range could be non- > mixed and then individual accepts would fail. > > > So instead there could be a KVM_LPAGE_GUEST_INHIBIT that doesn't get cleared Set KVM_LPAGE_GUEST_INHIBIT via a TDVMCALL ? Or just set the KVM_LPAGE_GUEST_INHIBIT when an EPT violation contains 4KB level info? I guess it's the latter one as it can avoid modification to both EDK2 and Linux guest. I observed ~2710 instances of "guest accepts at 4KB when KVM can map at 2MB" during the boot-up of a TD with 4GB memory. But does it mean TDX needs to hold write mmu_lock in the EPT violation handler and set KVM_LPAGE_GUEST_INHIBIT on finding a violation carries 4KB level info? > based on mixed attributes. It would be one way. It would need to get set by > something like kvm_write_track_add_gfn() that lives in tdx.c and is called > before going into the fault handler on 4k accept size. It would have to take mmu > write lock I think, which would kill scalability in the 4k accept case (but not > the normal 2MB one). But as long as mmu_write lock is held, demote will be no > problem, which the operation would also need to do. > > I think it actually makes KVM's behavior easier to understand. We don't need to > worry about races between multiple accept sizes and things like that. It also > leaves the core MMU code mostly untouched. Performance/scalability wise it only > punishes the rare case. Write down my understanding to check if it's correct: - when a TD is NOT configured to support KVM_LPAGE_GUEST_INHIBIT TDVMCALL, KVM always maps at 4KB - When a TD is configured to support KVM_LPAGE_GUEST_INHIBIT TDVMCALL, (a) 1. guest accepts at 4KB 2. TDX sets KVM_LPAGE_GUEST_INHIBIT and try splitting.(with write mmu_lock) 3. KVM maps at 4KB (with read mmu_lock) 4. guest's 4KB accept succeeds. (b) 1. guest accepts at 2MB. 2. KVM maps at 4KB due to a certain reason. 3. guest's accept 2MB fails with TDACCEPT_SIZE_MISMATCH. 4. guest accepts at 4KB 5. guest's 4KB accept succeeds. > For leaving the option open to promote the GFNs in the future, a GHCI interface > or similar could be defined for the guest to say "I don't care about page size > anymore for this gfn". So it won't close it off forever. ok.