Sean Christopherson <seanjc@xxxxxxxxxx> writes: > On Thu, Jul 24, 2025, Ackerley Tng wrote: >> Fuad Tabba <tabba@xxxxxxxxxx> writes: >> > int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault, >> > @@ -3362,8 +3371,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_page_fault *fault, >> > if (max_level == PG_LEVEL_4K) >> > return PG_LEVEL_4K; >> > >> > - if (is_private) >> > - host_level = kvm_max_private_mapping_level(kvm, fault, slot, gfn); >> > + if (is_private || kvm_memslot_is_gmem_only(slot)) >> > + host_level = kvm_gmem_max_mapping_level(kvm, fault, slot, gfn, >> > + is_private); >> > else >> > host_level = host_pfn_mapping_level(kvm, gfn, slot); >> >> No change required now, would like to point out that in this change >> there's a bit of an assumption if kvm_memslot_is_gmem_only(), even for >> shared pages, guest_memfd will be the only source of truth. > > It's not an assumption, it's a hard requirement. > >> This holds now because shared pages are always split to 4K, but if >> shared pages become larger, might mapping in the host actually turn out >> to be smaller? > > Yes, the host userspace mappens could be smaller, and supporting that scenario is > very explicitly one of the design goals of guest_memfd. From commit a7800aa80ea4 > ("KVM: Add KVM_CREATE_GUEST_MEMFD ioctl() for guest-specific backing memory"): > > : A guest-first memory subsystem allows for optimizations and enhancements > : that are kludgy or outright infeasible to implement/support in a generic > : memory subsystem. With guest_memfd, guest protections and mapping sizes > : are fully decoupled from host userspace mappings. E.g. KVM currently > : doesn't support mapping memory as writable in the guest without it also > : being writable in host userspace, as KVM's ABI uses VMA protections to > : define the allow guest protection. Userspace can fudge this by > : establishing two mappings, a writable mapping for the guest and readable > : one for itself, but that’s suboptimal on multiple fronts. > : > : Similarly, KVM currently requires the guest mapping size to be a strict > : subset of the host userspace mapping size, e.g. KVM doesn’t support > : creating a 1GiB guest mapping unless userspace also has a 1GiB guest > : mapping. Decoupling the mappings sizes would allow userspace to precisely > : map only what is needed without impacting guest performance, e.g. to > : harden against unintentional accesses to guest memory. Let me try to understand this better. If/when guest_memfd supports larger folios for shared pages, and guest_memfd returns a 2M folio from kvm_gmem_fault_shared(), can the mapping in host userspace turn out to be 4K? If that happens, should kvm_gmem_max_mapping_level() return 4K for a memslot with kvm_memslot_is_gmem_only() == true? The above code would skip host_pfn_mapping_level() and return just what guest_memfd reports, which is 2M. Or do you mean that guest_memfd will be the source of truth in that it must also know/control, in the above scenario, that the host mapping is also 2M?