Re: [PATCH v9 10/17] KVM: x86: Compute max_mapping_level with input from guest_memfd

Ackerley Tng <ackerleytng@xxxxxxxxxx> · Wed, 21 May 2025 17:45:07 -0700

David Hildenbrand <david@xxxxxxxxxx> writes:

> On 13.05.25 18:34, Fuad Tabba wrote:
>> From: Ackerley Tng <ackerleytng@xxxxxxxxxx>
>> 
>> This patch adds kvm_gmem_max_mapping_level(), which always returns
>> PG_LEVEL_4K since guest_memfd only supports 4K pages for now.
>> 
>> When guest_memfd supports shared memory, max_mapping_level (especially
>> when recovering huge pages - see call to __kvm_mmu_max_mapping_level()
>> from recover_huge_pages_range()) should take input from
>> guest_memfd.
>> 
>> Input from guest_memfd should be taken in these cases:
>> 
>> + if the memslot supports shared memory (guest_memfd is used for
>>    shared memory, or in future both shared and private memory) or
>> + if the memslot is only used for private memory and that gfn is
>>    private.
>> 
>> If the memslot doesn't use guest_memfd, figure out the
>> max_mapping_level using the host page tables like before.
>> 
>> This patch also refactors and inlines the other call to
>> __kvm_mmu_max_mapping_level().
>> 
>> In kvm_mmu_hugepage_adjust(), guest_memfd's input is already
>> provided (if applicable) in fault->max_level. Hence, there is no need
>> to query guest_memfd.
>> 
>> lpage_info is queried like before, and then if the fault is not from
>> guest_memfd, adjust fault->req_level based on input from host page
>> tables.
>> 
>> Signed-off-by: Ackerley Tng <ackerleytng@xxxxxxxxxx>
>> Signed-off-by: Fuad Tabba <tabba@xxxxxxxxxx>
>> ---
>>   arch/x86/kvm/mmu/mmu.c   | 92 ++++++++++++++++++++++++++--------------
>>   include/linux/kvm_host.h |  7 +++
>>   virt/kvm/guest_memfd.c   | 12 ++++++
>>   3 files changed, 79 insertions(+), 32 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>> index cfbb471f7c70..9e0bc8114859 100644
>> --- a/arch/x86/kvm/mmu/mmu.c
>> +++ b/arch/x86/kvm/mmu/mmu.c
>> @@ -3256,12 +3256,11 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
>>   	return level;
>>   }
> [...]
>
>>   static u8 kvm_max_level_for_fault_and_order(struct kvm *kvm,
>>   					    struct kvm_page_fault *fault,
>>   					    int order)
>> @@ -4523,7 +4551,7 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
>>   {
>>   	unsigned int foll = fault->write ? FOLL_WRITE : 0;
>>   
>> -	if (fault->is_private || kvm_gmem_memslot_supports_shared(fault->slot))
>> +	if (fault_from_gmem(fault))
>
> Should this change rather have been done in the previous patch?
>
> (then only adjust fault_from_gmem() in this function as required)
>

Yes, that is a good idea, thanks!

>>   		return kvm_mmu_faultin_pfn_gmem(vcpu, fault);
>>   
>>   	foll |= FOLL_NOWAIT;
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index de7b46ee1762..f9bb025327c3 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -2560,6 +2560,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
>>   int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
>>   		     gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
>>   		     int *max_order);
>> +int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn);
>>   #else
>>   static inline int kvm_gmem_get_pfn(struct kvm *kvm,
>>   				   struct kvm_memory_slot *slot, gfn_t gfn,
>> @@ -2569,6 +2570,12 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
>>   	KVM_BUG_ON(1, kvm);
>>   	return -EIO;
>>   }
>> +static inline int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot,
>> +					 gfn_t gfn)
>
> Probably should indent with two tabs here.

Yup!