Re: [PATCH v9 10/17] KVM: x86: Compute max_mapping_level with input from guest_memfd

David Hildenbrand <david@xxxxxxxxxx> · Thu, 22 May 2025 10:56:24 +0200

On 22.05.25 09:22, Fuad Tabba wrote:
Hi David,

On Wed, 21 May 2025 at 09:01, David Hildenbrand <david@xxxxxxxxxx> wrote:

On 13.05.25 18:34, Fuad Tabba wrote:
From: Ackerley Tng <ackerleytng@xxxxxxxxxx>

This patch adds kvm_gmem_max_mapping_level(), which always returns
PG_LEVEL_4K since guest_memfd only supports 4K pages for now.

When guest_memfd supports shared memory, max_mapping_level (especially
when recovering huge pages - see call to __kvm_mmu_max_mapping_level()
from recover_huge_pages_range()) should take input from
guest_memfd.

Input from guest_memfd should be taken in these cases:

+ if the memslot supports shared memory (guest_memfd is used for
    shared memory, or in future both shared and private memory) or
+ if the memslot is only used for private memory and that gfn is
    private.

If the memslot doesn't use guest_memfd, figure out the
max_mapping_level using the host page tables like before.

This patch also refactors and inlines the other call to
__kvm_mmu_max_mapping_level().

In kvm_mmu_hugepage_adjust(), guest_memfd's input is already
provided (if applicable) in fault->max_level. Hence, there is no need
to query guest_memfd.

lpage_info is queried like before, and then if the fault is not from
guest_memfd, adjust fault->req_level based on input from host page
tables.

Signed-off-by: Ackerley Tng <ackerleytng@xxxxxxxxxx>
Signed-off-by: Fuad Tabba <tabba@xxxxxxxxxx>
---
   arch/x86/kvm/mmu/mmu.c   | 92 ++++++++++++++++++++++++++--------------
   include/linux/kvm_host.h |  7 +++
   virt/kvm/guest_memfd.c   | 12 ++++++
   3 files changed, 79 insertions(+), 32 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index cfbb471f7c70..9e0bc8114859 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3256,12 +3256,11 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,
       return level;
   }
[...]

   static u8 kvm_max_level_for_fault_and_order(struct kvm *kvm,
                                           struct kvm_page_fault *fault,
                                           int order)
@@ -4523,7 +4551,7 @@ static int __kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu,
   {
       unsigned int foll = fault->write ? FOLL_WRITE : 0;

-     if (fault->is_private || kvm_gmem_memslot_supports_shared(fault->slot))
+     if (fault_from_gmem(fault))

Should this change rather have been done in the previous patch?

(then only adjust fault_from_gmem() in this function as required)

               return kvm_mmu_faultin_pfn_gmem(vcpu, fault);

       foll |= FOLL_NOWAIT;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index de7b46ee1762..f9bb025327c3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2560,6 +2560,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn)
   int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot,
                    gfn_t gfn, kvm_pfn_t *pfn, struct page **page,
                    int *max_order);
+int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn);
   #else
   static inline int kvm_gmem_get_pfn(struct kvm *kvm,
                                  struct kvm_memory_slot *slot, gfn_t gfn,
@@ -2569,6 +2570,12 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm,
       KVM_BUG_ON(1, kvm);
       return -EIO;
   }
+static inline int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot,
+                                      gfn_t gfn)

Probably should indent with two tabs here.

(I'm fixing the patch before respinning, hence it's me asking)

Not sure I understand. Indentation here matches the same style as that
for kvm_gmem_get_pfn() right above it in the alignment of the
parameters, i.e., the parameter `gfn_t gfn` is aligned with the
parameter `const struct kvm_memory_slot *slot` (four tabs and a
space).

Yeah, that way of indenting is rather bad practice. Especially for new 
code we're adding or when we touch existing code, we should just use two
tabs.

That way, we can fit more stuff into a single line, and when doing
simple changes, such as renaming the function or changing the return
type, we won't have to touch all the parameters.

Maybe KVM has its own rules on that ... that's why I said "probably" :)

--
Cheers,

David / dhildenb