Hi David On Wed, 4 Jun 2025 at 14:17, David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 27.05.25 20:02, Fuad Tabba wrote: > > Add arm64 support for handling guest page faults on guest_memfd backed > > memslots. Until guest_memfd supports huge pages, the fault granule is > > restricted to PAGE_SIZE. > > > > Signed-off-by: Fuad Tabba <tabba@xxxxxxxxxx> > > > > --- > > > > Note: This patch introduces a new function, gmem_abort() rather than > > previous attempts at trying to expand user_mem_abort(). This is because > > there are many differences in how faults are handled when backed by > > guest_memfd vs regular memslots with anonymous memory, e.g., lack of > > VMA, and for now, lack of huge page support for guest_memfd. The > > function user_mem_abort() is already big and unwieldly, adding more > > complexity to it made things more difficult to understand. > > > > Once larger page size support is added to guest_memfd, we could factor > > out the common code between these two functions. > > > > --- > > arch/arm64/kvm/mmu.c | 89 +++++++++++++++++++++++++++++++++++++++++++- > > 1 file changed, 87 insertions(+), 2 deletions(-) > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > index 9865ada04a81..896c56683d88 100644 > > --- a/arch/arm64/kvm/mmu.c > > +++ b/arch/arm64/kvm/mmu.c > > @@ -1466,6 +1466,87 @@ static bool kvm_vma_mte_allowed(struct vm_area_struct *vma) > > return vma->vm_flags & VM_MTE_ALLOWED; > > } > > > > +static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > + struct kvm_memory_slot *memslot, bool is_perm) > > TBH, I have no idea why the existing function is called "_abort". I am > sure there is a good reason :) > The reason is ARM. They're called "memory aborts", see D8.15 Memory aborts in the ARM ARM: https://developer.arm.com/documentation/ddi0487/latest/ Warning: PDF is 100mb+ with almost 15k pages :) > > +{ > > + enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED; > > + enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R; > > + bool logging, write_fault, exec_fault, writable; > > + struct kvm_pgtable *pgt; > > + struct page *page; > > + struct kvm *kvm; > > + void *memcache; > > + kvm_pfn_t pfn; > > + gfn_t gfn; > > + int ret; > > + > > + if (!is_perm) { > > + int min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu); > > + > > + if (!is_protected_kvm_enabled()) { > > + memcache = &vcpu->arch.mmu_page_cache; > > + ret = kvm_mmu_topup_memory_cache(memcache, min_pages); > > + } else { > > + memcache = &vcpu->arch.pkvm_memcache; > > + ret = topup_hyp_memcache(memcache, min_pages); > > + } > > + if (ret) > > + return ret; > > + } > > + > > + kvm = vcpu->kvm; > > + gfn = fault_ipa >> PAGE_SHIFT; > > These two can be initialized directly above. > I was trying to go with reverse christmas tree order of declarations, but I'll do that. > > + > > + logging = memslot_is_logging(memslot); > > + write_fault = kvm_is_write_fault(vcpu); > > + exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu); > > + VM_BUG_ON(write_fault && exec_fault); > > No VM_BUG_ON please. > > VM_WARN_ON_ONCE() maybe. Or just handle it along the "Unexpected L2 read > permission error" below cleanly. I'm following the same pattern as the existing user_mem_abort(), but I'll change it. > > + > > + if (is_perm && !write_fault && !exec_fault) { > > + kvm_err("Unexpected L2 read permission error\n"); > > + return -EFAULT; > > + } > > + > > + ret = kvm_gmem_get_pfn(vcpu->kvm, memslot, gfn, &pfn, &page, NULL); > > + if (ret) { > > + kvm_prepare_memory_fault_exit(vcpu, fault_ipa, PAGE_SIZE, > > + write_fault, exec_fault, false); > > + return ret; > > + } > > + > > + writable = !(memslot->flags & KVM_MEM_READONLY) && > > + (!logging || write_fault); > > + > > + if (writable) > > + prot |= KVM_PGTABLE_PROT_W; > > + > > + if (exec_fault || cpus_have_final_cap(ARM64_HAS_CACHE_DIC)) > > + prot |= KVM_PGTABLE_PROT_X; > > + > > + pgt = vcpu->arch.hw_mmu->pgt; > > Can probably also initialize directly above. Ack. > > + > > + kvm_fault_lock(kvm); > > + if (is_perm) { > > + /* > > + * Drop the SW bits in favour of those stored in the > > + * PTE, which will be preserved. > > + */ > > + prot &= ~KVM_NV_GUEST_MAP_SZ; > > + ret = KVM_PGT_FN(kvm_pgtable_stage2_relax_perms)(pgt, fault_ipa, prot, flags); > > + } else { > > + ret = KVM_PGT_FN(kvm_pgtable_stage2_map)(pgt, fault_ipa, PAGE_SIZE, > > + __pfn_to_phys(pfn), prot, > > + memcache, flags); > > + } > > + kvm_release_faultin_page(kvm, page, !!ret, writable); > > + kvm_fault_unlock(kvm); > > + > > + if (writable && !ret) > > + mark_page_dirty_in_slot(kvm, memslot, gfn); > > + > > + return ret != -EAGAIN ? ret : 0; > > +} > > + > > Nothing else jumped at me. But just like on the x86 code, I think we > need some arch experts take a look at this one ... Thanks! /fuad > -- > Cheers, > > David / dhildenb >