Hi Sean, Paolo, Oliver, + others, Here is a v3 of KVM Userfault. Thanks for all the feedback on the v2, Sean. I realize it has been 6 months since the v2; I hope that isn't an issue. I am working on the QEMU side of the changes as I get time. Let me know if it's important for me to send those patches out for this series to be merged. Be aware that this series will have non-trivial conflicts with Fuad's user mapping support for guest_memfd series[1]. For example, for the arm64 change he is making, the newly introduced gmem_abort() would need to be enlightened to handle KVM Userfault exits. Changelog: v2[2]->v3: - Pull in Sean's changes to genericize struct kvm_page_fault and use it for arm64. Many of these patches now have Sean's SoB. - Pull in Sean's small rename and squashing of the main patches. - Add kvm_arch_userfault_enabled() in place of calling kvm_arch_flush_shadow_memslot() directly from generic code. - Pull in Xin Li's documentation section number fix for KVM_CAP_ARM_WRITABLE_IMP_ID_REGS[3]. v1[4]->v2: - For arm64, no longer zap stage 2 when disabling KVM_MEM_USERFAULT (thanks Oliver). - Fix the userfault_bitmap validation and casts (thanks kernel test robot). - Fix _Atomic cast for the userfault bitmap in the selftest (thanks kernel test robot). - Pick up Reviewed-by on doc changes (thanks Bagas). Below is the cover letter from v1, mostly unchanged: Please see the RFC[5] for the problem description. In summary, guest_memfd VMs have no mechanism for doing post-copy live migration. KVM Userfault provides such a mechanism. There is a second problem that KVM Userfault solves: userfaultfd-based post-copy doesn't scale very well. KVM Userfault when used with userfaultfd can scale much better in the common case that most post-copy demand fetches are a result of vCPU access violations. This is a continuation of the solution Anish was working on[6]. This aspect of KVM Userfault is important for userfaultfd-based live migration when scaling up to hundreds of vCPUs with ~30us network latency for a PAGE_SIZE demand-fetch. The implementation in this series is version than the RFC[5]. It adds... 1. a new memslot flag is added: KVM_MEM_USERFAULT, 2. a new parameter, userfault_bitmap, into struct kvm_memory_slot, 3. a new KVM_RUN exit reason: KVM_MEMORY_EXIT_FLAG_USERFAULT, 4. a new KVM capability KVM_CAP_USERFAULT. KVM Userfault does not attempt to catch KVM's own accesses to guest memory. That is left up to userfaultfd. When enabling KVM_MEM_USERFAULT for a memslot, the second-stage mappings are zapped, and new faults will check `userfault_bitmap` to see if the fault should exit to userspace. When KVM_MEM_USERFAULT is enabled, only PAGE_SIZE mappings are permitted. When disabling KVM_MEM_USERFAULT, huge mappings will be reconstructed consistent with dirty log disabling. So on x86, huge mappings will be reconstructed, but on arm64, they won't be. KVM Userfault is not compatible with async page faults. Nikita has proposed a new implementation of async page faults that is more userspace-driven that *is* compatible with KVM Userfault[7]. See v1 for more performance details[4]. They are unchanged in this version. This series is based on the latest kvm-x86/next. [1]: https://lore.kernel.org/kvm/20250611133330.1514028-1-tabba@xxxxxxxxxx/ [2]: https://lore.kernel.org/kvm/20250109204929.1106563-1-jthoughton@xxxxxxxxxx/ [3]: https://lore.kernel.org/kvm/20250414165146.2279450-1-xin@xxxxxxxxx/ [4]: https://lore.kernel.org/kvm/20241204191349.1730936-1-jthoughton@xxxxxxxxxx/ [5]: https://lore.kernel.org/kvm/20240710234222.2333120-1-jthoughton@xxxxxxxxxx/ [6]: https://lore.kernel.org/all/20240215235405.368539-1-amoorthy@xxxxxxxxxx/ [7]: https://lore.kernel.org/kvm/20241118123948.4796-1-kalyazin@xxxxxxxxxx/#t James Houghton (11): KVM: Add common infrastructure for KVM Userfaults KVM: x86: Add support for KVM userfault exits KVM: arm64: Add support for KVM userfault exits KVM: Enable and advertise support for KVM userfault exits KVM: selftests: Fix vm_mem_region_set_flags docstring KVM: selftests: Fix prefault_mem logic KVM: selftests: Add va_start/end into uffd_desc KVM: selftests: Add KVM Userfault mode to demand_paging_test KVM: selftests: Inform set_memory_region_test of KVM_MEM_USERFAULT KVM: selftests: Add KVM_MEM_USERFAULT + guest_memfd toggle tests KVM: Documentation: Add KVM_CAP_USERFAULT and KVM_MEM_USERFAULT details Sean Christopherson (3): KVM: x86/mmu: Move "struct kvm_page_fault" definition to asm/kvm_host.h KVM: arm64: Add "struct kvm_page_fault" to gather common fault variables KVM: arm64: x86: Require "struct kvm_page_fault" for memory fault exits Xin Li (Intel) (1): KVM: Documentation: Fix section number for KVM_CAP_ARM_WRITABLE_IMP_ID_REGS Documentation/virt/kvm/api.rst | 35 ++++- arch/arm64/include/asm/kvm_host.h | 9 ++ arch/arm64/kvm/Kconfig | 1 + arch/arm64/kvm/mmu.c | 48 +++--- arch/x86/include/asm/kvm_host.h | 68 +++++++- arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/mmu/mmu.c | 13 +- arch/x86/kvm/mmu/mmu_internal.h | 77 +--------- arch/x86/kvm/x86.c | 27 ++-- include/linux/kvm_host.h | 49 +++++- include/uapi/linux/kvm.h | 6 +- .../selftests/kvm/demand_paging_test.c | 145 ++++++++++++++++-- .../testing/selftests/kvm/include/kvm_util.h | 5 + .../selftests/kvm/include/userfaultfd_util.h | 2 + tools/testing/selftests/kvm/lib/kvm_util.c | 42 ++++- .../selftests/kvm/lib/userfaultfd_util.c | 2 + .../selftests/kvm/set_memory_region_test.c | 33 ++++ virt/kvm/Kconfig | 3 + virt/kvm/kvm_main.c | 57 ++++++- 19 files changed, 489 insertions(+), 134 deletions(-) base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494 -- 2.50.0.rc2.692.g299adb8693-goog