On Tue, May 6, 2025 at 8:01 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Thu, Jan 09, 2025, James Houghton wrote: > > Use one of the 14 reserved u64s in struct kvm_userspace_memory_region2 > > for the user to provide `userfault_bitmap`. > > > > The memslot flag indicates if KVM should be reading from the > > `userfault_bitmap` field from the memslot. The user is permitted to > > provide a bogus pointer. If the pointer cannot be read from, we will > > return -EFAULT (with no other information) back to the user. > > For the uAPI+infrastructure changelog, please elaborate on the design goals and > choices. The "what" is pretty obvious from the patch; describe why this is being > added. > > > Signed-off-by: James Houghton <jthoughton@xxxxxxxxxx> > > --- > > include/linux/kvm_host.h | 14 ++++++++++++++ > > include/uapi/linux/kvm.h | 4 +++- > > virt/kvm/Kconfig | 3 +++ > > virt/kvm/kvm_main.c | 35 +++++++++++++++++++++++++++++++++++ > > 4 files changed, 55 insertions(+), 1 deletion(-) > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index 401439bb21e3..f7a3dfd5e224 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -590,6 +590,7 @@ struct kvm_memory_slot { > > unsigned long *dirty_bitmap; > > struct kvm_arch_memory_slot arch; > > unsigned long userspace_addr; > > + unsigned long __user *userfault_bitmap; > > u32 flags; > > short id; > > u16 as_id; > > @@ -724,6 +725,11 @@ static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm) > > } > > #endif > > > > +static inline bool kvm_has_userfault(struct kvm *kvm) > > +{ > > + return IS_ENABLED(CONFIG_HAVE_KVM_USERFAULT); > > +} > > Eh, don't think we need this wrapper. Just check the CONFIG_xxx manually in the > one or two places where code isn't guarded by an #ifdef. > > > struct kvm_memslots { > > u64 generation; > > atomic_long_t last_used_slot; > > @@ -2553,4 +2559,12 @@ long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, > > struct kvm_pre_fault_memory *range); > > #endif > > > > +int kvm_gfn_userfault(struct kvm *kvm, struct kvm_memory_slot *memslot, > > + gfn_t gfn); > > + > > +static inline bool kvm_memslot_userfault(struct kvm_memory_slot *memslot) > > I strongly prefer kvm_is_userfault_memslot(). KVM's weird kvm_memslot_<flag>() > nomenclature comes from ancient code, i.e. isn't something I would follow. > > > +{ > > + return memslot->flags & KVM_MEM_USERFAULT; > > I think it's worth checking for a non-NULL memslot, even if all current callers > pre-check for a slot. > > > @@ -2042,6 +2051,9 @@ int __kvm_set_memory_region(struct kvm *kvm, > > if (r) > > goto out; > > } > > + if (mem->flags & KVM_MEM_USERFAULT) > > + new->userfault_bitmap = > > + (unsigned long __user *)(unsigned long)mem->userfault_bitmap; > > if (mem->flags & KVM_MEM_USERFAULT) > new->userfault_bitmap = u64_to_user_ptr(mem->userfault_bitmap); Applied this change to the other cast (where we do access_ok()) as well, thanks! > > > r = kvm_set_memslot(kvm, old, new, change); > > if (r) > > @@ -6426,3 +6438,26 @@ void kvm_exit(void) > > kvm_irqfd_exit(); > > } > > EXPORT_SYMBOL_GPL(kvm_exit); > > + > > +int kvm_gfn_userfault(struct kvm *kvm, struct kvm_memory_slot *memslot, > > + gfn_t gfn) > > I think this series is the perfect opportunity (read: victim) to introduce a > common "struct kvm_page_fault". With a common structure to provide the gfn, slot, > write, exec, and is_private fields, this helper can handle the checks and the call > to kvm_prepare_memory_fault_exit(). > > And with that in place, I would vote to name this something like kvm_do_userfault(), > return a boolean, and let the caller return -EFAULT. Returning 'true' from kvm_do_userfault() without a kvm_prepare_memory_fault_exit() looked a bit strange at first, but I don't have strong feelings. I'll add a small comment there. > > For making "struct kvm_page_fault" common, one thought would be to have arch code > define the entire struct, and simply assert on the few fields that common KVM needs > being defined by arch code. And wrap all references in CONFIG_KVM_GENERIC_PAGE_FAULT. > > I don't expect there will be a huge number of fields that common KVM needs, i.e. I > don't think the maintenance burden of punting to arch code will be high. And letting > arch code own the entire struct means we don't need to have e.g. fault->arch.present > vs. fault->write in KVM x86, which to me is a big net negative for readability. > > I'll respond to the cover letter with an attachment of seven patches to sketch out > the idea. Looks great! Thanks very much! > > > +{ > > + unsigned long bitmap_chunk = 0; > > + off_t offset; > > + > > + if (!kvm_memslot_userfault(memslot)) > > + return 0; > > + > > + if (WARN_ON_ONCE(!memslot->userfault_bitmap)) > > + return 0; > > '0' is technically a valid userspace address. I'd just drop this. If we have a > KVM bug that results in failure to generate usefaults, we'll notice quite quickly. > > > + > > + offset = gfn - memslot->base_gfn; > > + > > + if (copy_from_user(&bitmap_chunk, > > + memslot->userfault_bitmap + offset / BITS_PER_LONG, > > + sizeof(bitmap_chunk))) > > Since the address is checked during memslot creation, I'm pretty sure this can > use __get_user(). At the very least, it should be get_user(). Thanks! I agree, __get_user() should be fine. > > > + return -EFAULT; > > + > > + /* Set in the bitmap means that the gfn is userfault */ > > + return !!(bitmap_chunk & (1ul << (offset % BITS_PER_LONG))); > > test_bit()? Thanks for all the feedback and applying it for me in those patches you sent back. :)