On Wed, Oct 05, 2022, Thanos Makatos wrote: Amusingly, I floated this exact idea internally without ever seeing this patch (we ended up going a different direction). Sadly, I can't claim infringement, as my suggestion was timestamped from December 2022 :-D If this is useful for y'all, I don't see a reason not to do it. > --- > include/uapi/linux/kvm.h | 5 ++++- > tools/include/uapi/linux/kvm.h | 2 ++ > virt/kvm/eventfd.c | 9 +++++++++ > 3 files changed, 15 insertions(+), 1 deletion(-) > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index eed0315a77a6..0a884ac1cc76 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -804,6 +804,7 @@ enum { > kvm_ioeventfd_flag_nr_deassign, > kvm_ioeventfd_flag_nr_virtio_ccw_notify, > kvm_ioeventfd_flag_nr_fast_mmio, > + kvm_ioevetnfd_flag_nr_commit_write, > kvm_ioeventfd_flag_nr_max, > }; > > @@ -812,16 +813,18 @@ enum { > #define KVM_IOEVENTFD_FLAG_DEASSIGN (1 << kvm_ioeventfd_flag_nr_deassign) > #define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \ > (1 << kvm_ioeventfd_flag_nr_virtio_ccw_notify) > +#define KVM_IOEVENTFD_FLAG_COMMIT_WRITE (1 << kvm_ioevetnfd_flag_nr_commit_write) Maybe POST_WRITE to try to capture the effective semantics? As for read after write hazards, my vote is to document that KVM provides no guarantees on that front. I can't envision a use case where it makes sense to provide guarantees in the kernel, since doing so would largely defeat the purpose of handling writes in the fastpath. > #define KVM_IOEVENTFD_VALID_FLAG_MASK ((1 << kvm_ioeventfd_flag_nr_max) - 1) > > struct kvm_ioeventfd { > __u64 datamatch; > __u64 addr; /* legal pio/mmio address */ > + __u64 vaddr; /* user address to write to if COMMIT_WRITE is set */ This needs to be placed at the end, i.e. actually needs to consume the pad[] bytes. Inserting into the middle changes the layout of the structure and thus breaks ABI. And maybe post_addr (or commit_addr)? Because vaddr might be interpreted as the host virtual address that corresponds to "addr", which may or may not be the case. > __u32 len; /* 1, 2, 4, or 8 bytes; or 0 to ignore length */ > __s32 fd; > __u32 flags; > - __u8 pad[36]; > + __u8 pad[28]; > }; ... > @@ -812,6 +813,7 @@ enum { > #define KVM_IOEVENTFD_FLAG_DEASSIGN (1 << kvm_ioeventfd_flag_nr_deassign) > #define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \ > (1 << kvm_ioeventfd_flag_nr_virtio_ccw_notify) > +#define KVM_IOEVENTFD_FLAG_COMMIT_WRITE (1 << kvm_ioevetnfd_flag_nr_commit_write) > > #define KVM_IOEVENTFD_VALID_FLAG_MASK ((1 << kvm_ioeventfd_flag_nr_max) - 1) > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > index 2a3ed401ce46..c98e7b54fafa 100644 > --- a/virt/kvm/eventfd.c > +++ b/virt/kvm/eventfd.c > @@ -682,6 +682,8 @@ struct _ioeventfd { > struct kvm_io_device dev; > u8 bus_idx; > bool wildcard; > + bool commit_write; > + void *vaddr; There's no need for a separate bool, just pivot on the validity of the pointer. The simplest approach is to disallow NULL pointers (which aren't technically illegal for userspace, but I doubt any use case actually cares). Alternatively, set the internal pointer to e.g. -EINVAL and then act on !IS_ERR(). The pointer also needs to be tagged __user. > }; > > static inline struct _ioeventfd * > @@ -753,6 +755,10 @@ ioeventfd_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this, gpa_t addr, > if (!ioeventfd_in_range(p, addr, len, val)) > return -EOPNOTSUPP; > > + if (p->commit_write) { > + if (unlikely(copy_to_user(p->vaddr, val, len))) This needs to check that len > 0. I think it's also worth hoisting the validity checks into kvm_assign_ioeventfd_idx() so that this can use the slightly more optimal __copy_to_user(). E.g. if (args->flags & KVM_IOEVENTFD_FLAG_REDIRECT) { if (!args->len || !args->post_addr || args->redirect != untagged_addr(args->post_addr) || !access_ok((void __user *)(unsigned long)args->post_addr, args->len)) { ret = -EINVAL; goto fail; } p->post_addr = (void __user *)(unsigned long)args->post_addr; } And then the usage here can be if (p->post_addr && __copy_to_user(p->post_addr, val, len)) return -EFAULT; I assume the spinlock in eventfd_signal() provides ordering even on weakly ordered architectures, but we should double check that, i.e. that we don't need an explicitly barrier of some kind. Lastly, I believe kvm_deassign_ioeventfd_idx() needs to check for a match on post_addr (or whatever it gets named). > + return -EFAULT; > + } > eventfd_signal(p->eventfd, 1); > return 0; > } > @@ -832,6 +838,9 @@ static int kvm_assign_ioeventfd_idx(struct kvm *kvm, > else > p->wildcard = true; > > + p->commit_write = args->flags & KVM_IOEVENTFD_FLAG_COMMIT_WRITE; > + p->vaddr = (void *)args->vaddr; > + > mutex_lock(&kvm->slots_lock); > > /* Verify that there isn't a match already */ > -- > 2.22.3 >