Re: [RFC PATCH] KVM: optionally commit write on ioeventfd write

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 05, 2022, Thanos Makatos wrote:

Amusingly, I floated this exact idea internally without ever seeing this patch
(we ended up going a different direction).  Sadly, I can't claim infringement,
as my suggestion was timestamped from December 2022 :-D

If this is useful for y'all, I don't see a reason not to do it.

> ---
>  include/uapi/linux/kvm.h       | 5 ++++-
>  tools/include/uapi/linux/kvm.h | 2 ++
>  virt/kvm/eventfd.c             | 9 +++++++++
>  3 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index eed0315a77a6..0a884ac1cc76 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -804,6 +804,7 @@ enum {
>  	kvm_ioeventfd_flag_nr_deassign,
>  	kvm_ioeventfd_flag_nr_virtio_ccw_notify,
>  	kvm_ioeventfd_flag_nr_fast_mmio,
> +	kvm_ioevetnfd_flag_nr_commit_write,
>  	kvm_ioeventfd_flag_nr_max,
>  };
>  
> @@ -812,16 +813,18 @@ enum {
>  #define KVM_IOEVENTFD_FLAG_DEASSIGN  (1 << kvm_ioeventfd_flag_nr_deassign)
>  #define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \
>  	(1 << kvm_ioeventfd_flag_nr_virtio_ccw_notify)
> +#define KVM_IOEVENTFD_FLAG_COMMIT_WRITE (1 << kvm_ioevetnfd_flag_nr_commit_write)

Maybe POST_WRITE to try to capture the effective semantics?  As for read after
write hazards, my vote is to document that KVM provides no guarantees on that
front.  I can't envision a use case where it makes sense to provide guarantees
in the kernel, since doing so would largely defeat the purpose of handling writes
in the fastpath.

>  #define KVM_IOEVENTFD_VALID_FLAG_MASK  ((1 << kvm_ioeventfd_flag_nr_max) - 1)
>  
>  struct kvm_ioeventfd {
>  	__u64 datamatch;
>  	__u64 addr;        /* legal pio/mmio address */
> +	__u64 vaddr;       /* user address to write to if COMMIT_WRITE is set */

This needs to be placed at the end, i.e. actually needs to consume the pad[]
bytes.  Inserting into the middle changes the layout of the structure and thus
breaks ABI.

And maybe post_addr (or commit_addr)?  Because vaddr might be interpreted as the
host virtual address that corresponds to "addr", which may or may not be the case.

>  	__u32 len;         /* 1, 2, 4, or 8 bytes; or 0 to ignore length */
>  	__s32 fd;
>  	__u32 flags;
> -	__u8  pad[36];
> +	__u8  pad[28];
>  };
 
...

> @@ -812,6 +813,7 @@ enum {
>  #define KVM_IOEVENTFD_FLAG_DEASSIGN  (1 << kvm_ioeventfd_flag_nr_deassign)
>  #define KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY \
>  	(1 << kvm_ioeventfd_flag_nr_virtio_ccw_notify)
> +#define KVM_IOEVENTFD_FLAG_COMMIT_WRITE (1 << kvm_ioevetnfd_flag_nr_commit_write)
>  
>  #define KVM_IOEVENTFD_VALID_FLAG_MASK  ((1 << kvm_ioeventfd_flag_nr_max) - 1)
>  
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 2a3ed401ce46..c98e7b54fafa 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -682,6 +682,8 @@ struct _ioeventfd {
>  	struct kvm_io_device dev;
>  	u8                   bus_idx;
>  	bool                 wildcard;
> +	bool                 commit_write;
> +	void                 *vaddr;

There's no need for a separate bool, just pivot on the validity of the pointer.
The simplest approach is to disallow NULL pointers (which aren't technically
illegal for userspace, but I doubt any use case actually cares).  Alternatively,
set the internal pointer to e.g. -EINVAL and then act on !IS_ERR().

The pointer also needs to be tagged __user.

>  };
>  
>  static inline struct _ioeventfd *
> @@ -753,6 +755,10 @@ ioeventfd_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this, gpa_t addr,
>  	if (!ioeventfd_in_range(p, addr, len, val))
>  		return -EOPNOTSUPP;
>  
> +	if (p->commit_write) {
> +		if (unlikely(copy_to_user(p->vaddr, val, len)))

This needs to check that len > 0.  I think it's also worth hoisting the validity
checks into kvm_assign_ioeventfd_idx() so that this can use the slightly more
optimal __copy_to_user().

E.g. 

	if (args->flags & KVM_IOEVENTFD_FLAG_REDIRECT) {
		if (!args->len || !args->post_addr ||
		    args->redirect != untagged_addr(args->post_addr) ||
		    !access_ok((void __user *)(unsigned long)args->post_addr, args->len)) {
			ret = -EINVAL;
			goto fail;
		}

		p->post_addr = (void __user *)(unsigned long)args->post_addr;
	}

And then the usage here can be

	if (p->post_addr && __copy_to_user(p->post_addr, val, len))
		return -EFAULT;

I assume the spinlock in eventfd_signal() provides ordering even on weakly
ordered architectures, but we should double check that, i.e. that we don't need
an explicitly barrier of some kind.

Lastly, I believe kvm_deassign_ioeventfd_idx() needs to check for a match on
post_addr (or whatever it gets named).

> +			return -EFAULT;
> +	}
>  	eventfd_signal(p->eventfd, 1);
>  	return 0;
>  }
> @@ -832,6 +838,9 @@ static int kvm_assign_ioeventfd_idx(struct kvm *kvm,
>  	else
>  		p->wildcard = true;
>  
> +	p->commit_write = args->flags & KVM_IOEVENTFD_FLAG_COMMIT_WRITE;
> +	p->vaddr = (void *)args->vaddr;
> +
>  	mutex_lock(&kvm->slots_lock);
>  
>  	/* Verify that there isn't a match already */
> -- 
> 2.22.3
> 




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux