On Thu, Aug 7, 2025 at 7:58 PM Eugenio Pérez <eperezma@xxxxxxxxxx> wrote: > > This allows sepparate the different virtqueues in groups that shares the > same address space. Asking the VDUSE device for the groups of the vq at > the beginning as they're needed for the DMA API. > > Allocating 3 vq groups as net is the device that need the most groups: > * Dataplane (guest passthrough) > * CVQ > * Shadowed vrings. > > Future versions of the series can include dynamic allocation of the > groups array so VDUSE can declare more groups. > > Signed-off-by: Eugenio Pérez <eperezma@xxxxxxxxxx> > --- > v2: > * Cache group information in kernel, as we need to provide the vq map > tokens properly. > * Add descs vq group to optimize SVQ forwarding and support indirect > descriptors out of the box. > --- > drivers/vdpa/vdpa_user/vduse_dev.c | 71 +++++++++++++++++++++++++++++- > include/uapi/linux/vduse.h | 19 +++++++- > 2 files changed, 88 insertions(+), 2 deletions(-) > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c > index d858c4389cc1..d1f6d00a9c71 100644 > --- a/drivers/vdpa/vdpa_user/vduse_dev.c > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c > @@ -46,6 +46,11 @@ > #define VDUSE_IOVA_SIZE (VDUSE_MAX_BOUNCE_SIZE + 128 * 1024 * 1024) > #define VDUSE_MSG_DEFAULT_TIMEOUT 30 > > +/* > + * Let's make it 3 for simplicity. > + */ > +#define VDUSE_MAX_VQ_GROUPS 3 I think we can release this to something like 64. Otherwise we might bump the version again just to increase the limitation? Or having a sysfs entry like bounce_size? > + > #define IRQ_UNBOUND -1 > > struct vduse_virtqueue { > @@ -58,6 +63,8 @@ struct vduse_virtqueue { > struct vdpa_vq_state state; > bool ready; > bool kicked; > + u32 vq_group; > + u32 vq_desc_group; > spinlock_t kick_lock; > spinlock_t irq_lock; > struct eventfd_ctx *kickfd; > @@ -114,6 +121,7 @@ struct vduse_dev { > u8 status; > u32 vq_num; > u32 vq_align; > + u32 ngroups; > struct vduse_umem *umem; > struct mutex mem_lock; > unsigned int bounce_size; > @@ -592,6 +600,20 @@ static int vduse_vdpa_set_vq_state(struct vdpa_device *vdpa, u16 idx, > return 0; > } > > +static u32 vduse_get_vq_group(struct vdpa_device *vdpa, u16 idx) > +{ > + struct vduse_dev *dev = vdpa_to_vduse(vdpa); > + > + return dev->vqs[idx]->vq_group; > +} > + > +static u32 vduse_get_vq_desc_group(struct vdpa_device *vdpa, u16 idx) > +{ > + struct vduse_dev *dev = vdpa_to_vduse(vdpa); > + > + return dev->vqs[idx]->vq_desc_group; > +} > + > static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx, > struct vdpa_vq_state *state) > { > @@ -678,13 +700,48 @@ static u8 vduse_vdpa_get_status(struct vdpa_device *vdpa) > return dev->status; > } > > +static int vduse_fill_vq_groups(struct vduse_dev *dev) > +{ > + if (dev->api_version < VDUSE_API_VERSION_1) > + return 0; > + > + for (int i = 0; i < dev->vdev->vdpa.nvqs; ++i) { > + struct vduse_dev_msg msg = { 0 }; > + int ret; > + > + msg.req.type = VDUSE_GET_VQ_GROUP; > + msg.req.vq_group.index = i; > + ret = vduse_dev_msg_sync(dev, &msg); I fail to understand why the default group mapping is not done during device creation. > + if (ret) > + return ret; > + > + dev->vqs[i]->vq_group = msg.resp.vq_group.num; > + > + msg.req.type = VDUSE_GET_VRING_DESC_GROUP; > + ret = vduse_dev_msg_sync(dev, &msg); > + if (ret) > + return ret; > + > + dev->vqs[i]->vq_desc_group = msg.resp.vq_group.num; > + } > + > + return 0; > +} > + > static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status) > { > struct vduse_dev *dev = vdpa_to_vduse(vdpa); > + u8 previous_status = dev->status; > > if (vduse_dev_set_status(dev, status)) > return; > > + if ((dev->status ^ previous_status) & > + BIT_ULL(VIRTIO_CONFIG_S_FEATURES_OK) && > + status & (1ULL << VIRTIO_CONFIG_S_FEATURES_OK)) > + if (vduse_fill_vq_groups(dev)) Can we merge the two messages into a single one? Or can we use a shared memory for storing such mapping? For example, if we have 256 queues it would be very slow. Thanks