On Wed, Apr 23, 2025 at 03:09:41PM -0300, Jason Gunthorpe wrote: > On Wed, Apr 23, 2025 at 11:13:08AM +0300, Leon Romanovsky wrote: > > From: Leon Romanovsky <leonro@xxxxxxxxxx> > > > > Remove intermediate scatter-gather table completely and > > enable new DMA link API. > > > > Tested-by: Jens Axboe <axboe@xxxxxxxxx> > > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxx> > > --- > > drivers/vfio/pci/mlx5/cmd.c | 298 ++++++++++++++++------------------- > > drivers/vfio/pci/mlx5/cmd.h | 21 ++- > > drivers/vfio/pci/mlx5/main.c | 31 ---- > > 3 files changed, 147 insertions(+), 203 deletions(-) > > Reviewed-by: Jason Gunthorpe <jgg@xxxxxxxxxx> > > > +static int register_dma_pages(struct mlx5_core_dev *mdev, u32 npages, > > + struct page **page_list, u32 *mkey_in, > > + struct dma_iova_state *state, > > + enum dma_data_direction dir) > > +{ > > + dma_addr_t addr; > > + size_t mapped = 0; > > + __be64 *mtt; > > + int i, err; > > > > - return mlx5_core_create_mkey(mdev, mkey, mkey_in, inlen); > > + WARN_ON_ONCE(dir == DMA_NONE); > > + > > + mtt = (__be64 *)MLX5_ADDR_OF(create_mkey_in, mkey_in, klm_pas_mtt); > > + > > + if (dma_iova_try_alloc(mdev->device, state, 0, npages * PAGE_SIZE)) { > > + addr = state->addr; > > + for (i = 0; i < npages; i++) { > > + err = dma_iova_link(mdev->device, state, > > + page_to_phys(page_list[i]), mapped, > > + PAGE_SIZE, dir, 0); > > + if (err) > > + goto error; > > + *mtt++ = cpu_to_be64(addr); > > + addr += PAGE_SIZE; > > + mapped += PAGE_SIZE; > > + } > > This is an area I'd like to see improvement on as a follow up. > > Given we know we are allocating contiguous IOVA we should be able to > request a certain alignment so we can know that it can be put into the > mkey as single mtt. That would eliminate the double translation cost in > the HW. > > The RDMA mkey builder is able to do this from the scatterlist but the > logic to do that was too complex to copy into vfio. This is close to > being simple enough, just the alignment is the only problem. I saw this improvement as well, but there is a need to generalize this "if (dma_iova_try_alloc) ... else ..." code first, as it will be used by all vfio HW drivers. So the plan is: 1. Merge the code as is. 2. Convert second vfio HW to the new API. 3. Propose something like dma_map_pages(..., struct page **page_list, ...) to map array of pages. 4. Optimize mlx5 vfio MTT creation. Thanks > > Jason >