On Wed, Apr 23, 2025 at 11:13:08AM +0300, Leon Romanovsky wrote: > From: Leon Romanovsky <leonro@xxxxxxxxxx> > > Remove intermediate scatter-gather table completely and > enable new DMA link API. > > Tested-by: Jens Axboe <axboe@xxxxxxxxx> > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxx> > --- > drivers/vfio/pci/mlx5/cmd.c | 298 ++++++++++++++++------------------- > drivers/vfio/pci/mlx5/cmd.h | 21 ++- > drivers/vfio/pci/mlx5/main.c | 31 ---- > 3 files changed, 147 insertions(+), 203 deletions(-) Reviewed-by: Jason Gunthorpe <jgg@xxxxxxxxxx> > +static int register_dma_pages(struct mlx5_core_dev *mdev, u32 npages, > + struct page **page_list, u32 *mkey_in, > + struct dma_iova_state *state, > + enum dma_data_direction dir) > +{ > + dma_addr_t addr; > + size_t mapped = 0; > + __be64 *mtt; > + int i, err; > > - return mlx5_core_create_mkey(mdev, mkey, mkey_in, inlen); > + WARN_ON_ONCE(dir == DMA_NONE); > + > + mtt = (__be64 *)MLX5_ADDR_OF(create_mkey_in, mkey_in, klm_pas_mtt); > + > + if (dma_iova_try_alloc(mdev->device, state, 0, npages * PAGE_SIZE)) { > + addr = state->addr; > + for (i = 0; i < npages; i++) { > + err = dma_iova_link(mdev->device, state, > + page_to_phys(page_list[i]), mapped, > + PAGE_SIZE, dir, 0); > + if (err) > + goto error; > + *mtt++ = cpu_to_be64(addr); > + addr += PAGE_SIZE; > + mapped += PAGE_SIZE; > + } This is an area I'd like to see improvement on as a follow up. Given we know we are allocating contiguous IOVA we should be able to request a certain alignment so we can know that it can be put into the mkey as single mtt. That would eliminate the double translation cost in the HW. The RDMA mkey builder is able to do this from the scatterlist but the logic to do that was too complex to copy into vfio. This is close to being simple enough, just the alignment is the only problem. Jason