On Tue, 13 May 2025 11:57:30 +0800 lizhe.67@xxxxxxxxxxxxx wrote: > From: Li Zhe <lizhe.67@xxxxxxxxxxxxx> > > When vfio_pin_pages_remote() is called with a range of addresses that > includes hugetlbfs folios, the function currently performs individual > statistics counting operations for each page. This can lead to significant > performance overheads, especially when dealing with large ranges of pages. > > This patch optimize this process by batching the statistics counting > operations. > > The performance test results for completing the 8G VFIO IOMMU DMA mapping, > obtained through trace-cmd, are as follows. In this case, the 8G virtual > address space has been mapped to physical memory using hugetlbfs with > pagesize=2M. > > Before this patch: > funcgraph_entry: # 33813.703 us | vfio_pin_map_dma(); > > After this patch: > funcgraph_entry: # 15635.055 us | vfio_pin_map_dma(); > > Signed-off-by: Li Zhe <lizhe.67@xxxxxxxxxxxxx> > --- > drivers/vfio/vfio_iommu_type1.c | 49 +++++++++++++++++++++++++++++++++ > 1 file changed, 49 insertions(+) Hi, Thanks for looking at improvements in this area... > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 0ac56072af9f..bafa7f8c4cc6 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -337,6 +337,30 @@ static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova) > return NULL; > } > > +/* > + * Find a random vfio_pfn that belongs to the range > + * [iova, iova + PAGE_SIZE * npage) > + */ > +static struct vfio_pfn *vfio_find_vpfn_range(struct vfio_dma *dma, > + dma_addr_t iova, unsigned long npage) > +{ > + struct vfio_pfn *vpfn; > + struct rb_node *node = dma->pfn_list.rb_node; > + dma_addr_t end_iova = iova + PAGE_SIZE * npage; > + > + while (node) { > + vpfn = rb_entry(node, struct vfio_pfn, node); > + > + if (end_iova <= vpfn->iova) > + node = node->rb_left; > + else if (iova > vpfn->iova) > + node = node->rb_right; > + else > + return vpfn; > + } > + return NULL; > +} This essentially duplicates vfio_find_vpfn(), where the existing function only finds a single page. The existing function should be extended for this new use case and callers updated. Also the vfio_pfn is not "random", it's the first vfio_pfn overlapping the range. > + > static void vfio_link_pfn(struct vfio_dma *dma, > struct vfio_pfn *new) > { > @@ -670,6 +694,31 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, > iova += (PAGE_SIZE * ret); > continue; > } > + Spurious new blank line. > + } A new blank line here would be appreciated. > + /* Handle hugetlbfs page */ > + if (likely(!disable_hugepages) && Isn't this already accounted for with npage = 1? > + folio_test_hugetlb(page_folio(batch->pages[batch->offset]))) { I don't follow how this guarantees the entire batch->size is contiguous. Isn't it possible that a batch could contain multiple hugetlb folios? Is the assumption here only true if folio_nr_pages() (or specifically the pages remaining) is >= batch->capacity? What happens if we try to map the last half of one 2MB hugetlb page and first half of the non-physically-contiguous next page? Or what if the hugetlb size is 64KB and the batch contains multiple folios that are not physically contiguous? > + if (pfn != *pfn_base + pinned) > + goto out; > + > + if (!rsvd && !vfio_find_vpfn_range(dma, iova, batch->size)) { > + if (!dma->lock_cap && > + mm->locked_vm + lock_acct + batch->size > limit) { > + pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n", > + __func__, limit << PAGE_SHIFT); > + ret = -ENOMEM; > + goto unpin_out; > + } > + pinned += batch->size; > + npage -= batch->size; > + vaddr += PAGE_SIZE * batch->size; > + iova += PAGE_SIZE * batch->size; > + lock_acct += batch->size; > + batch->offset += batch->size; > + batch->size = 0; > + continue; > + } There's a lot of duplication with the existing page-iterative loop. I think they could be consolidated if we extract the number of known contiguous pages based on the folio into a variable, default 1. Also, while this approach is an improvement, it leaves a lot on the table in scenarios where folio_nr_pages() exceeds batch->capacity. For example we're at best incrementing 1GB hugetlb pages in 2MB increments. We're also wasting a lot of cycles to fill pages points we mostly don't use. Thanks, Alex