On Wed, 21 May 2025 12:25:07 +0800 lizhe.67@xxxxxxxxxxxxx wrote: > From: Li Zhe <lizhe.67@xxxxxxxxxxxxx> > > When vfio_pin_pages_remote() is called with a range of addresses that > includes large folios, the function currently performs individual > statistics counting operations for each page. This can lead to significant > performance overheads, especially when dealing with large ranges of pages. > > This patch optimize this process by batching the statistics counting > operations. > > The performance test results for completing the 8G VFIO IOMMU DMA mapping, > obtained through trace-cmd, are as follows. In this case, the 8G virtual > address space has been mapped to physical memory using hugetlbfs with > pagesize=2M. > > Before this patch: > funcgraph_entry: # 33813.703 us | vfio_pin_map_dma(); > > After this patch: > funcgraph_entry: # 16071.378 us | vfio_pin_map_dma(); > > Signed-off-by: Li Zhe <lizhe.67@xxxxxxxxxxxxx> > Co-developed-by: Alex Williamson <alex.williamson@xxxxxxxxxx> > Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx> > --- Given the discussion on v3, this is currently a Nak. Follow-up in that thread if there are further ideas how to salvage this. Thanks, Alex > Changelogs: > > v3->v4: > - Use min_t() to obtain the step size, rather than min(). > - Fix some issues in commit message and title. > > v2->v3: > - Code simplification. > - Fix some issues in comments. > > v1->v2: > - Fix some issues in comments and formatting. > - Consolidate vfio_find_vpfn_range() and vfio_find_vpfn(). > - Move the processing logic for hugetlbfs folio into the while(true) loop > and use a variable with a default value of 1 to indicate the number of > consecutive pages. > > v3 patch: https://lore.kernel.org/all/20250520070020.6181-1-lizhe.67@xxxxxxxxxxxxx/ > v2 patch: https://lore.kernel.org/all/20250519070419.25827-1-lizhe.67@xxxxxxxxxxxxx/ > v1 patch: https://lore.kernel.org/all/20250513035730.96387-1-lizhe.67@xxxxxxxxxxxxx/ > > drivers/vfio/vfio_iommu_type1.c | 48 +++++++++++++++++++++++++-------- > 1 file changed, 37 insertions(+), 11 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 0ac56072af9f..bd46ed9361fe 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -319,15 +319,22 @@ static void vfio_dma_bitmap_free_all(struct vfio_iommu *iommu) > /* > * Helper Functions for host iova-pfn list > */ > -static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova) > + > +/* > + * Find the first vfio_pfn that overlapping the range > + * [iova, iova + PAGE_SIZE * npage) in rb tree. > + */ > +static struct vfio_pfn *vfio_find_vpfn_range(struct vfio_dma *dma, > + dma_addr_t iova, unsigned long npage) > { > struct vfio_pfn *vpfn; > struct rb_node *node = dma->pfn_list.rb_node; > + dma_addr_t end_iova = iova + PAGE_SIZE * npage; > > while (node) { > vpfn = rb_entry(node, struct vfio_pfn, node); > > - if (iova < vpfn->iova) > + if (end_iova <= vpfn->iova) > node = node->rb_left; > else if (iova > vpfn->iova) > node = node->rb_right; > @@ -337,6 +344,11 @@ static struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova) > return NULL; > } > > +static inline struct vfio_pfn *vfio_find_vpfn(struct vfio_dma *dma, dma_addr_t iova) > +{ > + return vfio_find_vpfn_range(dma, iova, 1); > +} > + > static void vfio_link_pfn(struct vfio_dma *dma, > struct vfio_pfn *new) > { > @@ -681,32 +693,46 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, > * and rsvd here, and therefore continues to use the batch. > */ > while (true) { > + struct folio *folio = page_folio(batch->pages[batch->offset]); > + long nr_pages; > + > if (pfn != *pfn_base + pinned || > rsvd != is_invalid_reserved_pfn(pfn)) > goto out; > > + /* > + * Note: The current nr_pages does not achieve the optimal > + * performance in scenarios where folio_nr_pages() exceeds > + * batch->capacity. It is anticipated that future enhancements > + * will address this limitation. > + */ > + nr_pages = min_t(long, batch->size, folio_nr_pages(folio) - > + folio_page_idx(folio, batch->pages[batch->offset])); > + if (nr_pages > 1 && vfio_find_vpfn_range(dma, iova, nr_pages)) > + nr_pages = 1; > + > /* > * Reserved pages aren't counted against the user, > * externally pinned pages are already counted against > * the user. > */ > - if (!rsvd && !vfio_find_vpfn(dma, iova)) { > + if (!rsvd && (nr_pages > 1 || !vfio_find_vpfn(dma, iova))) { > if (!dma->lock_cap && > - mm->locked_vm + lock_acct + 1 > limit) { > + mm->locked_vm + lock_acct + nr_pages > limit) { > pr_warn("%s: RLIMIT_MEMLOCK (%ld) exceeded\n", > __func__, limit << PAGE_SHIFT); > ret = -ENOMEM; > goto unpin_out; > } > - lock_acct++; > + lock_acct += nr_pages; > } > > - pinned++; > - npage--; > - vaddr += PAGE_SIZE; > - iova += PAGE_SIZE; > - batch->offset++; > - batch->size--; > + pinned += nr_pages; > + npage -= nr_pages; > + vaddr += PAGE_SIZE * nr_pages; > + iova += PAGE_SIZE * nr_pages; > + batch->offset += nr_pages; > + batch->size -= nr_pages; > > if (!batch->size) > break;