On Thu, 10 Jul 2025 16:53:50 +0800 lizhe.67@xxxxxxxxxxxxx wrote: > From: Li Zhe <lizhe.67@xxxxxxxxxxxxx> > > This patchset is an integration of the two previous patchsets[1][2]. > > When vfio_pin_pages_remote() is called with a range of addresses that > includes large folios, the function currently performs individual > statistics counting operations for each page. This can lead to significant > performance overheads, especially when dealing with large ranges of pages. > > The function vfio_unpin_pages_remote() has a similar issue, where executing > put_pfn() for each pfn brings considerable consumption. > > This patchset primarily optimizes the performance of the relevant functions > by batching the less efficient operations mentioned before. > > The first two patch optimizes the performance of the function > vfio_pin_pages_remote(), while the remaining patches optimize the > performance of the function vfio_unpin_pages_remote(). > > The performance test results, based on v6.16-rc4, for completing the 16G > VFIO MAP/UNMAP DMA, obtained through unit test[3] with slight > modifications[4], are as follows. > > Base(6.16-rc4): > ./vfio-pci-mem-dma-map 0000:03:00.0 16 > ------- AVERAGE (MADV_HUGEPAGE) -------- > VFIO MAP DMA in 0.047 s (340.2 GB/s) > VFIO UNMAP DMA in 0.135 s (118.6 GB/s) > ------- AVERAGE (MAP_POPULATE) -------- > VFIO MAP DMA in 0.280 s (57.2 GB/s) > VFIO UNMAP DMA in 0.312 s (51.3 GB/s) > ------- AVERAGE (HUGETLBFS) -------- > VFIO MAP DMA in 0.052 s (310.5 GB/s) > VFIO UNMAP DMA in 0.136 s (117.3 GB/s) > > With this patchset: > ------- AVERAGE (MADV_HUGEPAGE) -------- > VFIO MAP DMA in 0.027 s (600.7 GB/s) > VFIO UNMAP DMA in 0.045 s (357.0 GB/s) > ------- AVERAGE (MAP_POPULATE) -------- > VFIO MAP DMA in 0.261 s (61.4 GB/s) > VFIO UNMAP DMA in 0.288 s (55.6 GB/s) > ------- AVERAGE (HUGETLBFS) -------- > VFIO MAP DMA in 0.031 s (516.4 GB/s) > VFIO UNMAP DMA in 0.045 s (353.9 GB/s) > > For large folio, we achieve an over 40% performance improvement for VFIO > MAP DMA and an over 66% performance improvement for VFIO DMA UNMAP. For > small folios, the performance test results show a slight improvement with > the performance before optimization. > > [1]: https://lore.kernel.org/all/20250529064947.38433-1-lizhe.67@xxxxxxxxxxxxx/ > [2]: https://lore.kernel.org/all/20250620032344.13382-1-lizhe.67@xxxxxxxxxxxxx/#t > [3]: https://github.com/awilliam/tests/blob/vfio-pci-mem-dma-map/vfio-pci-mem-dma-map.c > [4]: https://lore.kernel.org/all/20250610031013.98556-1-lizhe.67@xxxxxxxxxxxxx/ > > Li Zhe (5): > mm: introduce num_pages_contiguous() > vfio/type1: optimize vfio_pin_pages_remote() > vfio/type1: batch vfio_find_vpfn() in function > vfio_unpin_pages_remote() > vfio/type1: introduce a new member has_rsvd for struct vfio_dma > vfio/type1: optimize vfio_unpin_pages_remote() > > drivers/vfio/vfio_iommu_type1.c | 111 ++++++++++++++++++++++++++------ > include/linux/mm.h | 23 +++++++ > 2 files changed, 113 insertions(+), 21 deletions(-) Applied to vfio next branch for v6.17. Thanks, Alex