On Thu, May 15, 2025 at 03:19:46PM -0600, Alex Williamson wrote: > On Tue, 13 May 2025 11:57:30 +0800 > lizhe.67@xxxxxxxxxxxxx wrote: > > > From: Li Zhe <lizhe.67@xxxxxxxxxxxxx> > > > > When vfio_pin_pages_remote() is called with a range of addresses that > > includes hugetlbfs folios, the function currently performs individual > > statistics counting operations for each page. This can lead to significant > > performance overheads, especially when dealing with large ranges of pages. > > > > This patch optimize this process by batching the statistics counting > > operations. > > > > The performance test results for completing the 8G VFIO IOMMU DMA mapping, > > obtained through trace-cmd, are as follows. In this case, the 8G virtual > > address space has been mapped to physical memory using hugetlbfs with > > pagesize=2M. > > > > Before this patch: > > funcgraph_entry: # 33813.703 us | vfio_pin_map_dma(); > > > > After this patch: > > funcgraph_entry: # 15635.055 us | vfio_pin_map_dma(); > > > > Signed-off-by: Li Zhe <lizhe.67@xxxxxxxxxxxxx> > > --- > > drivers/vfio/vfio_iommu_type1.c | 49 +++++++++++++++++++++++++++++++++ > > 1 file changed, 49 insertions(+) > > Hi, > > Thanks for looking at improvements in this area... Why not just use iommufd? Doesn't it already does all these optimizations? Indeed today you can use iommufd with a memfd handle which should return the huge folios directly from the hugetlbfs and we never iterate with 4K pages. Jason