On Tue, Jun 10, 2025 at 12:52:18PM +0200, Christian König wrote: > >> dma_addr_t/len array now that the new DMA API supporting that has been > >> merged. Is there any chance the dma-buf maintainers could start to kick this > >> off? I'm of course happy to assist. > > Work on that is already underway for some time. > > Most GPU drivers already do sg_table -> DMA array conversion, I need > to push on the remaining to clean up. Do you have a pointer? > >> Yes, that's really puzzling and should be addressed first. > > With high CPU performance (e.g., 3GHz), GUP (get_user_pages) overhead > > is relatively low (observed in 3GHz tests). > > Even on a low end CPU walking the page tables and grabbing references > shouldn't be that much of an overhead. Yes. > > There must be some reason why you see so much CPU overhead. E.g. > compound pages are broken up or similar which should not happen in > the first place. pin_user_pages outputs an array of PAGE_SIZE (modulo offset and shorter last length) array strut pages unfortunately. The block direct I/O code has grown code to reassemble folios from them fairly recently which did speed up some workloads. Is this test using the block device or iomap direct I/O code? What kernel version is it run on?