> -----Original Message----- > From: Christoph Hellwig <hch@xxxxxxxxxxxxx> > Sent: Monday, June 9, 2025 12:35 PM > To: Christian König <christian.koenig@xxxxxxx> > Cc: wangtao <tao.wangtao@xxxxxxxxx>; Christoph Hellwig > <hch@xxxxxxxxxxxxx>; sumit.semwal@xxxxxxxxxx; kraxel@xxxxxxxxxx; > vivek.kasireddy@xxxxxxxxx; viro@xxxxxxxxxxxxxxxxxx; brauner@xxxxxxxxxx; > hughd@xxxxxxxxxx; akpm@xxxxxxxxxxxxxxxxxxxx; amir73il@xxxxxxxxx; > benjamin.gaignard@xxxxxxxxxxxxx; Brian.Starkey@xxxxxxx; > jstultz@xxxxxxxxxx; tjmercier@xxxxxxxxxx; jack@xxxxxxx; > baolin.wang@xxxxxxxxxxxxxxxxx; linux-media@xxxxxxxxxxxxxxx; dri- > devel@xxxxxxxxxxxxxxxxxxxxx; linaro-mm-sig@xxxxxxxxxxxxxxxx; linux- > kernel@xxxxxxxxxxxxxxx; linux-fsdevel@xxxxxxxxxxxxxxx; linux- > mm@xxxxxxxxx; wangbintian(BintianWang) <bintian.wang@xxxxxxxxx>; > yipengxiang <yipengxiang@xxxxxxxxx>; liulu 00013167 > <liulu.liu@xxxxxxxxx>; hanfeng 00012985 <feng.han@xxxxxxxxx> > Subject: Re: [PATCH v4 0/4] Implement dmabuf direct I/O via > copy_file_range > > On Fri, Jun 06, 2025 at 01:20:48PM +0200, Christian König wrote: > > > dmabuf acts as a driver and shouldn't be handled by VFS, so I made > > > dmabuf implement copy_file_range callbacks to support direct I/O > > > zero-copy. I'm open to both approaches. What's the preference of VFS > > > experts? > > > > That would probably be illegal. Using the sg_table in the DMA-buf > > implementation turned out to be a mistake. > > Two thing here that should not be directly conflated. Using the sg_table was > a huge mistake, and we should try to move dmabuf to switch that to a pure I'm a bit confused: don't dmabuf importers need to traverse sg_table to access folios or dma_addr/len? Do you mean restricting sg_table access (e.g., only via iov_iter) or proposing alternative approaches? > dma_addr_t/len array now that the new DMA API supporting that has been > merged. Is there any chance the dma-buf maintainers could start to kick this > off? I'm of course happy to assist. > > But that notwithstanding, dma-buf is THE buffer sharing mechanism in the > kernel, and we should promote it instead of reinventing it badly. > And there is a use case for having a fully DMA mapped buffer in the block > layer and I/O path, especially on systems with an IOMMU. > So having an iov_iter backed by a dma-buf would be extremely helpful. > That's mostly lib/iov_iter.c code, not VFS, though. Are you suggesting adding an ITER_DMABUF type to iov_iter, or implementing dmabuf-to-iov_bvec conversion within iov_iter? > > > The question Christoph raised was rather why is your CPU so slow that > > walking the page tables has a significant overhead compared to the > > actual I/O? > > Yes, that's really puzzling and should be addressed first. With high CPU performance (e.g., 3GHz), GUP (get_user_pages) overhead is relatively low (observed in 3GHz tests). | 32x32MB Read 1024MB |Creat-ms|Close-ms| I/O-ms|I/O-MB/s| I/O% |---------------------------|--------|--------|--------|--------|----- | 1) memfd direct R/W| 1 | 118 | 312 | 3448 | 100% | 2) u+memfd direct R/W| 196 | 123 | 295 | 3651 | 105% | 3) u+memfd direct sendfile| 175 | 102 | 976 | 1100 | 31% | 4) u+memfd direct splice| 173 | 103 | 443 | 2428 | 70% | 5) udmabuf buffer R/W| 183 | 100 | 453 | 2375 | 68% | 6) dmabuf buffer R/W| 34 | 4 | 427 | 2519 | 73% | 7) udmabuf direct c_f_r| 200 | 102 | 278 | 3874 | 112% | 8) dmabuf direct c_f_r| 36 | 5 | 269 | 4002 | 116% With lower CPU performance (e.g., 1GHz), GUP overhead becomes more significant (as seen in 1GHz tests). | 32x32MB Read 1024MB |Creat-ms|Close-ms| I/O-ms|I/O-MB/s| I/O% |---------------------------|--------|--------|--------|--------|----- | 1) memfd direct R/W| 2 | 393 | 969 | 1109 | 100% | 2) u+memfd direct R/W| 592 | 424 | 570 | 1884 | 169% | 3) u+memfd direct sendfile| 587 | 356 | 2229 | 481 | 43% | 4) u+memfd direct splice| 568 | 352 | 795 | 1350 | 121% | 5) udmabuf buffer R/W| 597 | 343 | 1238 | 867 | 78% | 6) dmabuf buffer R/W| 69 | 13 | 1128 | 952 | 85% | 7) udmabuf direct c_f_r| 595 | 345 | 372 | 2889 | 260% | 8) dmabuf direct c_f_r| 80 | 13 | 274 | 3929 | 354% Regards, Wangtao.