Fengnan Chang <changfengnan@xxxxxxxxxxxxx> writes: > Ritesh Harjani <ritesh.list@xxxxxxxxx> 于2025年8月27日周三 01:26写道: >> >> Fengnan Chang <changfengnan@xxxxxxxxxxxxx> writes: >> >> > Christoph Hellwig <hch@xxxxxxxxxxxxx> 于2025年8月25日周一 17:21写道: >> >> >> >> On Mon, Aug 25, 2025 at 04:51:27PM +0800, Fengnan Chang wrote: >> >> > No restrictions for now, I think we can enable this by default. >> >> > Maybe better solution is modify in bio.c? Let me do some test first. >> >> If there are other implications to consider, for using per-cpu bio cache >> by default, then maybe we can first get the optimizations for iomap in >> for at least REQ_ALLOC_CACHE users and later work on to see if this >> can be enabled by default for other users too. >> Unless someone else thinks otherwise. >> >> Why I am thinking this is - due to limited per-cpu bio cache if everyone >> uses it for their bio submission, we may not get the best performance >> where needed. So that might require us to come up with a different >> approach. > > Agree, if everyone uses it for their bio submission, we can not get the best > performance. > >> >> >> >> >> Any kind of numbers you see where this makes a different, including >> >> the workloads would also be very valuable here. >> > I'm test random direct read performance on io_uring+ext4, and try >> > compare to io_uring+ raw blkdev, io_uring+ext4 is quite poor, I'm try to >> > improve this, I found ext4 is quite different with blkdev when run >> > bio_alloc_bioset. It's beacuse blkdev ext4 use percpu bio cache, but ext4 >> > path not. So I make this modify. >> >> I am assuming you meant to say - DIO with iouring+raw_blkdev uses >> per-cpu bio cache where as iouring+(ext4/xfs) does not use it. >> Hence you added this patch which will enable the use of it - which >> should also improve the performance of iouring+(ext4/xfs). > > Yes. DIO+iouring+raw_blkdev vs DIO+iouring+(ext4/xfs). > >> >> That make sense to me. >> >> > My test command is: >> > /fio/t/io_uring -p0 -d128 -b4096 -s1 -c1 -F1 -B1 -R1 -X1 -n1 -P1 -t0 >> > /data01/testfile >> > Without this patch: >> > BW is 1950MB >> > with this patch >> > BW is 2001MB. I guess here you meant BW: XXXX MB/s >> >> Ok. That's around 2.6% improvement.. Is that what you were expecting to >> see too? Is that because you were testing with -p0 (non-polled I/O)? > > I don't have a quantitative target for expectations, 2.6% seems reasonable. > Not related to -p0, with -p1, about 3.1% improvement. > Why we can't get 5-6% improvement? I think the biggest bottlenecks are > in ext4/xfs, most in ext4_es_lookup_extent. > Sure thanks for sharing the details. Could you add the performance improvements numbers along with the io_uring cmd you shared above in the commit message in v2? With that please feel free to add: Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx> >> >> Looking at the numbers here [1] & [2], I was hoping this could give >> maybe around 5-6% improvement ;) >> >> [1]: https://lore.kernel.org/io-uring/cover.1666347703.git.asml.silence@xxxxxxxxx/ >> [2]: https://lore.kernel.org/all/20220806152004.382170-3-axboe@xxxxxxxxx/ >> >> >> -ritesh