On Tue, Feb 18, 2025 at 11:42:43AM +0100, Daniel Gomez wrote: > On Thu, Feb 13, 2025 at 08:00:40PM +0100, Ming Lei wrote: > > PAGE_SIZE is applied in validating block device queue limits, this way is > > very fragile and is wrong: > > > > - queue limits are read from hardware, which is often one readonly hardware > > property > > > > - PAGE_SIZE is one config option which can be changed during build time. > > > > In RH lab, it has been found that max segment size of some mmc card is > > less than 64K, then this kind of card can't be probed successfully when > > same kernel is re-built with 64K PAGE_SIZE. > > > > Fix this issue by using BLK_MIN_SEGMENT_SIZE in related code for dealing > > with queue limits and checking if bio needn't split as a hint. Define > > BLK_MIN_SEGMENT_SIZE as 4K(minimized PAGE_SIZE). > > > > The following commits are depended for backporting: > > > > commit 6aeb4f836480 ("block: remove bio_add_pc_page") > > commit 02ee5d69e3ba ("block: remove blk_rq_bio_prep") > > commit b7175e24d6ac ("block: add a dma mapping iterator") > > > > Cc: Paul Bunyan <pbunyan@xxxxxxxxxx> > > Cc: Yi Zhang <yi.zhang@xxxxxxxxxx> > > Cc: Luis Chamberlain <mcgrof@xxxxxxxxxx> > > Cc: John Garry <john.g.garry@xxxxxxxxxx> > > Cc: Bart Van Assche <bvanassche@xxxxxxx> > > Cc: Keith Busch <kbusch@xxxxxxxxxx> > > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > > --- > > V3: > > - rephrase commit log & fix patch style(Christoph) > > - more comment log(Christoph) > > V2: > > - cover bio_split_rw_at() > > - add BLK_MIN_SEGMENT_SIZE > > > > block/blk-merge.c | 2 +- > > block/blk-settings.c | 6 +++--- > > block/blk.h | 8 ++++++-- > > include/linux/blkdev.h | 2 ++ > > 4 files changed, 12 insertions(+), 6 deletions(-) > > > > diff --git a/block/blk-merge.c b/block/blk-merge.c > > index 15cd231d560c..b55c52a42303 100644 > > --- a/block/blk-merge.c > > +++ b/block/blk-merge.c > > @@ -329,7 +329,7 @@ int bio_split_rw_at(struct bio *bio, const struct queue_limits *lim, > > > > if (nsegs < lim->max_segments && > > bytes + bv.bv_len <= max_bytes && > > - bv.bv_offset + bv.bv_len <= PAGE_SIZE) { > > + bv.bv_offset + bv.bv_len <= BLK_MIN_SEGMENT_SIZE) { > > In which cases this "new" condition (for systems where PAGE_SIZE > > BLK_MIN_SEGMENT_SIZE) is going to be true? In my test case below, is always Yes. > false, so it defaults to the else path. And I think that is going to be the > "normal" case in these systems, is that right? It depends on block size in your workload. > > Doing a 'quick' test using next-20250213 on a 16k PAGE_SIZE system with the NVMe > driver and a 4k lbs disk + ~1h fio sequential writes, I get results indicating > a write performance degradation of ~0.8%. This is due to the new loop condition > doing 4k steps rather than PS. I guess it's going to be slighly worse the larger > the PAGE_SIZE the system is, and bio? So, why not decreasing the minimum segment No, just one extra bvec_split_segs() is called once for any >4k page size Probably the opposite, effect on 64K page size could be smaller since bio may have less bvec on same workload. > size for the cases it's actually needed rather than making it now the default? > > I've measured bio_split_rw_at latency in the above test with the following > results: Fine, I will add one min segment size for covering this case since you care the little perf drop on >4k PAGE_SIZE. Thanks, Ming