On Sun, Apr 06, 2025 at 03:40:04PM -0400, Sean Anderson wrote: > Hi all, > > I'm not really sure what guarantees the block layer makes regarding the > segments in a bio as part of a request submitted to a block driver. As > far as I can tell this is not documented anywhere. In particular, First you need to define what segment you mean. We have at least two and a half historical uses of the name. One is for each bio_vec attached to the bio, either directly as submitted into ->submit_bio for bio based drivers (case 1a), or generated by bio_split_to_limits (case 1b), which is called for every blk-mq driver before calling into ->queue_rq(s) or explicitly called by a few bio based driver. The other is the bio-vec synthesized by bio_for_each_segment (case 2). > - Is bv_len aligned to SECTOR_SIZE? Yes. > - To logical_sector_size? Yes. > - What if logical_sector_size > PAGE_SIZE? Still always aligned to logical_sector_size. > - What about bv_offset? bv_offset is a memory offset and must only be aligned to the dma_alignment limit. > - Is it possible to have a bio where the total length is a multiple of > logical_sector_size, but the data is split across several segments > where each segment is a multiple of SECTOR_SIZE? Yes. > - Is is possible to have segments not even aligned to SECTOR_SIZE? No. > - Can I somehow request to only get segments with bv_len aligned to > logical_sector_size? For drivers that use bio_split_to_limits implicitly or explicitly you can do that by setting the right seg_boundary_mask. > make some big assumptions (which might be bugs?) For example, in > drivers/mtd/mtd_blkdevs.c, do_blktrans_request looks like: > - There is only one bio in a request. This one is a bit of a soft > assumption since we should only flush the pages in the bio and not the > whole request otherwise. It always operates on the first bio in the request and then uses blk_update_request to move the context past that. It is an old and somewhat arkane way to write drivers, but should work. The rq_for_each_segment looks do call flush_dcache_page look horribly wrong for this model, though. > - The data is in lowmem OR bv_offset + bv_len <= PAGE_SIZE. kmap() only > maps a single page, so if we go past one page we end up in adjacent > kmapped pages. Yes, this looks broken. > Am I missing something here? Handling highmem seems like a persistent > issue. E.g. drivers/mtd/ubi/block.c doesn't even bother doing a kmap. > Should both of these have BLK_FEAT_BOUNCE_HIGH? BLK_FEAT_BOUNCE_HIGH needs to go away rather sooner than later. in the short run the best fix would be to synthesized a bio_for_each_segment like bio_vec that stays inside a single page using bio_iter_iovec) at the top of do_blktrans_request and use that for all references to the data.