Re: bio segment constraints

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/7/25 03:07, Christoph Hellwig wrote:
On Sun, Apr 06, 2025 at 03:40:04PM -0400, Sean Anderson wrote:
Hi all,

I'm not really sure what guarantees the block layer makes regarding the
segments in a bio as part of a request submitted to a block driver. As
far as I can tell this is not documented anywhere. In particular,

First you need to define what segment you mean.  We have at least two and
a half historical uses of the name.  One is for each bio_vec attached to
the bio, either directly as submitted into ->submit_bio for bio based
drivers (case 1a), or generated by bio_split_to_limits (case 1b), which
is called for every blk-mq driver before calling into ->queue_rq(s) or
explicitly called by a few bio based driver.

The other is the bio-vec synthesized by bio_for_each_segment (case 2).

I'm referring to the bio_vecs you get from queue_mq. Which I think is the
latter.

- Is bv_len aligned to SECTOR_SIZE?

Yes.

- To logical_sector_size?

Yes.

OK, but...

- What if logical_sector_size > PAGE_SIZE?

Still always aligned to logical_sector_size.

- What about bv_offset?

bv_offset is a memory offset and must only be aligned to the
dma_alignment limit.

- Is it possible to have a bio where the total length is a multiple of
   logical_sector_size, but the data is split across several segments
   where each segment is a multiple of SECTOR_SIZE?

Yes.

...if this is the case, then for some of those segments wouldn't bv_len
not be a multiple of logical_sector_size?

- Is is possible to have segments not even aligned to SECTOR_SIZE?

No.

- Can I somehow request to only get segments with bv_len aligned to
   logical_sector_size?

For drivers that use bio_split_to_limits implicitly or explicitly you can
do that by setting the right seg_boundary_mask.

Is that the right knob? It operates on the physical address, so it looked
more like something for broken DMA engines. For example (if I recall correctly)
MMC SDMA can't cross a page boundary, so you could use seg_boundary_mask to
enforce that.

make some big assumptions (which might be bugs?) For example, in
drivers/mtd/mtd_blkdevs.c, do_blktrans_request looks like:

- There is only one bio in a request. This one is a bit of a soft
   assumption since we should only flush the pages in the bio and not the
   whole request otherwise.

It always operates on the first bio in the request and then uses
blk_update_request to move the context past that.  It is an old
and somewhat arkane way to write drivers, but should work.  The
rq_for_each_segment looks do call flush_dcache_page look horribly
wrong for this model, though.

- The data is in lowmem OR bv_offset + bv_len <= PAGE_SIZE. kmap() only
   maps a single page, so if we go past one page we end up in adjacent
   kmapped pages.

Yes, this looks broken.

Am I missing something here? Handling highmem seems like a persistent
issue. E.g. drivers/mtd/ubi/block.c doesn't even bother doing a kmap.
Should both of these have BLK_FEAT_BOUNCE_HIGH?

BLK_FEAT_BOUNCE_HIGH needs to go away rather sooner than later.

in the short run the best fix would be to synthesized a
bio_for_each_segment like bio_vec that stays inside a single page
using bio_iter_iovec) at the top of do_blktrans_request and use
that for all references to the data.


OK, but if you have to stay inside a single page couldn't you end up
with a sector spanning a page boundary due to only being aligned to
dma_alignment? Or maybe we set seg_boundary_mask to PAGE_MASK to enforce that?

--Sean




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux