On 4/6/25 21:40, Sean Anderson wrote:
Hi all,
I'm not really sure what guarantees the block layer makes regarding the
segments in a bio as part of a request submitted to a block driver. As
far as I can tell this is not documented anywhere. In particular,
- Is bv_len aligned to SECTOR_SIZE?
The block layer always uses a 512 byte sector size, so yes.
- To logical_sector_size?
Not necessarily. Bvecs are a consecutive list of byte ranges which
make up the data portion of a bio.
The logical sector size is a property of the request queue, which is
applied when a request is formed from one or several bios.
For the request the overall length need to be a multiple of the logical
sector size, but not necessarily the individual bios.
- What if logical_sector_size > PAGE_SIZE?
See above.
- What about bv_offset?
Same story. The eventual request needs to observe that the offset
and the length is aligned to the logical block size, but the individual
bios might not.
- Is it possible to have a bio where the total length is a multiple of
logical_sector_size, but the data is split across several segments
where each segment is a multiple of SECTOR_SIZE?
Sure.
- Is is possible to have segments not even aligned to SECTOR_SIZE?
Nope.
- Can I somehow request to only get segments with bv_len aligned to
logical_sector_size? Or do I need to do my own coalescing and bounce
buffering for that?
The driver surely can. You should be able to set 'max_segment_size' to
the logical block size, and that should give you what you want.
I've been reading some drivers (as well as stuff in block/) to try and
figure things out, but it's hard to figure out all the places where
constraints are enforced. In particular, I've read several drivers that
make some big assumptions (which might be bugs?) For example, in
drivers/mtd/mtd_blkdevs.c, do_blktrans_request looks like:
In general, the block layer has two major data items, bios and requests.
'struct bio' is the central structure for any 'upper' layers to submit
data (via the 'submit_bio()' function), and 'struct request' is the
central structure for drivers to fetch data for submission to the
hardware (via the 'queue_rq()' request_queue callback).
And the task of the block layer is to convert 'struct bio' into
'struct request'.
[ .. ]
For context, tr->blkshift is either 512 or 4096, depending on the
backend. From what I can tell, this code assumes the following:
mtd is probably not a good examples, as MTD has it's own set of
limitations which might result in certain shortcuts to be taken.
- There is only one bio in a request. This one is a bit of a soft
assumption since we should only flush the pages in the bio and not the
whole request otherwise.
- There is only one segment in a bio. This one could be reasonable if
max_segments was set to 1, but it's not as far as I can tell. So I
guess we just go off the end of the bio if there's a second segment?
- The data is in lowmem OR bv_offset + bv_len <= PAGE_SIZE. kmap() only
maps a single page, so if we go past one page we end up in adjacent
kmapped pages.
Well, that code _does_ look suspicious. It really should be converted
to using the iov iterators.
But then again, it _might_ be okay if there are underlying MTD
restrictions which would devolve into MTD only having a single bvec.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich