Re: Improper io_opt setting for md raid5

Damien Le Moal <dlemoal@xxxxxxxxxx> · Mon, 28 Jul 2025 11:41:30 +0900

On 7/28/25 9:55 AM, Yu Kuai wrote:
> Hi,
> 
> 在 2025/07/28 8:39, Damien Le Moal 写道:
>> md setting its io_opt to 64K*number of drives in the array is strange... It
>> does not have to be that large since io_opt is an upper bound and not a "issue
>> that IO size for optimal performance". io_opt is simply a limit saying: if you
>> exceed that IO size, performance may suffer.
>>
> 
> At least from Documentation, for raid arrays, multiple of io_opt is the
> prefereed io size to the optimal io performance, and for raid5, this is
> chunksize * data disks.
> 
>> So a default of stride size x number of drives for the io_opt may be OK, but
>> that should be bound to some reasonable value. Furthermore, this is likely
>> suboptimal. I woulld think that setting the md array io_opt initially to
>> min(all drives io_opt) x number of drives would be a better default.
> 
> For raid5, this is not ok, the value have to be chunksize * data disks,
> regardless of io_opt from member disks, otherwise raid5 have to issue
> additional IO from other disks to build xor data.
> 
> For example:
> 
>  - write aligned chunksize to one disk, actually means read chunksize
> old xor data,then write chunksize data and chunksize new xor data.
>  - write aligned chunksize * data disks, new xor data can be build
> directly without reading old xor data.

I understand all of that. But you missed my point: io_opt simply indicates an
upper bound for an IO size. If exceeded, performance may be degraded. This has
*nothing* to do with the io granularity, which for a RAID array should ideally
be equal to stride size x number of data disks.

This is the confusion here. md setting io_opt to stride x number of disks in
the array is simply not what io_opt is supposed to indicate.

-- 
Damien Le Moal
Western Digital Research