On 7/28/25 9:55 AM, Yu Kuai wrote: > Hi, > > 在 2025/07/28 8:39, Damien Le Moal 写道: >> md setting its io_opt to 64K*number of drives in the array is strange... It >> does not have to be that large since io_opt is an upper bound and not a "issue >> that IO size for optimal performance". io_opt is simply a limit saying: if you >> exceed that IO size, performance may suffer. >> > > At least from Documentation, for raid arrays, multiple of io_opt is the > prefereed io size to the optimal io performance, and for raid5, this is > chunksize * data disks. > >> So a default of stride size x number of drives for the io_opt may be OK, but >> that should be bound to some reasonable value. Furthermore, this is likely >> suboptimal. I woulld think that setting the md array io_opt initially to >> min(all drives io_opt) x number of drives would be a better default. > > For raid5, this is not ok, the value have to be chunksize * data disks, > regardless of io_opt from member disks, otherwise raid5 have to issue > additional IO from other disks to build xor data. > > For example: > > - write aligned chunksize to one disk, actually means read chunksize > old xor data,then write chunksize data and chunksize new xor data. > - write aligned chunksize * data disks, new xor data can be build > directly without reading old xor data. I understand all of that. But you missed my point: io_opt simply indicates an upper bound for an IO size. If exceeded, performance may be degraded. This has *nothing* to do with the io granularity, which for a RAID array should ideally be equal to stride size x number of data disks. This is the confusion here. md setting io_opt to stride x number of disks in the array is simply not what io_opt is supposed to indicate. -- Damien Le Moal Western Digital Research