Re: Improper io_opt setting for md raid5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

在 2025/07/29 14:13, Hannes Reinecke 写道:
On 7/28/25 11:02, Yu Kuai wrote:
Hi,

在 2025/07/28 15:44, Damien Le Moal 写道:
On 7/28/25 4:14 PM, Yu Kuai wrote:
With git log, start from commit 7e5f5fb09e6f ("block: Update topology
documentation"), the documentation start contain specail explanation for
raid array, and the optimal_io_size says:

For RAID arrays it is usually the
stripe width or the internal track size.  A properly aligned
multiple of optimal_io_size is the preferred request size for
workloads where sustained throughput is desired.

And this explanation is exactly what raid5 did, it's important that
io size is aligned multiple of io_opt.

Looking at the sysfs doc for the above fields, they are described as follows:

* /sys/block/<disk>/queue/minimum_io_size

[RO] Storage devices may report a granularity or preferred
minimum I/O size which is the smallest request the device can
perform without incurring a performance penalty.  For disk
drives this is often the physical block size.  For RAID arrays
it is often the stripe chunk size.  A properly aligned multiple
of minimum_io_size is the preferred request size for workloads
where a high number of I/O operations is desired.

So this matches the SCSI limit OPTIMAL TRANSFER LENGTH GRANULARITY and for a
RAID array, this indeed should be the stride x number of data disks.

Do you mean stripe here? io_min for raid array is always just one
chunksize.

My bad, yes, that is the definition in sysfs. So io_min is the stride size, where:

stride size x number of data disks == stripe_size.

Yes.

Note that chunk_sectors limit is the *stripe* size, not per drive stride.
Beware of the wording here to avoid confusion (this is all already super
confusing !).

This is something we're not in the same page :( For example, 8 disks
raid5, with default chunk size. Then the above calculation is:

64k * 7 = 448k

The chunksize I said is 64k...

Hmm. I always thought that the 'chunksize' is the limit which I/O must
not cross to avoid being split.
So for RAID 4/5/6 I would have thought this to be the stride size,
as MD must split larger I/O onto two disks.
Sure, one could argue that the stripe size is the chunk size, but then
MD will have to split that I/O...

BTW, I always thought chunksize to be stride size simply because there
is a metadata field in mddev superblock named 'chunk_size', which is the
stride size.

Thanks,
Kuai


Cheers,

Hannes





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux