> Ok, looks like there are two problems now: > > a) io_min, size to prevent performance penalty; > > 1) For raid5, to avoid read-modify-write, this value should be 448k, > but now it's 64k; You have two penalties for RAID5: Writes smaller than the stripe chunk size and writes smaller than the full stripe width. > 2) For raid0/raid10, this value is set to 64k now, however, this value > should not set. If the value in member disks is 4k, issue 4k is just > fine, there won't be any performance penalty; Correct. > 3) For raid1, this value is not set, and will use member disks, this is > correct. Correct. > b) io_opt, size to ??? > 4) For raid0/raid10/rai5, this value is set to mininal IO size to get > best performance. For RAID 0 you want to set io_opt to the stripe width. io_opt is for sequential, throughput-optimized I/O. Presumably the MD stripe chunk size has been chosen based on knowledge about the underlying disks and their performance. And thus maximum throughput will be achieved when doing full stripe writes across all drives. For software RAID I am not sure how much this really matters in a modern context. It certainly did 25 years ago when we benchmarked things for XFS. Full stripe writes were a big improvement with both software and hardware RAID. But how much this matters today, I am not sure. > 5) For raid1, this value is not set, and will use member disks. Correct. > > If io_opt should be *upper bound*, problem 4) should be fixed like case > 5), and other places like blk_apply_bdi_limits() setting ra_pages by > io_opt should be fixed as well. I understand Damien's "upper bound" interpretation but it does not take alignment and granularity into account. And both are imperative for io_opt. > If io_opt should be *mininal IO size to get best performance*, What is "best performance"? IOPS or throughput? io_min is about IOPS. io_opt is about throughput. -- Martin K. Petersen