Re: Improper io_opt setting for md raid5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

在 2025/07/15 23:56, Coly Li 写道:
Then when my raid5 array sets its queue limits, because its io_opt is 64KiB*7,
and the raid component sata hard drive has io_opt with 32767 sectors, by
calculation in block/blk-setting.c:blk_stack_limits() at line 753,
753         t->io_opt = lcm_not_zero(t->io_opt, b->io_opt);
the calculated opt_io_size of my raid5 array is more than 1GiB. It is too large.

Perhaps we should at least provide a helper for raid5 that we prefer
raid5 io_opt over underlying disk's io_opt. Because of raid5 internal
implemation, chunk_size * data disks is the best choice, there will be
significant differences in performance if not aligned with io_opt.

Something like following:

diff --git a/block/blk-settings.c b/block/blk-settings.c
index a000daafbfb4..04e7b4808e7a 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -700,6 +700,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
                t->features &= ~BLK_FEAT_POLL;

        t->flags |= (b->flags & BLK_FLAG_MISALIGNED);
+       t->flags |= (b->flags & BLK_FLAG_STACK_IO_OPT);

        t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors);
        t->max_user_sectors = min_not_zero(t->max_user_sectors,
@@ -750,7 +751,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
                                     b->physical_block_size);

        t->io_min = max(t->io_min, b->io_min);
-       t->io_opt = lcm_not_zero(t->io_opt, b->io_opt);
+       if (!t->io_opt || !(t->flags & BLK_FLAG_STACK_IO_OPT) ||
+           (b->flags & BLK_FLAG_STACK_IO_OPT))
+           t->io_opt = lcm_not_zero(t->io_opt, b->io_opt);
+
        t->dma_alignment = max(t->dma_alignment, b->dma_alignment);

        /* Set non-power-of-2 compatible chunk_sectors boundary */
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 5b270d4ee99c..bb482ec40506 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7733,6 +7733,7 @@ static int raid5_set_limits(struct mddev *mddev)
        lim.io_min = mddev->chunk_sectors << 9;
        lim.io_opt = lim.io_min * (conf->raid_disks - conf->max_degraded);
        lim.features |= BLK_FEAT_RAID_PARTIAL_STRIPES_EXPENSIVE;
+       lim.flags |= BLK_FLAG_STACK_IO_OPT;
        lim.discard_granularity = stripe;
        lim.max_write_zeroes_sectors = 0;
        mddev_stack_rdev_limits(mddev, &lim, 0);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 332b56f323d9..65317e93790e 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -360,6 +360,9 @@ typedef unsigned int __bitwise blk_flags_t;
 /* passthrough command IO accounting */
 #define BLK_FLAG_IOSTATS_PASSTHROUGH   ((__force blk_flags_t)(1u << 2))

+/* ignore underlying disks io_opt */
+#define BLK_FLAG_STACK_IO_OPT          ((__force blk_flags_t)(1u << 3))
+
 struct queue_limits {
        blk_features_t          features;
        blk_flags_t             flags;





[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux