Re: Improper io_opt setting for md raid5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 16, 2025 at 03:26:02PM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2025/07/15 23:56, Coly Li 写道:
> > Then when my raid5 array sets its queue limits, because its io_opt is 64KiB*7,
> > and the raid component sata hard drive has io_opt with 32767 sectors, by
> > calculation in block/blk-setting.c:blk_stack_limits() at line 753,
> > 753         t->io_opt = lcm_not_zero(t->io_opt, b->io_opt);
> > the calculated opt_io_size of my raid5 array is more than 1GiB. It is too large.
> 
> Perhaps we should at least provide a helper for raid5 that we prefer
> raid5 io_opt over underlying disk's io_opt. Because of raid5 internal
> implemation, chunk_size * data disks is the best choice, there will be
> significant differences in performance if not aligned with io_opt.
> 
> Something like following:
> 

Yeah, this one also solves my issue. Thanks.

Coly Li


> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index a000daafbfb4..04e7b4808e7a 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -700,6 +700,7 @@ int blk_stack_limits(struct queue_limits *t, struct
> queue_limits *b,
>                 t->features &= ~BLK_FEAT_POLL;
> 
>         t->flags |= (b->flags & BLK_FLAG_MISALIGNED);
> +       t->flags |= (b->flags & BLK_FLAG_STACK_IO_OPT);
> 
>         t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors);
>         t->max_user_sectors = min_not_zero(t->max_user_sectors,
> @@ -750,7 +751,10 @@ int blk_stack_limits(struct queue_limits *t, struct
> queue_limits *b,
>                                      b->physical_block_size);
> 
>         t->io_min = max(t->io_min, b->io_min);
> -       t->io_opt = lcm_not_zero(t->io_opt, b->io_opt);
> +       if (!t->io_opt || !(t->flags & BLK_FLAG_STACK_IO_OPT) ||
> +           (b->flags & BLK_FLAG_STACK_IO_OPT))
> +           t->io_opt = lcm_not_zero(t->io_opt, b->io_opt);
> +
>         t->dma_alignment = max(t->dma_alignment, b->dma_alignment);
> 
>         /* Set non-power-of-2 compatible chunk_sectors boundary */
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 5b270d4ee99c..bb482ec40506 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -7733,6 +7733,7 @@ static int raid5_set_limits(struct mddev *mddev)
>         lim.io_min = mddev->chunk_sectors << 9;
>         lim.io_opt = lim.io_min * (conf->raid_disks - conf->max_degraded);
>         lim.features |= BLK_FEAT_RAID_PARTIAL_STRIPES_EXPENSIVE;
> +       lim.flags |= BLK_FLAG_STACK_IO_OPT;
>         lim.discard_granularity = stripe;
>         lim.max_write_zeroes_sectors = 0;
>         mddev_stack_rdev_limits(mddev, &lim, 0);
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 332b56f323d9..65317e93790e 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -360,6 +360,9 @@ typedef unsigned int __bitwise blk_flags_t;
>  /* passthrough command IO accounting */
>  #define BLK_FLAG_IOSTATS_PASSTHROUGH   ((__force blk_flags_t)(1u << 2))
> 
> +/* ignore underlying disks io_opt */
> +#define BLK_FLAG_STACK_IO_OPT          ((__force blk_flags_t)(1u << 3))
> +
>  struct queue_limits {
>         blk_features_t          features;
>         blk_flags_t             flags;
> 




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux