Re: [PATCH] dm-delay: Prevent zoned write reordering on suspend

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/10/25 1:33 PM, Damien Le Moal wrote:
> When a dm-delay devie is being suspended, the .presuspend() operation is
> first executed (delay_presuspend()) to immediately issue all the BIOs
> present in the delayed list of the device and also sets the device
> may_delay boolean to false. At the same time, any new BIO is issued to
> the device will not be delayed and immediately issued with delay_bio()
> returning DM_MAPIO_REMAPPED. This creates a situation where potentially
> 2 different contexts may be issuing write BIOs to the same zone of a
> zone device without respecting the issuing order from the user, that is,
> a newly issued write BIO may be issued before other write BIOs for the
> same target zone that are in the device delayed list. If such situation
> occurs, write BIOs may be failed by the underlying zoned device due to
> an unaligned write error.
> 
> Prevent this situation from happening by always handling newly issued
> write BIOs using the delayed list of BIOs, even when the device is being
> suspended. This is done by forcing the use of the worker kthread for
> zoned devices, and by modifying flush_worker_fn() to always flush all
> delayed BIOs if the device may_delay boolean is false.
> 
> Fixes: d43929ef65a6 ("dm-delay: support zoned devices")
> Signed-off-by: Damien Le Moal <dlemoal@xxxxxxxxxx>

I probably should also add:

Reported-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx>

> ---
>  drivers/md/dm-delay.c | 29 +++++++++++++++++++++--------
>  1 file changed, 21 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/md/dm-delay.c b/drivers/md/dm-delay.c
> index d4cf0ac2a7aa..c665b2ab1115 100644
> --- a/drivers/md/dm-delay.c
> +++ b/drivers/md/dm-delay.c
> @@ -128,7 +128,7 @@ static int flush_worker_fn(void *data)
>  	struct delay_c *dc = data;
>  
>  	while (!kthread_should_stop()) {
> -		flush_delayed_bios(dc, false);
> +		flush_delayed_bios(dc, !dc->may_delay);
>  		spin_lock(&dc->delayed_bios_lock);
>  		if (unlikely(list_empty(&dc->delayed_bios))) {
>  			set_current_state(TASK_INTERRUPTIBLE);
> @@ -213,6 +213,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  	struct delay_c *dc;
>  	int ret;
>  	unsigned int max_delay;
> +	bool is_zoned = false;
>  
>  	if (argc != 3 && argc != 6 && argc != 9) {
>  		ti->error = "Requires exactly 3, 6 or 9 arguments";
> @@ -236,6 +237,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  	if (ret)
>  		goto bad;
>  	max_delay = dc->read.delay;
> +	is_zoned = bdev_is_zoned(dc->read.dev->bdev);
>  
>  	if (argc == 3) {
>  		ret = delay_class_ctr(ti, &dc->write, argv);
> @@ -251,6 +253,7 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  	if (ret)
>  		goto bad;
>  	max_delay = max(max_delay, dc->write.delay);
> +	is_zoned = is_zoned || bdev_is_zoned(dc->write.dev->bdev);
>  
>  	if (argc == 6) {
>  		ret = delay_class_ctr(ti, &dc->flush, argv + 3);
> @@ -263,13 +266,16 @@ static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  	if (ret)
>  		goto bad;
>  	max_delay = max(max_delay, dc->flush.delay);
> +	is_zoned = is_zoned || bdev_is_zoned(dc->flush.dev->bdev);
>  
>  out:
> -	if (max_delay < 50) {
> -		/*
> -		 * In case of small requested delays, use kthread instead of
> -		 * timers and workqueue to achieve better latency.
> -		 */
> +	/*
> +	 * In case of small requested delays, use kthread instead of timers and
> +	 * workqueue to achieve better latency. Also use a kthread for a zoned
> +	 * device so that we can preserve the order of write operations during
> +	 * suspend.
> +	 */
> +	if (max_delay < 50 || is_zoned) {
>  		dc->worker = kthread_run(&flush_worker_fn, dc, "dm-delay-flush-worker");
>  		if (IS_ERR(dc->worker)) {
>  			ret = PTR_ERR(dc->worker);
> @@ -313,8 +319,15 @@ static int delay_bio(struct delay_c *dc, struct delay_class *c, struct bio *bio)
>  
>  	spin_lock(&dc->delayed_bios_lock);
>  	if (unlikely(!dc->may_delay)) {
> -		spin_unlock(&dc->delayed_bios_lock);
> -		return DM_MAPIO_REMAPPED;
> +		/*
> +		 * Issue the BIO immediately if the device is not zoned. FOr a
> +		 * zoned device, preserver the ordering of write operations by
> +		 * using the delay list.
> +		 */
> +		if (!bdev_is_zoned(c->dev->bdev) || c != &dc->write) {
> +			spin_unlock(&dc->delayed_bios_lock);
> +			return DM_MAPIO_REMAPPED;
> +		}
>  	}
>  	c->ops++;
>  	list_add_tail(&delayed->list, &dc->delayed_bios);


-- 
Damien Le Moal
Western Digital Research




[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux