Re: [PATCH v3 4/5] dm: dm-crypt: Do not partially accept write BIOs with zoned targets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Wed, 25 Jun 2025, Damien Le Moal wrote:

> Read and write operations issued to a dm-crypt target may be split
> according to the dm-crypt internal limits defined by the max_read_size
> and max_write_size module parameters (default is 128 KB). The intent is
> to improve processing time of large BIOs by splitting them into smaller
> operations that can be parallelized on different CPUs.
> 
> For zoned dm-crypt targets, this BIO splitting is still done but without
> the parallel execution to ensure that the issuing order of write
> operations to the underlying devices remains sequential. However, the
> splitting itself causes other problems:
> 
> 1) Since dm-crypt relies on the block layer zone write plugging to
>    handle zone append emulation using regular write operations, the
>    reminder of a split write BIO will always be plugged into the target
>    zone write plugged. Once the on-going write BIO finishes, this
>    reminder BIO is unplugged and issued from the zone write plug work.
>    If this reminder BIO itself needs to be split, the reminder will be
>    re-issued and plugged again, but that causes a call to a
>    blk_queue_enter(), which may block if a queue freeze operation was
>    initiated. This results in a deadlock as DM submission still holds
>    BIOs that the queue freeze side is waiting for.
> 
> 2) dm-crypt relies on the emulation done by the block layer using
>    regular write operations for processing zone append operations. This
>    still requires to properly return the written sector as the BIO
>    sector of the original BIO. However, this can be done correctly only
>    and only if there is a single clone BIO used for processing the
>    original zone append operation issued by the user. If the size of a
>    zone append operation is larger than dm-crypt max_write_size, then
>    the orginal BIO will be split and processed as a chain of regular
>    write operations. Such chaining result in an incorrect written sector
>    being returned to the zone append issuer using the original BIO
>    sector.  This in turn results in file system data corruptions using
>    xfs or btrfs.
> 
> Fix this by modifying get_max_request_size() to always return the size
> of the BIO to avoid it being split with dm_accpet_partial_bio() in
> crypt_map(). get_max_request_size() is renamed to
> get_max_request_sectors() to clarify the unit of the value returned
> and its interface is changed to take a struct dm_target pointer and a
> pointer to the struct bio being processed. In addition to this change,
> to ensure that crypt_alloc_buffer() works correctly, set the dm-crypt
> device max_hw_sectors limit to be at most
> BIO_MAX_VECS << PAGE_SECTORS_SHIFT (1 MB with a 4KB page architecture).
> This forces DM core to split write BIOs before passing them to
> crypt_map(), and thus guaranteeing that dm-crypt can always accept an
> entire write BIO without needing to split it.
> 
> This change does not have any effect on the read path of dm-crypt. Read
> operations can still be split and the BIO fragments processed in
> parallel. There is also no impact on the performance of the write path
> given that all zone write BIOs were already processed inline instead of
> in parallel.
> 
> This change also does not affect in any way regular dm-crypt block
> devices.
> 
> Fixes: f211268ed1f9 ("dm: Use the block layer zone append emulation")
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Damien Le Moal <dlemoal@xxxxxxxxxx>

Reviewed-by: Mikulas Patocka <mpatocka@xxxxxxxxxx>

> ---
>  drivers/md/dm-crypt.c | 49 ++++++++++++++++++++++++++++++++++---------
>  1 file changed, 39 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
> index 17157c4216a5..4e80784d1734 100644
> --- a/drivers/md/dm-crypt.c
> +++ b/drivers/md/dm-crypt.c
> @@ -253,17 +253,35 @@ MODULE_PARM_DESC(max_read_size, "Maximum size of a read request");
>  static unsigned int max_write_size = 0;
>  module_param(max_write_size, uint, 0644);
>  MODULE_PARM_DESC(max_write_size, "Maximum size of a write request");
> -static unsigned get_max_request_size(struct crypt_config *cc, bool wrt)
> +
> +static unsigned get_max_request_sectors(struct dm_target *ti, struct bio *bio)
>  {
> +	struct crypt_config *cc = ti->private;
>  	unsigned val, sector_align;
> -	val = !wrt ? READ_ONCE(max_read_size) : READ_ONCE(max_write_size);
> -	if (likely(!val))
> -		val = !wrt ? DM_CRYPT_DEFAULT_MAX_READ_SIZE : DM_CRYPT_DEFAULT_MAX_WRITE_SIZE;
> -	if (wrt || cc->used_tag_size) {
> -		if (unlikely(val > BIO_MAX_VECS << PAGE_SHIFT))
> -			val = BIO_MAX_VECS << PAGE_SHIFT;
> -	}
> -	sector_align = max(bdev_logical_block_size(cc->dev->bdev), (unsigned)cc->sector_size);
> +	bool wrt = op_is_write(bio_op(bio));
> +
> +	if (wrt) {
> +		/*
> +		 * For zoned devices, splitting write operations creates the
> +		 * risk of deadlocking queue freeze operations with zone write
> +		 * plugging BIO work when the reminder of a split BIO is
> +		 * issued. So always allow the entire BIO to proceed.
> +		 */
> +		if (ti->emulate_zone_append)
> +			return bio_sectors(bio);
> +
> +		val = min_not_zero(READ_ONCE(max_write_size),
> +				   DM_CRYPT_DEFAULT_MAX_WRITE_SIZE);
> +	} else {
> +		val = min_not_zero(READ_ONCE(max_read_size),
> +				   DM_CRYPT_DEFAULT_MAX_READ_SIZE);
> +	}
> +
> +	if (wrt || cc->used_tag_size)
> +		val = min(val, BIO_MAX_VECS << PAGE_SHIFT);
> +
> +	sector_align = max(bdev_logical_block_size(cc->dev->bdev),
> +			   (unsigned)cc->sector_size);
>  	val = round_down(val, sector_align);
>  	if (unlikely(!val))
>  		val = sector_align;
> @@ -3496,7 +3514,7 @@ static int crypt_map(struct dm_target *ti, struct bio *bio)
>  	/*
>  	 * Check if bio is too large, split as needed.
>  	 */
> -	max_sectors = get_max_request_size(cc, bio_data_dir(bio) == WRITE);
> +	max_sectors = get_max_request_sectors(ti, bio);
>  	if (unlikely(bio_sectors(bio) > max_sectors))
>  		dm_accept_partial_bio(bio, max_sectors);
>  
> @@ -3733,6 +3751,17 @@ static void crypt_io_hints(struct dm_target *ti, struct queue_limits *limits)
>  		max_t(unsigned int, limits->physical_block_size, cc->sector_size);
>  	limits->io_min = max_t(unsigned int, limits->io_min, cc->sector_size);
>  	limits->dma_alignment = limits->logical_block_size - 1;
> +
> +	/*
> +	 * For zoned dm-crypt targets, there will be no internal splitting of
> +	 * write BIOs to avoid exceeding BIO_MAX_VECS vectors per BIO. But
> +	 * without respecting this limit, crypt_alloc_buffer() will trigger a
> +	 * BUG(). Avoid this by forcing DM core to split write BIOs to this
> +	 * limit.
> +	 */
> +	if (ti->emulate_zone_append)
> +		limits->max_hw_sectors = min(limits->max_hw_sectors,
> +					     BIO_MAX_VECS << PAGE_SECTORS_SHIFT);
>  }
>  
>  static struct target_type crypt_target = {
> -- 
> 2.49.0
> 





[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux