Re: [PATCH blktests] zbd/013: Test stacked drivers and queue freezing

Shinichiro Kawasaki <shinichiro.kawasaki@xxxxxxx> · Mon, 26 May 2025 08:25:30 +0000

On May 23, 2025 / 09:49, Bart Van Assche wrote:
> Since there is no test yet in the blktests repository for a device mapper
> driver stacked on top of a zoned block device, add such a test. This test
> triggers the following deadlock in kernel versions 6.10..6.14:
> 
> Call Trace:
>  <TASK>
>  __schedule+0x43f/0x12f0
>  schedule+0x27/0xf0
>  __bio_queue_enter+0x10e/0x230
>  __submit_bio+0xf0/0x280
>  submit_bio_noacct_nocheck+0x185/0x3d0
>  blk_zone_wplug_bio_work+0x1ad/0x1f0
>  process_one_work+0x17b/0x330
>  worker_thread+0x2ce/0x3f0
>  kthread+0xec/0x220
>  ret_from_fork+0x31/0x50
>  ret_from_fork_asm+0x1a/0x30
>  </TASK>
> Call Trace:
>  <TASK>
>  __schedule+0x43f/0x12f0
>  schedule+0x27/0xf0
>  blk_mq_freeze_queue_wait+0x6f/0xa0
>  queue_attr_store+0x14f/0x1b0
>  kernfs_fop_write_iter+0x13b/0x1f0
>  vfs_write+0x253/0x420
>  ksys_write+0x64/0xe0
>  do_syscall_64+0x82/0x160
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>  </TASK>

Thanks for the patch. I ran this test case and observed the deadlock
with the kernel v6.15-rc7 and linux-block/for-next tip (git hash
efe615cd8823). I hope the kernel side fix gets available before applying
this patch to the blktests master branch, because this new test case
makes the blktests runs hang.

Please find my comments below.

> 
> Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx>
> ---
>  tests/zbd/013     | 110 ++++++++++++++++++++++++++++++++++++++++++++++
>  tests/zbd/013.out |   2 +
>  2 files changed, 112 insertions(+)
>  create mode 100755 tests/zbd/013
>  create mode 100644 tests/zbd/013.out
> 
> diff --git a/tests/zbd/013 b/tests/zbd/013
> new file mode 100755
> index 000000000000..88aea23ee68a
> --- /dev/null
> +++ b/tests/zbd/013
> @@ -0,0 +1,110 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-3.0+
> +# Copyright (C) 2025 Google LLC
> +

Let's have some more description about the test here. How about the text
like this?

# Test the race between writes on zoned dm-crypt device and queue freeze via the
# sysfs attribute "queue/read_ahead_kb", using zoned null_blk.

> +. tests/zbd/rc
> +. common/null_blk

Nit: this line can be removed since tests/zbd/rc sources common/null_blk.

> +
> +DESCRIPTION="test stacked drivers and queue freezing"
> +QUICK=1

TIMED=1 should be here instead of QUICK=1 since this test case refers to
$TIMEOUT.

> +
> +requires() {
> +	_have_driver dm-crypt
> +	_have_fio
> +	_have_module null_blk

Nit: the line above can be removed since group_requires() zbd/rc checks it.

> +	_have_program cryptsetup
> +}
> +
> +# Trigger blk_mq_freeze_queue() repeatedly because there is a bug in the
> +# Linux kernel 6.10..6.14 zoned block device code that triggers a deadlock
> +# between zoned writes and queue freezing.
> +queue_freeze_loop() {
> +	while true; do
> +		echo 4 >"$1"
> +		sleep .1
> +		echo 8 >"$1"
> +		sleep .1
> +	done
> +}
> +
> +test() {
> +	set -e

Is this required? When I comment out this line and the "set +e" below,
still I was able to recreate the deadlock.

> +
> +	echo "Running ${TEST_NAME}"
> +
> +	_init_null_blk nr_devices=0 queue_mode=2

This line can be removed by renameing the null_blk devices as follows:

  nullb0 -> nullb1
  nullb1 -> nullb2

nullb0 is not recommended since it can not be reconfigured when the
null_blk driver is built-in.

> +
> +	# A small conventional block device for the LUKS header.
> +	local null_blk_params=(
> +		blocksize=4096
> +		completion_nsec=0
> +		memory_backed=1
> +		size=4            # MiB
> +		submit_queues=1
> +		power=1
> +	)
> +	_configure_null_blk nullb0 "${null_blk_params[@]}"
> +	local hdev=/dev/nullb0

As noted above, I suggest to rename from nullb0 to nullb1.

> +
> +	# A larger zoned block device for the data.
> +	local null_blk_params=(
> +		blocksize=4096
> +		completion_nsec=0
> +		memory_backed=1
> +		size=1024         # MiB
> +		submit_queues=1
> +		zoned=1
> +		power=1
> +	)
> +	_configure_null_blk nullb1 "${null_blk_params[@]}"
> +	local zdev=/dev/nullb1

As noted above, I suggest to rename from nullb1 to nullb2.

> +
> +	local luks_passphrase=this-passphrase-is-not-secret
> +	{ echo "${luks_passphrase}" |
> +		  cryptsetup luksFormat --batch-mode ${zdev} \
> +			     --header ${hdev}; }
> +	local luks_vol_name=zbd-013
> +	{ echo "${luks_passphrase}" |
> +		  cryptsetup luksOpen \
> +			     --batch-mode "${zdev}" "${luks_vol_name}" \
> +			     --header ${hdev}; }
> +	local luksdev="/dev/mapper/${luks_vol_name}"
> +	local dmdev
> +	dmdev="$(basename "$(readlink "${luksdev}")")"
> +	ls -ld "${hdev}" "${zdev}" "${luksdev}" "/dev/${dmdev}" >>"${FULL}"
> +	local zdev_basename
> +	zdev_basename=$(basename "$zdev")
> +	local max_sectors_zdev
> +	max_sectors_zdev=/sys/block/"${zdev_basename}"/queue/max_sectors_kb
> +	echo 4 > "${max_sectors_zdev}"

Is this max_sectors_kb change a key condition to recreate the deadlock? I tried
some runs with this change, and still able to recreate the hang on my test node.

> +	echo "${zdev_basename}: max_sectors_kb=$(<"${max_sectors_zdev}")" >>"${FULL}"
> +	local max_sectors_dm
> +	max_sectors_dm=/sys/block/"${dmdev}"/queue/max_sectors_kb
> +	echo "${dmdev}: max_sectors_kb=$(<"${max_sectors_dm}")" >>"${FULL}"
> +	queue_freeze_loop /sys/block/"$dmdev"/queue/read_ahead_kb &
> +	local loop_pid=$!
> +	local fio_args=(
> +		--bs=64M
> +		--direct=1
> +		--filename="${luksdev}"
> +		--runtime="$TIMEOUT"
> +		--time_based
> +		--zonemode=zbd
> +	)
> +	if ! _run_fio_verify_io "${fio_args[@]}" >>"${FULL}" 2>&1; then
> +		fail=true
> +	fi
> +
> +	set +e

As I noted above, I wonder if this line is required.

Thanks!