Hi,
在 2025/04/22 11:24, Xiao Ni 写道:
commit 777d0961ff95b ("fs: move the bdex_statx call to vfs_getattr_nosec")
introduces a regression problem, like this:
[ 1479.474440] INFO: task kdevtmpfs:506 blocked for more than 491 seconds.
[ 1479.478569] __wait_for_common+0x99/0x1c0
[ 1479.478823] ? __pfx_schedule_timeout+0x10/0x10
[ 1479.479466] __flush_workqueue+0x138/0x400
[ 1479.479684] md_alloc+0x21/0x370 [md_mod]
[ 1479.479926] md_probe+0x24/0x50 [md_mod]
[ 1479.480150] blk_probe_dev+0x62/0x90
[ 1479.480368] blk_request_module+0xf/0x70
[ 1479.480604] blkdev_get_no_open+0x5e/0xa0
[ 1479.480809] bdev_statx+0x1f/0xf0
[ 1479.481364] vfs_getattr_nosec+0x10a/0x120
[ 1479.481602] handle_remove+0x68/0x290
[ 1479.481812] ? __update_idle_core+0x23/0xb0
[ 1479.482023] devtmpfs_work_loop+0x10d/0x2a0
[ 1479.482231] ? __pfx_devtmpfsd+0x10/0x10
[ 1479.482464] devtmpfsd+0x2f/0x30
[ 1479.485397] INFO: task kworker/33:1:532 blocked for more than 491 seconds.
[ 1479.535380] __wait_for_common+0x99/0x1c0
[ 1479.566876] ? __pfx_schedule_timeout+0x10/0x10
[ 1479.591156] devtmpfs_submit_req+0x6e/0x90
[ 1479.591389] devtmpfs_delete_node+0x60/0x90
[ 1479.591631] ? process_sysctl_arg+0x270/0x2f0
[ 1479.592256] device_del+0x315/0x3d0
[ 1479.592839] ? mutex_lock+0xe/0x30
[ 1479.593455] del_gendisk+0x216/0x320
[ 1479.593672] md_kobj_release+0x34/0x40 [md_mod]
[ 1479.594317] kobject_cleanup+0x3a/0x130
[ 1479.594562] process_one_work+0x19d/0x3d0
Now del_gendisk and put_disk are called asynchronously in workqueue work.
del_gendisk deletes device node by devtmpfs. devtmpfs tries to open this
array again and it flush the workqueue at the bigging of open process. So
a deadlock happens.
The asynchronous way also has a problem that the device node can still
exist after mdadm --stop command returns in a short window. So udev rule
can open this device node and create the struct mddev in kernel again.
So put del_gendisk in ioctl path and still leave put_disk in
md_kobj_release to avoid uaf.
Fixes: 777d0961ff95 ("fs: move the bdex_statx call to vfs_getattr_nosec")
Signed-off-by: Xiao Ni <xni@xxxxxxxxxx>
---
drivers/md/md.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 9daa78c5fe33..c3f793296ccc 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5746,7 +5746,6 @@ static void md_kobj_release(struct kobject *ko)
if (mddev->sysfs_level)
sysfs_put(mddev->sysfs_level);
- del_gendisk(mddev->gendisk);
put_disk(mddev->gendisk);
}
@@ -6597,6 +6596,8 @@ static int do_md_stop(struct mddev *mddev, int mode)
md_clean(mddev);
if (mddev->hold_active == UNTIL_STOP)
mddev->hold_active = 0;
+
+ del_gendisk(mddev->gendisk);
I'm still trying to understand the problem here, however, this change
will break sysfs api md/array_state, do_md_stop will be called as well,
del_gendisk in this context will deadlock, because it will wait for all
sysfs writers, include itself, to be done.
Thanks,
Kuai
}
md_new_event();
sysfs_notify_dirent_safe(mddev->sysfs_state);