On Tue, Apr 22, 2025 at 2:21 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > Hi, > > 在 2025/04/22 11:24, Xiao Ni 写道: > > commit 777d0961ff95b ("fs: move the bdex_statx call to vfs_getattr_nosec") > > introduces a regression problem, like this: > > [ 1479.474440] INFO: task kdevtmpfs:506 blocked for more than 491 seconds. > > [ 1479.478569] __wait_for_common+0x99/0x1c0 > > [ 1479.478823] ? __pfx_schedule_timeout+0x10/0x10 > > [ 1479.479466] __flush_workqueue+0x138/0x400 > > [ 1479.479684] md_alloc+0x21/0x370 [md_mod] > > [ 1479.479926] md_probe+0x24/0x50 [md_mod] > > [ 1479.480150] blk_probe_dev+0x62/0x90 > > [ 1479.480368] blk_request_module+0xf/0x70 > > [ 1479.480604] blkdev_get_no_open+0x5e/0xa0 > > [ 1479.480809] bdev_statx+0x1f/0xf0 > > [ 1479.481364] vfs_getattr_nosec+0x10a/0x120 > > [ 1479.481602] handle_remove+0x68/0x290 > > [ 1479.481812] ? __update_idle_core+0x23/0xb0 > > [ 1479.482023] devtmpfs_work_loop+0x10d/0x2a0 > > [ 1479.482231] ? __pfx_devtmpfsd+0x10/0x10 > > [ 1479.482464] devtmpfsd+0x2f/0x30 > > > > [ 1479.485397] INFO: task kworker/33:1:532 blocked for more than 491 seconds. > > [ 1479.535380] __wait_for_common+0x99/0x1c0 > > [ 1479.566876] ? __pfx_schedule_timeout+0x10/0x10 > > [ 1479.591156] devtmpfs_submit_req+0x6e/0x90 > > [ 1479.591389] devtmpfs_delete_node+0x60/0x90 > > [ 1479.591631] ? process_sysctl_arg+0x270/0x2f0 > > [ 1479.592256] device_del+0x315/0x3d0 > > [ 1479.592839] ? mutex_lock+0xe/0x30 > > [ 1479.593455] del_gendisk+0x216/0x320 > > [ 1479.593672] md_kobj_release+0x34/0x40 [md_mod] > > [ 1479.594317] kobject_cleanup+0x3a/0x130 > > [ 1479.594562] process_one_work+0x19d/0x3d0 > > > > Now del_gendisk and put_disk are called asynchronously in workqueue work. > > del_gendisk deletes device node by devtmpfs. devtmpfs tries to open this > > array again and it flush the workqueue at the bigging of open process. So > > a deadlock happens. > > > > The asynchronous way also has a problem that the device node can still > > exist after mdadm --stop command returns in a short window. So udev rule > > can open this device node and create the struct mddev in kernel again. > > > > So put del_gendisk in ioctl path and still leave put_disk in > > md_kobj_release to avoid uaf. > > > > Fixes: 777d0961ff95 ("fs: move the bdex_statx call to vfs_getattr_nosec") > > Signed-off-by: Xiao Ni <xni@xxxxxxxxxx> > > --- > > drivers/md/md.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/md/md.c b/drivers/md/md.c > > index 9daa78c5fe33..c3f793296ccc 100644 > > --- a/drivers/md/md.c > > +++ b/drivers/md/md.c > > @@ -5746,7 +5746,6 @@ static void md_kobj_release(struct kobject *ko) > > if (mddev->sysfs_level) > > sysfs_put(mddev->sysfs_level); > > > > - del_gendisk(mddev->gendisk); > > put_disk(mddev->gendisk); > > } > > > > @@ -6597,6 +6596,8 @@ static int do_md_stop(struct mddev *mddev, int mode) > > md_clean(mddev); > > if (mddev->hold_active == UNTIL_STOP) > > mddev->hold_active = 0; > > + > > + del_gendisk(mddev->gendisk); > > I'm still trying to understand the problem here, however, this change > will break sysfs api md/array_state, do_md_stop will be called as well, > del_gendisk in this context will deadlock, because it will wait for all > sysfs writers, include itself, to be done. Thanks for pointing out this. Is it ok to remove the support of md_probe now? Now all the things that are difficult to handle is that array can be created again when opening the block device node. 78b6350dcaadb03b4a2970b16387227ba6744876 ("md: support disabling of create-on-open semantics.") tries to disable create-on-open. And the probe method will be removed too (451f0b6f4c44d7b649ae609157b114b71f6d7875 block: default BLOCK_LEGACY_AUTOLOAD to y). To make things easy, we can remove the support of md_probe. md_probe is used for system without udev environment. So it needs to create a device node by mknod and open the device node to create the array. I guess no system doesn't have udev environment, right? Regards Xiao > > Thanks, > Kuai > > > } > > md_new_event(); > > sysfs_notify_dirent_safe(mddev->sysfs_state); > > >