Re: [RFC PATCH 1/1] md: delete gendisk in ioctl path

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 22, 2025 at 2:21 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> 在 2025/04/22 11:24, Xiao Ni 写道:
> > commit 777d0961ff95b ("fs: move the bdex_statx call to vfs_getattr_nosec")
> > introduces a regression problem, like this:
> > [ 1479.474440] INFO: task kdevtmpfs:506 blocked for more than 491 seconds.
> > [ 1479.478569]  __wait_for_common+0x99/0x1c0
> > [ 1479.478823]  ? __pfx_schedule_timeout+0x10/0x10
> > [ 1479.479466]  __flush_workqueue+0x138/0x400
> > [ 1479.479684]  md_alloc+0x21/0x370 [md_mod]
> > [ 1479.479926]  md_probe+0x24/0x50 [md_mod]
> > [ 1479.480150]  blk_probe_dev+0x62/0x90
> > [ 1479.480368]  blk_request_module+0xf/0x70
> > [ 1479.480604]  blkdev_get_no_open+0x5e/0xa0
> > [ 1479.480809]  bdev_statx+0x1f/0xf0
> > [ 1479.481364]  vfs_getattr_nosec+0x10a/0x120
> > [ 1479.481602]  handle_remove+0x68/0x290
> > [ 1479.481812]  ? __update_idle_core+0x23/0xb0
> > [ 1479.482023]  devtmpfs_work_loop+0x10d/0x2a0
> > [ 1479.482231]  ? __pfx_devtmpfsd+0x10/0x10
> > [ 1479.482464]  devtmpfsd+0x2f/0x30
> >
> > [ 1479.485397] INFO: task kworker/33:1:532 blocked for more than 491 seconds.
> > [ 1479.535380]  __wait_for_common+0x99/0x1c0
> > [ 1479.566876]  ? __pfx_schedule_timeout+0x10/0x10
> > [ 1479.591156]  devtmpfs_submit_req+0x6e/0x90
> > [ 1479.591389]  devtmpfs_delete_node+0x60/0x90
> > [ 1479.591631]  ? process_sysctl_arg+0x270/0x2f0
> > [ 1479.592256]  device_del+0x315/0x3d0
> > [ 1479.592839]  ? mutex_lock+0xe/0x30
> > [ 1479.593455]  del_gendisk+0x216/0x320
> > [ 1479.593672]  md_kobj_release+0x34/0x40 [md_mod]
> > [ 1479.594317]  kobject_cleanup+0x3a/0x130
> > [ 1479.594562]  process_one_work+0x19d/0x3d0
> >
> > Now del_gendisk and put_disk are called asynchronously in workqueue work.
> > del_gendisk deletes device node by devtmpfs. devtmpfs tries to open this
> > array again and it flush the workqueue at the bigging of open process. So
> > a deadlock happens.
> >
> > The asynchronous way also has a problem that the device node can still
> > exist after mdadm --stop command returns in a short window. So udev rule
> > can open this device node and create the struct mddev in kernel again.
> >
> > So put del_gendisk in ioctl path and still leave put_disk in
> > md_kobj_release to avoid uaf.
> >
> > Fixes: 777d0961ff95 ("fs: move the bdex_statx call to vfs_getattr_nosec")
> > Signed-off-by: Xiao Ni <xni@xxxxxxxxxx>
> > ---
> >   drivers/md/md.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index 9daa78c5fe33..c3f793296ccc 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -5746,7 +5746,6 @@ static void md_kobj_release(struct kobject *ko)
> >       if (mddev->sysfs_level)
> >               sysfs_put(mddev->sysfs_level);
> >
> > -     del_gendisk(mddev->gendisk);
> >       put_disk(mddev->gendisk);
> >   }
> >
> > @@ -6597,6 +6596,8 @@ static int do_md_stop(struct mddev *mddev, int mode)
> >               md_clean(mddev);
> >               if (mddev->hold_active == UNTIL_STOP)
> >                       mddev->hold_active = 0;
> > +
> > +             del_gendisk(mddev->gendisk);
>
> I'm still trying to understand the problem here, however, this change
> will break sysfs api md/array_state, do_md_stop will be called as well,
> del_gendisk in this context will deadlock, because it will wait for all
> sysfs writers, include itself, to be done.

Thanks for pointing out this.

Is it ok to remove the support of md_probe now? Now all the things
that are difficult to handle is that array can be created again when
opening the block device node.

78b6350dcaadb03b4a2970b16387227ba6744876 ("md: support disabling of
create-on-open semantics.") tries to disable create-on-open. And the
probe method will be removed too
(451f0b6f4c44d7b649ae609157b114b71f6d7875 block: default
BLOCK_LEGACY_AUTOLOAD to y). To make things easy, we can remove the
support of md_probe. md_probe is used for system without udev
environment. So it needs to create a device node by mknod and open the
device node to create the array.  I guess no system doesn't have udev
environment, right?

Regards
Xiao
>
> Thanks,
> Kuai
>
> >       }
> >       md_new_event();
> >       sysfs_notify_dirent_safe(mddev->sysfs_state);
> >
>






[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux