Re: [RFC PATCH 1/1] md: delete gendisk in ioctl path

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 22, 2025 at 4:31 PM Xiao Ni <xni@xxxxxxxxxx> wrote:
>
> On Tue, Apr 22, 2025 at 1:51 PM Christoph Hellwig <hch@xxxxxx> wrote:
> >
> > On Tue, Apr 22, 2025 at 11:24:03AM +0800, Xiao Ni wrote:
> > > Now del_gendisk and put_disk are called asynchronously in workqueue work.
> > > del_gendisk deletes device node by devtmpfs. devtmpfs tries to open this
> > > array again and it flush the workqueue at the bigging of open process. So
> > > a deadlock happens.
> > >
> > > The asynchronous way also has a problem that the device node can still
> > > exist after mdadm --stop command returns in a short window. So udev rule
> > > can open this device node and create the struct mddev in kernel again.
> > >
> > > So put del_gendisk in ioctl path and still leave put_disk in
> > > md_kobj_release to avoid uaf.
> >
> > The md lifetime rules are complicated enough as-is.  So while I won't
> > object to this change per-see I'd rather have it reviewed by the md
> > maintainers independently.
> >
> > In the meantime this should ensure devtmpfs doesn't call into
> > blkdev_get_no_open and thus put_disk:
> >
> > diff --git a/block/bdev.c b/block/bdev.c
> > index 6a34179192c9..97d4c0ab1670 100644
> > --- a/block/bdev.c
> > +++ b/block/bdev.c
> > @@ -1274,18 +1274,23 @@ void sync_bdevs(bool wait)
> >   */
> >  void bdev_statx(const struct path *path, struct kstat *stat, u32 request_mask)
> >  {
> > -       struct inode *backing_inode;
> >         struct block_device *bdev;
> >
> > -       backing_inode = d_backing_inode(path->dentry);
> > -
> >         /*
> > -        * Note that backing_inode is the inode of a block device node file,
> > -        * not the block device's internal inode.  Therefore it is *not* valid
> > -        * to use I_BDEV() here; the block device has to be looked up by i_rdev
> > -        * instead.
> > +        * Note that d_backing_inode() returnsthe inode of a block device node
> > +        * file, not the block device's internal inode.
> > +        *
> > +        * Therefore it is *not* valid to use I_BDEV() here; the block device
> > +        * has to be looked up by i_rdev instead.
> > +        *
> > +        * Only do this lookup if actually needed to avoid the performance
> > +        * overhead of the lookup, and to avoid injecting bdev lifetime issues
> > +        * into devtmpfs.
> >          */
> > -       bdev = blkdev_get_no_open(backing_inode->i_rdev);
> > +       if (!(request_mask & (STATX_DIOALIGN | STATX_WRITE_ATOMIC)))
> > +               return;
> > +
> > +       bdev = blkdev_get_no_open(d_backing_inode(path->dentry)->i_rdev);
> >         if (!bdev)
> >                 return;
> >
> >
>
> Hi Christoph
>
> Thanks for the point. I tried this patch and it's stuck on the device
> mapper device during shutdown.

Sorry, please ignore the following error. I made a mistake when
building the kernel.

Regards
Xiao
>
> [  848.520483] systemd-shutdow[-- MARK -- Tue Apr 22 08:20:00 2025]
> [  989.175210] INFO: task systemd-shutdow:1 blocked for more than 122 seconds.
> [  989.175667]       Not tainted 6.15.0-rc3+ #3
> [  989.176327] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  989.176728] task:systemd-shutdow state:D stack:0     pid:1
> tgid:1     ppid:0      task_flags:0x480100 flags:0x00004002
> [  989.177287] Call Trace:
> [  989.177429]  <TASK>
> [  989.177910]  __schedule+0x2c7/0x690
> [  989.178501]  schedule+0x23/0x80
> [  989.179265]  schedule_timeout+0xe8/0x100
> [  989.179488]  ? sched_clock_cpu+0xb/0x190
> [  989.179703]  ? __smp_call_single_queue+0xac/0x120
> [  989.180436]  ? ttwu_queue_wakelist+0xd0/0xf0
> [  989.181065]  __wait_for_common+0x99/0x1c0
> [  989.181287]  ? __pfx_schedule_timeout+0x10/0x10
> [  989.181935]  devtmpfs_submit_req+0x6e/0x90
> [  989.182154]  devtmpfs_delete_node+0x60/0x90
> [  989.182374]  ? kernfs_remove+0x5e/0x70
> [  989.182579]  ? sysfs_remove_group+0x46/0x90
> [  989.182787]  device_del+0x315/0x3d0
> [  989.183389]  ? mutex_lock+0xe/0x30
> [  989.183963]  del_gendisk+0x216/0x320
> [  989.184180]  cleanup_mapped_device+0x13f/0x150 [dm_mod]
> [  989.184502]  __dm_destroy+0x12e/0x1f0 [dm_mod]
> [  989.185162]  dev_remove+0x119/0x1a0 [dm_mod]
> [  989.185830]  ctl_ioctl+0x21c/0x370 [dm_mod]
> [  989.186090]  dm_ctl_ioctl+0xa/0x20 [dm_mod]
> [  989.186325]  __x64_sys_ioctl+0x92/0xc0
> [  989.186533]  do_syscall_64+0x79/0x160
> [  989.186738]  ? blkg_alloc+0x10f/0x1c0
> [  989.186976]  ? __memcg_slab_free_hook+0xf2/0x140
> [  989.187616]  ? __x64_sys_close+0x39/0x80
> [  989.187836]  ? kmem_cache_free+0x33c/0x450
> [  989.188057]  ? syscall_exit_to_user_mode+0xc/0x1d0
> [  989.188696]  ? do_syscall_64+0x85/0x160
> [  989.188904]  ? __pfx_submit_bio_wait_endio+0x10/0x10
> [  989.189222]  ? blkdev_fsync+0x41/0x60
> [  989.189468]  ? syscall_exit_to_user_mode+0xc/0x1d0
> [  989.190096]  ? do_syscall_64+0x85/0x160
> [  989.190316]  ? do_sys_openat2+0x8c/0xd0
> [  989.190524]  ? syscall_exit_to_user_mode+0xc/0x1d0
> [  989.191160]  ? do_syscall_64+0x85/0x160
> [  989.191374]  ? syscall_exit_to_user_mode+0xc/0x1d0
> [  989.192001]  ? do_syscall_64+0x85/0x160
> [  989.192226]  ? do_syscall_64+0x85/0x160
> [  989.192434]  ? do_syscall_64+0x85/0x160
> [  989.192641]  ? syscall_exit_to_user_mode+0xc/0x1d0
> [  989.193287]  ? do_syscall_64+0x85/0x160
> [  989.193509]  ? do_syscall_64+0x85/0x160
> [  989.193719]  ? exc_page_fault+0x64/0x140
> [  989.2: 00000246 ORIG_RAX: 0000000000000010
> [  989.294592] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007fef13903b0b
> [  989.295388] RDX: 00007ffef60fbe50 RSI: 00000000c138fd04 RDI: 0000000000000005
> [  989.296153] RBP: 00007ffef60fbfc0 R08: 0000000000000000 R09: 00007ffef60fb190
> [  989.296936] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000005
> [  989.297708] R13: 0000000000000000 R14: 00007ffef60fbe10 R15: 0000000000000000
> [  989.298512]  </TASK>
> [  989.298700] INFO: task kdevtmpfs:506 blocked for more than 123 seconds.
> [  989.299139]       Not tainted 6.15.0-rc3+ #3
> [  989.299795] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [  989.300279] task:kdevtmpfs       state:D stack:0     pid:506
> tgid:506   ppid:2      task_flags:0x208140 flags:0x00004000
> [  989.300837] Call Trace:
> [  989.300994]  <TASK>
> [  68/0x290
> [  989.701395]  ? __update_idle_core+0x23/0xb0
> [  989.701611]  devtmpfs_work_loop+0x10d/0x2a0
> [  989.701979]  ? __pfx_devtmpfsd+0x10/0x10
> [  989.700x10/0x10
> [  989.802290]  ret_from_fork+0x30/0x50
> [  989.802498]  ? __pfx_kthread+0x10/0x10
> [  989.802694]  ret_from_fork_asm+0x1a/0x30
> [  989.802926]  </TASK>
> [  989.803103] INFO: task kworker/38:1:586 blocked for more than 123 seconds.
> [  989.803453]       Not tainted 6.15.0-rc3+ #3
>
>
> There is another reason that I want to put del_gendisk in the ioctl
> path. Because sometimes device node can still exist after command
> mdadm --stop. del_gendisk removes inode first and then removes device
> node (e.g. /dev/md0). So there is a small window that device node can
> be open again. Then some strange things happen. Sometimes the array is
> created but device node can't be created (I guess it's removed by
> devtempfs?). Sometimes the kernel message prints "block device
> autoloading is deprecated and will be removed".
>
> If your patch can work well, md array can call
> flush_workqueue(md_misc_wq) in do_md_stop to avoid the deadlock that
> del_gendisk needs to wait all sysfs calls to return.
>
>
> Regards
> Xiao
>
> Xiao






[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux