Re: md regression caused by commit 9e59d609763f70a992a8f3808dabcce60f14eb5c

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 7, 2025 at 10:18 PM Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote:
>
>
>
> On Thu, 7 Aug 2025, Luca Boccassi wrote:
>
> > On Thu, 7 Aug 2025 at 01:04, Xiao Ni <xni@xxxxxxxxxx> wrote:
> > >
> > > Hi all
> > >
> > > It needs to use the latest upstream mdadm
> > > https://github.com/md-raid-utilities/mdadm/ which has fixed this
> > > problem. And for fedora, it hasn't updated to the latest upstream. So
> > > it has this problem. I'll update fedora mdadm to latest upstream.
> > >
> > > Best Regards
> > > Xiao
> >
> > Thank you for looking into it and providing a solution - however,
> > isn't it against the rules to break existing released userspace
> > components and requiring new versions to be released in order to use a
> > new kernel version? Is there any way this kernel patch could be
> > amended to avoid breaking the existing userspace as it is?
> >
> > Thanks
>
> I also think that the misbehavior should be fixed in the kernel.
>
> We shouldn't use arbitrary timeouts to clean up the sysfs entries, because
> it would introduce race conditions.
>
> What about destroying the sysfs entries when the file descriptor is
> closed? (instead of on the STOP_ARRAY ioctl) That wouldn't interfere with
> other code trying to stop the array and it would make it work with the
> buggy mdadm that calls STOP_ARRAY and then tries to find the sysfs entries
> and then calls SET_ARRAY_INFO.
>
> Mikulas
>

Hi all

The assemble process is:
1. create array
2. stop it (STOP_ARRAY). Before the kernel change, del_gendisk is
called at the last release of mddev rather than in STOP_ARRAY ioctl
3. access /sys/block/md0/md

The kernel change tries to call del_gendisk in STOP_ARRAY. So /dev/md0
can be removed and no one can access it. If not, the array can be
created again because md supports create on open.

After the kernel change, the assemble process is:
1. create array
2. stop it (del_gendisk runs and /sys/block/md0 is removed)
3. acces /sys/block/md0/xx (it fails)

So del_gendisk destroys sysfs entries. If we destroy sysfs entries at
the last release of mddev, it will return to the old state that
/dev/md0 can be opened after stop. I don't want to return back.
Because some customers encounter bugs that shutdown is stuck because
/dev/md0 can't be stopped and the regression test usually fails
because of this too.

I know it's not good to break mdadm by a kernel change. But sometimes
it needs userspace tool and kernel work together to fix a problem,
right?
Sorry for bringing the problem, and thanks for the suggestions. Any
more good suggestions?

Best Regards
Xiao






[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux