Re: Kernel mistakenly "starts" resync on fully-degraded, newly-created raid10 array

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/22/25 09:30, Wol wrote:
On 22/06/2025 02:39, Omari Stephens wrote:
I tried asking on Reddit, and ended up resolving the issue myself:
https://www.reddit.com/r/linuxquestions/comments/1lh9to0/ kernel_is_stuck_resyncing_a_4drive_raid10_array/

I run Debian SID, and am using kernel 6.12.32-amd64

#apt-cache policy linux-image-amd64
linux-image-amd64:
   Installed: 6.12.32-1
   Candidate: 6.12.32-1
   Version table:
  *** 6.12.32-1 500
         500 http://mirrors.kernel.org/debian unstable/main amd64 Packages          500 http://http.us.debian.org/debian unstable/main amd64 Packages
         100 /var/lib/dpkg/status

#uname -r
6.12.32-amd64

To summarize the issue and my diagnostic steps, I ran this command to create a new raid10 array:

|#mdadm --create md13 --name=media --level=10 --layout=f2 -n 4 /dev/ sdb1 missing /dev/sdf1 missing|

|At that point, /proc/mdstat showed the following, which makes no sense:|

Why doesn't it make any sense?

Don't forget a raid-10 in linux is NOT a two raid-0s in a raid-1, it's its own thing entirely.

Understood. I've been using Linux MD raid10 for over a decade. I've read through this (and other references) in depth:
https://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10

My question is this: Suppose you create a 4-drive array. 2 drives are missing. What data is there to synchronize? What should get copied where, or what should get recomputed and written where?

To my understanding, in that situation, each block in the array only appears in one place on the physical media, and there is no redundancy or parity for any block that could be out of sync.

When you read from the array, yes, you're going to get interleaved bits of whatever happened to be on the physical media to start with, but that's basically the same as reading directly from any new physical media -- it's not initialized until it's initialized, and until it is, you don't know what you're going to read.


md127 : active raid10 sdb1[2] sdc1[0]
       23382980608 blocks super 1.2 512K chunks 2 far-copies [4/2] [U_U_]
       [>....................]  resync =  0.0% (8594688/23382980608) finish=25176161501.3min speed=0K/sec
       bitmap: 175/175 pages [700KB], 65536KB chunk

With 2 drives present and 2 drives absent, the array can only start if the present drives are considered in sync.  The kernel spent most of a day in this state.  The "8594688" count increased very slowly over time, but after 24 hours, it was only up to 0.1%.  During that time, I had mounted the array and transfered 11TB of data onto it.

I can't see any mention of drive size, so your % complete is meaningless, but I would say my raid with about 12TB of disk takes a a couple of days to sort itself out ...

The rate of resync completion was ~0. The estimated time to completion was 17483445 _years_. Again, my hypothesis is that this is because the system was confused and wasn't actually doing anything meaningful. (Although the md127_resync process was sitting at 100% cpu usage the entire time; no clue what it was spending those cycles on)

Here's my disk layout, currently, after successfully adding the last two drives:
$lsblk /dev/sda /dev/sdb /dev/sdc /dev/sde
NAME              MAJ:MIN RM   SIZE RO TYPE   MOUNTPOINTS
sda                 8:0    0  10.9T  0 disk
└─sda1              8:1    0  10.9T  0 part
  └─md127           9:127  0  21.8T  0 raid10
    ├─media_crypt 253:0    0  21.8T  0 crypt  /mnt/home_media
    └─md127p1     259:0    0 492.2G  0 part
sdb                 8:16   0  10.9T  0 disk
└─sdb1              8:17   0  10.9T  0 part
  └─md127           9:127  0  21.8T  0 raid10
    ├─media_crypt 253:0    0  21.8T  0 crypt  /mnt/home_media
    └─md127p1     259:0    0 492.2G  0 part
sdc                 8:32   0  10.9T  0 disk
└─sdc1              8:33   0  10.9T  0 part
  └─md127           9:127  0  21.8T  0 raid10
    ├─media_crypt 253:0    0  21.8T  0 crypt  /mnt/home_media
    └─md127p1     259:0    0 492.2G  0 part
sde                 8:64   0  10.9T  0 disk
└─sde1              8:65   0  10.9T  0 part
  └─md127           9:127  0  21.8T  0 raid10
    ├─media_crypt 253:0    0  21.8T  0 crypt  /mnt/home_media
    └─md127p1     259:0    0 492.2G  0 part


Then when power-cycled, swapped SATA cables, and added the remaining drives, they were marked as spares and weren't added to the array (likely because the array was considered to be already resyncing):

I think you're right - if the array is already rebuilding, it can't start a new, different rebuild half way through the old one ...

#mdadm --detail /dev/md127
/dev/md127:
[...]
     Number   Major   Minor   RaidDevice State
        0       8       33        0      active sync   /dev/sdc1
        -       0        0        1      removed
        2       8       17        2      active sync   /dev/sdb1
        -       0        0        3      removed

        4       8        1        -      spare   /dev/sda1
        5       8       65        -      spare   /dev/sde1


I ended up resolving the issue by recreating the array with --assume- clean:

Bad idea !!! It's okay, especially with new drives and a new array, but it will leave the array in a random state. Not good ...

#mdadm --create md19 --name=media3 --assume-clean --readonly -- level=10 --layout=f2 -n 4 /dev/sdc1 missing /dev/sdb1 missing To optimalize recovery speed, it is recommended to enable write-indent bitmap, do you want to enable it now? [y/N]? y
mdadm: /dev/sdc1 appears to be part of a raid array:
        level=raid10 devices=4 ctime=Sun Jun 22 00:51:33 2025
mdadm: /dev/sdb1 appears to be part of a raid array:
        level=raid10 devices=4 ctime=Sun Jun 22 00:51:33 2025
Continue creating array [y/N]? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md/md19 started.

#cat /proc/mdstat
Personalities : [raid1] [raid10] [raid0] [raid6] [raid5] [raid4]
md127 : active (read-only) raid10 sdb1[2] sdc1[0]
       23382980608 blocks super 1.2 512K chunks 2 far-copies [4/2] [U_U_]
       bitmap: 175/175 pages [700KB], 65536KB chunk

At which point, I was able to add the new devices and have the array (start to) resync as expected:

Yup. Now the two-drive array is not resyncing, the new drives can be added and will resync.

#mdadm --manage /dev/md127 --add /dev/sda1 --add /dev/sde1
mdadm: added /dev/sda1
mdadm: added /dev/sde1

#cat /proc/mdstat
Personalities : [raid1] [raid10] [raid0] [raid6] [raid5] [raid4]
md127 : active raid10 sde1[5] sda1[4] sdc1[0] sdb1[2]
       23382980608 blocks super 1.2 512K chunks 2 far-copies [4/2] [U_U_]
       [>....................]  recovery =  0.0% (714112/11691490304) finish=1091.3min speed=178528K/sec
       bitmap: 0/175 pages [0KB], 65536KB chunk

#mdadm --detail /dev/md127
/dev/md127:
[...]
     Number   Major   Minor   RaidDevice State
        0       8       33        0      active sync   /dev/sdc1
        5       8       65        1      spare rebuilding   /dev/sde1
        2       8       17        2      active sync   /dev/sdb1
        4       8        1        3      spare rebuilding   /dev/sda1

--xsdg


Now you have an array where anything you have written will be okay (which I guess is what you care about), but the rest of the disk is uninitialised garbage that will instantly trigger a read fault if you try to read it.

Because these happen to be brand new (but pre-zeroed) hard disks, it happens that the initialized state is all zeroes. Where would the read fault come from, though? What is there to be out of sync? raid10 has no parity, and there's no redundancy across 2 disks in a 4-disk raid10.

#dd if=/dev/md127 bs=1 skip=20T count=1M | hd
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00100000
1048576+0 records in
1048576+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 1.37091 s, 765 kB/s

And to be clear, that's way past where the resync status is right now:
$cat /proc/mdstat
Personalities : [raid1] [raid10] [raid0] [raid6] [raid5] [raid4]
md127 : active raid10 sde1[5] sda1[4] sdc1[0] sdb1[2]
      23382980608 blocks super 1.2 512K chunks 2 far-copies [4/2] [U_U_]
[=============>.......] recovery = 69.8% (8170708928/11691490304) finish=415.6min speed=141166K/sec
      bitmap: 13/175 pages [52KB], 65536KB chunk

--xsdg


You need to set off a scrub, which will do those reads and get the array itself (not just your data) into a sane state.

https://archive.kernel.org/oldwiki/raid.wiki.kernel.org/

Ignore the obsolete content crap. Somebody clearly thinks that replacing USER documentation by double-dutch programmer documentation (aimed at a completely different audience) is a good idea ...

Cheers,
Wol





[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux