[PATCH] mdadm: Fix IMSM Raid assembly after disk link failure and reboot

Richard Li <tianqi.li@xxxxxxxxxx> · Thu, 1 May 2025 07:00:24 +0000

This patch addresses a scenario observed in production where disk links
go down. After a system reboot, depending on which disk becomes available
first, the IMSM RAID array may either fully assemble or come up with
missing disks.

Below is an example of the production case simulating disk link failures
and subsequent system reboot.

(note: "echo "1" | sudo tee /sys/class/scsi_device/x:x:x:x/device/delete"
is used here to fail/unplug/disconnect disks)

Raid Configuration: IMSM Raid1 with two disks

- When sda is unplugged first, then sdb, and after reboot sdb is
reconnected first followed by sda, the container (/dev/md127) and
subarrays (/dev/md125, /dev/md126) correctly assemble and become active.
- However, when sda is reconnected first, then sdb, the subarrays fail to
fully reconstruct — sda remains missing from the assembled subarrays,
due to stale metadata.

Above behaviors are influenced by udev event handling:

- When a disk disconnects, the rule ACTION=="remove", ENV{ID_PATH}=="?*",
RUN+="/usr/sbin/mdadm -If $devnode --path $env{ID_PATH}" is triggered to
inform mdadm of the removal.
- When a disk reconnects (i.e., ACTION!="remove"), the rule
IMPORT{program}="/usr/sbin/mdadm --incremental --export $devnode
--offroot $env{DEVLINKS}" is triggered to incrementally assemble the
RAID arrays.

During array assembling, the array may not be fully assembled due to
disks with stale metadata.

This patch adds a udev-triggered script that detects this failure
and brings the missing disks back to the array. Basically, it
inspects the RAID configuration in /usr/sbin/mdadm --detail --scan --export,
identifies disks that belong to a container array but are missing from
their corresponding member (sub)arrays, and restores them by performing
a hot remove-and-re-add cycle.

The patch improves resilience by ensuring consistent array reconstruction
regardless of disk detection order. This aligns system behavior with
expected RAID redundancy and reduces risk of unnecessary manual recovery
steps after reboots in degraded hardware environments.

Signed-off-by: Richard Li <tianqi.li@xxxxxxxxxx>
---
 imsm_rescue.sh              | 148 ++++++++++++++++++++++++++++++++++++
 udev-md-raid-assembly.rules |   3 +
 2 files changed, 151 insertions(+)
 create mode 100644 imsm_rescue.sh

diff --git a/imsm_rescue.sh b/imsm_rescue.sh
new file mode 100644
index 00000000..7dcb0773
--- /dev/null
+++ b/imsm_rescue.sh
@@ -0,0 +1,148 @@
+#!/bin/sh
+# Check IMSM Raid array health and bring up failed/missing disk members
+
+mdadm_output=$(/usr/sbin/mdadm --detail --scan --export)
+export MDADM_INFO="$mdadm_output"
+
+lines=$(echo "$MDADM_INFO" | grep '^MD_')
+
+arrays=()
+array_indexes=()
+index=0
+current=()
+
+# Parse mdadm_output into arrays
+while IFS= read -r line; do
+    if [[ $line == MD_LEVEL=* ]]; then
+        if [[ ${#current[@]} -gt 0 ]]; then
+            arrays[index]="${current[*]}"
+            array_indexes+=($index)
+            current=()
+            index=$((index + 1))
+        fi
+    fi
+    current+=("$line")
+done <<< "$lines"
+
+if [[ ${#current[@]} -gt 0 ]]; then
+    arrays[index]="${current[*]}"
+    array_indexes+=($index)
+fi
+
+# Parse containers and map them to disks
+container_names=()
+container_disks=()
+
+for i in "${array_indexes[@]}"; do
+    IFS=' ' read -r -a props <<< "${arrays[$i]}"
+
+    level=""
+    devname=""
+    disks=""
+
+    for entry in "${props[@]}"; do
+        key="${entry%%=*}"
+        val="${entry#*=}"
+
+        case "$key" in
+            MD_LEVEL) level="$val" ;;
+            MD_DEVNAME) devname="$val" ;;
+            MD_DEVICE_dev*_DEV) disks+=" $val" ;;
+        esac
+    done
+
+    if [[ "$level" == "container" && -n "$devname" ]]; then
+        container_names+=("$devname")
+        container_disks+=("${disks# }")
+    fi
+done
+
+# Check and find missing disks of each container and their subarrays
+containers_with_missing_disks_in_subarray=()
+missing_disks_list=()
+
+for i in "${array_indexes[@]}"; do
+    IFS=' ' read -r -a props <<< "${arrays[$i]}"
+
+    level=""
+    container_path=""
+    devname=""
+    devices=""
+    present=()
+
+    for entry in "${props[@]}"; do
+        key="${entry%%=*}"
+        val="${entry#*=}"
+
+        case "$key" in
+            MD_LEVEL) level="$val" ;;
+            MD_DEVNAME) devname="$val" ;;
+            MD_DEVICES) devices="$val" ;;
+            MD_CONTAINER) container_path="$val" ;;
+            MD_DEVICE_dev*_DEV) present+=("$val") ;;
+        esac
+    done
+
+    if [[ "$level" == "container" || -z "$devices" ]]; then
+        continue
+    fi
+
+    present_count="${#present[@]}"
+    if (( present_count < devices )); then
+        container_name=$(basename "$container_path")
+        # if MD_CONTAINER is empty, then it's a regular raid
+        if [[ -z "$container_name" ]]; then
+            continue
+        fi
+
+        container_real=$(realpath "$container_path")
+
+        if [[ -z "$container_real" ]]; then
+            continue
+        fi
+        
+        # Find disks in container
+        container_idx=-1
+        for j in "${!container_names[@]}"; do
+            if [[ "${container_names[$j]}" == "$container_name" ]]; then
+                container_idx=$j
+                break
+            fi
+        done
+
+        if (( container_idx >= 0 )); then
+            container_disk_line="${container_disks[$container_idx]}"
+            container_missing=()
+
+            for dev in $container_disk_line; do
+                found=false
+                for pd in "${present[@]}"; do
+                    [[ "$pd" == "$dev" ]] && found=true && break
+                done
+                $found || container_missing+=("$dev")
+            done
+
+            if (( ${#container_missing[@]} > 0 )); then
+                containers_with_missing_disks_in_subarray+=("$container_real")
+                missing_disks_list+=("${container_missing[*]}")
+            fi
+        fi
+    fi
+done
+
+# Perform a hot remove-and-re-add cycle to bring missing disks back
+for idx in "${!containers_with_missing_disks_in_subarray[@]}"; do
+    container="${containers_with_missing_disks_in_subarray[$idx]}"
+    missing_disks="${missing_disks_list[$idx]}"
+
+    for dev in $missing_disks; do
+        id_path=$(udevadm info --query=property --name="$dev" | grep '^ID_PATH=' | cut -d= -f2)
+
+        if [[ -z "$id_path" ]]; then
+            continue
+        fi
+
+        /usr/sbin/mdadm -If "$dev" --path "$id_path"
+        /usr/sbin/mdadm --add --run --export "$container" "$dev"
+    done
+done
diff --git a/udev-md-raid-assembly.rules b/udev-md-raid-assembly.rules
index 4cd2c6f4..fc210437 100644
--- a/udev-md-raid-assembly.rules
+++ b/udev-md-raid-assembly.rules
@@ -41,6 +41,9 @@ ACTION=="change", KERNEL!="dm-*|md*", GOTO="md_inc_end"
 ACTION!="remove", IMPORT{program}="BINDIR/mdadm --incremental --export $devnode --offroot $env{DEVLINKS}"
 ACTION!="remove", ENV{MD_STARTED}=="*unsafe*", ENV{MD_FOREIGN}=="no", ENV{SYSTEMD_WANTS}+="mdadm-last-resort@$env{MD_DEVICE}.timer"
 
+# do a health check and try to bring up missing disk members
+ACTION=="add", RUN+="./imsm_rescue.sh"
+
 ACTION=="remove", ENV{ID_PATH}=="?*", RUN+="BINDIR/mdadm -If $devnode --path $env{ID_PATH}"
 ACTION=="remove", ENV{ID_PATH}!="?*", RUN+="BINDIR/mdadm -If $devnode"
 
-- 
2.43.5