Re: [PATCH] md: ensure consistent action state in md_do_sync

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





在 2025/9/1 10:16, Li Nan 写道:


在 2025/8/30 17:51, Paul Menzel 写道:
Dear Nan,


Thank you for your patch.

Am 30.08.25 um 11:05 schrieb linan666@xxxxxxxxxxxxxxx:
From: Li Nan <linan122@xxxxxxxxxx>

The 'mddev->recovery' flags can change during md_do_sync(), leading to
inconsistencies. For example, starting with MD_RECOVERY_RECOVER and
ending with MD_RECOVERY_SYNC can cause incorrect offset updates.

Can you give a concrete example?


T1                    T2
md_do_sync
  action = ACTION_RECOVER
                     (write sysfs)
                     action_store
                      set MD_RECOVERY_SYNC
  [ do recovery ]
  update resync_offset

The corresponding code is:
```
         if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery) &&
             mddev->curr_resync > MD_RESYNC_ACTIVE) {
                if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) { ->SYNC is set, But what we do is recovery
                         if (test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
                                if (mddev->curr_resync >= mddev->resync_offset) {                                         pr_debug("md: checkpointing %s of %s.\n",
                                                  desc, mdname(mddev));
                                         if (test_bit(MD_RECOVERY_ERROR,
                                                 &mddev->recovery))
                                                 mddev->resync_offset =

mddev->curr_resync_completed;
                                         else
                                                 mddev->resync_offset =
                                                         mddev->curr_resync;
                                 }
```

To avoid this, use the 'action' determined at the beginning of the
function instead of repeatedly checking 'mddev->recovery'.

Do you have a reproducer?


I don't have a reproducer because reproducing it requires modifying the
kernel. The approximate steps are:

- Modify the kernel to add a delay before the above check.
- Trigger recovery by removing and adding disks.
- After recovery completes, write to the sysfs interface at the delay point
to set the sync flag.


Please ignore my previous reply — it was wrong. When MD_RECOVERY_RUNNING
is set, the recovery state should not be changed, so this is just a
cleanup. I will further improve the code about sync finish in v2.

--
Thanks,
Nan





[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux