On 2025/08/23 10:54, Li Nan wrote: > > > 在 2025/8/18 1:27, Kenta Akagi 写道: >> A super_write IO failure with MD_FAILFAST must not cause the array >> to fail. >> >> Because a failfast bio may fail even when the rdev is not broken, >> so IO must be retried rather than failing the array when a metadata >> write with MD_FAILFAST fails on the last rdev. >> >> A metadata write with MD_FAILFAST is retried after failure as >> follows: >> >> 1. In super_written, MD_SB_NEED_REWRITE is set in sb_flags. >> >> 2. In md_super_wait, which is called by the function that >> executed md_super_write and waits for completion, >> -EAGAIN is returned because MD_SB_NEED_REWRITE is set. >> >> 3. The caller of md_super_wait (such as md_update_sb) >> receives a negative return value and then retries md_super_write. >> >> 4. The md_super_write function, which is called to perform >> the same metadata write, issues a write bio without MD_FAILFAST >> this time. >> >> When a write from super_written without MD_FAILFAST fails, >> the array may broken, and MD_BROKEN should be set. >> >> After commit 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10"), >> calling md_error on the last rdev in RAID1/10 always sets >> the MD_BROKEN flag on the array. >> As a result, when failfast IO fails on the last rdev, the array >> immediately becomes failed. >> >> This commit prevents MD_BROKEN from being set when a super_write with >> MD_FAILFAST fails on the last rdev, ensuring that the array does >> not become failed due to failfast IO failures. >> >> Failfast IO failures on any rdev except the last one are not retried >> and are marked as Faulty immediately. This minimizes array IO latency >> when an rdev fails. >> >> Fixes: 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10") >> Signed-off-by: Kenta Akagi <k@xxxxxxx> > > > [...] > >> --- a/drivers/md/raid1.c >> +++ b/drivers/md/raid1.c >> @@ -1746,8 +1746,12 @@ static void raid1_status(struct seq_file *seq, struct mddev *mddev) >> * - recovery is interrupted. >> * - &mddev->degraded is bumped. >> * >> - * @rdev is marked as &Faulty excluding case when array is failed and >> - * &mddev->fail_last_dev is off. >> + * If @rdev is marked with &FailfastIOFailure, it means that super_write >> + * failed in failfast and will be retried, so the @mddev did not fail. >> + * >> + * @rdev is marked as &Faulty excluding any cases: >> + * - when @mddev is failed and &mddev->fail_last_dev is off >> + * - when @rdev is last device and &FailfastIOFailure flag is set >> */ >> static void raid1_error(struct mddev *mddev, struct md_rdev *rdev) >> { >> @@ -1758,6 +1762,10 @@ static void raid1_error(struct mddev *mddev, struct md_rdev *rdev) >> if (test_bit(In_sync, &rdev->flags) && >> (conf->raid_disks - mddev->degraded) == 1) { >> + if (test_bit(FailfastIOFailure, &rdev->flags)) { >> + spin_unlock_irqrestore(&conf->device_lock, flags); >> + return; >> + } >> set_bit(MD_BROKEN, &mddev->flags); >> if (!mddev->fail_last_dev) { > > At this point, users who try to fail this rdev will get a successful return > without Faulty flag. Should we consider it? Hi, Sorry for the late reply. I will submit a v3 patch that sets the flag just before md_error and modifies raid{1,10}_error to use test_and_clear_bit. Therefore, echo "faulty" to dev-*/state should always correctly mark the device as faulty. Thanks, Akagi >