Hi, On 10/04/2025 10:15, Yu Kuai wrote: > Hi, > > 在 2025/04/08 22:38, Meir Elisha 写道: >> During recovery/check operations, the process_checks function loops >> through available disks to find a 'primary' source with successfully >> read data. >> >> If no suitable source disk is found after checking all possibilities, >> the 'primary' index will reach conf->raid_disks * 2. Add an explicit >> check for this condition after the loop. If no source disk was found, >> print an error message and return early to prevent further processing >> without a valid primary source. >> >> Signed-off-by: Meir Elisha <meir.elisha@xxxxxxxxxxx> >> --- > > Just to be sure, do you tested this version? > > Thanks, > Kuai I wasn't able to reproduce the crash when using this patch. >> Changes from v1: >> - Don't fix read errors on recovery/check >> >> This was observed when forcefully disconnecting all iSCSI devices backing >> a RAID1 array(using --failfast flag) during a check operation, causing >> all reads within process_checks to fail. The resulting kernel oops shows: >> >> BUG: kernel NULL pointer dereference, address: 0000000000000040 >> RIP: 0010:process_checks+0x25e/0x5e0 [raid1] >> Code: ... <4c> 8b 53 40 ... // mov r10,[rbx+0x40] >> Call Trace: >> process_checks >> sync_request_write >> raid1d >> md_thread >> kthread >> ret_from_fork >> >> drivers/md/raid1.c | 26 ++++++++++++++++---------- >> 1 file changed, 16 insertions(+), 10 deletions(-) >> >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c >> index 0efc03cea24e..de9bccbe7337 100644 >> --- a/drivers/md/raid1.c >> +++ b/drivers/md/raid1.c >> @@ -2200,14 +2200,9 @@ static int fix_sync_read_error(struct r1bio *r1_bio) >> if (!rdev_set_badblocks(rdev, sect, s, 0)) >> abort = 1; >> } >> - if (abort) { >> - conf->recovery_disabled = >> - mddev->recovery_disabled; >> - set_bit(MD_RECOVERY_INTR, &mddev->recovery); >> - md_done_sync(mddev, r1_bio->sectors, 0); >> - put_buf(r1_bio); >> + if (abort) >> return 0; >> - } >> + >> /* Try next page */ >> sectors -= s; >> sect += s; >> @@ -2346,10 +2341,21 @@ static void sync_request_write(struct mddev *mddev, struct r1bio *r1_bio) >> int disks = conf->raid_disks * 2; >> struct bio *wbio; >> - if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) >> - /* ouch - failed to read all of that. */ >> - if (!fix_sync_read_error(r1_bio)) >> + if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) { >> + /* >> + * ouch - failed to read all of that. >> + * No need to fix read error for check/repair >> + * because all member disks are read. >> + */ >> + if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) || >> + !fix_sync_read_error(r1_bio)) { >> + conf->recovery_disabled = mddev->recovery_disabled; >> + set_bit(MD_RECOVERY_INTR, &mddev->recovery); >> + md_done_sync(mddev, r1_bio->sectors, 0); >> + put_buf(r1_bio); >> return; >> + } >> + } >> if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) >> process_checks(r1_bio); >> >