Changes from V2: - Fix to prevent the array from being marked broken for all Failfast IOs, not just metadata. - Reflecting the review, update raid{1,10}_error to clear FailfastIOFailure so that devices are properly marked Faulty. Changes from V1: - Avoid setting MD_BROKEN instead of clearing it - Add pr_crit() when setting MD_BROKEN - Fix the message may shown after all rdevs failure: "Operation continuing on 0 devices" v2: https://lore.kernel.org/linux-raid/20250817172710.4892-1-k@xxxxxxx/ v1: https://lore.kernel.org/linux-raid/20250812090119.153697-1-k@xxxxxxx/ A failfast bio, for example in the case of nvme-tcp, bio will fail immediately if the connection to the target is briefly lost and the device enters a reconnecting state - even though it would recover given few seconds. This behavior is by design in failfast. However, md treats Failfast IO failures as fatal, potentially marking the array as MD_BROKEN when a connection is lost. For example, if an initiator - that is, a machine loading the md module - loses all connections briefly, the array is marked as MD_BROKEN, preventing subsequent writes. This is the issue I am currently facing, and which this patch aims to fix. The 1st patch changes the behavior on MD_FAILFAST IO failures on the last rdev. The 2nd and 3rd patches modify the pr_crit messages. Kenta Akagi (3): md/raid1,raid10: Do not set MD_BROKEN on failfast io failure md/raid1,raid10: Add error message when setting MD_BROKEN md/raid1,raid10: Fix: Operation continuing on 0 devices. drivers/md/md.c | 14 +++++++++----- drivers/md/md.h | 13 +++++++------ drivers/md/raid1.c | 32 ++++++++++++++++++++++++++------ drivers/md/raid10.c | 35 ++++++++++++++++++++++++++++------- 4 files changed, 70 insertions(+), 24 deletions(-) -- 2.50.1