Changes from V1: - Avoid setting MD_BROKEN instead of clearing it - Add pr_crit() when setting MD_BROKEN - Fix the message may shown after all rdevs failure: "Operation continuing on 0 devices" A failfast bio, for example in the case of nvme-tcp, will fail immediately if the connection to the target is lost for a few seconds and the device enters a reconnecting state - even though it would recover if given a few seconds. This behavior is exactly as intended by the design of failfast. However, md treats super_write operations fails with failfast as fatal. For example, if an initiator - that is, a machine loading the md module - loses all connections for a few seconds, the array becomes broken and subsequent write is no longer possible. This is the issue I am currently facing, and which this patch aims to fix. The 1st patch changes the behavior on super_write MD_FAILFAST IO failures. The 2nd and 3rd patches modify the output of pr_crit. Kenta Akagi (3): md/raid1,raid10: don't broken array on failfast metadata write fails md/raid1,raid10: Add error message when setting MD_BROKEN md/raid1,raid10: Fix: Operation continuing on 0 devices. drivers/md/md.c | 9 ++++++--- drivers/md/md.h | 7 ++++--- drivers/md/raid1.c | 26 ++++++++++++++++++++------ drivers/md/raid10.c | 26 ++++++++++++++++++++------ 4 files changed, 50 insertions(+), 18 deletions(-) -- 2.50.1