On Thu, May 29, 2025 at 02:36:30PM +1000, Dave Chinner wrote: > In these situations writeback could fail for several attempts before > the storage timed out and came back online. Then the next write > retry would succeed, and everything would be good. Linux never gave > us a specific IO error for this case, so we just had to retry on EIO > and hope that the storage came back eventually. Linux has had differenciated I/O error codes for quite a while. But more importantly dm-multipath doesn't just return errors to the upper layer during failover, but is instead expected to queue the I/O up until it either has a working path or an internal timeout passed. In other words, write errors in Linux are in general expected to be persistent, modulo explicit failfast requests like REQ_NOWAIT. Which also leaves me a bit puzzled what the XFS metadata retries are actually trying to solve, especially without even having a corresponding data I/O version.