Re: [QUESTION] xfs, iomap: Handle writeback errors to prevent silent data corruption

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 3 Jun 2025 09:19:10 +1000

On Sun, Jun 01, 2025 at 10:38:07PM -0700, Christoph Hellwig wrote:
> On Thu, May 29, 2025 at 02:36:30PM +1000, Dave Chinner wrote:
> > In these situations writeback could fail for several attempts before
> > the storage timed out and came back online. Then the next write
> > retry would succeed, and everything would be good. Linux never gave
> > us a specific IO error for this case, so we just had to retry on EIO
> > and hope that the storage came back eventually.
> 
> Linux has had differenciated I/O error codes for quite a while.  But
> more importantly dm-multipath doesn't just return errors to the upper
> layer during failover, but is instead expected to queue the I/O up
> until it either has a working path or an internal timeout passed.
> 
> In other words, write errors in Linux are in general expected to be
> persistent, modulo explicit failfast requests like REQ_NOWAIT.

Say what? the blk_errors array defines multiple block layer errors
that are transient in nature - stuff like ENOSPC, ETIMEDOUT, EILSEQ,
ENOLINK, EBUSY - all indicate a transient, retryable error occurred
somewhere in the block/storage layers.

What is permanent about dm-thinp returning ENOSPC to a write
request? Once the pool has been GC'd to free up space or expanded,
the ENOSPC error goes away.

What is permanent about an IO failing with EILSEQ because a t10
checksum failed due to a random bit error detected between the HBA
and the storage device? Retry the IO, and it goes through just fine
without any failures.

These transient error types typically only need a write retry after
some time period to resolve, and that's what XFS does by default.
What makes these sorts of errors persistent in the linux block layer
and hence requiring an immediate filesystem shutdown and complete
denial of service to the storage?

I ask this seriously, because you are effectively saying the linux
storage stack now doesn't behave the same as the model we've been
using for decades. What has changed, and when did it change?

> Which also leaves me a bit puzzled what the XFS metadata retries are
> actually trying to solve, especially without even having a corresponding
> data I/O version.

It's always been for preventing immediate filesystem shutdown when
spurious transient IO errors occur below XFS. Data IO errors don't
cause filesystem shutdowns - errors get propagated to the
application - so there isn't a full system DOS potential for
incorrect classification of data IO errors...

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx