Re: [QUESTION] xfs, iomap: Handle writeback errors to prevent silent data corruption

Christoph Hellwig <hch@xxxxxxxxxxxxx> · Mon, 2 Jun 2025 21:50:04 -0700

On Tue, Jun 03, 2025 at 09:19:10AM +1000, Dave Chinner wrote:
> > In other words, write errors in Linux are in general expected to be
> > persistent, modulo explicit failfast requests like REQ_NOWAIT.
> 
> Say what? the blk_errors array defines multiple block layer errors
> that are transient in nature - stuff like ENOSPC, ETIMEDOUT, EILSEQ,
> ENOLINK, EBUSY - all indicate a transient, retryable error occurred
> somewhere in the block/storage layers.

Let's use the block layer codes reported all the way up to the file
systems and their descriptions instead of the errnos they are
mapped to for compatibility.  The above would be in order:

[BLK_STS_NOSPC]         = { -ENOSPC,    "critical space allocation" },
[BLK_STS_TIMEOUT]       = { -ETIMEDOUT, "timeout" },
[BLK_STS_PROTECTION]    = { -EILSEQ,    "protection" },
[BLK_STS_TRANSPORT]     = { -ENOLINK,   "recoverable transport" },
[BLK_STS_DEV_RESOURCE]  = { -EBUSY,     "device resource" },

> What is permanent about dm-thinp returning ENOSPC to a write
> request? Once the pool has been GC'd to free up space or expanded,
> the ENOSPC error goes away.

Everything.  ENOSPC means there is no space.  There might be space in
the non-determinant future, but if the layer just needs to GC it must
not report the error.

u

> What is permanent about an IO failing with EILSEQ because a t10
> checksum failed due to a random bit error detected between the HBA
> and the storage device? Retry the IO, and it goes through just fine
> without any failures.

Normally it means your checksum was wrong.  If you have bit errors
in the cable they will show up again, maybe not on the next I/O
but soon.

> These transient error types typically only need a write retry after
> some time period to resolve, and that's what XFS does by default.
> What makes these sorts of errors persistent in the linux block layer
> and hence requiring an immediate filesystem shutdown and complete
> denial of service to the storage?
> 
> I ask this seriously, because you are effectively saying the linux
> storage stack now doesn't behave the same as the model we've been
> using for decades. What has changed, and when did it change?

Hey, you can retry.  You're unlikely to improve the situation though
but instead just keep deferring the inevitable shutdown.

> > Which also leaves me a bit puzzled what the XFS metadata retries are
> > actually trying to solve, especially without even having a corresponding
> > data I/O version.
> 
> It's always been for preventing immediate filesystem shutdown when
> spurious transient IO errors occur below XFS. Data IO errors don't
> cause filesystem shutdowns - errors get propagated to the
> application - so there isn't a full system DOS potential for
> incorrect classification of data IO errors...

Except as we see in this thread for a fairly common use case (buffered
I/O without fsync) they don't.  And I agree with you that this is not
how you write applications that care about data integrity - but the
entire reset of the system and just about every common utility is
written that way.

And even applications that fsync won't see you fancy error code.  The
only thing stored in the address_space for fsync to catch is EIO and
ENOSPC.