On Tue, Jun 03, 2025 at 09:19:10AM +1000, Dave Chinner wrote: > > In other words, write errors in Linux are in general expected to be > > persistent, modulo explicit failfast requests like REQ_NOWAIT. > > Say what? the blk_errors array defines multiple block layer errors > that are transient in nature - stuff like ENOSPC, ETIMEDOUT, EILSEQ, > ENOLINK, EBUSY - all indicate a transient, retryable error occurred > somewhere in the block/storage layers. Let's use the block layer codes reported all the way up to the file systems and their descriptions instead of the errnos they are mapped to for compatibility. The above would be in order: [BLK_STS_NOSPC] = { -ENOSPC, "critical space allocation" }, [BLK_STS_TIMEOUT] = { -ETIMEDOUT, "timeout" }, [BLK_STS_PROTECTION] = { -EILSEQ, "protection" }, [BLK_STS_TRANSPORT] = { -ENOLINK, "recoverable transport" }, [BLK_STS_DEV_RESOURCE] = { -EBUSY, "device resource" }, > What is permanent about dm-thinp returning ENOSPC to a write > request? Once the pool has been GC'd to free up space or expanded, > the ENOSPC error goes away. Everything. ENOSPC means there is no space. There might be space in the non-determinant future, but if the layer just needs to GC it must not report the error. u > What is permanent about an IO failing with EILSEQ because a t10 > checksum failed due to a random bit error detected between the HBA > and the storage device? Retry the IO, and it goes through just fine > without any failures. Normally it means your checksum was wrong. If you have bit errors in the cable they will show up again, maybe not on the next I/O but soon. > These transient error types typically only need a write retry after > some time period to resolve, and that's what XFS does by default. > What makes these sorts of errors persistent in the linux block layer > and hence requiring an immediate filesystem shutdown and complete > denial of service to the storage? > > I ask this seriously, because you are effectively saying the linux > storage stack now doesn't behave the same as the model we've been > using for decades. What has changed, and when did it change? Hey, you can retry. You're unlikely to improve the situation though but instead just keep deferring the inevitable shutdown. > > Which also leaves me a bit puzzled what the XFS metadata retries are > > actually trying to solve, especially without even having a corresponding > > data I/O version. > > It's always been for preventing immediate filesystem shutdown when > spurious transient IO errors occur below XFS. Data IO errors don't > cause filesystem shutdowns - errors get propagated to the > application - so there isn't a full system DOS potential for > incorrect classification of data IO errors... Except as we see in this thread for a fairly common use case (buffered I/O without fsync) they don't. And I agree with you that this is not how you write applications that care about data integrity - but the entire reset of the system and just about every common utility is written that way. And even applications that fsync won't see you fancy error code. The only thing stored in the address_space for fsync to catch is EIO and ENOSPC.