[taking this private to discuss the mpt drivers] > Hmmm... DID_SOFT_ERROR... Normally, this is an immediate retry as this normally > is used to indicate that a command is a collateral abort due to an NCQ error, > and per ATA spec, that command should be retried. However, the *BAD* thing > about Broadcom HBAs using this is that it increments the command retry counter, > so if a command ends up being retried more than 5 times due to other commands > failing, the command runs out of retries and is failed like this. The command > retry counter should *not* be incremented for NCQ collateral aborts. I tried to > fix this, but it is impossible as we actually do not know if this is a > collateral abort or something else. The HBA events used to handle completion do > not allow differentiation. Waiting on Broadcom to do something about this (the > mpi3mr HBA driver has the same nasty issue). Maybe we should just change the mpt3 sas/mr drivers to use DID_SOFT_ERROR less? In fact there's not really a whole lot of DID_SOFT_ERROR users otherwise, and there's probably better status codes whatever they are doing can be translated to that do not increment the retry counter.