On Tue, 2025-06-03 at 07:41 -0700, Christoph Hellwig wrote: > [taking this private to discuss the mpt drivers] > > > Hmmm... DID_SOFT_ERROR... Normally, this is an immediate retry as > > this normally is used to indicate that a command is a collateral > > abort due to an NCQ error, and per ATA spec, that command should be > > retried. However, the *BAD* thing about Broadcom HBAs using this is > > that it increments the command retry counter, so if a command ends > > up being retried more than 5 times due to other commands failing, > > the command runs out of retries and is failed like this. The > > command retry counter should *not* be incremented for NCQ > > collateral aborts. I tried to fix this, but it is impossible as we > > actually do not know if this is a collateral abort or something > > else. The HBA events used to handle completion do not allow > > differentiation. Waiting on Broadcom to do something about this > > (the mpi3mr HBA driver has the same nasty issue). > > Maybe we should just change the mpt3 sas/mr drivers to use > DID_SOFT_ERROR less? In fact there's not really a whole lot of > DID_SOFT_ERROR users otherwise, and there's probably better status > codes whatever they are doing can be translated to that do not > increment the retry counter. The status code that does that (retry without incrementing the counter) is DID_IMM_RETRY. The driver has to be a bit careful about using this because we can get into infinite retry loops. Regards, James