Re: [PATCH v7 08/19] scsi: detect support for command duration limits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/3/25 20:28, Friedrich Weber wrote:
>>> They provided controller information via `sas3ircu` and `storcli`:
>>>
>>> sas3ircu:
>>>
>>>   Controller type                         : SAS3008
>>>   BIOS version                            : 8.37.00.00
>>>   Firmware version                        : 16.00.16.00
>>
>> Is this the latest available FW for this HBA ? (see below)
> 
> It seems 16.00.16.00 is even newer than the latest version available on
> the Broadcom website, which is a bit strange -- I only found [1] there
> which has an older 16.00.14.00 (3008_FW_PH16.00.14.00.rar).

So this is an old/now EOL 9300 series HBA, right ? Or is this a 3008 controller
chip as part of the server motherboard (e.g. a supermicro HBA ?)
Looking at the Broadcom support page for legacy products, the latest FW version
seems to be 16.00.10.00.

>>> storcli:
>>>
>>> Firmware Package Build = 24.18.0-0021
>>> Firmware Version = 4.670.00-6500
>>> CPLD Version = 26515-00A
>>> Bios Version = 6.34.01.0_4.19.08.00_0x06160200
>>> HII Version = 03.23.06.00
>>> Ctrl-R Version = 5.18-0400
>>> Preboot CLI Version = 01.07-05:#%0000
>>> NVDATA Version = 3.1611.00-0005
>>> Boot Block Version = 3.07.00.00-0003
>>> Driver Name = megaraid_sas
>>> Driver Version = 07.727.03.00-rc1
>>
>> Unfortunately, I do not have any megaraid model so I cannot test/recreate. I
>> only have mpt3sas (9300, 9400 and 9500 series HBAs) and mpi3mr models (9600 HBA
>> series).
> 
> We just realized this is actually the firmware information for a
> different unrelated controller on the same host (a LSI MegaRAID SAS-3
> 3108 using the megaraid_sas driver). But the megaraid_sas one is not
> used in our tests, so please ignore the storcli output we provided.
> Sorry for the confusion.
> 
> The controller we're testing with is the SAS3008 I mentioned initially,
> with firmware version 16.00.16.00 as reported by sas3ircu above.

I do not have this FW... Not sure what the HBA itself is too. I only have some
Broadcom 9300-XX HBAs that have the 3008 controller.

> FWIW, the user reports they have also seen the same issue with a
> SAS3-9500-8e Tri-mode HBA.

This one had a FW update last month or so. So checking the latest is required.

>>> And the disk information from `smartctl --xall`
>>>
>>> 20T:
>>>
>>> === START OF INFORMATION SECTION ===
>>> Vendor:               WDC
>>> Product:              WUH722020BL5204

...

>>> Product:              WUH721818AL5204

I have these. I will try to check. But again, I seriously doubt this has
anything to do with the drives since these do not support CDL, nor do the HBAs
you listed. None of then support CDL so calling scsi_report_opcode() for
checking CDL, we should always see the HBA SAT return "CDL not supported".


>> I do not think that the drives are relevant for this issue. How the HBA react
>> to a command error from the drive resulting from the HBA command translation
>> likely is the issue.
> 
> I see, but it is certainly strange that 18T vs 20T drives do seem to
> make a difference (hotplug works with 18T and doesn't work with 20T).

Probably a timing difference since these drives are not the same generation.
They have different timing on scan.

>>> If you need any additional information, please let us know!
>>
>> Adding the Broadcom folks to this thread, since as suspected, this seems to be
>> an HBA issue. I strongly suspect that it relates to a recent very similar issue
>> I have seen with the mpi3mr driver and a 9600 Broadcom HBA: any hotplug of a
>> drive would completely crash the HBA and a full power cycle was needed to
>> recover. A simple reboot would not be sufficient. I think the latest HBA FW
>> version fixes that problem.
>>
>> Broadcom team,
>>
>> Any comment ?

Broadcom ? Would you care to comment ?

At this point, I have no idea what is going on. My hunch is that it is the HBA
SAT misbehaving. But that is only a hunch. To prove it, we would likely need a
bus trace and have Broadcom look at HBA logs (which can be extracted using
storecli). All of this likely means involving the technical support of the vendors.


-- 
Damien Le Moal
Western Digital Research




[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux