On Wed, 9 Jul 2025, Ilpo Järvinen wrote: > > I wonder if it shouldn't have to see some kind of actual link activity > > as a prereq to entering the quirk. > > How would you observe that "link activity"? Doesn't LBMS itself imply > "link activity" occurred? It does, although in this case it shouldn't have been set in the first place, because after reset the link never comes up (i.e. goes into the Link Active state) and only keeps flipping between training and not training, as indicated by the LT bit. FAOD with the affected link the LBMS bit doesn't ever retrigger once cleared while the link is in its broken state. Once the speed has been clamped and link retrained it goes up right away (i.e. into the Link Active state) and remains steady up, also once the speed has been unclamped. I made a test once and left the system up for half a year or so. The LBMS bit was set once, a couple of days after system reset. I cleared it by hand and it never retriggered for the rest of the experiment, so this single occasion must have been a glitch and not a link quality issue. During that half a year the system and the link in question were both used heavily in remote GNU toolchain verification over a network interface placed downstream the problematic link. Traffic included NFS and SSH. No issues ever triggered, so I must conclude the link training issue is specific to speed negotiation, likely at the protocol level, rather than at the physical layer. Last year I tried to make an alternative setup using a PCIe switch option card using the same ASMedia device. The card has turned out not to work at all (the switch reporting in the configurations space, but all the downstream switch permanently down) owing to the host leaving the Vaux line disconnected in the slot, which is a conforming configuration. I was told by the option card manufacturer this is an erratum in the ASMedia switch device and the workaround is to drive Vaux. I think this just tells what the quality of these devices is. Sigh. Anyway, I chose to rework the card and tracked down a suitable miniature SMD switch to mount onto the PCB so as to let me select whether to drive ASMedia device's Vaux input from the Vaux or a regular 3.3V slot position, but owing to other commitments I've never got to completing this effort, as it requires a couple of hours of precise manual work at the workshop. I'll get back to it sometime and report the results. > Any good suggestions how to realize that check more precisely to > differentiate if there was some link activity or not? The LT bit is an obvious candidate and also how I wrote a corresponding quirk in U-boot. A problem however is while in U-boot it's fine to poll the LT bit busy-looping for a second or so, it's absolutely not in Linux where we have the rest of the OS running. Sampling at random intervals isn't going to help as we could well miss the active state. FWIW it's all documented with the description of the quirk. > > One thing that honestly doesn't make any sense to me is the ID list in the > > quirk. If the link comes up after forcing to Gen1 then it would only restore > > TLS if the device is the ASMedia switch, but also ignoring what device is > > detected downstream. If we allow ASMedia to restore the speed for any downstream > > device when we only saw the initial issue with the Pericom switch then why > > do we exclude Intel Root Ports or AMD Root Ports or any other bridge from the > > list which did not have any issues reported. > > I think it's because the restore has been tested on that device > (whitelist). Correct, the idea has been to err on the side of caution. The ASMedia device seems to cope well with this unclamping, so it's been listed, and so should any other device that has been confirmed to work. Matching the downstream and the upstream device both at a time instead, once this quirk has triggered and succeeded, seems to make no sense: if the device downstream turns out affected, then it matches the behaviour observed, so it should be enough to have the upstream device checked. I did want to run it at full speed anyway. OTOH matching the downstream device likely makes sense if the quirk has been bypassed, such as when the link speed had been already clamped by the firmware. In this case we do not really know if the clamping has been triggered by this erratum or something else, so such a check would be justified. I don't think it's going to matter for the problems discussed though. Apologies for the irregular replies, lots on my head right now and I had to write this all down properly. Maciej