Re: [PATCH v2 1/1] PCI/bwctrl: Replace lbms_count with PCI_LINK_LBMS_SEEN flag

Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> · Fri, 25 Apr 2025 15:24:45 +0300 (EEST)

On Fri, 25 Apr 2025, Lukas Wunner wrote:
> On Thu, Apr 24, 2025 at 03:37:38PM +0300, Ilpo Järvinen wrote:
> > On Thu, 24 Apr 2025, Lukas Wunner wrote:
> > >   The only concern here is whether the cached
> > >   link speed is updated.  pcie_bwctrl_change_speed() does call
> > >   pcie_update_link_speed() after calling pcie_retrain_link(), so that
> > >   looks fine.  But there's a second caller of pcie_retrain_link():
> > >   pcie_aspm_configure_common_clock().  It doesn't update the cached
> > >   link speed after calling pcie_retrain_link().  Not sure if this can
> > >   lead to a change in link speed and therefore the cached link speed
> > >   should be updated?  The Target Link Speed isn't changed, but maybe
> > >   the link fails to retrain to the same speed for electrical reasons?
> > 
> > I've never seen that to happen but it would seem odd if that is forbidden 
> > (as the alternative is probably that the link remains down).
> > 
> > Perhaps pcie_reset_lbms() should just call pcie_update_link_speed() as the 
> > last step, then the irq handler returning IRQ_NONE doesn't matter.
> 
> Why pcie_reset_lbms()?  I was rather thinking that pcie_update_link_speed()
> should be called from pcie_retrain_link().  Maybe right after the final
> pcie_wait_for_link_status().

My reasonale for having it in pcie_reset_lbms() is that LBMS is cleared
there which races with the irq handler reading LBMS. If LBMS is cleared 
before the irq handler reads linksta register, it returns IRQ_NONE and 
will misses the LBMS event. So this race problem is directly associated 
with the write-to-clear of LBMS.

> That would ensure that the speed is updated in case retraining from
> pcie_aspm_configure_common_clock() happens to lead to a lower speed.
> And the call to pcie_update_link_speed() from pcie_bwctrl_change_speed()
> could then be dropped.
>
> PCIe r6.2 sec 7.5.3.19 says the Target Link Speed "sets an upper limit
> on Link operational speed", which implies that the actual negotiated
> speed might be lower.

While I don't disagree with that spec interpretation, in case of ASPM, the 
question is more complex than that. The link was already trained to speed 
x, can the new link training result in failing to train to x (in 
practice).

The funny problem here is that all 3 places have a different, but good 
reason to call pcie_update_link_speed():

1) pcie_reset_lbms() because of the LBMS race mentioned above.

2) pcie_retrain_link() because Link Speed could have changed because of 
   the link training.

3) pcie_bwctrl_change_speed() because it asked to change link speed, and 
   also due to the known HW issue that some platforms do not reliably send 
   LBMS (I don't recall if it was interrupt triggering issue or issue 
   with asserting LBMS itself, the effect is the same regardless).

In addition, in the code 3) calls 2) and 2) calls 1), which leaves 
pcie_reset_lbms() as the function where all roads always lead to including 
those that only call pcie_reset_lbms(). So if pcie_update_link_speed() is 
to be placed not all those three places but only one, the best place seems 
to be pcie_reset_lbms() as it covers all cases including those that only 
calls it directly.

-- 
 i.