Re: [Bug 219984] New: [BISECTED] High power usage since 'PCI/ASPM: Correct LTR_L1.2_THRESHOLD computation'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Bjorn,

one (probably the main) power user is the CPU at shallow C states post
7afeb84d14ea. Even at some load (like web browsing) the CPU spends
most time in C7 after reverting 7afeb84d14ea, in contrast to C3 even
at idle in the original 6.14.0. So the main question is what can make
the CPU busy with larger LTR_L1.2_THRESHOLDs?

I do have Win10 too, but neither Win binaries of pciutils nor Device
Manager show LTR_L1.2_THRESHOLD. lspci -vv run as Administrator does
report some "latencies" though. Some of them are significantly
smaller, e.g. "Exit Latency L0s <1us, L1 <16us" for the bridge
00:1d.6, others are significantly larger, e.g. "Exit Latency L1
unlimited" for the NVMe 6e:00.0, than the LTR_L1.2_THRESHOLDs
calculated by Linux. The full log is attached.

But do we need to care about precise values? At least we know now that
7afeb84d14ea has only increased the thresholds, slightly. What happens
if they are underestimated? Can this lead to severe problems, e.g.
data corruption on NVMes? If not (and I've never seen one using 5.15
kernels for 4 years), can we reprogram LTR_L1.2_THRESHOLDs at runtime?
Like for the CPU, introduce 'performance' and 'powersave' governors
for the PCI, which set the thresholds to, say, 2x and 0.5x (2 + 4 +
t_common_mode + t_power_on), respectively.

Thanks.
Sergey.

Sergey.

On Wed, Apr 9, 2025 at 12:18 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>
> On Tue, Apr 08, 2025 at 09:02:46PM +0100, Sergey Dolgov wrote:
> > Dear Bjorn,
> >
> > here are both dmesg from the kernels with your info patch.
>
> Thanks again!  Here's the difference:
>
>   - pre  7afeb84d14ea
>   + post 7afeb84d14ea
>
>    pci 0000:02:00.0: parent CMRT 0x28 child CMRT 0x00
>    pci 0000:02:00.0: parent T_POWER_ON 0x2c usec (val 0x16 scale 0)
>    pci 0000:02:00.0: child  T_POWER_ON 0x0a usec (val 0x5 scale 0)
>    pci 0000:02:00.0: t_common_mode 0x28 t_power_on 0x2c l1_2_threshold 0x5a
>   -pci 0000:02:00.0: encoded LTR_L1.2_THRESHOLD value 0x02 scale 3
>   +pci 0000:02:00.0: encoded LTR_L1.2_THRESHOLD value 0x58 scale 2
>
> We computed LTR_L1.2_THRESHOLD == 0x5a == 90 usec == 90000 nsec.
>
> Prior to 7afeb84d14ea, we computed *scale = 3, *value = (90000 >> 15)
> == 0x2.  But per PCIe r6.0, sec 6.18, this is a latency value of only
> 0x2 * 32768 == 65536 ns, which is less than the 90000 ns we requested.
>
> After 7afeb84d14ea, we computed *scale = 2, *value =
> roundup(threshold_ns, 1024) / 1024 == 0x58, which is a latency value
> of 90112 ns, which is almost exactly what we requested.
>
> In essence, before 7afeb84d14ea we tell the Root Port that it can
> enter L1.2 and get back to L0 in 65536 ns or less, and after
> 7afeb84d14ea, we tell it that it may take up to 90112 ns.
>
> It's possible that the calculation of LTR_L1.2_THRESHOLD itself in
> aspm_calc_l12_info() is too conservative, and we don't actually need
> 90 usec, but I think the encoding done by 7afeb84d14ea itself is more
> correct.  I don't have any information about how to improve 90 usec
> estimate.  (If you happen to have Windows on that box, it would be
> really interesting to see how it sets LTR_L1.2_THRESHOLD.)
>
> If the device has sent LTR messages indicating a latency requirement
> between 65536 ns and 90112 ns, the pre-7afeb84d14ea kernel would allow
> L1.2 while post 7afeb84d14ea would not.  I don't think we can actually
> see the LTR messages sent by the device, but my guess is they must be
> in that range.  I don't know if that's enough to account for the major
> difference in power consumption you're seeing.
>
> The AX200 at 6f:00.0 is in exactly the same situation as the
> Thunderbolt bridge at 02:00.0 (LTR_L1.2_THRESHOLD 90 usec, RP set to
> 65536 ns before 7afeb84d14ea and 90112 ns after).
>
> For the NVMe devices at 6d:00.0 and 6e:00.0, LTR_L1.2_THRESHOLD is
> 3206 usec (!), and we set the RP to 3145728 ns (slightly too small)
> before, 3211264 ns after.
>
> For the RTS525A at 70:00.0, LTR_L1.2_THRESHOLD is 126 usec, and we set
> the RP to 98304 ns before, 126976 ns after.
>
> Sorry, no real answers here yet, still puzzled.
>
> Bjorn

Attachment: lspci-win.log
Description: Binary data


[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux