On 7/17/2025 7:29 PM, Manivannan Sadhasivam wrote: > On Thu, Jul 17, 2025 at 06:46:12PM GMT, Baochen Qiang wrote: >> >> >> On 7/17/2025 6:31 PM, Manivannan Sadhasivam wrote: >>> On Thu, Jul 17, 2025 at 05:24:13PM GMT, Baochen Qiang wrote: >>> >>> [...] >>> >>>>> @@ -16,6 +16,8 @@ >>>>> #include "mhi.h" >>>>> #include "debug.h" >>>>> >>>>> +#include "../ath.h" >>>>> + >>>>> #define ATH12K_PCI_BAR_NUM 0 >>>>> #define ATH12K_PCI_DMA_MASK 36 >>>>> >>>>> @@ -928,8 +930,7 @@ static void ath12k_pci_aspm_disable(struct ath12k_pci *ab_pci) >>>>> u16_get_bits(ab_pci->link_ctl, PCI_EXP_LNKCTL_ASPM_L1)); >>>>> >>>>> /* disable L0s and L1 */ >>>>> - pcie_capability_clear_word(ab_pci->pdev, PCI_EXP_LNKCTL, >>>>> - PCI_EXP_LNKCTL_ASPMC); >>>>> + pci_disable_link_state(ab_pci->pdev, PCIE_LINK_STATE_L0S | PCIE_LINK_STATE_L1); >>>> >>>> Not always, but sometimes seems the 'disable' does not work: >>>> >>>> [ 279.920507] ath12k_pci_power_up 1475: link_ctl 0x43 //before disable >>>> [ 279.920539] ath12k_pci_power_up 1482: link_ctl 0x43 //after disable >>>> >>>> >>>>> >>>>> set_bit(ATH12K_PCI_ASPM_RESTORE, &ab_pci->flags); >>>>> } >>>>> @@ -958,10 +959,7 @@ static void ath12k_pci_aspm_restore(struct ath12k_pci *ab_pci) >>>>> { >>>>> if (ab_pci->ab->hw_params->supports_aspm && >>>>> test_and_clear_bit(ATH12K_PCI_ASPM_RESTORE, &ab_pci->flags)) >>>>> - pcie_capability_clear_and_set_word(ab_pci->pdev, PCI_EXP_LNKCTL, >>>>> - PCI_EXP_LNKCTL_ASPMC, >>>>> - ab_pci->link_ctl & >>>>> - PCI_EXP_LNKCTL_ASPMC); >>>>> + pci_enable_link_state(ab_pci->pdev, ath_pci_aspm_state(ab_pci->link_ctl)); >>>> >>>> always, the 'enable' is not working: >>>> >>>> [ 280.561762] ath12k_pci_start 1180: link_ctl 0x43 //before restore >>>> [ 280.561809] ath12k_pci_start 1185: link_ctl 0x42 //after restore >>>> >>> >>> Interesting! I applied your diff and I never see this issue so far (across 10+ >>> reboots): >> >> I was not testing reboot. Here is what I am doing: >> >> step1: rmmod ath12k >> step2: force LinkCtrl using setpci (make sure it is 0x43, which seems more likely to see >> the issue) >> >> sudo setpci -s 02:00.0 0x80.B=0x43 >> >> step3: insmod ath12k and check linkctrl >> > > So I did the same and got: > > [ 3283.363569] ath12k_pci_power_up 1475: link_ctl 0x43 > [ 3283.363769] ath12k_pci_power_up 1480: link_ctl 0x40 > [ 3284.007661] ath12k_pci_start 1180: link_ctl 0x40 > [ 3284.007826] ath12k_pci_start 1185: link_ctl 0x42 > > My host machine is Qcom based Thinkpad T14s and it doesn't support L0s. So > that's why the lnkctl value once enabled becomes 0x42. This is exactly the > reason why the drivers should not muck around LNKCTL register manually. Thanks, then the 0x43 -> 0x40 -> 0x40 -> 0x42 sequence should not be a concern. But still the random 0x43 -> 0x43 -> 0x43 -> 0x42 sequence seems problematic. How many iterations have you done with above steps? From my side it seems random so better to do some stress test. > >>> >>> [ 3.758239] ath12k_pci_power_up 1475: link_ctl 0x42 >>> [ 3.758315] ath12k_pci_power_up 1480: link_ctl 0x40 >>> [ 4.383900] ath12k_pci_start 1180: link_ctl 0x40 >>> [ 4.384026] ath12k_pci_start 1185: link_ctl 0x42 >>> >>> Are you sure that you applied all the 6 patches in the series and not just the >>> ath patches? Because, the first 3 PCI core patches are required to make the API >>> work as intended. >> >> pretty sure all of them: >> >> $ git log --oneline >> 07387d1bc17f (HEAD -> VALIDATE-pci-enable-link-state-behavior) wifi: ath12k: dump linkctrl reg >> dbb3e5a7828b wifi: ath10k: Use pci_{enable/disable}_link_state() APIs to enable/disable >> ASPM states >> 392d7b3486b3 wifi: ath11k: Use pci_{enable/disable}_link_state() APIs to enable/disable >> ASPM states >> f2b0685c456d wifi: ath12k: Use pci_{enable/disable}_link_state() APIs to enable/disable >> ASPM states >> b1c8fad998f1 PCI/ASPM: Improve the kernel-doc for pci_disable_link_state*() APIs >> b8f5204ba4b0 PCI/ASPM: Transition the device to D0 (if required) inside >> pci_enable_link_state_locked() API >> 186b1bbd4c62 PCI/ASPM: Fix the behavior of pci_enable_link_state*() APIs >> 5a1ad8faaa16 (tag: ath-202507151704, origin/master, origin/main, origin/HEAD) Add >> localversion-wireless-testing-ath >> > > Ok! > >> >>> >>>> >>>>> } >>>>> >>>>> static void ath12k_pci_cancel_workqueue(struct ath12k_base *ab) >>>>> >>>> >>>> In addition, frequently I can see below AER warnings: >>>> >>>> [ 280.383143] aer_ratelimit: 30 callbacks suppressed >>>> [ 280.383151] pcieport 0000:00:1c.0: AER: Correctable error message received from >>>> 0000:00:1c.0 >>>> [ 280.383177] pcieport 0000:00:1c.0: PCIe Bus Error: severity=Correctable, type=Data Link >>>> Layer, (Transmitter ID) >>>> [ 280.383184] pcieport 0000:00:1c.0: device [8086:7ab8] error status/mask=00001000/00002000 >>>> [ 280.383193] pcieport 0000:00:1c.0: [12] Timeout >>>> >>> >>> I don't see any AER errors either. >> >> My WLAN chip is attached via a PCIe-to-M.2 adapter, maybe some hardware issue? However I >> never saw them until your changes applied. >> > > I don't think it should matter. I have an Intel NUC lying around with QCA6390 > attached via M.2. Let me test this change on that and report back the result. > > - Mani >