On Mon, Aug 11, 2025 at 1:00 PM Baochen Qiang <baochen.qiang@xxxxxxxxxxxxxxxx> wrote: > > > > On 8/11/2025 5:08 PM, Balsam Chihi wrote: > > On Fri, Jul 25, 2025 at 5:25 AM Baochen Qiang > > <baochen.qiang@xxxxxxxxxxxxxxxx> wrote: > >> > >> > >> > >> On 6/12/2025 12:40 AM, balsam.chihi@xxxxxxxxxxx wrote: > >>> From: Balsam CHIHI <balsam.chihi@xxxxxxxxxxx> > >>> > > > > Hello, sorry for the delay. I was OoO. > > > >>> During system reboot or module removal (rmmod), a crash was observed > >>> due to a synchronous external abort caused by invalid register access > >>> in ath11k_pci_clear_dbg_registers(). This happens when the device is > >>> already powered off and the driver attempts to read debug registers. > >> > >> What does 'powered off' mean exactly here? is WLAN powered by an internal/external supply > >> that will be removed during rmmod/reboot? > > > > The "powered off" means that the wlan card is rmmod or reboot. > > It is not powered by external supply. > > OK. > > > > >> > >>> > >>> The crash log shows: > >>> root@OpenWrt:~# reboot > >>> [ 343.663492] Internal error: synchronous external abort: > >>> 0000000096000210 [#1] SMP > >>> [ 343.670992] Modules linked in: [...] > >>> [ 343.842432] CPU: 7 PID: 9435 Comm: procd Tainted: G O 6.6.86 #0 > >>> [ 343.849746] Hardware name: LS1088A > >>> [ 343.856969] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > >>> [ 343.863933] pc : ath11k_pci_get_msi_irq+0x18a0/0x1900 [ath11k_pci] > >> > >> I don't see any relationship between ath11k_pci_clear_dbg_registers() and this PC pointer. > >> So how did you locate the problematic register access then? > > > > I tyied to look for the problem based the PC pointer but I was not > > abled to find it. > > So, I added prints in ath11k driver to find out where it stucks, > > classical debug method. > > Yeah, classical and always work. > > > Then I found out that any "ath11k_pcic_read32()" call fails on rmmod > > ath11k_pci or reboot, > > I worry PCIe link is not stable on your platform when rmmod/reboot. Can you try with ASPM > disabled? ASPM is indeed disabled in the Kernel config. I will also try with your patch and get back to you with the results. And in the meanwhile, I would like to let you know the following information : Other PCIe devices are working well even with ASPM enabled in the same slot, and the issue is present in all PCIe slots. I have many board variants of the same platform that showed the same output. Maybe it is only related to the Qualcomm QCN9074 WiFi6E chipset? I tried 2 wifi cards from two manufacturers of the same wifi chip and the issue is the same. Is it related to the SoC Freescale Layerscape LS1088A? It seems like the kernel is accessing a freed resource or no longer existing memory address because the wifi card is turned off. > > --- a/drivers/net/wireless/ath/ath11k/pci.c > +++ b/drivers/net/wireless/ath/ath11k/pci.c > @@ -605,11 +605,13 @@ static void ath11k_pci_aspm_disable(struct ath11k_pci *ab_pci) > > static void ath11k_pci_aspm_restore(struct ath11k_pci *ab_pci) > { > +#if 0 > if (test_and_clear_bit(ATH11K_PCI_ASPM_RESTORE, &ab_pci->flags)) > pcie_capability_clear_and_set_word(ab_pci->pdev, PCI_EXP_LNKCTL, > PCI_EXP_LNKCTL_ASPMC, > ab_pci->link_ctl & > PCI_EXP_LNKCTL_ASPMC); > +#endif > } > > #ifdef CONFIG_DEV_COREDUMP > > > > but they work fine when insmod/boot ath11k_pci. > > > >> > >>> [ 343.870125] lr : ath11k_pcic_init_msi_config+0x98/0xc4 [ath11k] > >>> [...] > >>> [ 343.950800] Call trace: > >>> [ 343.953241] ath11k_pci_get_msi_irq+0x18a0/0x1900 [ath11k_pci] > >>> [ 343.959080] ath11k_pcic_init_msi_config+0x98/0xc4 [ath11k] > >>> [ 343.964671] ath11k_pcic_read32+0x30/0xb4 [ath11k] > >>> [ 343.969477] ath11k_pci_get_msi_irq+0x528/0x1900 [ath11k_pci] > >>> [ 343.975230] ath11k_pci_get_msi_irq+0x1460/0x1900 [ath11k_pci] > >>> [ 343.981068] ath11k_pci_get_msi_irq+0x1750/0x1900 [ath11k_pci] > >>> [ 343.986906] pci_device_shutdown+0x34/0x44 > >>> [ 343.991004] device_shutdown+0x160/0x268 > >>> [ 343.994928] kernel_restart+0x40/0xc0 > >>> [ 343.998594] __do_sys_reboot+0x104/0x23c > >>> [ 344.002518] __arm64_sys_reboot+0x24/0x30 > >>> [ 344.006529] do_el0_svc+0x6c/0x100 > >>> [ 344.009931] el0_svc+0x28/0x9c > >>> [ 344.012986] el0t_64_sync_handler+0x120/0x12c > >>> [ 344.017344] el0t_64_sync+0x178/0x17c > >>> [ 344.021009] Code: f94e0a80 92404a73 91420273 8b130013 (b9400273) > >>> [ 344.027102] ---[ end trace 0000000000000000 ]--- > >>> > >>> This patch adds a `power_on` flag to conditionally skip debug register > >>> reads when the device is not powered, preventing invalid memory access. > >> > >> why is only 'read' concerned here? > > > > Good question, I can't explain why only the reading fails but not the writing. > > > >> > >> You are skipping several 'read' in this change, could you help narrow down which > >> individual read instance causes this? > >> > > > > All read calls fail on rmmod/reboot, when > > ath11k_pci_clear_dbg_registers() is called. > > But they all succeed when insmod/boot ath11k_pci. > > > >>> > >>> With this change, the system no longer crashes on reboot or rmmod, > >>> and the driver continues to function correctly when reloaded. > >>> > >>> Tested-on: QCN9074 hw2.0 PCIe on LS1088A > >>> Tested-on: OpenWrt 24.10.1 > >>> Fixes: crash on reboot/rmmod due to invalid register access > >>> Signed-off-by: Balsam CHIHI <balsam.chihi@xxxxxxxxxxx> > >>> --- > >>> drivers/net/wireless/ath/ath11k/pci.c | 36 ++++++++++++++++----------- > >>> 1 file changed, 21 insertions(+), 15 deletions(-) > >>> > >>> diff --git a/drivers/net/wireless/ath/ath11k/pci.c b/drivers/net/wireless/ath/ath11k/pci.c > >>> index 78444f8ea153..0c1b55957557 100644 > >>> --- a/drivers/net/wireless/ath/ath11k/pci.c > >>> +++ b/drivers/net/wireless/ath/ath11k/pci.c > >>> @@ -203,16 +203,18 @@ static void ath11k_pci_soc_global_reset(struct ath11k_base *ab) > >>> ath11k_warn(ab, "link down error during global reset\n"); > >>> } > >>> > >>> -static void ath11k_pci_clear_dbg_registers(struct ath11k_base *ab) > >>> +static void ath11k_pci_clear_dbg_registers(struct ath11k_base *ab, bool power_on) > >>> { > >>> - u32 val; > >>> + if (power_on) { > >>> + u32 val; > >>> > >>> - /* read cookie */ > >>> - val = ath11k_pcic_read32(ab, PCIE_Q6_COOKIE_ADDR); > >>> - ath11k_dbg(ab, ATH11K_DBG_PCI, "pcie_q6_cookie_addr 0x%x\n", val); > >>> + /* read cookie */ > >>> + val = ath11k_pcic_read32(ab, PCIE_Q6_COOKIE_ADDR); > >>> + ath11k_dbg(ab, ATH11K_DBG_PCI, "pcie_q6_cookie_addr 0x%x\n", val); > >>> > >>> - val = ath11k_pcic_read32(ab, WLAON_WARM_SW_ENTRY); > >>> - ath11k_dbg(ab, ATH11K_DBG_PCI, "wlaon_warm_sw_entry 0x%x\n", val); > >>> + val = ath11k_pcic_read32(ab, WLAON_WARM_SW_ENTRY); > >>> + ath11k_dbg(ab, ATH11K_DBG_PCI, "wlaon_warm_sw_entry 0x%x\n", val); > >>> + } > >>> > >>> /* TODO: exact time to sleep is uncertain */ > >>> mdelay(10); > >>> @@ -223,14 +225,18 @@ static void ath11k_pci_clear_dbg_registers(struct ath11k_base *ab) > >>> ath11k_pcic_write32(ab, WLAON_WARM_SW_ENTRY, 0); > >>> mdelay(10); > >>> > >>> - val = ath11k_pcic_read32(ab, WLAON_WARM_SW_ENTRY); > >>> - ath11k_dbg(ab, ATH11K_DBG_PCI, "wlaon_warm_sw_entry 0x%x\n", val); > >>> + if (power_on) { > >>> + u32 val; > >>> > >>> - /* A read clear register. clear the register to prevent > >>> - * Q6 from entering wrong code path. > >>> - */ > >>> - val = ath11k_pcic_read32(ab, WLAON_SOC_RESET_CAUSE_REG); > >>> - ath11k_dbg(ab, ATH11K_DBG_PCI, "soc reset cause %d\n", val); > >>> + val = ath11k_pcic_read32(ab, WLAON_WARM_SW_ENTRY); > >>> + ath11k_dbg(ab, ATH11K_DBG_PCI, "wlaon_warm_sw_entry 0x%x\n", val); > >>> + > >>> + /* A read clear register. clear the register to prevent > >>> + * Q6 from entering wrong code path. > >>> + */ > >>> + val = ath11k_pcic_read32(ab, WLAON_SOC_RESET_CAUSE_REG); > >>> + ath11k_dbg(ab, ATH11K_DBG_PCI, "soc reset cause %d\n", val); > >>> + } > >>> } > >>> > >>> static int ath11k_pci_set_link_reg(struct ath11k_base *ab, > >>> @@ -368,7 +374,7 @@ static void ath11k_pci_sw_reset(struct ath11k_base *ab, bool power_on) > >>> } > >>> > >>> ath11k_mhi_clear_vector(ab); > >>> - ath11k_pci_clear_dbg_registers(ab); > >>> + ath11k_pci_clear_dbg_registers(ab, power_on); > >>> ath11k_pci_soc_global_reset(ab); > >>> ath11k_mhi_set_mhictrl_reset(ab); > >>> } > >> > > >