On Mon, Jun 23, 2025 at 4:27 AM Kang Yang <kang.yang@xxxxxxxxxxxxxxxx> wrote: > > In rare cases, ath10k may lose connection with the PCIe bus due to > some unknown reasons, which could further lead to system crashes during > resuming due to watchdog timeout: > > ath10k_pci 0000:01:00.0: wmi command 20486 timeout, restarting hardware > ath10k_pci 0000:01:00.0: already restarting > ath10k_pci 0000:01:00.0: failed to stop WMI vdev 0: -11 > ath10k_pci 0000:01:00.0: failed to stop vdev 0: -11 > ieee80211 phy0: PM: **** DPM device timeout **** > Call Trace: > panic+0x125/0x315 > dpm_watchdog_set+0x54/0x54 > dpm_watchdog_handler+0x57/0x57 > call_timer_fn+0x31/0x13c > > At this point, all WMI commands will timeout and attempt to restart > device. So set a threshold for consecutive restart failures. If the > threshold is exceeded, consider the hardware is unreliable and all > ath10k operations should be skipped to avoid system crash. > > fail_cont_count and pending_recovery are atomic variables, and > do not involve complex conditional logic. Therefore, even if recovery > check and reconfig complete are executed concurrently, the recovery > mechanism will not be broken. > > Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00288-QCARMSWPZ-1 > > Signed-off-by: Kang Yang <kang.yang@xxxxxxxxxxxxxxxx> Reviewed-by: Loic Poulain <loic.poulain@xxxxxxxxxxxxxxxx>