On 6/12/2025 3:02 PM, Sergey Senozhatsky wrote: > On (25/06/12 13:47), Baochen Qiang wrote: >>> [..] >>>>> diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c >>>>> index 8cb1505a5a0c..cab11a35f911 100644 >>>>> --- a/drivers/net/wireless/ath/ath11k/hal.c >>>>> +++ b/drivers/net/wireless/ath/ath11k/hal.c >>>>> @@ -1346,6 +1346,10 @@ EXPORT_SYMBOL(ath11k_hal_srng_init); >>>>> void ath11k_hal_srng_deinit(struct ath11k_base *ab) >>>>> { >>>>> struct ath11k_hal *hal = &ab->hal; >>>>> + int i; >>>>> + >>>>> + for (i = 0; i < HAL_SRNG_RING_ID_MAX; i++) >>>>> + ab->hal.srng_list[i].initialized = 0; >>>> >>>> With this flag reset, srng stats would not be dumped in ath11k_hal_dump_srng_stats(). >>> >>> I think un-initialized lists should not be dumped. >>> >>> ath11k_hal_srng_deinit() releases wrp.vaddr and rdp.vaddr, which are >>> accessed, as far as I understand it, in ath11k_hal_dump_srng_stats() >>> as *srng->u.src_ring.tp_addr and *srng->u.dst_ring.hp_addr, presumably, >>> causing things like: >> >> But ath11k_hal_dump_srng_stats() is called before ath11k_hal_srng_deinit(), right? >> >> The sequence is ath11k_hal_dump_srng_stats() is called in reset process, then restart_work >> is queued and in ath11k_core_restart() we call ath11k_core_reconfigure_on_crash(), there >> ath11k_hal_srng_deinit() is called, right? > > My understanding is that the driver first fails to reconfigure > > <4>[163874.555825] ath11k_pci 0000:01:00.0: already resetting count 2 > <4>[163884.606490] ath11k_pci 0000:01:00.0: failed to wait wlan mode request (mode 4): -110 > <4>[163884.606508] ath11k_pci 0000:01:00.0: qmi failed to send wlan mode off: -110 > <3>[163884.606550] ath11k_pci 0000:01:00.0: failed to reconfigure driver on crash recovery > > so ath11k_core_reconfigure_on_crash() calls ath11k_hal_srng_deinit(), > which destroys the srng lists, but leaves the stale initialized flag. > So next time ath11k_hal_dump_srng_stats() is called everything looks ok, > but in fact everything is not quite ok. OK, we have a second crash while the first crash is still in recovering. And guess the first recovery fails such that srng is not reinitialized. Then after a wait-for-first-recovery time out, the second recovery starts, this results in ath11k_hal_dump_srng_stats() getting called and hence the kernel crash. Could you please share complete verbose kernel log? you may enable it with modprobe ath11k debug_mask=0xffffffff modprobe ath11k_pci > > Regardless of that, I do think that resetting the initialized flag > when srng list is de-initialized/destroyed is the right thing to do. Yeah, correct.