Thanks for the quick reply Ping-Ke Shih wrote on Thu, May 29, 2025 at 01:32:44AM +0000: > My suggestions are: > > (1) The latest firmware of RTL8852BTE is 0.29.122.0. Please try it. I'm apparently already on that: rtw89_8852bte 0000:03:00.0: Firmware version 0.29.122.0 (5659197d), cmd version 0, type 5 > (2) To disable PCI ASPM is also worth to try: > > /etc/modprobe.d/70-rtw89.conf > options rtw89_pci disable_aspm_l1ss=y disable_aspm_l1=y disable_clkreq=y > options rtw89_core disable_ps_mode=y > > After setup these options, you need to cold reboot your machine to > make sure ASPM isn't enabled. Also, please check the setting via > /sys/module/rtw89_pci/parameters/ after rebooting. > > (3) Another is to disable power save mode via > sudo iw wlan0 set power_save off > > If this can work to you, you can add option to configuration file of > network manager to disable it. Thank you. I'm not too interested in workarounds as I don't really need to use the card; if you want me to try a kernel fix for the kernel crash I'll be happy to re-enable the card a few weeks to check but there's little point in testing these in my opinion unless it would help you fix the firmware. > > The line in question would be this line: > > > txwd = &wd_ring->pages[seq]; > > (which matches as pages is an array of 512 rtw89_pci_tx_wd structs) > > > > > > Checking seq < RTW89_PCI_TXWD_NUM_MAX is trivial and I could send a > > patch, but if that data is really bogus I assume any local check could > > be fooled e.g. the data could be < 512 and still incorrect. > > You're right. Right, I also checked the earlier "Cannot map qsel to dma" message but it's also from the same value obtained from the card, so failing at that point is not enough (even if it can probably still be better than a hard crash...) I'm not sure the crash happens in the same call of rtw89_pci_interrupt_threadfn() but I think so given the end of the dump and the map qsel error are 8 nsec appart (if I read this netconsole format right) In that case perhaps it'd make sense from the call to mark the card as no longer running (and perhaps bail out) in case of error? Something like this: ``` diff --git a/drivers/net/wireless/realtek/rtw89/pci.c b/drivers/net/wireless/realtek/rtw89/pci.c index c2fe5a898dc7..793e4a4cf5a2 100644 --- a/drivers/net/wireless/realtek/rtw89/pci.c +++ b/drivers/net/wireless/realtek/rtw89/pci.c @@ -896,8 +896,12 @@ static irqreturn_t rtw89_pci_interrupt_threadfn(int irq, void *dev) if (unlikely(isrs.isrs[0] & gen_def->isr_rdu)) rtw89_pci_isr_rxd_unavail(rtwdev, rtwpci); - if (unlikely(isrs.halt_c2h_isrs & gen_def->isr_halt_c2h)) + if (unlikely(isrs.halt_c2h_isrs & gen_def->isr_halt_c2h)) { rtw89_ser_notify(rtwdev, rtw89_mac_get_err_status(rtwdev)); + // maybe? + rtwpci->running = false; + return IRQ_HANDLED; + } if (unlikely(isrs.halt_c2h_isrs & gen_def->isr_wdt_timeout)) rtw89_ser_notify(rtwdev, MAC_AX_ERR_L2_ERR_WDT_TIMEOUT_INT); ``` Or is that flag also supposed to set itself? Anyway, as said above this is mostly curiosity for me, please deal with it as you see fit and let me know if you want me to test something. Thanks, -- Dominique Martinet | Asmadeus