Hi, On Tue, Apr 29, 2025 at 05:23:35PM +0500, Muhammad Usama Anjum wrote: > Fix dma_direct_alloc() failure at resume time during bhie_table > allocation. There is a crash report where at resume time, the memory > from the dma doesn't get allocated and MHI fails to re-initialize. > There is fragmentation/memory pressure. > > To fix it, don't free the memory at power down during suspend / > hibernation. Instead, use the same allocated memory again after every > resume / hibernation. This patch has been tested with resume and > hibernation both. > > The rddm is of constant size for a given hardware. While the fbc_image > size depends on the firmware. If the firmware changes, we'll free and > allocate new memory for it. > > Here are the crash logs: > > [ 3029.338587] mhi mhi0: Requested to power ON > [ 3029.338621] mhi mhi0: Power on setup success > [ 3029.668654] kworker/u33:8: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0 > [ 3029.668682] CPU: 4 UID: 0 PID: 2744 Comm: kworker/u33:8 Not tainted 6.11.11-valve10-1-neptune-611-gb69e902b4338 #1ed779c892334112fb968aaa3facf9686b5ff0bd7 > [ 3029.668690] Hardware name: Valve Galileo/Galileo, BIOS F7G0112 08/01/2024 > [ 3029.668694] Workqueue: mhi_hiprio_wq mhi_pm_st_worker [mhi] > [ 3029.668717] Call Trace: > [ 3029.668722] <TASK> > [ 3029.668728] dump_stack_lvl+0x4e/0x70 > [ 3029.668738] warn_alloc+0x164/0x190 > [ 3029.668747] ? srso_return_thunk+0x5/0x5f > [ 3029.668754] ? __alloc_pages_direct_compact+0xaf/0x360 > [ 3029.668761] __alloc_pages_slowpath.constprop.0+0xc75/0xd70 > [ 3029.668774] __alloc_pages_noprof+0x321/0x350 > [ 3029.668782] __dma_direct_alloc_pages.isra.0+0x14a/0x290 > [ 3029.668790] dma_direct_alloc+0x70/0x270 > [ 3029.668796] mhi_alloc_bhie_table+0xe8/0x190 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0] > [ 3029.668814] mhi_fw_load_handler+0x1bc/0x310 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0] > [ 3029.668830] mhi_pm_st_worker+0x5c8/0xaa0 [mhi faa917c5aa23a5f5b12d6a2c597067e16d2fedc0] > [ 3029.668844] ? srso_return_thunk+0x5/0x5f > [ 3029.668853] process_one_work+0x17e/0x330 > [ 3029.668861] worker_thread+0x2ce/0x3f0 > [ 3029.668868] ? __pfx_worker_thread+0x10/0x10 > [ 3029.668873] kthread+0xd2/0x100 > [ 3029.668879] ? __pfx_kthread+0x10/0x10 > [ 3029.668885] ret_from_fork+0x34/0x50 > [ 3029.668892] ? __pfx_kthread+0x10/0x10 > [ 3029.668898] ret_from_fork_asm+0x1a/0x30 > [ 3029.668910] </TASK> > > Tested-on: WCN6855 WLAN.HSP.1.1-03926.13-QCAHSPSWPL_V2_SILICONZ_CE-2.52297.6 > > Signed-off-by: Muhammad Usama Anjum <usama.anjum@xxxxxxxxxxxxx> > --- This breaks ath12k on my T14s Snapdragon with WCN785x. After a suspend/resume cycle the following is in my logs (and the resume is super slow). Additionally at shutdown ath12k crashes with a NULL pointer dereference in mhi_deinit_dev_ctxt, which got called by mhi_unprepare_after_power_down, which got called by ath12k_mhi_stop. This happens after filesystem umount and I don't have anything configured right now to get logs from that point, so it is not included in the log from the suspend/resume cycle down below: ... [ 28.385370] ath12k_pci 0004:01:00.0: failed to set mhi state INIT(0) in current mhi state (0x1) [ 28.385379] ath12k_pci 0004:01:00.0: failed to set mhi state: INIT(0) [ 28.385383] ath12k_pci 0004:01:00.0: failed to start mhi: -22 [ 28.385387] ath12k_pci 0004:01:00.0: failed to power up hif during resume: -22 [ 28.385391] ath12k_pci 0004:01:00.0: failed to early resume core: -22 [ 28.385393] ath12k_pci 0004:01:00.0: PM: dpm_run_callback(): pci_pm_resume_early returns -22 [ 28.385413] ath12k_pci 0004:01:00.0: PM: failed to resume async early: error -22 [ 28.385513] qcom_mhi_qrtr mhi0_IPCR: Current EE: DISABLE Required EE Mask: 0x4 [ 28.385521] qcom_mhi_qrtr mhi0_IPCR: failed to prepare for autoqueue transfer -107 [ 28.385526] qcom_mhi_qrtr mhi0_IPCR: PM: dpm_run_callback(): qcom_mhi_qrtr_pm_resume_early [qrtr_mhi] returns -107 [ 28.385541] qcom_mhi_qrtr mhi0_IPCR: PM: failed to resume early: error -107 [ 50.146823] ath12k_pci 0004:01:00.0: timeout while waiting for restart complete [ 50.146830] ath12k_pci 0004:01:00.0: failed to resume core: -110 [ 50.146834] ath12k_pci 0004:01:00.0: PM: dpm_run_callback(): pci_pm_resume returns -110 [ 50.146849] ath12k_pci 0004:01:00.0: PM: failed to resume async: error -110 [ 53.218794] ath12k_pci 0004:01:00.0: wmi command 16387 timeout [ 53.218801] ath12k_pci 0004:01:00.0: failed to send WMI_PDEV_SET_PARAM cmd [ 53.218808] ath12k_pci 0004:01:00.0: failed to set ac override for ARP: -11 [ 53.218813] ath12k_pci 0004:01:00.0: fail to start mac operations in pdev idx 0 ret -11 [ 53.218817] ------------[ cut here ]------------ [ 53.218820] Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue. [ 53.218855] WARNING: CPU: 2 PID: 1958 at net/mac80211/util.c:1829 ieee80211_reconfig+0x37c/0x1718 [mac80211] [ 53.218936] Modules linked in: reset_gpio snd_soc_wsa884x q6prm_clocks q6apm_dai q6apm_lpass_dais snd_q6dsp_common q6prm michael_mic rfcomm wireguard libchacha20poly1305 chacha_neon libchacha poly1305_neon ip6_udp_tunnel udp_tunnel libcurve25519_generic binfmt_misc qrtr_mhi ath12k mac80211 libarc4 cfg80211 mhi hci_uart btqca btbcm snd_soc_x1e80100 snd_soc_qcom_sdw snd_soc_qcom_common bluetooth ecdh_generic ecc qcom_spmi_temp_alarm rfkill snd_q6apm snd_soc_hdmi_codec fastrpc snd_soc_lpass_va_macro snd_soc_lpass_tx_macro snd_soc_lpass_rx_macro snd_soc_lpass_wsa_macro soundwire_qcom snd_soc_wcd938x slimbus snd_soc_lpass_macro_common snd_soc_wcd938x_sdw pci_pwrctrl_pwrseq regmap_sdw snd_soc_wcd_mbhc coresight_stm coresight_funnel coresight_tmc snd_soc_wcd_classh coresight_cti stm_core coresight_replicator soundwire_bus coresight mux_gpio fuse nfnetlink ip_tables x_tables ipv6 gpio_sbu_mux panel_edp msm hid_multitouch drm_exec ocmem gpu_sched drm_dp_aux_bus rpmsg_ctrl apr rpmsg_char qrtr_smd i2c_hid_of qcom_pd_mapper [ 53.219100] ps883x phy_nxp_ptn3222 i2c_hid drm_display_helper nvme phy_qcom_qmp_combo leds_qcom_lpg ucsi_glink pmic_glink_altmode nvme_core aux_hpd_bridge typec_ucsi qcom_battmgr sm3_ce sm3 led_class_multicolor qcom_q6v5_pas sha3_ce rtc_pm8xxx phy_qcom_eusb2_repeater qcom_pbs drm_client_lib aux_bridge sha512_ce qcom_pil_info drm_kms_helper qcom_common qcom_pon sha512_arm64 qcom_glink_smem typec qcom_q6v5 nvmem_qcom_spmi_sdam dispcc_x1e80100 drm pwrseq_qcom_wcn pinctrl_sm8550_lpass_lpi pwrseq_core i2c_qcom_geni qcom_stats pinctrl_lpass_lpi phy_qcom_edp phy_qcom_qmp_usb qcom_sysmon tcsrcc_x1e80100 llcc_qcom gpucc_x1e80100 phy_qcom_snps_eusb2 mdt_loader lpasscc_sc8280xp pcie_qcom qcom_cpucp_mbox icc_bwmon phy_qcom_qmp_pcie qrtr pmic_glink pdr_interface qcom_pdr_msg pwm_bl socinfo backlight qmi_helpers [ 53.219234] CPU: 2 UID: 0 PID: 1958 Comm: kworker/u49:49 Not tainted 6.15.0-rc4+ #95 PREEMPT [ 53.219241] Hardware name: LENOVO 21N1CTO1WW/21N1CTO1WW, BIOS N42ET85W (2.15 ) 11/22/2024 [ 53.219245] Workqueue: async async_run_entry_fn [ 53.219258] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 53.219265] pc : ieee80211_reconfig+0x37c/0x1718 [mac80211] [ 53.219315] lr : ieee80211_reconfig+0x37c/0x1718 [mac80211] [ 53.219362] sp : ffff8000853ebb30 [ 53.219364] x29: ffff8000853ebbf0 x28: 0000000000000000 x27: 0000000000000000 [ 53.219373] x26: ffff1ce140047428 x25: 0000000000000000 x24: ffff1ce1408f7c05 [ 53.219380] x23: ffff1ce14aaa05b8 x22: 0000000000000010 x21: 00000000fffffff5 [ 53.219387] x20: 0000000000000000 x19: ffff1ce14aaa0900 x18: 00000000fffffffe [ 53.219394] x17: 72617774666f7320 x16: 6120656220646c75 x15: 6f63207369685420 [ 53.219401] x14: 2e656d7573657220 x13: 0a2e657573736920 x12: 6572617764726168 [ 53.219408] x11: 0000000000000058 x10: 0000000000000018 x9 : ffffdacf6aa7749c [ 53.219415] x8 : 0000000000000507 x7 : ffffdacf6d031138 x6 : ffffdacf6d031138 [ 53.219422] x5 : ffff1ce8bbe76508 x4 : 0000000000000000 x3 : ffff42194efd1000 [ 53.219429] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff1ce149caa300 [ 53.219437] Call trace: [ 53.219440] ieee80211_reconfig+0x37c/0x1718 [mac80211] (P) [ 53.219490] ieee80211_resume+0x54/0x78 [mac80211] [ 53.219541] wiphy_resume+0x8c/0x200 [cfg80211] [ 53.219603] dpm_run_callback+0x50/0x188 [ 53.219614] device_resume+0xc4/0x1f8 [ 53.219621] async_resume+0x2c/0x50 [ 53.219628] async_run_entry_fn+0x3c/0x160 [ 53.219634] process_one_work+0x158/0x3c8 [ 53.219643] worker_thread+0x2e0/0x418 [ 53.219650] kthread+0x14c/0x230 [ 53.219657] ret_from_fork+0x10/0x20 [ 53.219666] ---[ end trace 0000000000000000 ]--- [ 53.220154] ------------[ cut here ]------------ [ 53.220158] WARNING: CPU: 2 PID: 1958 at net/mac80211/driver-ops.c:41 drv_stop+0x1cc/0x1e8 [mac80211] [ 53.220235] Modules linked in: reset_gpio snd_soc_wsa884x q6prm_clocks q6apm_dai q6apm_lpass_dais snd_q6dsp_common q6prm michael_mic rfcomm wireguard libchacha20poly1305 chacha_neon libchacha poly1305_neon ip6_udp_tunnel udp_tunnel libcurve25519_generic binfmt_misc qrtr_mhi ath12k mac80211 libarc4 cfg80211 mhi hci_uart btqca btbcm snd_soc_x1e80100 snd_soc_qcom_sdw snd_soc_qcom_common bluetooth ecdh_generic ecc qcom_spmi_temp_alarm rfkill snd_q6apm snd_soc_hdmi_codec fastrpc snd_soc_lpass_va_macro snd_soc_lpass_tx_macro snd_soc_lpass_rx_macro snd_soc_lpass_wsa_macro soundwire_qcom snd_soc_wcd938x slimbus snd_soc_lpass_macro_common snd_soc_wcd938x_sdw pci_pwrctrl_pwrseq regmap_sdw snd_soc_wcd_mbhc coresight_stm coresight_funnel coresight_tmc snd_soc_wcd_classh coresight_cti stm_core coresight_replicator soundwire_bus coresight mux_gpio fuse nfnetlink ip_tables x_tables ipv6 gpio_sbu_mux panel_edp msm hid_multitouch drm_exec ocmem gpu_sched drm_dp_aux_bus rpmsg_ctrl apr rpmsg_char qrtr_smd i2c_hid_of qcom_pd_mapper [ 53.220351] ps883x phy_nxp_ptn3222 i2c_hid drm_display_helper nvme phy_qcom_qmp_combo leds_qcom_lpg ucsi_glink pmic_glink_altmode nvme_core aux_hpd_bridge typec_ucsi qcom_battmgr sm3_ce sm3 led_class_multicolor qcom_q6v5_pas sha3_ce rtc_pm8xxx phy_qcom_eusb2_repeater qcom_pbs drm_client_lib aux_bridge sha512_ce qcom_pil_info drm_kms_helper qcom_common qcom_pon sha512_arm64 qcom_glink_smem typec qcom_q6v5 nvmem_qcom_spmi_sdam dispcc_x1e80100 drm pwrseq_qcom_wcn pinctrl_sm8550_lpass_lpi pwrseq_core i2c_qcom_geni qcom_stats pinctrl_lpass_lpi phy_qcom_edp phy_qcom_qmp_usb qcom_sysmon tcsrcc_x1e80100 llcc_qcom gpucc_x1e80100 phy_qcom_snps_eusb2 mdt_loader lpasscc_sc8280xp pcie_qcom qcom_cpucp_mbox icc_bwmon phy_qcom_qmp_pcie qrtr pmic_glink pdr_interface qcom_pdr_msg pwm_bl socinfo backlight qmi_helpers [ 53.220444] CPU: 2 UID: 0 PID: 1958 Comm: kworker/u49:49 Tainted: G W 6.15.0-rc4+ #95 PREEMPT [ 53.220452] Tainted: [W]=WARN [ 53.220455] Hardware name: LENOVO 21N1CTO1WW/21N1CTO1WW, BIOS N42ET85W (2.15 ) 11/22/2024 [ 53.220458] Workqueue: async async_run_entry_fn [ 53.220467] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 53.220472] pc : drv_stop+0x1cc/0x1e8 [mac80211] [ 53.220521] lr : ieee80211_stop_device+0x8c/0xa8 [mac80211] [ 53.220580] sp : ffff8000853eb9f0 [ 53.220582] x29: ffff8000853eb9f0 x28: 0000000000000000 x27: 0000000000000000 [ 53.220591] x26: ffff1ce140047428 x25: ffff8000853eba50 x24: ffff8000853eba50 [ 53.220598] x23: 0000000000000000 x22: 0000000000000001 x21: 0000000000000000 [ 53.220604] x20: 0000000000000000 x19: ffff1ce14aaa0900 x18: 00000000fffffffe [ 53.220611] x17: ffff42194efd1000 x16: ffff800080010000 x15: 6f63207369685420 [ 53.220618] x14: 000000000000037f x13: 000000000000037f x12: 071c71c71c71c71c [ 53.220625] x11: ffff1ce8bbe88b8c x10: 1f0348adc6bb8584 x9 : ffffdacf67622b3c [ 53.220633] x8 : ffff1ce149e1e550 x7 : 0000000000000000 x6 : 000000000000003f [ 53.220640] x5 : 0000000000000040 x4 : 0000000000000000 x3 : 0000000000000003 [ 53.220646] x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000000 [ 53.220652] Call trace: [ 53.220654] drv_stop+0x1cc/0x1e8 [mac80211] (P) [ 53.220702] ieee80211_stop_device+0x8c/0xa8 [mac80211] [ 53.220751] ieee80211_do_stop+0x644/0x830 [mac80211] [ 53.220798] ieee80211_stop+0x60/0x1b0 [mac80211] [ 53.220845] __dev_close_many+0xbc/0x1f0 [ 53.220857] dev_close_many+0x94/0x160 [ 53.220863] netif_close+0x78/0xa0 [ 53.220868] dev_close+0x3c/0x70 [ 53.220876] cfg80211_shutdown_all_interfaces+0x4c/0x118 [cfg80211] [ 53.220935] wiphy_resume+0xc0/0x200 [cfg80211] [ 53.220985] dpm_run_callback+0x50/0x188 [ 53.220992] device_resume+0xc4/0x1f8 [ 53.220999] async_resume+0x2c/0x50 [ 53.221006] async_run_entry_fn+0x3c/0x160 [ 53.221012] process_one_work+0x158/0x3c8 [ 53.221020] worker_thread+0x2e0/0x418 [ 53.221027] kthread+0x14c/0x230 [ 53.221033] ret_from_fork+0x10/0x20 [ 53.221039] ---[ end trace 0000000000000000 ]--- [ 53.221223] ieee80211 phy0: PM: dpm_run_callback(): wiphy_resume [cfg80211] returns -11 [ 53.221277] ieee80211 phy0: PM: failed to resume async: error -11 [ 53.667179] OOM killer enabled. [ 53.667182] Restarting tasks ... done. [ 53.668270] random: crng reseeded on system resumption [ 53.668317] PM: suspend exit [ 56.804822] ath12k_pci 0004:01:00.0: wmi command 16387 timeout [ 56.804845] ath12k_pci 0004:01:00.0: failed to send WMI_PDEV_SET_PARAM cmd [ 56.804859] ath12k_pci 0004:01:00.0: failed to enable PMF QOS: (-11 [ 56.804872] ath12k_pci 0004:01:00.0: fail to start mac operations in pdev idx 0 ret -11 ... -- Sebastian
Attachment:
signature.asc
Description: PGP signature