On Fri, Apr 25, 2025 at 04:41:43PM +0500, Muhammad Usama Anjum wrote: > On 4/25/25 1:59 PM, Manivannan Sadhasivam wrote: > > On Fri, Apr 25, 2025 at 12:42:38PM +0500, Muhammad Usama Anjum wrote: > >> On 4/25/25 12:32 PM, Manivannan Sadhasivam wrote: > >>> On Fri, Apr 25, 2025 at 12:14:39PM +0500, Muhammad Usama Anjum wrote: > >>>> On 4/25/25 12:04 PM, Manivannan Sadhasivam wrote: > >>>>> On Thu, Apr 10, 2025 at 07:56:54PM +0500, Muhammad Usama Anjum wrote: > >>>>>> Fix dma_direct_alloc() failure at resume time during bhie_table > >>>>>> allocation. There is a crash report where at resume time, the memory > >>>>>> from the dma doesn't get allocated and MHI fails to re-initialize. > >>>>>> There may be fragmentation of some kind which fails the allocation > >>>>>> call. > >>>>>> > >>>>> > >>>>> If dma_direct_alloc() fails, then it is a platform limitation/issue. We cannot > >>>>> workaround that in the device drivers. What is the guarantee that other drivers > >>>>> will also continue to work? Will you go ahead and patch all of them which > >>>>> release memory during suspend? > >>>>> > >>>>> Please investigate why the allocation fails. Even this is not a device issue, so > >>>>> we cannot add quirks :/ > >>>> This isn't a platform specific quirk. We are only hitting it because > >>>> there is high memory pressure during suspend/resume. This dma allocation > >>>> failure can happen with memory pressure on any device. > >>>> > >>> > >>> Yes. > >> Thanks for understanding. > >> > >>> > >>>> The purpose of this patch is just to make driver more robust to memory > >>>> pressure during resume. > >>>> > >>>> I'm not sure about MHI. But other drivers already have such patches as > >>>> dma_direct_alloc() is susceptible to failures when memory pressure is > >>>> high. This patch was motivated from ath12k [1] and ath11k [2]. > >>>> > >>> > >>> Even if we patch the MHI driver, the issue is going to trip some other driver. > >>> How does the DMA memory goes low during resume? So some other driver is > >>> consuming more than it did during probe()? > >> Think it like this. The first probe happens just after boot. Most of the > >> RAM was empty. Then let's say user launches applications which not only > >> consume entire RAM but also the Swap. The DMA memory area is the first > >> ~4GB on x86_64 (if I'm not mistaken). Now at resume time when we want to > >> allocate memory from dma, it may not be available entirely or because of > >> fragmentation we cannot allocate that much contiguous memory. > >> > > > > Looks like you have a workload that consumes the limited DMA coherent memory. > > Most likely the GPU applications I think. > > > >> In our testing and real world cases, right now only wifi driver is > >> misbehaving. Wifi is also very important. So we are hoping to make wifi > >> driver robust. > >> > > > > Sounds fair. If you want to move forward, please modify the exisiting > > mhi_power_down_keep_dev() to include this partial unprepare as well: > > > > mhi_power_down_unprepare_keep_dev() > > > > Since both APIs are anyway going to be used together, I don't see a need to > > introduce yet another API. > I've looked at usages of mhi_power_down_keep_dev(). Its getting used by > ath12k and ath11k both. We would have to look at ath12k as well before > we can change mhi_power_down_keep_dev(). Unfortunately, I don't have > device using ath12k at hand. > ath12k conversion looks trivial. So please go ahead with this new API conversion for that driver as well. - Mani -- மணிவண்ணன் சதாசிவம்