On Fri, Apr 25, 2025 at 12:42:38PM +0500, Muhammad Usama Anjum wrote: > On 4/25/25 12:32 PM, Manivannan Sadhasivam wrote: > > On Fri, Apr 25, 2025 at 12:14:39PM +0500, Muhammad Usama Anjum wrote: > >> On 4/25/25 12:04 PM, Manivannan Sadhasivam wrote: > >>> On Thu, Apr 10, 2025 at 07:56:54PM +0500, Muhammad Usama Anjum wrote: > >>>> Fix dma_direct_alloc() failure at resume time during bhie_table > >>>> allocation. There is a crash report where at resume time, the memory > >>>> from the dma doesn't get allocated and MHI fails to re-initialize. > >>>> There may be fragmentation of some kind which fails the allocation > >>>> call. > >>>> > >>> > >>> If dma_direct_alloc() fails, then it is a platform limitation/issue. We cannot > >>> workaround that in the device drivers. What is the guarantee that other drivers > >>> will also continue to work? Will you go ahead and patch all of them which > >>> release memory during suspend? > >>> > >>> Please investigate why the allocation fails. Even this is not a device issue, so > >>> we cannot add quirks :/ > >> This isn't a platform specific quirk. We are only hitting it because > >> there is high memory pressure during suspend/resume. This dma allocation > >> failure can happen with memory pressure on any device. > >> > > > > Yes. > Thanks for understanding. > > > > >> The purpose of this patch is just to make driver more robust to memory > >> pressure during resume. > >> > >> I'm not sure about MHI. But other drivers already have such patches as > >> dma_direct_alloc() is susceptible to failures when memory pressure is > >> high. This patch was motivated from ath12k [1] and ath11k [2]. > >> > > > > Even if we patch the MHI driver, the issue is going to trip some other driver. > > How does the DMA memory goes low during resume? So some other driver is > > consuming more than it did during probe()? > Think it like this. The first probe happens just after boot. Most of the > RAM was empty. Then let's say user launches applications which not only > consume entire RAM but also the Swap. The DMA memory area is the first > ~4GB on x86_64 (if I'm not mistaken). Now at resume time when we want to > allocate memory from dma, it may not be available entirely or because of > fragmentation we cannot allocate that much contiguous memory. > Looks like you have a workload that consumes the limited DMA coherent memory. Most likely the GPU applications I think. > In our testing and real world cases, right now only wifi driver is > misbehaving. Wifi is also very important. So we are hoping to make wifi > driver robust. > Sounds fair. If you want to move forward, please modify the exisiting mhi_power_down_keep_dev() to include this partial unprepare as well: mhi_power_down_unprepare_keep_dev() Since both APIs are anyway going to be used together, I don't see a need to introduce yet another API. - Mani > > > >> [1] > >> https://lore.kernel.org/all/20240419034034.2842-1-quic_bqiang@xxxxxxxxxxx/ > >> [2] > >> https://lore.kernel.org/all/20220506141448.10340-1-quic_akolli@xxxxxxxxxxx/ > >> > >> What do you think can be the way forward for this patch? > >> > > > > Let's try first to analyze why the memory pressure happens during suspend. As I > > can see, even if we fix the MHI driver, you are likely to hit this issue > > somewhere else.> > > - Mani > > > >>> > > > > [...] > > > >>> Did you intend to leak this information? If not, please remove it from > >>> stacktrace. > >> The device isn't private. Its fine. > >> > > > > Okay. > > > > - Mani > > > > > -- > Regards, > Usama -- மணிவண்ணன் சதாசிவம்