Adding ath/mhi and dma API developers to the discussion. On 7/22/25 10:32 AM, Greg KH wrote: > On Mon, Jul 21, 2025 at 06:13:10PM +0100, Matthew Wilcox wrote: >> On Mon, Jul 21, 2025 at 08:03:12PM +0500, Muhammad Usama Anjum wrote: >>> Hello, >>> >>> When 10-12GB our of total 16GB RAM is being used as page cache >>> (active_file + inactive_file) at suspend time, the drivers fail to allocate >>> dma memory at resume as dma memory is either occupied by the page cache or >>> fragmented. Example: >>> >>> kworker/u33:5: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0 >> >> Just to be clear, this is not a page cache problem. The driver is asking >> us to do a 512kB allocation without doing I/O! This is a ridiculous >> request that should be expected to fail. >> >> The solution, whatever it may be, is not related to the page cache. >> I reject your diagnosis. Almost all of the page cache is clean and >> could be dropped (as far as I can tell from the output below). >> >> Now, I'm not too familiar with how the page allocator chooses to fail >> this request. Maybe it should be trying harder to drop bits of the page >> cache. Maybe it should be doing some compaction. That's very thoughtful. I'll look at the page allocator why isn't it dropping cache or doing compaction. >> I am not inclined to >> go digging on your behalf, because frankly I'm offended by the suggestion >> that the page cache is at fault. I apologize—that wasn't my intention. >> >> Perhaps somebody else will help you, or you can dig into this yourself. > > I'm with Matthew, this really looks like a driver bug somehow. If there > is page cache memory that is "clean", the driver should be able to > access it just fine if really required. > > What exact driver(s) is having this problem? What is the exact error, > and on what lines of code? The issue occurs on both ath11k and mhi drivers during resume, when dma_alloc_coherent(GFP_KERNEL) fails and returns -ENOMEM. This failure has been observed at multiple points in these drivers. For example, in the mhi driver, the failure is triggered when the MHI's st_worker gets scheduled-in at resume. mhi_pm_st_worker() -> mhi_fw_load_handler() -> mhi_load_image_bhi() -> mhi_alloc_bhi_buffer() -> dma_alloc_coherent(GFP_KERNEL) returns -ENOMEM Thank you, - Usama