Re: Excessive page cache occupies DMA32 memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Adding ath/mhi and dma API developers to the discussion.

On 7/22/25 10:32 AM, Greg KH wrote:
> On Mon, Jul 21, 2025 at 06:13:10PM +0100, Matthew Wilcox wrote:
>> On Mon, Jul 21, 2025 at 08:03:12PM +0500, Muhammad Usama Anjum wrote:
>>> Hello,
>>>
>>> When 10-12GB our of total 16GB RAM is being used as page cache
>>> (active_file + inactive_file) at suspend time, the drivers fail to allocate
>>> dma memory at resume as dma memory is either occupied by the page cache or
>>> fragmented. Example:
>>>
>>> kworker/u33:5: page allocation failure: order:7, mode:0xc04(GFP_NOIO|GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0
>>
>> Just to be clear, this is not a page cache problem.  The driver is asking
>> us to do a 512kB allocation without doing I/O!  This is a ridiculous
>> request that should be expected to fail.
>>
>> The solution, whatever it may be, is not related to the page cache.
>> I reject your diagnosis.  Almost all of the page cache is clean and
>> could be dropped (as far as I can tell from the output below).
>>
>> Now, I'm not too familiar with how the page allocator chooses to fail
>> this request.  Maybe it should be trying harder to drop bits of the page
>> cache.  Maybe it should be doing some compaction. 
That's very thoughtful. I'll look at the page allocator why isn't it dropping
cache or doing compaction.

>> I am not inclined to
>> go digging on your behalf, because frankly I'm offended by the suggestion
>> that the page cache is at fault.
I apologize—that wasn't my intention.

>>
>> Perhaps somebody else will help you, or you can dig into this yourself.
> 
> I'm with Matthew, this really looks like a driver bug somehow.  If there
> is page cache memory that is "clean", the driver should be able to
> access it just fine if really required.
> 
> What exact driver(s) is having this problem?  What is the exact error,
> and on what lines of code?
The issue occurs on both ath11k and mhi drivers during resume, when
dma_alloc_coherent(GFP_KERNEL) fails and returns -ENOMEM. This failure has
been observed at multiple points in these drivers.

For example, in the mhi driver, the failure is triggered when the
MHI's st_worker gets scheduled-in at resume.

mhi_pm_st_worker()
-> mhi_fw_load_handler()
   -> mhi_load_image_bhi()
      -> mhi_alloc_bhi_buffer()
         -> dma_alloc_coherent(GFP_KERNEL) returns -ENOMEM


Thank you,
- Usama





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux