On Fri, Aug 1, 2025 at 2:58 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: > > On 8/1/25 04:46, Mike Galbraith wrote: > > On Thu, 2025-07-31 at 20:41 +0200, Vlastimil Babka wrote: > >> On 7/31/25 20:34, Frank van der Linden wrote: > >> > Not sure what the right thing to do would be. Either explicitly boost > >> > the priority of a thread temporarily during migrate_pages_batch, or > >> > mitigate the issue by dealing with 'busy' pages more quickly in > >> > migrate_pages_batch. > >> > >> There's a workaround for realtime tasks. If you mlock[all]() their memory, > >> setting sysctl vm.compact_unevictable_allowed to 0 should exclude these > >> pages from migration by compaction. > > > > Hm, per documentation that's done automatically for PREEMPT_RT... > > Oh I see. > > > On CONFIG_PREEMPT_RT the default value is 0 in order to avoid a page fault, due > > to compaction, which would block the task from becoming active until the fault > > is resolved. > > So it's probably the mlock() part missing since that should otherwise apply > to kcompactd. > > > ...but rummaging, seems other stuff can step on it (contiguous alloc?). > > Yeah, there was time CMA was just something for mobile phone hardware. As > usage increases beyond that maybe we'll have to tackle it. Ideally by not > having mlock'd pages in CMA areas at all. And if contiguous alloc is > attempted outside of CMA areas, respect the sysctl there too. > > There are also things like mbind() migrating pages for NUMA locality but I > assume people just wouldn't try to do that with realtime workloads. > Another idea is to minimize the time that a migration PTE is in place for an mlocked page, Hugh (cc-ed) mentioned this in an offline discussion. E.g. skip any mlocked pages in the first pass, and just add them to a list. Then, do that list separately, but do them one by one. There is somewhat similar logic in migrate_pages_sync for pages that might need extra work / locking. Not sure if avoiding mlocked pages in CMA would work out. I mean, it's not hard to implement, as it would be pretty much the same as for pin_user_pages: just move them out of CMA on mlock. I'm just a bit worried of scenarios where the kernel might run out of space for unmovable allocations if you have a larger amount of CMA, which would be made worse by moving more allocations out of CMA. Then again, the amount of mlocked memory is probably generally small. - Frank