On Thu 03-04-25 14:58:25, Vlastimil Babka wrote: > On 4/3/25 14:29, Matt Fleming wrote: > > On Wed, Mar 26, 2025 at 10:59 AM Matt Fleming <matt@xxxxxxxxxxxxxxxx> wrote: > >> > >> Hi there, > > + Cc also Michal > > >> I'm also seeing this PF_MEMALLOC WARN triggered from kswapd in 6.12.19. > > We're talking about __alloc_pages_slowpath() doing WARN_ON_ONCE(current- > >flags & PF_MEMALLOC); for __GFP_NOFAIL allocations. > > kswapd() sets: > > tsk->flags |= PF_MEMALLOC | PF_KSWAPD; > > so any __GFP_NOFAIL allocation done in the kswapd context risks this > warning. It's also objectively bad IMHO because for direct reclaim we can > loop and hope kswapd rescues us, but kswapd would then have to rely on > direct reclaimers to get unstuck. I don't see an easy generic solution? Right. I do not think NOFAIL request from the reclaim context is really something we can commit to support. This really needs to be addressed on the shrinker side. > >> Does overlayfs need some kind of background inode reclaim support? > > > > Hey everyone, I know there was some off-list discussion last week at > > LSFMM, but I don't think a definite solution has been proposed for the > > below stacktrace. > > > > What is the shrinker API policy wrt memory allocation and I/O? Should > > overlayfs do something more like XFS and background reclaim to avoid > > GFP_NOFAIL > > allocations when kswapd is shrinking caches? > > > >> Call Trace: > >> <TASK> > >> __alloc_pages_noprof+0x31c/0x330 > >> alloc_pages_mpol_noprof+0xe3/0x1d0 > >> folio_alloc_noprof+0x5b/0xa0 > >> __filemap_get_folio+0x1f3/0x380 > >> __getblk_slow+0xa3/0x1e0 > >> __ext4_get_inode_loc+0x121/0x4b0 > >> ext4_get_inode_loc+0x40/0xa0 > >> ext4_reserve_inode_write+0x39/0xc0 > >> __ext4_mark_inode_dirty+0x5b/0x220 > >> ext4_evict_inode+0x26d/0x690 > >> evict+0x112/0x2a0 > >> __dentry_kill+0x71/0x180 > >> dput+0xeb/0x1b0 > >> ovl_stack_put+0x2e/0x50 [overlay] > >> ovl_destroy_inode+0x3a/0x60 [overlay] > >> destroy_inode+0x3b/0x70 > >> __dentry_kill+0x71/0x180 > >> shrink_dentry_list+0x6b/0xe0 > >> prune_dcache_sb+0x56/0x80 > >> super_cache_scan+0x12c/0x1e0 > >> do_shrink_slab+0x13b/0x350 > >> shrink_slab+0x278/0x3a0 > >> shrink_node+0x328/0x880 > >> balance_pgdat+0x36d/0x740 > >> kswapd+0x1f0/0x380 > >> kthread+0xd2/0x100 > >> ret_from_fork+0x34/0x50 > >> ret_from_fork_asm+0x1a/0x30 > >> </TASK> > > -- Michal Hocko SUSE Labs