On 7/8/25 00:10, Dave Chinner wrote: > On Wed, Jul 02, 2025 at 09:30:30AM +0200, Vlastimil Babka wrote: >> On 7/2/25 3:41 AM, Tetsuo Handa wrote: >> > By the way, why is xfs_init_fs_context() using __GFP_NOFAIL ? >> > >> > mp = kzalloc(sizeof(struct xfs_mount), GFP_KERNEL | __GFP_NOFAIL); >> > if (!mp) >> > return -ENOMEM; >> > >> > This looks an allocation attempt which can fail safely. > > It's irrelevant - it shouldn't fail regardless of __GFP_NOFAIL being > specified. If you mean the "too small to fail" behavior then it's generally true, except in some corner cases like being an oom victim, in which case the allocation can fail - the userspace process is doomed anyway. But a (small) kernel allocation not handling NULL would still need __GFP_NOFAIL to prevent that corner case. >> Indeed. Dave Chinner's commit f078d4ea82760 ("xfs: convert kmem_alloc() >> to kmalloc()") dropped the xfs wrapper. This allocation didn't use >> KM_MAYFAIL so it got __GFP_NOFAIL. The commit mentions this high-order >> nofail issue for another allocation site that had to use xlog_kvmalloc(). > > I don't see how high-order allocation behaviour is relevant here. > > Pahole says the struct xfs_mount is 4224 bytes in length. It is an > order-1 allocation and if we've fragmented memory so badly that slab > can't allocate an order-1 page then *lots* of other stuff is going > to be stalling. (e.g. slab pages for inodes are typically order-3, > same as the kmalloc-8kk slab). Elsewhere in this thread we figured it out since I wrote the quoted reply. 4224 bytes means kmalloc-8k where the fallback allocation (the one that passes on the __GFP_NOFAIL) order is 1 normally. But due to KASAN enabled its metadata means the per-object size goes above 8k and thus the fallback order will be 2. It's a corner case that wasn't anticipated and existed for years without known reports. We'll need to deal with it somehow. > Note that the size of the structure is largely because of the > embedded cpumask for inodegc: > > struct cpumask m_inodegc_cpumask; /* 3104 1024 */ > > This should probably be pulled out into a dynamically allocated > inodegc specific structure. Then the struct xfs_mount is only a > order-0 allocation and should never fail, regardless of > __GFP_NOFAIL being specified or not. > >> I think either this allocation really can fail as the code (return >> -ENOMEM) suggests and thus can drop __GFP_NOFAIL, or it can use >> kvmalloc() - I think the wrapper for that can be removed now too after >> the discussion in [1] resulted in commit 46459154f997 ("mm: kvmalloc: >> make kmalloc fast path real fast path"). > > I know about that - I have patches that I'm testing that replace > xlog_kvmalloc() with kvmalloc calls. Great, thanks! > -Dave.