On 4/9/25 9:35 AM, Michal Hocko wrote: > On Thu 03-04-25 21:51:46, Michal Hocko wrote: >> Add Andrew > > Andrew, do you want me to repost the patch or can you take it from this > email thread? I'll take it as it's now all in mm/slub.c >> Also, Dave do you want me to redirect xlog_cil_kvmalloc to kvmalloc or >> do you preffer to do that yourself? >> >> On Thu 03-04-25 09:43:41, Michal Hocko wrote: >>> There are users like xfs which need larger allocations with NOFAIL >>> sementic. They are not using kvmalloc currently because the current >>> implementation tries too hard to allocate through the kmalloc path >>> which causes a lot of direct reclaim and compaction and that hurts >>> performance a lot (see 8dc9384b7d75 ("xfs: reduce kvmalloc overhead for >>> CIL shadow buffers") for more details). >>> >>> kvmalloc does support __GFP_RETRY_MAYFAIL semantic to express that >>> kmalloc (physically contiguous) allocation is preferred and we should go >>> more aggressive to make it happen. There is currently no way to express >>> that kmalloc should be very lightweight and as it has been argued [1] >>> this mode should be default to support kvmalloc(NOFAIL) with a >>> lightweight kmalloc path which is currently impossible to express as >>> __GFP_NOFAIL cannot be combined by any other reclaim modifiers. >>> >>> This patch makes all kmalloc allocations GFP_NOWAIT unless >>> __GFP_RETRY_MAYFAIL is provided to kvmalloc. This allows to support both >>> fail fast and retry hard on physically contiguous memory with vmalloc >>> fallback. >>> >>> There is a potential downside that relatively small allocations (smaller >>> than PAGE_ALLOC_COSTLY_ORDER) could fallback to vmalloc too easily and >>> cause page block fragmentation. We cannot really rule that out but it >>> seems that xlog_cil_kvmalloc use doesn't indicate this to be happening. >>> >>> [1] https://lore.kernel.org/all/Z-3i1wATGh6vI8x8@xxxxxxxxxxxxxxxxxxx/T/#u >>> Signed-off-by: Michal Hocko <mhocko@xxxxxxxx> >>> --- >>> mm/slub.c | 8 +++++--- >>> 1 file changed, 5 insertions(+), 3 deletions(-) >>> >>> diff --git a/mm/slub.c b/mm/slub.c >>> index b46f87662e71..2da40c2f6478 100644 >>> --- a/mm/slub.c >>> +++ b/mm/slub.c >>> @@ -4972,14 +4972,16 @@ static gfp_t kmalloc_gfp_adjust(gfp_t flags, size_t size) >>> * We want to attempt a large physically contiguous block first because >>> * it is less likely to fragment multiple larger blocks and therefore >>> * contribute to a long term fragmentation less than vmalloc fallback. >>> - * However make sure that larger requests are not too disruptive - no >>> - * OOM killer and no allocation failure warnings as we have a fallback. >>> + * However make sure that larger requests are not too disruptive - i.e. >>> + * do not direct reclaim unless physically continuous memory is preferred >>> + * (__GFP_RETRY_MAYFAIL mode). We still kick in kswapd/kcompactd to start >>> + * working in the background but the allocation itself. >>> */ >>> if (size > PAGE_SIZE) { >>> flags |= __GFP_NOWARN; >>> >>> if (!(flags & __GFP_RETRY_MAYFAIL)) >>> - flags |= __GFP_NORETRY; >>> + flags &= ~__GFP_DIRECT_RECLAIM; >>> >>> /* nofail semantic is implemented by the vmalloc fallback */ >>> flags &= ~__GFP_NOFAIL; >>> -- >>> 2.49.0 >>> >> >> -- >> Michal Hocko >> SUSE Labs >