Re: [syzbot] [mm?] WARNING in xfs_init_fs_context

Vlastimil Babka <vbabka@xxxxxxx> · Mon, 7 Jul 2025 18:57:05 +0200

On 7/4/25 10:26, Harry Yoo wrote:
> On Wed, Jul 02, 2025 at 09:30:30AM +0200, Vlastimil Babka wrote:
>> +CC xfs and few more
>> 
>> On 7/2/25 3:41 AM, Tetsuo Handa wrote:
>> > On 2025/07/02 0:01, Zi Yan wrote:
>> >>>  __alloc_frozen_pages_noprof+0x319/0x370 mm/page_alloc.c:4972
>> >>>  alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2419
>> >>>  alloc_slab_page mm/slub.c:2451 [inline]
>> >>>  allocate_slab+0xe2/0x3b0 mm/slub.c:2627
>> >>>  new_slab mm/slub.c:2673 [inline]
>> >>
>> >> new_slab() allows __GFP_NOFAIL, since GFP_RECLAIM_MASK has it.
>> >> In allocate_slab(), the first allocation without __GFP_NOFAIL
>> >> failed, the retry used __GFP_NOFAIL but kmem_cache order
>> >> was greater than 1, which led to the warning above.
>> >>
>> >> Maybe allocate_slab() should just fail when kmem_cache
>> >> order is too big and first trial fails? I am no expert,
>> >> so add Vlastimil for help.
>> 
>> Thanks Zi. Slab shouldn't fail with __GFP_NOFAIL, that would only lead
>> to subsystems like xfs to reintroduce their own forever retrying
>> wrappers again. I think it's going the best it can for the fallback
>> attempt by using the minimum order, so the warning will never happen due
>> to the calculated optimal order being too large, but only if the
>> kmalloc()/kmem_cache_alloc() requested/object size is too large itself.
> 
> Right. The warning would trigger only if the object size is bigger
> than 8k (PAGE_SIZE * 2).
> 
>> Hm but perhaps enabling slab_debug can inflate it over the threshold, is
>> it the case here?
> 
> CONFIG_CMDLINE="earlyprintk=serial net.ifnames=0 sysctl.kernel.hung_task_all_cpu_backtrace=1 ima_policy=tcb nf-conntrack-ftp.ports=20000 nf-conntrack-tftp.ports=20000 nf-conntrack-sip.ports=20000 nf-conntrack-irc.ports=20000 nf-conntrack-sane.ports=20000 binder.debug_mask=0 rcupdate.rcu_expedited=1 rcupdate.rcu_cpu_stall_cputime=1 no_hash_pointers page_owner=on sysctl.vm.nr_hugepages=4 sysctl.vm.nr_overcommit_hugepages=4 secretmem.enable=1 sysctl.max_rcu_stall_to_panic=1 msr.allow_writes=off coredump_filter=0xffff root=/dev/sda console=ttyS0 vsyscall=native numa=fake=2 kvm-intel.nested=1 spec_store_bypass_disable=prctl nopcid vivid.n_devs=64 vivid.multiplanar=1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2 netrom.nr_ndevs=32 rose.rose_ndevs=32 smp.csd_lock_timeout=100000 watchdog_thresh=55 workqueue.watchdog_thresh=140 sysctl.net.core.netdev_unregister_timeout_secs=140 dummy_hcd.num=32 max_loop=32 nbds_max=32 panic_on_warn=1"
> 
> CONFIG_SLUB_DEBUG=y
> # CONFIG_SLUB_DEBUG_ON is not set
> 
> It seems no slab_debug is involved here.
> 
> I downloaded the config and built the kernel, and
> sizeof(struct xfs_mount) is 4480 bytes. It should have allocated using
> order 1?

So it should be the kmalloc-8k cache, its min order should be get_order(8k)
thus 1. If the object was larger than 8k it would be a large kmalloc anyway
and also trigger the __GFP_NOFAIL warning but with a different stacktrace.

> Not sure why the min order was greater than 1?
> Not sure what I'm missing...

The only sane explanation is that slab debugging is enabled but not via
CONFIG_CMDLINE but via options passed to the qemu execution? But I don't see
those, nor the full dmesg (that would report them) in the syzbot dashboard.

Hm or actually it might be kasan_cache_create() bumping our size when called
from calculate_sizes(). KASAN seems enabled...

>> I think in that rare case we could convert such
>> fallback allocations to large kmalloc to avoid adding the debugging
>> overhead - we can't easily create an individual slab page without the
>> debugging layout for a kmalloc cache with debugging enabled.
> 
> Yeah that can be doable when the size is exactly 8k or very close to 8k.
>