Hello, I hit this issue only once on 6.14 kernel. I tried to reproduce it multiple times but with no success. Any idea how I can reproduce the issue in stable way? Mar 24 17:39:15 ceph-testing-0001 kernel: [ 264.740815] run fstests ceph/001 at 2025-03-24 17:39:15 Mar 24 17:39:15 ceph-testing-0001 systemd[1]: Started /usr/bin/bash -c test -w /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec ./tests/ceph/001. <skipped> Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179493] kworker/u48:6: page allocation failure: order:5, mode: 0x40c40(GFP_NOFS|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179526] CPU: 7 UID: 0 PID: 1407 Comm: kworker/u48:6 Not tainte d 6.14.0+ #9 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179531] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179535] Workqueue: writeback wb_workfn (flush-ceph-2) Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179547] Call Trace: Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179551] <TASK> Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179555] dump_stack_lvl+0x76/0xa0 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179565] dump_stack+0x10/0x20 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179568] warn_alloc+0x22a/0x370 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179574] ? __pfx_warn_alloc+0x10/0x10 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179578] ? psi_task_change+0x1b6/0x230 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179586] ? __pfx___alloc_pages_direct_compact+0x10/0x10 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179589] ? psi_memstall_leave+0x15c/0x1a0 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179594] ? __pfx_psi_memstall_leave+0x10/0x10 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179602] __alloc_frozen_pages_noprof+0xf27/0x2230 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179605] ? mutex_unlock+0x80/0xe0 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179613] ? __kasan_check_write+0x14/0x30 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179620] ? __pfx___alloc_frozen_pages_noprof+0x10/0x10 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179624] ? send_request+0x1d3d/0x4870 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179631] ? xas_find_marked+0x353/0xf40 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179638] __alloc_pages_noprof+0x12/0x80 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179641] ___kmalloc_large_node+0x99/0x160 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179647] ? __ceph_allocate_page_array+0x27/0x120 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179655] __kmalloc_large_node_noprof+0x21/0xc0 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179661] __kmalloc_noprof+0x412/0x5e0 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179667] __ceph_allocate_page_array+0x27/0x120 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179671] ? __ceph_allocate_page_array+0x27/0x120 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179676] ceph_writepages_start+0x273f/0x56c0 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179685] ? __pfx_ceph_writepages_start+0x10/0x10 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179690] ? kasan_save_stack+0x3c/0x60 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179696] ? kasan_save_track+0x14/0x40 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179701] ? kasan_save_free_info+0x3b/0x60 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179704] ? __kasan_slab_free+0x54/0x80 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179709] ? ext4_es_free_extent+0x1fb/0x4b0 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179717] ? __es_remove_extent+0x2c0/0x1640 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179721] ? ext4_es_insert_extent+0x40c/0xd20 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179730] ? __kasan_check_write+0x14/0x30 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179734] ? _raw_spin_lock+0x82/0xf0 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179740] do_writepages+0x17b/0x720 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179747] ? __pfx_blk_mq_flush_plug_list+0x10/0x10 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179755] ? __pfx_do_writepages+0x10/0x10 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179758] ? __pfx_ceph_write_inode+0x10/0x10 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179764] ? __kasan_check_write+0x14/0x30 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179767] ? _raw_spin_lock+0x82/0xf0 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179771] ? __pfx__raw_spin_lock+0x10/0x10 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179775] ? __pfx__raw_spin_lock+0x10/0x10 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179779] __writeback_single_inode+0xaa/0x890 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179784] writeback_sb_inodes+0x547/0xe90 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179789] ? __pfx_writeback_sb_inodes+0x10/0x10 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179793] ? __pfx_domain_dirty_avail+0x10/0x10 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179802] ? __pfx_move_expired_inodes+0x10/0x10 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179809] __writeback_inodes_wb+0xba/0x210 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179814] wb_writeback+0x4ee/0x6c0 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179818] ? __pfx_wb_writeback+0x10/0x10 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179823] ? __pfx__raw_spin_lock_irq+0x10/0x10 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179828] wb_workfn+0x5af/0xbf0 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179833] ? __pfx_wb_workfn+0x10/0x10 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179837] ? __pfx___schedule+0x10/0x10 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179841] ? pwq_dec_nr_in_flight+0x227/0xba0 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179846] ? kick_pool+0x184/0x650 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179851] process_one_work+0x5f7/0xfa0 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179854] ? __kasan_check_write+0x14/0x30 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179860] worker_thread+0x779/0x1200 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179863] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179868] ? __pfx_worker_thread+0x10/0x10 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179871] kthread+0x395/0x890 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179877] ? __pfx_kthread+0x10/0x10 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179881] ? __kasan_check_write+0x14/0x30 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179884] ? recalc_sigpending+0x141/0x1e0 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179888] ? _raw_spin_unlock_irq+0xe/0x50 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179892] ? __pfx_kthread+0x10/0x10 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179897] ret_from_fork+0x43/0x90 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179901] ? __pfx_kthread+0x10/0x10 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179905] ret_from_fork_asm+0x1a/0x30 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179912] </TASK> Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179914] Mem-Info: Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179983] active_anon:899623 inactive_anon:73117 isolated_anon:0 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179983] active_file:79686 inactive_file:429756 isolated_file:0 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179983] unevictable:1152 dirty:52527 writeback:16547 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179983] slab_reclaimable:23303 slab_unreclaimable:164687 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179983] mapped:96125 shmem:7470 pagetables:6337 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179983] sec_pagetables:0 bounce:0 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179983] kernel_misc_reclaimable:0 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179983] free:44288 free_pcp:441 free_cma:0 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.179994] Node 0 active_anon:3598500kB inactive_anon:292468kB active_file:318744kB inactive_file:1718972kB unevictable:4608kB isolated(anon):0kB isolated(file):0kB mapped:384500kB dirty:210260kB writeback:66188kB shmem:29880kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB kernel_stack:31456kB pagetables:25348kB sec_pagetables:0kB all_unreclaimable? no Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.180005] Node 0 DMA free:14272kB boost:0kB min:148kB low:184kB high:220kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.180020] lowmem_reserve[]: 0 2991 6810 6810 6810 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.180032] Node 0 DMA32 free:63260kB boost:0kB min:29520kB low:36900kB high:44280kB reserved_highatomic:30720KB active_anon:1558080kB inactive_anon:6288kB active_file:27716kB inactive_file:1107204kB unevictable:0kB writepending:168236kB present:3129204kB managed:3063668kB mlocked:0kB bounce:0kB free_pcp:11076kB local_pcp:0kB free_cma:0kB Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.180047] lowmem_reserve[]: 0 0 3818 3818 3818 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.180059] Node 0 Normal free:97660kB boost:57344kB min:95256kB low:104732kB high:114208kB reserved_highatomic:30720KB active_anon:2040224kB inactive_anon:286180kB active_file:291028kB inactive_file:611180kB unevictable:4608kB writepending:107624kB present:5242880kB managed:3910476kB mlocked:0kB bounce:0kB free_pcp:760kB local_pcp:0kB free_cma:0kB Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.180076] lowmem_reserve[]: 0 0 0 0 0 Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.180087] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 1*64kB (U) 1*128kB (U) 1*256kB (U) 1*512kB (U) 1*1024kB (U) 2*2048kB (UM) 2*4096kB (M) = 14272kB Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.180125] Node 0 DMA32: 75*4kB (UMEH) 696*8kB (UEH) 273*16kB (UMEH) 240*32kB (UMEH) 137*64kB (UMEH) 53*128kB (MEH) 22*256kB (UMEH) 12*512kB (MEH) 3*1024kB (M) 11*2048kB (MH) 0*4096kB = 70844kB Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.180294] Node 0 Normal: 3025*4kB (UMEH) 1483*8kB (UMEH) 586*16kB (UMEH) 373*32kB (UMEH) 167*64kB (UMEH) 265*128kB (UMEH) 19*256kB (UMH) 1*512kB (H) 3*1024kB (H) 0*2048kB 0*4096kB = 98332kB Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.180341] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.180351] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.180355] 516888 total pagecache pages Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.180358] 20 pages in swap cache Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.180360] Free swap = 3982588kB Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.180363] Total swap = 3991548kB Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.180366] 2097019 pages RAM Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.180368] 0 pages HighMem/MovableOnly Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.180371] 349643 pages reserved Mar 24 17:40:27 ceph-testing-0001 kernel: [ 336.180373] 0 pages hwpoisoned As far as I can see, we have issue here: static inline void __ceph_allocate_page_array(struct ceph_writeback_ctl *ceph_wbc, unsigned int max_pages) { ceph_wbc->pages = kmalloc_array(max_pages, sizeof(*ceph_wbc->pages), GFP_NOFS); ^^^^^^^^^^^^^ We try to allocate 16K pointers or 128K memory here. if (!ceph_wbc->pages) { ceph_wbc->from_pool = true; ceph_wbc->pages = mempool_alloc(ceph_wb_pagevec_pool, GFP_NOFS); BUG_ON(!ceph_wbc->pages); } } I assume that it works in majority of cases if no significant memory fragmentation happens. Should we consider kvmalloc_array() here? Do we need to double check how many max_pages we would like to allocate here? Thanks, Slava.