[RFC] page allocation failure in __ceph_allocate_page_array()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I hit this issue only once on 6.14 kernel. I tried to reproduce it multiple
times but with no success. Any idea how I can reproduce the issue in stable way?

Mar 24 17:39:15 ceph-testing-0001 kernel: [  264.740815] run fstests ceph/001 at
2025-03-24 17:39:15
Mar 24 17:39:15 ceph-testing-0001 systemd[1]: Started /usr/bin/bash -c test -w
/proc/self/oom_score_adj && echo
 250 > /proc/self/oom_score_adj; exec ./tests/ceph/001.

<skipped>

Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179493] kworker/u48:6: page
allocation failure: order:5, mode:
0x40c40(GFP_NOFS|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179526] CPU: 7 UID: 0 PID: 1407
Comm: kworker/u48:6 Not tainte
d 6.14.0+ #9
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179531] Hardware name: QEMU
Standard PC (i440FX + PIIX, 1996),
 BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179535] Workqueue: writeback
wb_workfn (flush-ceph-2)
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179547] Call Trace:
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179551]  <TASK>
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179555] 
dump_stack_lvl+0x76/0xa0
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179565]  dump_stack+0x10/0x20
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179568]  warn_alloc+0x22a/0x370
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179574]  ?
__pfx_warn_alloc+0x10/0x10
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179578]  ?
psi_task_change+0x1b6/0x230
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179586]  ?
__pfx___alloc_pages_direct_compact+0x10/0x10
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179589]  ?
psi_memstall_leave+0x15c/0x1a0
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179594]  ?
__pfx_psi_memstall_leave+0x10/0x10
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179602] 
__alloc_frozen_pages_noprof+0xf27/0x2230
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179605]  ?
mutex_unlock+0x80/0xe0
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179613]  ?
__kasan_check_write+0x14/0x30
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179620]  ?
__pfx___alloc_frozen_pages_noprof+0x10/0x10
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179624]  ?
send_request+0x1d3d/0x4870
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179631]  ?
xas_find_marked+0x353/0xf40
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179638] 
__alloc_pages_noprof+0x12/0x80
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179641] 
___kmalloc_large_node+0x99/0x160
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179647]  ?
__ceph_allocate_page_array+0x27/0x120
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179655] 
__kmalloc_large_node_noprof+0x21/0xc0
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179661] 
__kmalloc_noprof+0x412/0x5e0
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179667] 
__ceph_allocate_page_array+0x27/0x120
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179671]  ?
__ceph_allocate_page_array+0x27/0x120
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179676] 
ceph_writepages_start+0x273f/0x56c0
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179685]  ?
__pfx_ceph_writepages_start+0x10/0x10
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179690]  ?
kasan_save_stack+0x3c/0x60
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179696]  ?
kasan_save_track+0x14/0x40
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179701]  ?
kasan_save_free_info+0x3b/0x60
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179704]  ?
__kasan_slab_free+0x54/0x80
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179709]  ?
ext4_es_free_extent+0x1fb/0x4b0
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179717]  ?
__es_remove_extent+0x2c0/0x1640
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179721]  ?
ext4_es_insert_extent+0x40c/0xd20
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179730]  ?
__kasan_check_write+0x14/0x30
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179734]  ?
_raw_spin_lock+0x82/0xf0
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179740] 
do_writepages+0x17b/0x720
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179747]  ?
__pfx_blk_mq_flush_plug_list+0x10/0x10
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179755]  ?
__pfx_do_writepages+0x10/0x10
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179758]  ?
__pfx_ceph_write_inode+0x10/0x10
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179764]  ?
__kasan_check_write+0x14/0x30
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179767]  ?
_raw_spin_lock+0x82/0xf0
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179771]  ?
__pfx__raw_spin_lock+0x10/0x10
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179775]  ?
__pfx__raw_spin_lock+0x10/0x10
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179779] 
__writeback_single_inode+0xaa/0x890
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179784] 
writeback_sb_inodes+0x547/0xe90
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179789]  ?
__pfx_writeback_sb_inodes+0x10/0x10
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179793]  ?
__pfx_domain_dirty_avail+0x10/0x10
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179802]  ?
__pfx_move_expired_inodes+0x10/0x10
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179809] 
__writeback_inodes_wb+0xba/0x210
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179814] 
wb_writeback+0x4ee/0x6c0
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179818]  ?
__pfx_wb_writeback+0x10/0x10
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179823]  ?
__pfx__raw_spin_lock_irq+0x10/0x10
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179828]  wb_workfn+0x5af/0xbf0
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179833]  ?
__pfx_wb_workfn+0x10/0x10
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179837]  ?
__pfx___schedule+0x10/0x10
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179841]  ?
pwq_dec_nr_in_flight+0x227/0xba0
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179846]  ?
kick_pool+0x184/0x650
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179851] 
process_one_work+0x5f7/0xfa0
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179854]  ?
__kasan_check_write+0x14/0x30
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179860] 
worker_thread+0x779/0x1200
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179863]  ?
__pfx__raw_spin_lock_irqsave+0x10/0x10
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179868]  ?
__pfx_worker_thread+0x10/0x10
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179871]  kthread+0x395/0x890
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179877]  ?
__pfx_kthread+0x10/0x10
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179881]  ?
__kasan_check_write+0x14/0x30
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179884]  ?
recalc_sigpending+0x141/0x1e0
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179888]  ?
_raw_spin_unlock_irq+0xe/0x50
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179892]  ?
__pfx_kthread+0x10/0x10
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179897] 
ret_from_fork+0x43/0x90
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179901]  ?
__pfx_kthread+0x10/0x10
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179905] 
ret_from_fork_asm+0x1a/0x30
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179912]  </TASK>
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179914] Mem-Info:
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179983] active_anon:899623
inactive_anon:73117 isolated_anon:0
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179983]  active_file:79686
inactive_file:429756 isolated_file:0
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179983]  unevictable:1152
dirty:52527 writeback:16547
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179983]  slab_reclaimable:23303
slab_unreclaimable:164687
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179983]  mapped:96125
shmem:7470 pagetables:6337
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179983]  sec_pagetables:0
bounce:0
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179983] 
kernel_misc_reclaimable:0
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179983]  free:44288
free_pcp:441 free_cma:0
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.179994] Node 0
active_anon:3598500kB inactive_anon:292468kB active_file:318744kB
inactive_file:1718972kB unevictable:4608kB isolated(anon):0kB isolated(file):0kB
mapped:384500kB dirty:210260kB writeback:66188kB shmem:29880kB shmem_thp:0kB
shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB kernel_stack:31456kB
pagetables:25348kB sec_pagetables:0kB all_unreclaimable? no
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.180005] Node 0 DMA free:14272kB
boost:0kB min:148kB low:184kB high:220kB reserved_highatomic:0KB active_anon:0kB
inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB
writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB
free_pcp:0kB local_pcp:0kB free_cma:0kB
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.180020] lowmem_reserve[]: 0
2991 6810 6810 6810
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.180032] Node 0 DMA32
free:63260kB boost:0kB min:29520kB low:36900kB high:44280kB
reserved_highatomic:30720KB active_anon:1558080kB inactive_anon:6288kB
active_file:27716kB inactive_file:1107204kB unevictable:0kB
writepending:168236kB present:3129204kB managed:3063668kB mlocked:0kB bounce:0kB
free_pcp:11076kB local_pcp:0kB free_cma:0kB
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.180047] lowmem_reserve[]: 0 0
3818 3818 3818
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.180059] Node 0 Normal
free:97660kB boost:57344kB min:95256kB low:104732kB high:114208kB
reserved_highatomic:30720KB active_anon:2040224kB inactive_anon:286180kB
active_file:291028kB inactive_file:611180kB unevictable:4608kB
writepending:107624kB present:5242880kB managed:3910476kB mlocked:0kB bounce:0kB
free_pcp:760kB local_pcp:0kB free_cma:0kB
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.180076] lowmem_reserve[]: 0 0 0
0 0
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.180087] Node 0 DMA: 0*4kB 0*8kB
0*16kB 0*32kB 1*64kB (U) 1*128kB (U) 1*256kB (U) 1*512kB (U) 1*1024kB (U)
2*2048kB (UM) 2*4096kB (M) = 14272kB
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.180125] Node 0 DMA32: 75*4kB
(UMEH) 696*8kB (UEH) 273*16kB (UMEH) 240*32kB (UMEH) 137*64kB (UMEH) 53*128kB
(MEH) 22*256kB (UMEH) 12*512kB (MEH) 3*1024kB (M) 11*2048kB (MH) 0*4096kB =
70844kB
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.180294] Node 0 Normal: 3025*4kB
(UMEH) 1483*8kB (UMEH) 586*16kB (UMEH) 373*32kB (UMEH) 167*64kB (UMEH) 265*128kB
(UMEH) 19*256kB (UMH) 1*512kB (H) 3*1024kB (H) 0*2048kB 0*4096kB = 98332kB
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.180341] Node 0
hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.180351] Node 0
hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.180355] 516888 total pagecache
pages
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.180358] 20 pages in swap cache
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.180360] Free swap  = 3982588kB
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.180363] Total swap = 3991548kB
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.180366] 2097019 pages RAM
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.180368] 0 pages
HighMem/MovableOnly
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.180371] 349643 pages reserved
Mar 24 17:40:27 ceph-testing-0001 kernel: [  336.180373] 0 pages hwpoisoned

As far as I can see, we have issue here:

static inline
void __ceph_allocate_page_array(struct ceph_writeback_ctl *ceph_wbc,
                                unsigned int max_pages)
{
        ceph_wbc->pages = kmalloc_array(max_pages,
                                        sizeof(*ceph_wbc->pages),
                                        GFP_NOFS);
                          ^^^^^^^^^^^^^
We try to allocate 16K pointers or 128K memory here.

        if (!ceph_wbc->pages) {
                ceph_wbc->from_pool = true;
                ceph_wbc->pages = mempool_alloc(ceph_wb_pagevec_pool, GFP_NOFS);
                BUG_ON(!ceph_wbc->pages);
        }
}

I assume that it works in majority of cases if no significant memory
fragmentation happens. Should we consider kvmalloc_array() here? Do we need to
double check how many max_pages we would like to allocate here?

Thanks,
Slava.





[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux