On 4/21/2025 8:44 PM, Kent Overstreet wrote: +Qu as I see similar report from him
This just popped up in one of my test runs. Given that it's buffer heads, it has to be the ext4 root filesystem, not bcachefs. 00465 ========= TEST lz4_buffered 00465 00465 WATCHDOG 360 00466 bcachefs (vdb): starting version 1.25: extent_flags opts=errors=panic,compression=lz4 00466 bcachefs (vdb): initializing new filesystem 00466 bcachefs (vdb): going read-write 00466 bcachefs (vdb): marking superblocks 00466 bcachefs (vdb): initializing freespace 00466 bcachefs (vdb): done initializing freespace 00466 bcachefs (vdb): reading snapshots table 00466 bcachefs (vdb): reading snapshots done 00466 bcachefs (vdb): done starting filesystem 00466 starting copy 00515 BUG: sleeping function called from invalid context at mm/util.c:743 00515 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 120, name: kcompactd0 00515 preempt_count: 1, expected: 0 00515 RCU nest depth: 0, expected: 0 00515 1 lock held by kcompactd0/120: 00515 #0: ffffff80c0c558f0 (&mapping->i_private_lock){+.+.}-{3:3}, at: __buffer_migrate_folio+0x114/0x298 00515 Preemption disabled at: 00515 [<ffffffc08025fa84>] __buffer_migrate_folio+0x114/0x298 00515 CPU: 11 UID: 0 PID: 120 Comm: kcompactd0 Not tainted 6.15.0-rc3-ktest-gb2a78fdf7d2f #20530 PREEMPT 00515 Hardware name: linux,dummy-virt (DT) 00515 Call trace: 00515 show_stack+0x1c/0x30 (C) 00515 dump_stack_lvl+0xb0/0xc0 00515 dump_stack+0x14/0x20 00515 __might_resched+0x180/0x288 00515 folio_mc_copy+0x54/0x98 00515 __migrate_folio.isra.0+0x68/0x168 00515 __buffer_migrate_folio+0x280/0x298 00515 buffer_migrate_folio_norefs+0x18/0x28 00515 migrate_pages_batch+0x94c/0xeb8 00515 migrate_pages_sync+0x84/0x240 00515 migrate_pages+0x284/0x698 00515 compact_zone+0xa40/0x10f8 00515 kcompactd_do_work+0x204/0x498 00515 kcompactd+0x3c4/0x400 00515 kthread+0x13c/0x208 00515 ret_from_fork+0x10/0x20 00518 starting sync 00519 starting rm 00520 ========= FAILED TIMEOUT lz4_buffered in 360s
I have also seen similar stack with folio_mc_copy() while testing PTE A bit patches. IIUC, it has something to do with cond_resched() called from folio_mc_copy(). (Thomas (tglx) mentioned long back that cond_resched() does not have the scope awareness), not sure where should the fix be done in these cases.. (I mean caller of the migrate_folio should call with no spinlock held but with mutex? ) Regards - Raghu