Re: scheduling while atomic on rc3 - migration + buffer heads

Raghavendra K T <raghavendra.kt@xxxxxxx> · Mon, 21 Apr 2025 21:17:18 +0530

On 4/21/2025 8:44 PM, Kent Overstreet wrote:

+Qu as I see similar report from him

This just popped up in one of my test runs.

Given that it's buffer heads, it has to be the ext4 root filesystem, not
bcachefs.

00465 ========= TEST   lz4_buffered
00465
00465 WATCHDOG 360
00466 bcachefs (vdb): starting version 1.25: extent_flags opts=errors=panic,compression=lz4
00466 bcachefs (vdb): initializing new filesystem
00466 bcachefs (vdb): going read-write
00466 bcachefs (vdb): marking superblocks
00466 bcachefs (vdb): initializing freespace
00466 bcachefs (vdb): done initializing freespace
00466 bcachefs (vdb): reading snapshots table
00466 bcachefs (vdb): reading snapshots done
00466 bcachefs (vdb): done starting filesystem
00466 starting copy
00515 BUG: sleeping function called from invalid context at mm/util.c:743
00515 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 120, name: kcompactd0
00515 preempt_count: 1, expected: 0
00515 RCU nest depth: 0, expected: 0
00515 1 lock held by kcompactd0/120:
00515  #0: ffffff80c0c558f0 (&mapping->i_private_lock){+.+.}-{3:3}, at: __buffer_migrate_folio+0x114/0x298
00515 Preemption disabled at:
00515 [<ffffffc08025fa84>] __buffer_migrate_folio+0x114/0x298
00515 CPU: 11 UID: 0 PID: 120 Comm: kcompactd0 Not tainted 6.15.0-rc3-ktest-gb2a78fdf7d2f #20530 PREEMPT
00515 Hardware name: linux,dummy-virt (DT)
00515 Call trace:
00515  show_stack+0x1c/0x30 (C)
00515  dump_stack_lvl+0xb0/0xc0
00515  dump_stack+0x14/0x20
00515  __might_resched+0x180/0x288
00515  folio_mc_copy+0x54/0x98
00515  __migrate_folio.isra.0+0x68/0x168
00515  __buffer_migrate_folio+0x280/0x298
00515  buffer_migrate_folio_norefs+0x18/0x28
00515  migrate_pages_batch+0x94c/0xeb8
00515  migrate_pages_sync+0x84/0x240
00515  migrate_pages+0x284/0x698
00515  compact_zone+0xa40/0x10f8
00515  kcompactd_do_work+0x204/0x498
00515  kcompactd+0x3c4/0x400
00515  kthread+0x13c/0x208
00515  ret_from_fork+0x10/0x20
00518 starting sync
00519 starting rm
00520 ========= FAILED TIMEOUT lz4_buffered in 360s

I have also seen similar stack with folio_mc_copy() while testing
PTE A bit patches.

IIUC, it has something to do with cond_resched() called from
folio_mc_copy().

(Thomas (tglx) mentioned long back that cond_resched() does not have the
scope awareness), not sure where should the fix be done in these
cases..

(I mean caller of the migrate_folio should call with no spinlock held
but with mutex? )

Regards
- Raghu