On Thu, Jul 24 2025 at 23:24, Thomas Gleixner wrote: > On Thu, Jul 24 2025 at 13:55, Waiman Long wrote: > So lets look at that call chain further > > vt_console_print() > hide_cursor() > vc->vc_sw->con_cursor(vc, false); --> fbcon_cursor() > > fbcon_cursor() > if (vc->vc_cursor_type & CUR_SW) > fbcon_del_cursor_work(info) > cancel_delayed_work_sync(&ops->cursor_work); > > Here we are. It's actually very simple to reproduce in a VM. No wireless, no suspend/resume, no special hardware required. Enable framebuffer and framebuffer console. Add console=tty0 to the kernel command line. Log into FB console and do # echo -e '\033[?17;0;64c' which enables the software cursor, which in turn enables the above conditional invocation of fbcon_del_cursor_work(). Then force a printk # echo h >/proc/sysrq-trigger and watch the show. [ 1406.245343] BUG: sleeping function called from invalid context at kernel/workqueue.c:4359 [ 1406.245345] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1819, name: bash [ 1406.245346] preempt_count: 2, expected: 0 [ 1406.245347] RCU nest depth: 1, expected: 0 [ 1406.245348] 7 locks held by bash/1819: [ 1406.245349] #0: ffff9a730c4ba428 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0x73/0xf0 [ 1406.245444] #1: ffffffffa7ade120 (rcu_read_lock){....}-{1:3}, at: __handle_sysrq+0x3d/0x270 [ 1406.245457] #2: ffffffffa7adb500 (console_lock){+.+.}-{0:0}, at: _printk+0x63/0x80 [ 1406.245464] #3: ffffffffa7adb550 (console_srcu){....}-{0:0}, at: console_flush_all+0x3c/0x530 [ 1406.245470] #4: ffffffffa7a6b140 (console_owner){....}-{0:0}, at: console_lock_spinning_enable+0x42/0x70 [ 1406.245475] #5: ffffffffa7a6b0c0 (printk_legacy_map-wait-type-override){....}-{4:4}, at: console_flush_all+0x2fe/0x530 [ 1406.245480] #6: ffffffffa7b8e4d8 (printing_lock){....}-{3:3}, at: vt_console_print+0x5b/0x490 [ 1406.245487] irq event stamp: 92690 [ 1406.245488] hardirqs last enabled at (92689): [<ffffffffa656edef>] console_flush_all+0x49f/0x530 [ 1406.245491] hardirqs last disabled at (92690): [<ffffffffa656edff>] console_flush_all+0x4af/0x530 [ 1406.245493] softirqs last enabled at (91822): [<ffffffffa6455166>] __fpu_restore_sig+0x206/0x6a0 [ 1406.245497] softirqs last disabled at (91820): [<ffffffffa6455088>] __fpu_restore_sig+0x128/0x6a0 [ 1406.245499] Preemption disabled at: [ 1406.245499] [<ffffffffa6570211>] vprintk_emit+0x1f1/0x430 [ 1406.245505] CPU: 46 UID: 0 PID: 1819 Comm: bash Not tainted 6.16.0-rc7+ #446 PREEMPT(voluntary) [ 1406.245508] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 1406.245510] Call Trace: [ 1406.245514] <TASK> [ 1406.245519] dump_stack_lvl+0x77/0xb0 [ 1406.245526] __might_resched+0x1b2/0x2d0 [ 1406.245539] cancel_delayed_work_sync+0x57/0x90 [ 1406.245548] fbcon_cursor+0xb2/0x140 [ 1406.245558] hide_cursor+0x24/0xb0 [ 1406.245562] vt_console_print+0x471/0x490 [ 1406.245565] ? __pfx_vt_console_print+0x10/0x10 [ 1406.245575] console_flush_all+0x32b/0x530 As I've enabled CONFIG_DEBUG_ATOMIC_SLEEP I get the splat immediately even when there is no work queued. (I misread that code yesterday with my tried brain and thought the might_sleep() would be too late). And clearly this is invoked with interrupts disabled: [ 1406.245345] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1819, name: bash Now if there would be work executed at that moment then the lockdep splat _AND_ the subsequent 'irqs already enabled' splat would materialize. I can't be bothered to figure out how to make the work executed because the above is already a clear prove of the theory :) It requires that the software cursor is enabled. That's probably rare, so that explains why this went unnoticed for so long. Contrary to all the halluzinations about the printk locking and lockdep correctness, this is a crystal clear bug in the framebuffer console code and has absolutely nothing to do with the printk() core code. The workqueue issue got introduced with commit 3b0fb6ab25dd ("fbcon: Use delayed work for cursor") in 5.19, but that replaced a del_timer_sync(), which had a similar problem because del_timer_sync() cannot be invoked in hard interrupt context, but printk() can ... That del_timer_sync() was conditional up to 4.2, where it became unconditional in commit 2a17d7e80f1d ("fbcon: unconditionally initialize cursor blink interval") just to be "fixed" shortly before 4.2-rc1 by moving the conditional back in commit a5edce421848 ("fbcon: Avoid deleting a timer in IRQ context"). That conditional did not fix anything as it just pushed the delete into the rarely executed path, therefore hiding the bug, which existed from the very beginning when this cursor blinking muck was introduced with commit acba9cd01974 ("fbcon: cursor blink control") in 2.6.23. So this software cursor code path was never tested with printk() and actually the splat which led to commit a5edce421848 ("fbcon: Avoid deleting a timer in IRQ context") should have made it crystal clear that this code path is broken: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at ../kernel/time/timer.c:1098 del_timer_sync+0x4c/0x54() Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.1.0-rc4-next-20150519 #1 Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [] (warn_slowpath_null) from [] (del_timer_sync+0x4c/0x54) [] (del_timer_sync) from [] (fbcon_del_cursor_timer+0x2c/0x40) [] (fbcon_del_cursor_timer) from [] (fbcon_cursor+0x9c/0x180) [] (fbcon_cursor) from [] (hide_cursor+0x30/0x98) [] (hide_cursor) from [] (vt_console_print+0x2a8/0x340) [] (vt_console_print) from [] (call_console_drivers.constprop.23+0xc8/0xec) [] (call_console_drivers.constprop.23) from [] (console_unlock+0x498/0x4f0) [] (console_unlock) from [] (vprintk_emit+0x1f0/0x508) [] (vprintk_emit) from [] (vprintk_default+0x24/0x2c) [] (vprintk_default) from [] (printk+0x70/0x88) But no, hiding it behind a rarely true conditional without actually testing it, is way better. Seriously? The changelog should have been: This commit papers over the underlying problem by restoring the condition... I leave it as an exercise to the framebuffer people how to fix this trainwreck for real. Thanks, tglx