Can you please not send HTML emails. One, they get blocked from all kernel.org mailing lists, and two, they are impossible to read on a text email client. I'll keep the below to show you what your email looks like in text, and also to make it get to some of the mailing lists. Thanks, -- Steve On Tue, 20 May 2025 09:52:03 +0000 fengtian guo <fengtian_guo@xxxxxxxxxxx> wrote: > 1. > > > 1. > I encountered a hard deadlock issue in the RT-patched kernel (v5.10) on an ARM64 arm64 32cores server, triggered by high-load stress testing with RT threads. > > Open Questions > > 1. RT Patch Specificity: Is this a new RT-specific issue ? or better patch for fix ? > > First Deadlock Root Cause Analysis > The initial deadlock occurs due to unprotected spinlock access between an IRQ work thread and a hardware interrupt on the same CPU. Here is the critical path: > Deadlock Sequence > > 1. IRQ Work Thread Context (RT priority): > > irq_work → rto_push_irq_work_func → raw_spin_lock(&rq->lock) → push_rt_task > > * The rto_push_irq_work_func thread acquires rq->lock without disabling interrupts. > 2. Hardware Interrupt Context (Clock timer): > > > hrtimer_interrupt → __hrtimer_run_queues → _run_hrtimer → hrtimer_wakeup → try_to_wake_up → ttwu_queue → raw_spin_lock(&rq->lock) > > * The clock interrupt preempts the IRQ work thread while it holds rq->lock. > * The interrupt handler attempts to acquire the same rq->lock via ttwu_queue, causing a double-lock deadlock. > > > 1. > After analysis, the deadlock arises from spinlock contention between interrupt contexts (e.g., hrtimer_interrupt) and thread contexts (e.g., rto_push_irq_work_func). > > > 1. > > > [871.101301][ C6] NMI watchdog: Watchdog detected hard LOCKUP on cpu 6 > [871.101302][ C12] NMI watchdog: Watchdog detected hard LOCKUP on cpu 12 > [871.103274][ C31] NMI watchdog: Watchdog detected hard LOCKUP on cpu 31 > [871.101302][ C12] NMI watchdog: Watchdog detected hard LOCKUP on cpu 12 > [871.101414][ C12] pstate: 00400089 (nzcv daIf +PAN -UAO -TCO BTYPE=---) > [871.101426][ C12] sp : ffff800010063cf0 > [871.101408][ C12] CPU: 12 PID: 97 Comm: irq_work/12 Kdump: loaded Tainted: G > [871.101416][ C12] pc : native_queued_spin_lock_slowpath+0x188/0x354 > [871.101422][ C12] lr : _raw_spin_lock+0x7c/0x8c > [871.101476][ C12] Call trace: > [871.101477][ C12] native_queued_spin_lock_slowpath+0x188/0x354 > [871.101479][ C12] _raw_spin_lock+0x7c/0x8c > [871.101482][ C12] ttwu_queue+0x58/0x154 > [871.101485][ C12] try_to_wake_up+0x1e4/0x4e0 > [871.101488][ C12] wake_up_process+0x20/0x30 > [871.101490][ C12] hrtimer_wakeup+0x28/0x40 > [871.101493][ C12] _run_hrtimer+0x84/0x304 > [871.101495][ C12] _hrtimer_run_queues+0xfb8/0x15c > [871.101497][ C12] hrtimer_interrupt+0xfc/0x2d0 > [871.101500][ C12] arch_timer_handler_phys+0x3c/0x50 > [871.101504][ C12] handle_percpu_devid_irq+0xac/0x2b0 > [871.101508][ C12] _handle_domain_irq+0xb8/0x13c > [871.101511][ C12] gic_handle_irq+0x78/0x2d0 > [871.101513][ C12] el1_irq+0xb8/0x180 > [871.101515][ C12] find_lowest_rq+0x0/0x26c > [871.101517][ C12] rto_push_irq_work_func+0x180/0x1ec > [871.101519][ C12] irq_work_single+0x38/0xb4 > [871.101522][ C12] irq_work_run_list+0x48/0x60 > [871.101525][ C12] run_irq_workd+0x30/0x40 > [871.101527][ C12] smpboot_thread_fn+0x278/0x300 > [871.101530][ C12] kthread+0x170/0x19c > [871.101532][ C12] ret_from_fork+0x10/0x18 > [871.103274][ C31] NMI watchdog: Watchdog detected hard LOCKUP on cpu 31 > [871.103326][ C31] CPU: 31 PID: 8897 Comm: stress-ng-cpu Kdump: loaded Tainted: G > [871.103330][ C31] pstate: 00400089 (nzcv daIf +PAN -UAO -TCO BTYPE=---) > [871.103332][ C31] pc : native_queued_spin_lock_slowpath+0x254/0x354 > [871.103336][ C31] lr : native_queued_spin_lock_slowpath+0xc4/0x354 > [871.103338][ C31] sp : ffff8000100fbd00 > [871.103339][ C31] x29: ffff8000100fbd00 x28: ffff748fb7dbb6c0 > [871.103341][ C31] x27: 0000000000000000 x26: ffffbfc0c4eeee30 > [871.103343][ C31] x25: ffffbfc0c4ef6e38 x24: ffffbfc0c4ef7800 > [871.103345][ C31] x23: ffffbfc0c4eeee30 x22: 000000000800000 > [871.103347][ C31] x21: ffffbfc0c4bde2c0 x20: ffff748fb7ff62c0 > [871.103349][ C31] x19: ffff748fb7dbb480 x18: 0000000000000000 > [871.103350][ C31] x17: 0000000000000000 x16: 0000000000000000 > [871.103352][ C31] x15: 0000000000000000 x14: 0000000000000000 > [871.103354][ C31] x13: 0000000000000000 x12: 000000000000040 > [871.103355][ C31] x11: 0000002d03aa232c x10: 000000000011af2 > [871.103357][ C31] x9 : ffffbfc0c45d0de0 x8 : 00000000000002f > [871.103359][ C31] x7 : 0000000000000000 x6 : 0000000000000000 > [871.103361][ C31] x5 : 0000000000000000 x4 : ffff748fb7ff62c0 > [871.103362][ C31] x3 : 0000000001c0101 x2 : ffffbfc0c4e10001 > [871.103364][ C31] x1 : 0000000000000000 x0 : 0000000000000000 > [871.103366][ C31] Call trace: > [871.103367][ C31] native_queued_spin_lock_slowpath+0x254/0x354 > [871.103369][ C31] _raw_spin_lock+0x7c/0x8c > [871.103372][ C31] do_sched_rt_period_timer+0x118/0x3a0 > [871.103374][ C31] sched_rt_period_timer+0x6c/0x150 > [871.103376][ C31] _run_hrtimer+0x84/0x304 > [871.103378][ C31] _hrtimer_run_queues+0xfb8/0x15c > [871.103380][ C31] hrtimer_interrupt+0xfc/0x2d0 > [871.103382][ C31] arch_timer_handler_phys+0x3c/0x50 > [871.103385][ C31] handle_percpu_devid_irq+0xac/0x2b0 > [871.103388][ C31] _handle_domain_irq+0xb8/0x13c > [871.103390][ C31] gic_handle_irq+0x78/0x2d0 > [871.103392][ C31] el0_irq_naked+0x50/0x58 > [871.101301][ C6] NMI watchdog: Watchdog detected hard LOCKUP on cpu 6 > [871.101414][ C6] pstate: 80400089 (Nzcv daIf +PAN -UAO -TCO BTYPE=---) > [871.101424][ C6] sp : ffff80001cfbbbf0 > [871.101474][ C6] > [871.101416][ C6] pc : native_queued_spin_lock_slowpath+0x220/0x354 > [871.101422][ C6] lr : native_queued_spin_lock_slowpath+0xc4/0x354 > [871.101408][ C6] CPU: 6 PID: 3170 Comm: dm_sched_thd Kdump: loaded Tainted: G > [871.101475][ C6] Call trace: > [871.101476][ C6] native_queued_spin_lock_slowpath+0x220/0x354 > [871.101479][ C6] _raw_spin_lock+0x7c/0x8c > [871.101483][ C6] ttwu_queue+0x58/0x154 > [871.101486][ C6] try_to_wake_up+0x1e4/0x4e0 > [871.101488][ C6] _wake_up_q+0xb8/0xf0 > [871.101491][ C6] futex_wake+0x178/0x1bc > [871.101493][ C6] do_futex+0x148/0x1cc > [871.101496][ C6] _arm64_sys_futex+0x120/0x1a0 > [871.101498][ C6] el0_svc_common.constprop.0+0x7c/0x1b4 > [871.101501][ C6] do_el0_svc+0x2c/0x9c > [871.101504][ C6] el0_svc+0x20/0x30 > [871.101506][ C6] el0_sync_handler+0xb0/0xb4 > [871.101508][ C6] el0_sync+0x160/0x180 > [871.101510][ C6] Kernel panic - not syncing: > [871.101511][ C6] Hard LOCKUP > [871.101512][ C6] CPU: 6 PID: 3170 Comm: dm_sched_thd Kdump: loaded Tainted: G > 10.0-60.18.0.rt62.50.kl.aarch64 #1 > [871.101515][ C6] Call trace: > [871.101516][ C6] dump_backtrace+0x0/0x1e4 > [871.101518][ C6] show_stack+0x20/0x2c > [871.101519][ C6] dump_stack+0xd0/0x134 > [871.101522][ C6] panic+0xd4/0x3c4 > [871.101525][ C6] add_taint+0x0/0xbc > [871.101528][ C6] watchdog_hardlockup_check+0x108/0x1a0 > [871.101531][ C6] sdei_watchdog_callback+0x94/0xd0 > [871.101534][ C6] sdei_event_handler+0x28/0x84 > [871.101538][ C6] _sdei_handler+0x88/0x150 > [109 ][871.101534][ C6] sdei_event_handler+0x28/0x84 > [110 ][871.101538][ C6] _sdei_handler+0x88/0x150 > [111 ][871.101539][ C6] _sdei_handler+0x30/0x60 > [112 ][871.101541][ C6] _sdei_asm_handler+0xbc/0x16c > [113 ][871.101541][ C6] native_queued_spin_lock_slowpath+0x220/0x354 > [114 ][871.101544][ C6] _raw_spin_lock+0x7c/0x8c > [115 ][871.101547][ C6] ttwu_queue+0x58/0x154 > [116 ][871.101549][ C6] try_to_wake_up+0x1e4/0x4e0 > [117 ][871.101551][ C6] _wake_up_q+0xb8/0xf0 > [118 ][871.101553][ C6] futex_wake+0x178/0x1bc > [119 ][871.101555][ C6] do_futex+0x148/0x1cc > [120 ][871.101556][ C6] _arm64_sys_futex+0x120/0x1a0 > [121 ][871.101558][ C6] el0_svc_common.constprop.0+0x7c/0x1b4 > > > > 1. > Drift version1 patch > > > 1. > kernel/sched/rt.c | 89 ++++++++++++++++++++++++++++++++++++++++--------------------- > 1 file changed, 54 insertions(+), 35 deletions(-) > > @@ -2127,6 +2137,9 @@ void rto_push_irq_work_func(struct irq_work *work) > struct rq *rq; > int cpu; > + unsigned long flags; > + u64 start_ts, end_ts, spend = 0; > + int mycpu = raw_smp_processor_id(); > > rq = this_rq(); > @@ -2135,19 +2148,24 @@ void rto_push_irq_work_func(struct irq_work *work) > * When it gets updated, a check is made if a push is possible. > */ > if (has_pushable_tasks(rq)) { > - raw_spin_lock(&rq->lock); > + raw_spin_lock_irqsave(&rq->lock, flags); > + start_ts = ktime_get_ns(); > while (push_rt_task(rq, true)) > ; > - raw_spin_unlock(&rq->lock); > + end_ts = ktime_get_ns(); > + raw_spin_unlock_irqrestore(&rq->lock, flags); > + spend = end_ts - start_ts; > } > > > > > > A2 > > 1. Revised Fix (v2): > * Improvement: replace all raw spinlock with irq disable (e.g., root_domain->rto_lock). > * > > * Attachments: > * > second_patch.txt (revised fix) - sorry this is not original patch but from my screen shot convert to text so some - or + is not regconized correctly (If needed I will create new patch) > * > second_deadlock_dump_after_patch1.txt (updated backtrace) I just get part of backtrace when panic > > Test Results and Fix Progress > 1. Pre-Fix: Severe System Crashes > > * Test Parameters: > > bash > > > cyclictest -i 1000 # Default priority testing > stress-ng --rt 0 --rt-period 1000000 --rt-runtime 500000 # Baseline stress test > > * Results: > * System crashed or rebooted within 10–15 minutes. > * Crash call stacks indicated contention on rq->lock (see attachment crash-log-v0.txt). > > ________________________________ > 2. First Patch (patch-v1): Significant Stability Improvement > > * Patch Changes: > * Used raw_spin_lock_irqsave in critical paths (e.g., rto_push_irq_work_func) to disable interrupts. > * Resolved lock contention between interrupt and thread contexts on the same CPU. > * Test Parameters: > > bash > > cyclictest -i 10 # High-frequency testing > stress-ng --rt 4 --rt-period 100000 --rt-runtime 90000 # High-load RT tasks > > * Results: > * No crashes for 3 hours under normal stress tests. > * Crash after 2.7 hours under extreme testing (cyclictest -i 1) (see crash-log-v1.txt). > * Root Cause: Global lock (root_domain->rto_lock) etc in cross-CPU operations. > > ________________________________ > 3. Second Patch (patch-v2): Full Stability Under Extreme Load > > * > > * Test Parameters: > > bash > > > cyclictest -i 1 # Extreme frequency testing (1μs interval) > stress-ng --rt 8 --rt-period 50000 --rt-runtime 45000 # Maximum RT task load > > * Results: > * System remained stable for 18+ hours with zero crashes. > * All test cases passed (see test-log-v2.txt). > > > > Best Regards > Fengtian