On 09.04.25 14:13, Aaron Lu wrote: > On Wed, Apr 09, 2025 at 02:59:18PM +0530, K Prateek Nayak wrote: >> (+ Aaron) > > Thank you Prateek for bring me in. > >> Hello Jan, >> >> On 4/9/2025 12:11 PM, Jan Kiszka wrote: >>> On 12.10.23 17:07, Valentin Schneider wrote: >>>> Hi folks, >>>> >>>> We've had reports of stalls happening on our v6.0-ish frankenkernels, and while >>>> we haven't been able to come out with a reproducer (yet), I don't see anything >>>> upstream that would prevent them from happening. >>>> >>>> The setup involves eventpoll, CFS bandwidth controller and timer >>>> expiry, and the sequence looks as follows (time-ordered): >>>> >>>> p_read (on CPUn, CFS with bandwidth controller active) >>>> ====== >>>> >>>> ep_poll_callback() >>>> read_lock_irqsave() >>>> ... >>>> try_to_wake_up() <- enqueue causes an update_curr() + sets need_resched >>>> due to having no more runtime >>>> preempt_enable() >>>> preempt_schedule() <- switch out due to p_read being now throttled >>>> >>>> p_write >>>> ======= >>>> >>>> ep_poll() >>>> write_lock_irq() <- blocks due to having active readers (p_read) >>>> >>>> ktimers/n >>>> ========= >>>> >>>> timerfd_tmrproc() >>>> `\ >>>> ep_poll_callback() >>>> `\ >>>> read_lock_irqsave() <- blocks due to having active writer (p_write) >>>> >>>> >>>> From this point we have a circular dependency: >>>> >>>> p_read -> ktimers/n (to replenish runtime of p_read) >>>> ktimers/n -> p_write (to let ktimers/n acquire the readlock) >>>> p_write -> p_read (to let p_write acquire the writelock) >>>> >>>> IIUC reverting >>>> 286deb7ec03d ("locking/rwbase: Mitigate indefinite writer starvation") >>>> should unblock this as the ktimers/n thread wouldn't block, but then we're back >>>> to having the indefinite starvation so I wouldn't necessarily call this a win. >>>> >>>> Two options I'm seeing: >>>> - Prevent p_read from being preempted when it's doing the wakeups under the >>>> readlock (icky) >>>> - Prevent ktimers / ksoftirqd (*) from running the wakeups that have >>>> ep_poll_callback() as a wait_queue_entry callback. Punting that to e.g. a >>>> kworker /should/ do. >>>> >>>> (*) It's not just timerfd, I've also seen it via net::sock_def_readable - >>>> it should be anything that's pollable. >>>> >>>> I'm still scratching my head on this, so any suggestions/comments welcome! >>>> >>> >>> We are hunting for quite some time sporadic lock-ups or RT systems, >>> first only in the field (sigh), now finally also in the lab. Those have >>> a fairly high overlap with what was described here. Our baselines so >>> far: 6.1-rt, Debian and vanilla. We are currently preparing experiments >>> with latest mainline. >> >> Do the backtrace from these lockups show tasks (specifically ktimerd) >> waiting on a rwsem? Throttle deferral helps if cfs bandwidth throttling >> becomes the reason for long delay / circular dependency. Is cfs bandwidth >> throttling being used on these systems that run into these lockups? >> Otherwise, your issue might be completely different. > > Agree. > >>> >>> While this thread remained silent afterwards, we have found [1][2][3] as >>> apparently related. But this means we are still with this RT bug, even >>> in latest 6.15-rc1? >> >> I'm pretty sure a bunch of locking related stuff has been reworked to >> accommodate PREEMPT_RT since v6.1. Many rwsem based locking patterns >> have been replaced with alternatives like RCU. Recently introduced >> dl_server infrastructure also helps prevent starvation of fair tasks >> which can allow progress and prevent lockups. I would recommend >> checking if the most recent -rt release can still reproduce your >> issue: >> https://lore.kernel.org/lkml/20250331095610.ulLtPP2C@xxxxxxxxxxxxx/ >> >> Note: Aaron Lu is working on Valentin's approach of deferring cfs >> throttling to exit to user mode boundary >> https://lore.kernel.org/lkml/20250313072030.1032893-1-ziqianlu@xxxxxxxxxxxxx/ >> >> If you still run into the issue of a lockup / long latencies on latest >> -rt release and your system is using cfs bandwidth controls, you can >> perhaps try running with Valentin's or Aaron's series to check if >> throttle deferral helps your scenario. > > I just sent out v2 :-) > https://lore.kernel.org/all/20250409120746.635476-1-ziqianlu@xxxxxxxxxxxxx/ > > Hi Jan, > > If you want to give it a try, please try v2. > Thanks, we are updating our setup right now. BTW, does anyone already have a test case that produces the lockup issue with one or two simple programs and some hectic CFS bandwidth settings? Jan -- Siemens AG, Foundational Technologies Linux Expert Center