On Wed, Apr 09, 2025 at 02:59:18PM +0530, K Prateek Nayak wrote: > (+ Aaron) Thank you Prateek for bring me in. > Hello Jan, > > On 4/9/2025 12:11 PM, Jan Kiszka wrote: > > On 12.10.23 17:07, Valentin Schneider wrote: > > > Hi folks, > > > > > > We've had reports of stalls happening on our v6.0-ish frankenkernels, and while > > > we haven't been able to come out with a reproducer (yet), I don't see anything > > > upstream that would prevent them from happening. > > > > > > The setup involves eventpoll, CFS bandwidth controller and timer > > > expiry, and the sequence looks as follows (time-ordered): > > > > > > p_read (on CPUn, CFS with bandwidth controller active) > > > ====== > > > > > > ep_poll_callback() > > > read_lock_irqsave() > > > ... > > > try_to_wake_up() <- enqueue causes an update_curr() + sets need_resched > > > due to having no more runtime > > > preempt_enable() > > > preempt_schedule() <- switch out due to p_read being now throttled > > > > > > p_write > > > ======= > > > > > > ep_poll() > > > write_lock_irq() <- blocks due to having active readers (p_read) > > > > > > ktimers/n > > > ========= > > > > > > timerfd_tmrproc() > > > `\ > > > ep_poll_callback() > > > `\ > > > read_lock_irqsave() <- blocks due to having active writer (p_write) > > > > > > > > > From this point we have a circular dependency: > > > > > > p_read -> ktimers/n (to replenish runtime of p_read) > > > ktimers/n -> p_write (to let ktimers/n acquire the readlock) > > > p_write -> p_read (to let p_write acquire the writelock) > > > > > > IIUC reverting > > > 286deb7ec03d ("locking/rwbase: Mitigate indefinite writer starvation") > > > should unblock this as the ktimers/n thread wouldn't block, but then we're back > > > to having the indefinite starvation so I wouldn't necessarily call this a win. > > > > > > Two options I'm seeing: > > > - Prevent p_read from being preempted when it's doing the wakeups under the > > > readlock (icky) > > > - Prevent ktimers / ksoftirqd (*) from running the wakeups that have > > > ep_poll_callback() as a wait_queue_entry callback. Punting that to e.g. a > > > kworker /should/ do. > > > > > > (*) It's not just timerfd, I've also seen it via net::sock_def_readable - > > > it should be anything that's pollable. > > > > > > I'm still scratching my head on this, so any suggestions/comments welcome! > > > > > > > We are hunting for quite some time sporadic lock-ups or RT systems, > > first only in the field (sigh), now finally also in the lab. Those have > > a fairly high overlap with what was described here. Our baselines so > > far: 6.1-rt, Debian and vanilla. We are currently preparing experiments > > with latest mainline. > > Do the backtrace from these lockups show tasks (specifically ktimerd) > waiting on a rwsem? Throttle deferral helps if cfs bandwidth throttling > becomes the reason for long delay / circular dependency. Is cfs bandwidth > throttling being used on these systems that run into these lockups? > Otherwise, your issue might be completely different. Agree. > > > > While this thread remained silent afterwards, we have found [1][2][3] as > > apparently related. But this means we are still with this RT bug, even > > in latest 6.15-rc1? > > I'm pretty sure a bunch of locking related stuff has been reworked to > accommodate PREEMPT_RT since v6.1. Many rwsem based locking patterns > have been replaced with alternatives like RCU. Recently introduced > dl_server infrastructure also helps prevent starvation of fair tasks > which can allow progress and prevent lockups. I would recommend > checking if the most recent -rt release can still reproduce your > issue: > https://lore.kernel.org/lkml/20250331095610.ulLtPP2C@xxxxxxxxxxxxx/ > > Note: Aaron Lu is working on Valentin's approach of deferring cfs > throttling to exit to user mode boundary > https://lore.kernel.org/lkml/20250313072030.1032893-1-ziqianlu@xxxxxxxxxxxxx/ > > If you still run into the issue of a lockup / long latencies on latest > -rt release and your system is using cfs bandwidth controls, you can > perhaps try running with Valentin's or Aaron's series to check if > throttle deferral helps your scenario. I just sent out v2 :-) https://lore.kernel.org/all/20250409120746.635476-1-ziqianlu@xxxxxxxxxxxxx/ Hi Jan, If you want to give it a try, please try v2. Thanks.