Re: How can an RT application know if it is running in nohz_full mode?

"Ahmed S. Darwish" <darwi@xxxxxxxxxxxxx> · Mon, 28 Apr 2025 11:43:32 +0200

Hi Sheng,

On Sat, 26 Apr, Sheng Liu wrote:
>
> On Sat, Apr 26 PM Ahmed S. Darwish wrote:
> >
> > On Fri, 25 Apr, Sheng Liu wrote:
> > >
> > > Would it be feasible to add a variable in procfs, which can be read by
> > > other programs and notify the RT process throught a shared memory flag.
> > >
> >
> > This is overly-engineeded with no extra benefit.  A simple cgroup setup
> > and some basic /sys/fs/cgroup/ file reads should do the job just fine.
> >
...
> > Both "cat"s should print something like:
> >
> >     0::/rtisolated.slice/isolated-shell.service
> >
> > which can be easily done and verified from any C code as well, before
> > starting the RT loop (where multiple initiatory steps need to be done
> > anyway).
>
> According to the kernel documentation (e.g.
> Documentation/timers/NO_HZ.txt), The only runnable SCHED_FIFO task in
> adaptive-tick mode can run continuously until they voluntarily release
> the CPU.
>
> However, in practice, just because a task is running on an isolated CPU
> doesn't mean the CPU is free of interrupts.  Even if only one task is
> running on an isolated CPU, it may still be interrupted by delayed
> work. User programs cannot know if there are future timed events on the
> CPU.
>

IMHO, you're conflating two issues here:

  (a) How can an RT process "know" it's run on the isolated CPUs
  (b) Proper kernel configuration for such isolated CPUs

I was merely answering (a).

That is, how to use CGroups so that your RT application is forced (in a
clean way) to run on the correct CPUs and how can the application
(cleanly) verify that, where such verification can be done in the
preparation code usually required before invoking the RT event loop.

Regarding (b), kernel configuration for the isolated CPUs, it's a whole
different topic.  There's nothing stopping you from further configuring
the isolated CPUs, exclusive to that 'rtisolated' cgroup, through nohz
kernel parameters and other means, in any way you want.

This, I belive, completes your subject line question:

    How can an RT application know if it is running in nohz_full?

Now, regarding:

>
> However, in practice, just because a task is running on an isolated CPU
> doesn't mean the CPU is free of interrupts.  Even if only one task is
> running on an isolated CPU, it may still be interrupted by delayed
> work. User programs cannot know if there are future timed events on the
> CPU.
>

The kernel controls the isolated CPU, not the SCHED_FIFO task.

So, beyond measuring latencies from within the application, the RT
application itself can /not/ "know" anything about the total kernel
behavior on the isolated CPUs.

The system configurator, need to ensure that, and then verify the sanity
of such RT system configuration later with the appropriate tools.  How to
configure the system properly for that is discussed at multiple
conference talks.  A quick "John Ogness real time checklist" search,
along with talks by Rostedt, can give some nice pointers to start with.

Hope this helps a bit.

Good luck!

--
Ahmed S. Darwish
Linutronix GmbH