Re: [RFC PATCH] cyclictest: Add skip_sec option

Leonardo Bras Soares Passos <leobras@xxxxxxxxxx> · Thu, 22 May 2025 01:01:54 -0300

On Wed, May 21, 2025 at 5:40 AM John Ogness <john.ogness@xxxxxxxxxxxxx> wrote:
>
> Hi Leo,
>
> On 2025-05-20, Leonardo Bras Soares Passos <leobras@xxxxxxxxxx> wrote:
> > On Tue, May 20, 2025 at 4:35 AM John Ogness <john.ogness@xxxxxxxxxxxxx> wrote:
> >>
> >> On 2025-05-20, Leonardo Bras <leobras@xxxxxxxxxx> wrote:
> >> > When running cyclictest with break trace (-b) option, wait for skip_sec
> >> > seconds before issuing the break.
> >> >
> >> > Cyclictest may present high latency on the initial cycles, which can be
> >> > caused by initial resource allocations, and may not represent the
> >> > expected latency for the running workload.
> >>
> >> If cyclictest is programmed correctly, this will not happen. Can you
> >> provide a technical explanation for high latencies on initial cycles?
> >
> > We are currently investigating the source of this latency, but I heard
> > from other team members it's also happening on Intel Secure VMs as
> > well.
> >
> > Scenario:
> > Host: ARM64, kernel-rt, 120 out of 128 cpu isolated, hugepages for guest
> > KVM Guest: kernel-rt, 120 vcpus pinned on above isolated cpus, 116
> > vcpus isolated on guest
> >
> > Cyclictest runs with trace-cmd attaching to a guest agent:
> >
> > trace-cmd record --poll -m 1000 -e csd -e sched/sched_switch -e timer
> > -e irq_handler_entry -e irq_handler_exit -e tick_stop -e
> > ipi_send_cpumask -e ipi_send_cpu -A 3 -e csd -e sched/sched_switch -e
> > timer -e irq_handler_entry -e irq_handler_exit -e tick_stop -e
> > ipi_send_cpumask -e ipi_send_cpu bash -c "ssh $my_guest 'cyclictest -m
> > -q -p95 --policy=fifo -D 2h -i 200 -h60 -t 116 -a 4-119 -b 50
> > --mainaffinity 0,1,2,3 --tracemark'"
> >
> > What we see is a peak of >50us in the first couple cycles, and then a
> > max peak of 15 in our tests.
>
> I wonder if this related to cache misses and/or memory bandwidth. If you
> keep this patch but you increase the interval, do you see >50us during
> your tests? For example:
>
>     -i 10000
>
> If so, your "max peak" of 15us is only valid when cache hot.
>

That's a smart move, we will run this test as suggested and bring the results.

> IMHO whatever situation is happening for the "first couple cycles" could
> happen later. In that case, the patch is just sweeping some of the bad
> numbers under the rug. It is really important to understand exactly what
> the problem is before changing cyclictest to ignore such problems.

That's a valid point.
We are currently investigating it, and I will bring the results when
we have some.

I have a v2 ready, but I think it makes sense to send it once we find
the above investigation if it makes sense to proceed with the
suggestion.

Thanks!
Leo