On Wed, May 21, 2025 at 5:40 AM John Ogness <john.ogness@xxxxxxxxxxxxx> wrote: > > Hi Leo, > > On 2025-05-20, Leonardo Bras Soares Passos <leobras@xxxxxxxxxx> wrote: > > On Tue, May 20, 2025 at 4:35 AM John Ogness <john.ogness@xxxxxxxxxxxxx> wrote: > >> > >> On 2025-05-20, Leonardo Bras <leobras@xxxxxxxxxx> wrote: > >> > When running cyclictest with break trace (-b) option, wait for skip_sec > >> > seconds before issuing the break. > >> > > >> > Cyclictest may present high latency on the initial cycles, which can be > >> > caused by initial resource allocations, and may not represent the > >> > expected latency for the running workload. > >> > >> If cyclictest is programmed correctly, this will not happen. Can you > >> provide a technical explanation for high latencies on initial cycles? > > > > We are currently investigating the source of this latency, but I heard > > from other team members it's also happening on Intel Secure VMs as > > well. > > > > Scenario: > > Host: ARM64, kernel-rt, 120 out of 128 cpu isolated, hugepages for guest > > KVM Guest: kernel-rt, 120 vcpus pinned on above isolated cpus, 116 > > vcpus isolated on guest > > > > Cyclictest runs with trace-cmd attaching to a guest agent: > > > > trace-cmd record --poll -m 1000 -e csd -e sched/sched_switch -e timer > > -e irq_handler_entry -e irq_handler_exit -e tick_stop -e > > ipi_send_cpumask -e ipi_send_cpu -A 3 -e csd -e sched/sched_switch -e > > timer -e irq_handler_entry -e irq_handler_exit -e tick_stop -e > > ipi_send_cpumask -e ipi_send_cpu bash -c "ssh $my_guest 'cyclictest -m > > -q -p95 --policy=fifo -D 2h -i 200 -h60 -t 116 -a 4-119 -b 50 > > --mainaffinity 0,1,2,3 --tracemark'" > > > > What we see is a peak of >50us in the first couple cycles, and then a > > max peak of 15 in our tests. > > I wonder if this related to cache misses and/or memory bandwidth. If you > keep this patch but you increase the interval, do you see >50us during > your tests? For example: > > -i 10000 > > If so, your "max peak" of 15us is only valid when cache hot. > That's a smart move, we will run this test as suggested and bring the results. > IMHO whatever situation is happening for the "first couple cycles" could > happen later. In that case, the patch is just sweeping some of the bad > numbers under the rug. It is really important to understand exactly what > the problem is before changing cyclictest to ignore such problems. That's a valid point. We are currently investigating it, and I will bring the results when we have some. I have a v2 ready, but I think it makes sense to send it once we find the above investigation if it makes sense to proceed with the suggestion. Thanks! Leo