Re: Latency spikes on V6.15.1 Preempt RT and maybe related to intel? IGB

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sebastian,

i tried that in the past (rtla top auto analysis). But i do not really understand the result:

rtla timerlat hit stop tracing
## CPU 1 hit stop tracing, analyzing it ##
  IRQ handler delay:                                         0.00 us (0.00 %)
  IRQ latency:                                             709.25 us
  Blocking thread:
                              ip:3567
    Blocking thread stack trace
                -> timerlat_irq
                -> __hrtimer_run_queues
                -> hrtimer_interrupt
                -> __sysvec_apic_timer_interrupt
                -> sysvec_apic_timer_interrupt
                -> asm_sysvec_apic_timer_interrupt
                -> igb_update_mc_addr_list
                -> igb_set_rx_mode
                -> __dev_change_flags
                -> netif_change_flags
                -> do_setlink.constprop.0
                -> rtnl_newlink
                -> rtnetlink_rcv_msg
                -> netlink_rcv_skb
                -> netlink_unicast
                -> netlink_sendmsg
                -> ____sys_sendmsg
                -> ___sys_sendmsg
                -> __sys_sendmsg
                -> do_syscall_64
                -> entry_SYSCALL_64_after_hwframe
------------------------------------------------------------------------
     IRQ latency:                                          709.25 us (100%)


I do not really understand where the IRQ/Preemption disabling is happening. What would the next thing be to do? Function (graph?) tracing on all the functions visible in the backtrace?


I tried to look at the event race output starting with the call to igb_set_rx_mode. I have attached the trace with all events and a function filter on igb on only the cpu executing ip.  I cannot understand what is happening between timestasmp 700.149995 and the IRQ disable event on 700.150795....


Thanks for your help,

Marc



Am 13.06.2025 um 21:58 schrieb Sebastian Andrzej Siewior:
On 2025-06-13 17:26:15 [+0200], marc.straemke wrote:
Thanks Sebastian, I will do that tomorrow.To confirm: Just pure event
tracing without the function tracer? (After enabling the sched
events)Regards,Marc
The event tracing should narrow down which of the tasks cause the spike.
So if you say it is the ip comment then you should see ip.
Step two would be the function tracer to narrow it down further. You
could start right away with the function tracer to see where the big gap
is.
rtla could speed up the whole process (via the timerlat auto analysis).

Sebastian
              ip-4931    [001] ...1.   700.149994: preempt_disable: caller=__local_bh_disable_ip+0x76/0xe0 parent=__local_bh_disable_ip+0x76/0xe0
              ip-4931    [001] ...11   700.149994: preempt_enable: caller=__local_bh_disable_ip+0x76/0xe0 parent=__local_bh_disable_ip+0x76/0xe0
              ip-4931    [001] b...3   700.149994: igb_set_rx_mode <-__dev_change_flags
              ip-4931    [001] b...3   700.149995: kmalloc: call_site=igb_set_rx_mode+0x4f8/0x5a0 ptr=00000000b0b4e5c8 bytes_req=12 bytes_alloc=16 gfp_flags=GFP_ATOMIC|__GFP_ZERO node=-1 accounted=false
              ip-4931    [001] b...3   700.149995: igb_update_mc_addr_list <-igb_set_rx_mode
              ip-4931    [001] Dn..3   700.150795: irq_disable: caller=irqentry_enter+0x2a/0x60 parent=0x0
              ip-4931    [001] Dnh.3   700.150795: preempt_disable: caller=sysvec_reschedule_ipi+0x24/0x130 parent=sysvec_reschedule_ipi+0x24/0x130
              ip-4931    [001] Dnh.3   700.150795: reschedule_entry: vector=253
              ip-4931    [001] DNh.3   700.150795: reschedule_exit: vector=253
              ip-4931    [001] DNh.3   700.150796: preempt_enable: caller=sysvec_reschedule_ipi+0x8d/0x130 parent=sysvec_reschedule_ipi+0x8d/0x130
              ip-4931    [001] DN.13   700.150796: preempt_disable: caller=irqentry_exit+0x64/0x70 parent=irqentry_exit+0x64/0x70
              ip-4931    [001] DN.13   700.150796: irq_enable: caller=preempt_schedule_irq+0x3b/0xe0 parent=0x0
              ip-4931    [001] DN.13   700.150796: irq_disable: caller=irqentry_enter+0x2a/0x60 parent=0x0
              ip-4931    [001] DNh13   700.150797: call_function_single_entry: vector=251
              ip-4931    [001] DNh23   700.150798: ipi_send_cpu: cpu=1 callsite=irq_work_queue_on+0x109/0x120 callback=rto_push_irq_work_func+0x0/0xa0
              ip-4931    [001] DNh13   700.150798: call_function_single_exit: vector=251
              ip-4931    [001] DN.13   700.150798: irq_enable: caller=asm_sysvec_call_function_single+0x1a/0x20 parent=0x0
              ip-4931    [001] DN.13   700.150799: irq_disable: caller=irqentry_enter+0x2a/0x60 parent=0x0
              ip-4931    [001] DNh13   700.150799: irq_work_entry: vector=246
              ip-4931    [001] DNh13   700.150799: irq_work_exit: vector=246
              ip-4931    [001] DN.13   700.150800: irq_enable: caller=asm_sysvec_irq_work+0x1a/0x20 parent=0x0
              ip-4931    [001] DN.13   700.150800: irq_disable: caller=irqentry_enter+0x2a/0x60 parent=0x0
              ip-4931    [001] DNh13   700.150800: local_timer_entry: vector=236
              ip-4931    [001] DNh23   700.150801: hrtimer_cancel: hrtimer=000000003093a798
              ip-4931    [001] DNh13   700.150801: hrtimer_expire_entry: hrtimer=000000003093a798 function=hrtimer_wakeup now=700145368977
              ip-4931    [001] DNh33   700.150802: sched_waking: comm=cyclictest pid=4893 prio=9 target_cpu=001
              ip-4931    [001] DNh43   700.150802: sched_stat_sleep: comm=cyclictest pid=4893 delay=1223239 [ns]
              ip-4931    [001] DNh43   700.150803: sched_wakeup: comm=cyclictest pid=4893 prio=9 target_cpu=001
              ip-4931    [001] DNh13   700.150803: hrtimer_expire_exit: hrtimer=000000003093a798
              ip-4931    [001] DNh23   700.150803: hrtimer_cancel: hrtimer=000000001da73767
              ip-4931    [001] DNh13   700.150803: hrtimer_expire_entry: hrtimer=000000001da73767 function=tick_nohz_handler now=700145368977
              ip-4931    [001] DNh13   700.150804: softirq_raise: vec=1 [action=TIMER]
              ip-4931    [001] DNh13   700.150804: rcu_utilization: Start scheduler-tick
              ip-4931    [001] DNh13   700.150805: rcu_utilization: End scheduler-tick
              ip-4931    [001] DNh13   700.150805: read_msr: e8, value 1787ff7229c
              ip-4931    [001] DNh13   700.150805: read_msr: e7, value 177daa8fd55
              ip-4931    [001] DNh23   700.150805: sched_stat_runtime: comm=ip pid=4931 runtime=1226636 [ns]
              ip-4931    [001] DNh13   700.150808: hrtimer_expire_exit: hrtimer=000000001da73767
              ip-4931    [001] DNh23   700.150808: hrtimer_start: hrtimer=000000001da73767 function=tick_nohz_handler expires=700146000000 softexpires=700146000000 mode=ABS
              ip-4931    [001] DNh13   700.150809: local_timer_exit: vector=236
              ip-4931    [001] DN.33   700.150809: sched_waking: comm=ktimers/1 pid=31 prio=98 target_cpu=001
              ip-4931    [001] DN.43   700.150809: sched_stat_sleep: comm=ktimers/1 pid=31 delay=3349898 [ns]
              ip-4931    [001] DN.43   700.150810: sched_wakeup: comm=ktimers/1 pid=31 prio=98 target_cpu=001
              ip-4931    [001] DN.13   700.150810: irq_enable: caller=asm_sysvec_apic_timer_interrupt+0x1a/0x20 parent=0x0
              ip-4931    [001] DN.13   700.150810: irq_disable: caller=__schedule+0x67f/0x1080 parent=0x0
              ip-4931    [001] DN.13   700.150810: rcu_utilization: Start context switch
              ip-4931    [001] DN.23   700.150810: rcu_preempt_task: rcu_preempt 93628 4931
              ip-4931    [001] DN.13   700.150811: rcu_utilization: End context switch
              ip-4931    [001] DN.23   700.150811: sched_stat_runtime: comm=ip pid=4931 runtime=4037 [ns]
              ip-4931    [001] DN.23   700.150812: sched_stat_wait: comm=cyclictest pid=4893 delay=7434 [ns]
              ip-4931    [001] D..23   700.150813: sched_stat_wait: comm=ip pid=4931 delay=0 [ns]
              ip-4931    [001] D..33   700.150814: hrtimer_start: hrtimer=0000000087697d14 function=inactive_task_timer expires=700464978086 softexpires=700464978086 mode=REL|HARD
              ip-4931    [001] D..33   700.150814: hrtimer_cancel: hrtimer=0000000096921230
              ip-4931    [001] D..33   700.150815: hrtimer_cancel: hrtimer=0000000087697d14
              ip-4931    [001] D..33   700.150816: hrtimer_start: hrtimer=0000000096921230 function=dl_task_timer expires=701094857133 softexpires=701094857133 mode=ABS|HARD
              ip-4931    [001] Dp.23   700.150816: sched_switch: prev_comm=ip prev_pid=4931 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=4893 next_prio=9
              ip-4931    [001] Dp.23   700.150817: tlb_flush: pages:-1 reason:flush on task switch (0)
              ip-4931    [001] Dp.23   700.150817: x86_fpu_regs_deactivated: x86/fpu: 00000000dd84b5fb load: 0 xfeatures: 2 xcomp_bv: 8000000000000007
      cyclictest-4893    [001] dN.3.   700.150818: sched_stat_runtime: comm=cyclictest pid=4893 runtime=8747 [ns]
      cyclictest-4893    [001] dN.3.   700.150819: sched_migrate_task: comm=rcu_preempt pid=17 prio=98 orig_cpu=1 dest_cpu=0
      cyclictest-4893    [001] dN.3.   700.150819: sched_wake_idle_without_ipi: cpu=0
      cyclictest-4893    [001] dN.1.   700.150820: irq_enable: caller=finish_task_switch.isra.0+0xb1/0x300 parent=0x0
      cyclictest-4893    [001] dN.1.   700.150820: irq_disable: caller=irqentry_enter+0x2a/0x60 parent=0x0
      cyclictest-4893    [001] dNh1.   700.150820: call_function_single_entry: vector=251
      cyclictest-4893    [001] dNh2.   700.150821: csd_queue_cpu: cpu=0 callsite=irq_work_queue_on+0xc6/0x120 func=rto_push_irq_work_func csd=00000000e2a8d506
      cyclictest-4893    [001] dNh2.   700.150821: ipi_send_cpu: cpu=0 callsite=irq_work_queue_on+0xc6/0x120 callback=generic_smp_call_function_single_interrupt+0x0/0x20
      cyclictest-4893    [001] dNh1.   700.150821: call_function_single_exit: vector=251
      cyclictest-4893    [001] dN.1.   700.150822: irq_enable: caller=asm_sysvec_call_function_single+0x1a/0x20 parent=0x0
      cyclictest-4893    [001] .N.1.   700.150822: preempt_enable: caller=do_nanosleep+0x72/0x190 parent=do_nanosleep+0x72/0x190
      cyclictest-4893    [001] .N.1.   700.150822: preempt_disable: caller=do_nanosleep+0x72/0x190 parent=do_nanosleep+0x72/0x190
      cyclictest-4893    [001] dN.1.   700.150822: irq_disable: caller=__schedule+0x67f/0x1080 parent=0x0
      cyclictest-4893    [001] dN.1.   700.150822: rcu_utilization: Start context switch
      cyclictest-4893    [001] dN.1.   700.150822: rcu_utilization: End context switch
      cyclictest-4893    [001] d..1.   700.150823: irq_enable: caller=__schedule+0xb5e/0x1080 parent=0x0
      cyclictest-4893    [001] ...1.   700.150823: preempt_enable: caller=do_nanosleep+0x72/0x190 parent=do_nanosleep+0x72/0x190
      cyclictest-4893    [001] .....   700.150823: sys_exit: NR 230 = 0
      cyclictest-4893    [001] d....   700.150823: irq_disable: caller=syscall_exit_to_user_mode+0x1c4/0x240 parent=0x0
      cyclictest-4893    [001] d....   700.150824: irq_enable: caller=syscall_exit_to_user_mode+0xe4/0x240 parent=0x0
      cyclictest-4893    [001] .....   700.150824: rseq_update: cpu_id=1 node_id=0 mm_cid=0
      cyclictest-4893    [001] d....   700.150824: irq_disable: caller=syscall_exit_to_user_mode+0x203/0x240 parent=0x0
      cyclictest-4893    [001] d....   700.150825: x86_fpu_regs_activated: x86/fpu: 00000000ff492f85 load: 1 xfeatures: 2 xcomp_bv: 8000000000000007
      cyclictest-4893    [001] d....   700.150825: irq_enable: caller=syscall_exit_to_user_mode+0x7c/0x240 parent=0x0
      cyclictest-4893    [001] d....   700.150830: irq_disable: caller=do_syscall_64+0x45/0x1a0 parent=0x0
      cyclictest-4893    [001] d....   700.150830: irq_enable: caller=do_syscall_64+0x4a/0x1a0 parent=0x0
      cyclictest-4893    [001] .....   700.150831: sys_enter: NR 1 (4, 7f640320e2c0, 21, 7f640320d5b5, 0, 64)
      cyclictest-4893    [001] ...1.   700.150832: preempt_disable: caller=vfs_write+0x306/0x430 parent=vfs_write+0x306/0x430
      cyclictest-4893    [001] ...1.   700.150832: preempt_enable: caller=vfs_write+0x328/0x430 parent=vfs_write+0x328/0x430
      cyclictest-4893    [001] .....   700.150832: tracing_mark_write: hit latency threshold (759 > 700)
      cyclictest-4893    [001] ...1.   700.150833: preempt_disable: caller=vfs_write+0x12f/0x430 parent=vfs_write+0x12f/0x430
      cyclictest-4893    [001] ...1.   700.150833: preempt_enable: caller=vfs_write+0x151/0x430 parent=vfs_write+0x151/0x430
      cyclictest-4893    [001] ...1.   700.150833: preempt_disable: caller=fput+0x1f/0x90 parent=fput+0x1f/0x90
      cyclictest-4893    [001] ...1.   700.150833: preempt_enable: caller=fput+0x3f/0x90 parent=fput+0x3f/0x90
      cyclictest-4893    [001] .....   700.150833: sys_exit: NR 1 = 33
      cyclictest-4893    [001] d....   700.150834: irq_disable: caller=syscall_exit_to_user_mode+0x1c4/0x240 parent=0x0
      cyclictest-4893    [001] d....   700.150834: irq_enable: caller=syscall_exit_to_user_mode+0x7c/0x240 parent=0x0
      cyclictest-4893    [001] d....   700.150834: irq_disable: caller=do_syscall_64+0x45/0x1a0 parent=0x0
      cyclictest-4893    [001] d....   700.150834: irq_enable: caller=do_syscall_64+0x4a/0x1a0 parent=0x0
      cyclictest-4893    [001] .....   700.150834: sys_enter: NR 1 (5, 5556d858a461, 2, 7f640320d5b5, 0, 64)
      cyclictest-4893    [001] ...1.   700.150835: preempt_disable: caller=vfs_write+0x306/0x430 parent=vfs_write+0x306/0x430
      cyclictest-4893    [001] ...1.   700.150835: preempt_enable: caller=vfs_write+0x328/0x430 parent=vfs_write+0x328/0x430

[Index of Archives]     [RT Stable]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux