On Thu, 10 Jul 2025 09:51:31 +0300 Tariq Toukan wrote: > + * - `pci_bw_inbound_high` > + - The number of times the device crossed the high inbound pcie bandwidth > + threshold. To be compared to pci_bw_inbound_low to check if the device > + is in a congested state. > + If pci_bw_inbound_high == pci_bw_inbound_low then the device is not congested. > + If pci_bw_inbound_high > pci_bw_inbound_low then the device is congested. > + - Tnformative The metrics make sense, but utilization has to be averaged over some period of time to be meaningful. Can you shad any light on what the measurement period or algorithm is? > + changes = cong_event->state ^ new_cong_state; > + if (!changes) > + return; no risk of the high / low events coming so quickly we'll miss both? Should there be a counter for "mis-firing" of that sort? You'd be surprised how long the scheduling latency for a kernel worker can be on a busy server :( > + cong_event->state = new_cong_state; > + > + if (changes & MLX5E_INBOUND_CONG) { > + if (new_cong_state & MLX5E_INBOUND_CONG) > + cong_event->stats.pci_bw_inbound_high++; > + else > + cong_event->stats.pci_bw_inbound_low++; > + } > + > + if (changes & MLX5E_OUTBOUND_CONG) { > + if (new_cong_state & MLX5E_OUTBOUND_CONG) > + cong_event->stats.pci_bw_outbound_high++; > + else > + cong_event->stats.pci_bw_outbound_low++; > + }