Re: [PATCH net-next V2 2/3] net/mlx5e: Add device PCIe congestion ethtool stats

Jakub Kicinski <kuba@xxxxxxxxxx> · Mon, 14 Jul 2025 08:26:00 -0700

On Sat, 12 Jul 2025 07:55:27 +0000 Dragos Tatulea wrote:
> > The metrics make sense, but utilization has to be averaged over some
> > period of time to be meaningful. Can you shad any light on what the
> > measurement period or algorithm is?
>
> The measurement period in FW is 200 ms.

SG, please include in the doc.

> > > +	changes = cong_event->state ^ new_cong_state;
> > > +	if (!changes)
> > > +		return;  
> > 
> > no risk of the high / low events coming so quickly we'll miss both?  
> Yes it is possible and it is fine because short bursts are not counted. The
> counters are for sustained high PCI BW usage.
> 
> > Should there be a counter for "mis-firing" of that sort?
> > You'd be surprised how long the scheduling latency for a kernel worker
> > can be on a busy server :(
> >  
> The event is just a notification to read the state from FW. If the
> read is issued later and the state has not changed then it will not be
> considered.

200ms is within the range of normal scheduler latency on a busy server.
It's not a deal breaker, but I'd personally add a counter for wakeups
which did not result in any state change. Likely recent experience
with constant EEVDF regressions and sched_ext is coloring my judgment.