Hi Jon,
On 3/21/25 12:24 PM, Jon Pan-Doh wrote:
On Thu, Mar 20, 2025 at 6:00 PM Sathyanarayanan Kuppuswamy
<sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx> wrote:
Should we exclude fatal errors from the rate limit? Fatal error logs
would be
really useful for debug analysis, and they not happen very frequently.
The logs today only make the distinction between correctable vs.
uncorrectable so I thought it made sense to be consistent.
You're right. From a logging perspective, the current driver only
differentiates between correctable and uncorrectable errors. However,
the goal of your patch series is to reduce the spam of frequent errors.
While we are rate-limiting these frequent logs, we must ensure that we
don't miss important logs. I believe we did not rate-limit DPC logs for
this very reason.
Maybe this is something that could be deferred? The only fixed
I am fine with deferring. IIUC, if needed, through sysfs user can
skip rate-limit for uncorrectable errors, right?
But, is the required change to do this complex? Won't skipping the
rate limit check for fatal errors solve the problem?
Bjorn, any comments? Do you think Fatal errors should be
rate-limited?
component is the sysfs attribute names (which can be made to refer to
uncorrectable nonfatal vs. uncorrectable in doc/underlying
implementation).
Thanks,
Jon
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer