On Thu, Mar 20, 2025 at 1:29 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > On Thu, Mar 20, 2025 at 12:53:53PM -0700, Jon Pan-Doh wrote: > I think the struct aer_err_info is basically a per-interrupt thing, so > maybe we could evaluate __ratelimit() once at the initial entry, save > the result in aer_err_info, and use that saved value everywhere we > print messages? I like this approach. Another advantage is it removes the need for the 2x ratelimit logic. Updated for v5. > - native AER: aer_isr_one_error() has RP pointer in rpc->rpd and > could save it (or pointer to the RP's ratelimit struct, or just > the result of __ratelimit()) in aer_err_info. Similar to aer_err_info.dev[], I store the evaluated __ratelimit() in aer_err_info.ratelimited[]. The main quirk is that for multiple errors, you won't see the root port log if the first error is ratelimited, but the subsequent errors are under the limit. I think this is fine, as the log prints out the first error only, but can change aer_print_port_info() to log if any of the errors is under the limit. > - GHES AER: I'm not sure struct cper_sec_pcie contains the RP, might > have to search upwards from the device we know about? > > - native DPC: dpc_process_error() has DP pointer and could save it > in aer_err_info. > > - EDR DPC: passes DP pointer to dpc_process_error(). These are largely unchanged: - GHES/CXL gated by aer_ratelimit() in pci_print_aer() - DPC not ratelimited with the expectation that there won't be error storms Thanks, Jon