Re: [PATCH v5 6/8] PCI/AER: Introduce ratelimit for error logs

"Paul E. McKenney" <paulmck@xxxxxxxxxx> · Thu, 3 Apr 2025 12:02:26 -0700

On Thu, Mar 27, 2025 at 03:49:12PM -0700, Jon Pan-Doh wrote:
> On Tue, Mar 25, 2025 at 10:17 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> > > --- a/drivers/pci/pci.h
> > > +++ b/drivers/pci/pci.h
> > > @@ -533,6 +533,7 @@ static inline bool pci_dev_test_and_set_removed(struct pci_dev *dev)
> > >
> > >  struct aer_err_info {
> > >       struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES];
> > > +     bool ratelimited[AER_MAX_MULTI_ERR_DEVICES];
> >
> > In my stop-the-bleeding patch, I pass this as an argument to the functions
> > needing it, but this works fine too.  Yes, this approach does consume
> > a bit more storage, but I doubt that it is enough to matter.
> 
> The reason for the boolean array is that we want to eval/use the same
> ratelimit for two log functions (aer_print_rp_info() and
> aer_print_error()). In past versions[1], I removed aer_print_rp_info()
> which simplified the ratelimit eval (i.e. directly eval within
> aer_print_error()). However, others found it helpful to keep the root
> port logs as it directly showed the linkage from interrupt on root
> port -> error source device(s).
> 
> > The main concern is that either all information for a given error is
> > printed or none of it is, to avoid confusion.  (There will of course be
> > the possibility of partial drops due to buffer overruns further down
> > the console-log pipeline, but no need for additional opportunities
> > for confusion.)
> >
> > For this boolean array to provide this property, the error path for a
> > given device must be single threaded, for example, only one interrupt
> > handler at a time.  Is this the case?
> 
> I believe so. AER uses threaded irq where interrupt processing is done
> by storing/reading info from a FIFO (i.e. serializes handling).

I traced out the call trees in v6.14 and got the following:

------------------------------------------------------------------------

Call Tree for aer_print_port_info().  (This is the old name, with Jon’s
patch applied this is aer_print_rp_info().)

	aer_isr() is a threaded interrupt handler.
		aer_isr() invokes aer_isr_one_error().
			aer_isr_one_error() invokes aer_print_port_info().

Call Tree for aer_print_error()

	aer_isr() is a threaded interrupt handler.
		aer_isr() invokes aer_isr_one_error().
			aer_isr_one_error() invokes aer_process_err_devices().
				aer_process_err_devices() invokes aer_print_error().

	dpc_handler() is a threaded interrupt handler.
		dpc_handler() invokes dpc_process_error().
			dpc_process_error() invokes aer_print_error()

	edr_handle_event() is an ACPI notifier.
		edr_handle_event() invokes dpc_process_error().
			dpc_process_error() invokes aer_print_error().

------------------------------------------------------------------------

So ___ratelimit() could be invoked in aer_isr_one_error() and the result
could be passed down directly to aer_print_port_info() on the one hand
and to aer_print_error() via aer_process_err_devices() on the other.

And ___ratelimit() could be invoked in dpc_process_error() and the
result passed directly to aer_print_error().

This would permit the ratelimit_state structures to be placed in
the portion of the pci_dev structure under #ifdef CONFIG_PCIEAER,
avoiding the need to search the enclosing structure.

Presumably using a helper function that invokes __ratelimit() for
CONFIG_PCIEAER=y kernels and just returns true otherwise.

Or am I missing something here?  (Quite possibly your root-port points
below...)

> > > @@ -722,21 +750,31 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info,
> > >  out:
> > >       if (info->id && info->error_dev_num > 1 && info->id == id)
> > >               pci_err(dev, "  Error of this Agent is reported first\n");
> > > -
> > > -     trace_aer_event(dev_name(&dev->dev), (info->status & ~info->mask),
> > > -                     info->severity, info->tlp_header_valid, &info->tlp);
> > >  }
> > >
> > >  static void aer_print_rp_info(struct pci_dev *rp, struct aer_err_info *info)
> > >  {
> > >       u8 bus = info->id >> 8;
> > >       u8 devfn = info->id & 0xff;
> > > +     struct pci_dev *dev;
> > > +     bool ratelimited = false;
> > > +     int i;
> > >
> > > -     pci_info(rp, "%s%s error message received from %04x:%02x:%02x.%d\n",
> > > -              info->multi_error_valid ? "Multiple " : "",
> > > -              aer_error_severity_string[info->severity],
> > > -              pci_domain_nr(rp->bus), bus, PCI_SLOT(devfn),
> > > -              PCI_FUNC(devfn));
> > > +     /* extract endpoint device ratelimit */
> > > +     for (i = 0; i < info->error_dev_num; i++) {
> > > +             dev = info->dev[i];
> > > +             if (info->id == pci_dev_id(dev)) {
> > > +                     ratelimited = info->ratelimited[i];
> > > +                     break;
> > > +             }
> > > +     }
> >
> > I cannot resist noting that passing a "ratelimited" argument to this
> > function would make it unnecessary to search this array.  This would
> > require doing rate-limiting checks in aer_isr_one_error(), which looks
> > like it should work.  Then again, I do not claim to be familiar with
> > this AER code.
> >
> > The "ratelimited" arguments would need to be added to
> > aer_print_port_info(), aer_process_err_devices(), and aer_print_error().
> > Maybe some that I have missed.  Or is there some handoff to
> > softirq or workqueues that I missed?
> 
> We are not ratelimiting the root port, but the source device that
> generated the interrupt on the root port. Thus, we have to search the
> array at some point. We could bake this into the topology traversal
> (link PCI ID with pci_dev) as another param to aer_error_info, but the
> array is <= 8 (i.e. small).
> 
> The root port itself can generate AER notifications. Ratelimiting by
> both its own errors as well as downstream devices will most likely
> mask its own errors.
> 
> [1] https://lore.kernel.org/linux-pci/20250214023543.992372-2-pandoh@xxxxxxxxxx/

OK, I see that aer_isr_one_error() invokes aer_print_port_info(),
and only then invokes find_source_device(), and only after that invokes
aer_process_err_devices().  And it appears that aer_process_err_devices()
iterates over the devices and chooses one, which it passes to
aer_print_error().

Is the logging by aer_print_port_info() and by the subsequent call
to aer_print_error() to be throttled as a group, but specific to the
non-root device passed to aer_print_error()?  Or should the logging by
aer_print_port_info() be throttled specific to the root device with the
subsequent call to aer_print_error() throttled separately specific to
the non-root device?

Or have I lost the thread completely?

							Thanx, Paul