On Thu, Mar 27, 2025 at 03:49:12PM -0700, Jon Pan-Doh wrote: > On Tue, Mar 25, 2025 at 10:17 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > --- a/drivers/pci/pci.h > > > +++ b/drivers/pci/pci.h > > > @@ -533,6 +533,7 @@ static inline bool pci_dev_test_and_set_removed(struct pci_dev *dev) > > > > > > struct aer_err_info { > > > struct pci_dev *dev[AER_MAX_MULTI_ERR_DEVICES]; > > > + bool ratelimited[AER_MAX_MULTI_ERR_DEVICES]; > > > > In my stop-the-bleeding patch, I pass this as an argument to the functions > > needing it, but this works fine too. Yes, this approach does consume > > a bit more storage, but I doubt that it is enough to matter. > > The reason for the boolean array is that we want to eval/use the same > ratelimit for two log functions (aer_print_rp_info() and > aer_print_error()). In past versions[1], I removed aer_print_rp_info() > which simplified the ratelimit eval (i.e. directly eval within > aer_print_error()). However, others found it helpful to keep the root > port logs as it directly showed the linkage from interrupt on root > port -> error source device(s). > > > The main concern is that either all information for a given error is > > printed or none of it is, to avoid confusion. (There will of course be > > the possibility of partial drops due to buffer overruns further down > > the console-log pipeline, but no need for additional opportunities > > for confusion.) > > > > For this boolean array to provide this property, the error path for a > > given device must be single threaded, for example, only one interrupt > > handler at a time. Is this the case? > > I believe so. AER uses threaded irq where interrupt processing is done > by storing/reading info from a FIFO (i.e. serializes handling). I traced out the call trees in v6.14 and got the following: ------------------------------------------------------------------------ Call Tree for aer_print_port_info(). (This is the old name, with Jon’s patch applied this is aer_print_rp_info().) aer_isr() is a threaded interrupt handler. aer_isr() invokes aer_isr_one_error(). aer_isr_one_error() invokes aer_print_port_info(). Call Tree for aer_print_error() aer_isr() is a threaded interrupt handler. aer_isr() invokes aer_isr_one_error(). aer_isr_one_error() invokes aer_process_err_devices(). aer_process_err_devices() invokes aer_print_error(). dpc_handler() is a threaded interrupt handler. dpc_handler() invokes dpc_process_error(). dpc_process_error() invokes aer_print_error() edr_handle_event() is an ACPI notifier. edr_handle_event() invokes dpc_process_error(). dpc_process_error() invokes aer_print_error(). ------------------------------------------------------------------------ So ___ratelimit() could be invoked in aer_isr_one_error() and the result could be passed down directly to aer_print_port_info() on the one hand and to aer_print_error() via aer_process_err_devices() on the other. And ___ratelimit() could be invoked in dpc_process_error() and the result passed directly to aer_print_error(). This would permit the ratelimit_state structures to be placed in the portion of the pci_dev structure under #ifdef CONFIG_PCIEAER, avoiding the need to search the enclosing structure. Presumably using a helper function that invokes __ratelimit() for CONFIG_PCIEAER=y kernels and just returns true otherwise. Or am I missing something here? (Quite possibly your root-port points below...) > > > @@ -722,21 +750,31 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info, > > > out: > > > if (info->id && info->error_dev_num > 1 && info->id == id) > > > pci_err(dev, " Error of this Agent is reported first\n"); > > > - > > > - trace_aer_event(dev_name(&dev->dev), (info->status & ~info->mask), > > > - info->severity, info->tlp_header_valid, &info->tlp); > > > } > > > > > > static void aer_print_rp_info(struct pci_dev *rp, struct aer_err_info *info) > > > { > > > u8 bus = info->id >> 8; > > > u8 devfn = info->id & 0xff; > > > + struct pci_dev *dev; > > > + bool ratelimited = false; > > > + int i; > > > > > > - pci_info(rp, "%s%s error message received from %04x:%02x:%02x.%d\n", > > > - info->multi_error_valid ? "Multiple " : "", > > > - aer_error_severity_string[info->severity], > > > - pci_domain_nr(rp->bus), bus, PCI_SLOT(devfn), > > > - PCI_FUNC(devfn)); > > > + /* extract endpoint device ratelimit */ > > > + for (i = 0; i < info->error_dev_num; i++) { > > > + dev = info->dev[i]; > > > + if (info->id == pci_dev_id(dev)) { > > > + ratelimited = info->ratelimited[i]; > > > + break; > > > + } > > > + } > > > > I cannot resist noting that passing a "ratelimited" argument to this > > function would make it unnecessary to search this array. This would > > require doing rate-limiting checks in aer_isr_one_error(), which looks > > like it should work. Then again, I do not claim to be familiar with > > this AER code. > > > > The "ratelimited" arguments would need to be added to > > aer_print_port_info(), aer_process_err_devices(), and aer_print_error(). > > Maybe some that I have missed. Or is there some handoff to > > softirq or workqueues that I missed? > > We are not ratelimiting the root port, but the source device that > generated the interrupt on the root port. Thus, we have to search the > array at some point. We could bake this into the topology traversal > (link PCI ID with pci_dev) as another param to aer_error_info, but the > array is <= 8 (i.e. small). > > The root port itself can generate AER notifications. Ratelimiting by > both its own errors as well as downstream devices will most likely > mask its own errors. > > [1] https://lore.kernel.org/linux-pci/20250214023543.992372-2-pandoh@xxxxxxxxxx/ OK, I see that aer_isr_one_error() invokes aer_print_port_info(), and only then invokes find_source_device(), and only after that invokes aer_process_err_devices(). And it appears that aer_process_err_devices() iterates over the devices and chooses one, which it passes to aer_print_error(). Is the logging by aer_print_port_info() and by the subsequent call to aer_print_error() to be throttled as a group, but specific to the non-root device passed to aer_print_error()? Or should the logging by aer_print_port_info() be throttled specific to the root device with the subsequent call to aer_print_error() throttled separately specific to the non-root device? Or have I lost the thread completely? Thanx, Paul