On Thu, Jun 12, 2025 at 12:31 AM Akshay Jindal <akshayaj.lkd@xxxxxxxxx> wrote: > > When an error is detected at a PCIe device and the root port receives the > error message, the threaded IRQ handler aer_isr traverses down the > hierarchy from the root port and keeps on adding those pcie devices on > which error has been recorded into the e_info->dev[] array for > respective error handling and recovery. The e_info->dev[] array has size > AER_MAX_MULTI_ERR_DEVICES which currently has been defined as 5. > This change adds an error message in case this limit is hit. > > Signed-off-by: Akshay Jindal <akshayaj.lkd@xxxxxxxxx> > --- > > Testing: > ======== > Verified log in dmesg on QEMU. > > 1. Following command created the required environment. As mentioned below a > pcie-root-port and a virtio-net-pci device are used on a Q35 machine model. > ./qemu-system-x86_64 \ > -M q35,accel=kvm \ > -m 2G -cpu host -nographic \ > -serial mon:stdio \ > -kernel /home/akshayaj/pci/arch/x86/boot/bzImage \ > -initrd /home/akshayaj/Embedded_System_Using_QEMU/rootfs/rootfs.cpio.gz \ > -append "console=ttyS0 root=/ pci=pcie_scan_all" \ > -device pcie-root-port,id=rp0,chassis=1,slot=1 \ > -device virtio-net-pci,bus=rp0 > > ~ # mylspci -t > -[0000:00]-+-00.0 > +-01.0 > +-02.0 > +-03.0-[01]----00.0 > +-1f.0 > +-1f.2 > \-1f.3 > 00:03.0--> pcie-root-port > > > 2. Kernel bzImage compiled with following changes: > 2.1 CONFIG_PCIEAER=y in config > 2.2 AER_MAX_MULTI_ERR_DEVICES set to 0 > Since there is no pcie-testdev in QEMU, it is impossible to create > a 5-level hierarchy of PCIe devices in QEMU. So we simulate the > error scenario by changing the limit to 0. > 2.3 Log added at the required place in aer.c. > > 3. Both correctable and uncorrectable errors were injected on > pcie-root-port via HMP command (pcie_aer_inject_error) in QEMU. > HMP Command used are as follows: > 3.1 pcie_aer_inject_error -c rp0 0x1 > 3.2 pcie_aer_inject_error -c rp0 0x40 > 3.3 pcie_aer_inject_error rp0 0x10 > > Resulting dmesg: > ================ > [ 0.380534] pcieport 0000:00:03.0: AER: enabled with IRQ 24 > [ 55.729530] pcieport 0000:00:03.0: AER: Exceeded max allowed (0) addition of PCIe devices for AER handling > [ 225.484456] pcieport 0000:00:03.0: AER: Exceeded max allowed (0) addition of PCIe devices for AER handling > [ 356.976253] pcieport 0000:00:03.0: AER: Exceeded max allowed (0) addition of PCIe devices for AER handling > > drivers/pci/pcie/aer.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index 70ac66188367..3995a1db5699 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -1039,7 +1039,8 @@ static int find_device_iter(struct pci_dev *dev, void *data) > /* List this device */ > if (add_error_device(e_info, dev)) { > /* We cannot handle more... Stop iteration */ > - /* TODO: Should print error message here? */ > + pci_err(dev, "Exceeded max allowed (%d) addition of PCIe " > + "devices for AER handling\n", AER_MAX_MULTI_ERR_DEVICES); > return 1; > } > > -- > 2.43.0 > Gentle reminder. Thanks, Akshay.