Hi, Is there any feedback on the patch? Thanks, Akshay On Fri, Jun 20, 2025 at 12:21 AM Akshay Jindal <akshayaj.lkd@xxxxxxxxx> wrote: > > When a PCIe error is detected, the root port receives the error message > and the threaded IRQ handler, aer_isr, traverses the hierarchy downward > from the root port. It populates the e_info->dev[] array with the PCIe > devices that have recorded error status, so that appropriate error > handling and recovery can be performed. > > The e_info->dev[] array is limited in size by AER_MAX_MULTI_ERR_DEVICES, > which is currently defined as 5. If more than five devices report errors > in the same event, the array silently truncates the list, and those > extra devices are not included in the recovery flow. > > Emit an error message when this limit is reached, fulfilling a TODO > comment in drivers/pci/pcie/aer.c. > /* TODO: Should print error message here? */ > > Signed-off-by: Akshay Jindal <akshayaj.lkd@xxxxxxxxx> > --- > > Changes since v1: > - Reworded commit message in imperative mood (per Shuah’s feedback) > - Mentioned and quoted related TODO in the message > - Updated recipient list > > Testing: > ======== > Verified log in dmesg on QEMU. > > 1. Following command created the required environment. As mentioned below a > pcie-root-port and a virtio-net-pci device are used on a Q35 machine model. > ./qemu-system-x86_64 \ > -M q35,accel=kvm \ > -m 2G -cpu host -nographic \ > -serial mon:stdio \ > -kernel /home/akshayaj/pci/arch/x86/boot/bzImage \ > -initrd /home/akshayaj/Embedded_System_Using_QEMU/rootfs/rootfs.cpio.gz \ > -append "console=ttyS0 root=/ pci=pcie_scan_all" \ > -device pcie-root-port,id=rp0,chassis=1,slot=1 \ > -device virtio-net-pci,bus=rp0 > > ~ # mylspci -t > -[0000:00]-+-00.0 > +-01.0 > +-02.0 > +-03.0-[01]----00.0 > +-1f.0 > +-1f.2 > \-1f.3 > 00:03.0--> pcie-root-port > > 2. Kernel bzImage compiled with following changes: > 2.1 CONFIG_PCIEAER=y in config > 2.2 AER_MAX_MULTI_ERR_DEVICES set to 0 > Since there is no pcie-testdev in QEMU, it is impossible to create > a 5-level hierarchy of PCIe devices in QEMU. So we simulate the > error scenario by changing the limit to 0. > 2.3 Log added at the required place in aer.c. > > 3. Both correctable and uncorrectable errors were injected on > pcie-root-port via HMP command (pcie_aer_inject_error) in QEMU. > HMP Command used are as follows: > 3.1 pcie_aer_inject_error -c rp0 0x1 > 3.2 pcie_aer_inject_error -c rp0 0x40 > 3.3 pcie_aer_inject_error rp0 0x10 > > Resulting dmesg: > ================ > [ 0.380534] pcieport 0000:00:03.0: AER: enabled with IRQ 24 > [ 55.729530] pcieport 0000:00:03.0: AER: Exceeded max allowed (0) addition of PCIe devices for AER handling > [ 225.484456] pcieport 0000:00:03.0: AER: Exceeded max allowed (0) addition of PCIe devices for AER handling > [ 356.976253] pcieport 0000:00:03.0: AER: Exceeded max allowed (0) addition of PCIe devices for AER handling > > drivers/pci/pcie/aer.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index 70ac66188367..3995a1db5699 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -1039,7 +1039,8 @@ static int find_device_iter(struct pci_dev *dev, void *data) > /* List this device */ > if (add_error_device(e_info, dev)) { > /* We cannot handle more... Stop iteration */ > - /* TODO: Should print error message here? */ > + pci_err(dev, "Exceeded max allowed (%d) addition of PCIe " > + "devices for AER handling\n", AER_MAX_MULTI_ERR_DEVICES); > return 1; > } > > -- > 2.43.0 >