On 2025-09-04 14:48:21 [+0200], Lukas Wunner wrote: > Since v6.16, AER supports rate limiting. It's unclear which > kernel version Crystal is using, but if it's older than v6.16, > it may be worth retrying with a newer release to see if that > solves the problem. Where is this rate limiting coming from? > > Another way would be to let the secondary handler run at a slightly lower > > priority than the primary handler. In this case making the primary > > non-threaded should not cause any harm. > > Why isn't the secondary handler always assigned a lower priority > by default? I think a lot of drivers are built on the assumption > that the primary handler is scheduled sooner than the secondary > handler. Well, that is the first time I see that someone made that assumption. > E.g. the native PCIe hotplug driver (drivers/pci/hotplug/pciehp_hpc.c) > uses the primary handler to pick up Command Completed interrupts > and will then wake the secondary handler, which is waiting in > pcie_wait_cmd(). The secondary handler uses a timeout of 1 sec > to ensure forward progress in case the hardware never signals > Command Completed (e.g. if the hotplug port itself was hot-removed). If it is waiting then everything is good. It would be only problematic if it busy-polls. > In extreme cases, the primary handler may not run within 1 sec > to wake the secondary handler. The secondary handler will then > run into the timeout and issue an error message (but should > otherwise react gracefully). > > My point is that keeping both at the same priority by default > provokes such situations more easily, so assigning a higher > default priority to the primary handler would seem prudent. Okay but the secondary should be one less than the primary. The primary is in the middle priority "MAX_RT_PRIO / 2". It should not be preferred over other forced-threaded handler just because it has also a secondary handler. The secondary should run after all primary handler are done. This would also mirror the !RT case. > > > +++ b/drivers/pci/pcie/aer.c > > > @@ -1671,7 +1671,8 @@ static int aer_probe(struct pcie_device *dev) > > > set_service_data(dev, rpc); > > > > > > status = devm_request_threaded_irq(device, dev->irq, aer_irq, aer_isr, > > > - IRQF_SHARED, "aerdrv", dev); > > > + IRQF_NO_THREAD | IRQF_SHARED, > > > + "aerdrv", dev); > > > > I'm not sure if this works with IRQF_SHARED. Your primary handler is > > IRQF_SHARED + IRQF_NO_THREAD and another shared handler which is > > forced-threaded will have IRQF_SHARED + IRQF_ONESHOT. > > If the core does not complain, all good. Worst case might be the shared > > ONESHOT lets your primary handler starve. It would be nice if you could > > check if you have shared handler here (I have no aer I three boxes I > > checked). > > Yes, interrupt sharing can happen if the Root Port uses legacy INTx > interrupts. In that case other port services such as hotplug, > bandwidth control, PME or DPC may use the same interrupt. So this sounds like it is not going to work then, or is it? > Thanks, > > Lukas Sebastian