Re: [PATCH] PCI/AER: Use IRQF_NO_THREAD on aer_irq

Lukas Wunner <lukas@xxxxxxxxx> · Thu, 4 Sep 2025 14:48:21 +0200

On Thu, Sep 04, 2025 at 09:30:24AM +0200, Sebastian Andrzej Siewior wrote:
> On 2025-09-02 17:44:41 [-0500], Crystal Wood wrote:
> > On PREEMPT_RT, currently both aer_irq and aer_isr run in separate threads,
> > at the same FIFO priority.  This can lead to the aer_isr thread starving
> > the aer_irq thread, particularly if multi_error_valid causes a scan of
> > all devices, and multiple errors are raised during the scan.
> > 
> > On !PREEMPT_RT, or if aer_irq runs at a higher priority than aer_isr, these
> > errors can be queued as single-error events as they happen.  But if aer_irq
> > can't run until aer_isr finishes, by that time the multi event bit will be
> > set again, causing a new scan and an infinite loop.
> 
> So if aer_irq is too slow we get new "work" pilled up? Is it because
> there is a timing constrains how long until the error needs to be
> acknowledged?

Since v6.16, AER supports rate limiting.  It's unclear which
kernel version Crystal is using, but if it's older than v6.16,
it may be worth retrying with a newer release to see if that
solves the problem.

> Another way would be to let the secondary handler run at a slightly lower
> priority than the primary handler. In this case making the primary
> non-threaded should not cause any harm.

Why isn't the secondary handler always assigned a lower priority
by default?  I think a lot of drivers are built on the assumption
that the primary handler is scheduled sooner than the secondary
handler.

E.g. the native PCIe hotplug driver (drivers/pci/hotplug/pciehp_hpc.c)
uses the primary handler to pick up Command Completed interrupts
and will then wake the secondary handler, which is waiting in
pcie_wait_cmd().  The secondary handler uses a timeout of 1 sec
to ensure forward progress in case the hardware never signals
Command Completed (e.g. if the hotplug port itself was hot-removed).

In extreme cases, the primary handler may not run within 1 sec
to wake the secondary handler.  The secondary handler will then
run into the timeout and issue an error message (but should
otherwise react gracefully).

My point is that keeping both at the same priority by default
provokes such situations more easily, so assigning a higher
default priority to the primary handler would seem prudent.

> > +++ b/drivers/pci/pcie/aer.c
> > @@ -1671,7 +1671,8 @@ static int aer_probe(struct pcie_device *dev)
> >  	set_service_data(dev, rpc);
> >  
> >  	status = devm_request_threaded_irq(device, dev->irq, aer_irq, aer_isr,
> > -					   IRQF_SHARED, "aerdrv", dev);
> > +					   IRQF_NO_THREAD | IRQF_SHARED,
> > +					   "aerdrv", dev);
> 
> I'm not sure if this works with IRQF_SHARED. Your primary handler is
> IRQF_SHARED + IRQF_NO_THREAD and another shared handler which is
> forced-threaded will have IRQF_SHARED + IRQF_ONESHOT. 
> If the core does not complain, all good. Worst case might be the shared
> ONESHOT lets your primary handler starve. It would be nice if you could
> check if you have shared handler here (I have no aer I three boxes I
> checked).

Yes, interrupt sharing can happen if the Root Port uses legacy INTx
interrupts.  In that case other port services such as hotplug,
bandwidth control, PME or DPC may use the same interrupt.

Thanks,

Lukas