Re: [PATCH] PCI/AER: Use IRQF_NO_THREAD on aer_irq

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2025-09-02 17:44:41 [-0500], Crystal Wood wrote:
> On PREEMPT_RT, currently both aer_irq and aer_isr run in separate threads,
> at the same FIFO priority.  This can lead to the aer_isr thread starving
> the aer_irq thread, particularly if multi_error_valid causes a scan of
> all devices, and multiple errors are raised during the scan.
> 
> On !PREEMPT_RT, or if aer_irq runs at a higher priority than aer_isr, these
> errors can be queued as single-error events as they happen.  But if aer_irq
> can't run until aer_isr finishes, by that time the multi event bit will be
> set again, causing a new scan and an infinite loop.

So if aer_irq is too slow we get new "work" pilled up? Is it because
there is a timing constrains how long until the error needs to be
acknowledged?

Another way would be to let the secondary handler run at a slightly lower
priority than the primary handler. In this case making the primary
non-threaded should not cause any harm.

Reviewed-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>

> Signed-off-by: Crystal Wood <crwood@xxxxxxxxxx>
> ---
> I'm seeing this on a particular ARM server when using /sys/bus/pci/rescan,
> though the internal reporter sometimes saw it happen on boot as well.
> On !PREEMPT_RT, or with this patch, a finite number of errors are emitted
> and the scan completes.
> ---
>  drivers/pci/pcie/aer.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 15ed541d2fbe..6945a112a5cd 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1671,7 +1671,8 @@ static int aer_probe(struct pcie_device *dev)
>  	set_service_data(dev, rpc);
>  
>  	status = devm_request_threaded_irq(device, dev->irq, aer_irq, aer_isr,
> -					   IRQF_SHARED, "aerdrv", dev);
> +					   IRQF_NO_THREAD | IRQF_SHARED,
> +					   "aerdrv", dev);

I'm not sure if this works with IRQF_SHARED. Your primary handler is
IRQF_SHARED + IRQF_NO_THREAD and another shared handler which is
forced-threaded will have IRQF_SHARED + IRQF_ONESHOT. 
If the core does not complain, all good. Worst case might be the shared
ONESHOT lets your primary handler starve. It would be nice if you could
check if you have shared handler here (I have no aer I three boxes I
checked).

>  	if (status) {
>  		pci_err(port, "request AER IRQ %d failed\n", dev->irq);
>  		return status;

Sebastian




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux