[PATCH AUTOSEL 6.16-6.6] PCI: xgene-msi: Resend an MSI racing with itself on a different CPU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Marc Zyngier <maz@xxxxxxxxxx>

[ Upstream commit 3cc8f625e4c6a0e9f936da6b94166e62e387fe1d ]

Since changing the affinity of an MSI really is about changing
the target address and that it isn't possible to mask an individual
MSI, it is completely possible for an interrupt to race with itself,
usually resulting in a lost interrupt.

Paper over the design blunder by informing the core code of this
sad state of affairs.

Signed-off-by: Marc Zyngier <maz@xxxxxxxxxx>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@xxxxxxxxxx>
Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
Link: https://lore.kernel.org/r/20250708173404.1278635-11-maz@xxxxxxxxxx
Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
---

LLM Generated explanations, may be completely bogus:

**Backport Status: YES**

This commit should be backported to stable kernel trees. Here's my
detailed analysis:

## Key Evidence for Backporting:

1. **Fixes a Real Bug - Lost Interrupts**: The commit addresses a race
   condition where MSI interrupts can be lost when CPU affinity changes
   occur. The commit message explicitly states this results in "a lost
   interrupt" which is a serious issue that can cause system malfunction
   or hangs.

2. **Minimal and Contained Change**: The fix consists of a single line
   addition:
  ```c
  irqd_set_resend_when_in_progress(irq_get_irq_data(virq));
  ```
  This is an extremely small, targeted fix that only affects the xgene-
  msi driver's interrupt allocation path.

3. **Part of a Broader Fix Pattern**: This commit follows the same
   pattern established for other interrupt controllers:
   - Commit 9c15eeb5362c4 introduced this mechanism for fasteoi handlers
     in 2023
   - Commit 2c9e7f857400 extended it to handle_simple_irq() flow
   - This commit applies the same proven workaround to the xgene-msi
     driver

4. **Addresses Hardware Design Limitation**: The commit message
   describes this as papering over a "design blunder" - the hardware
   inability to mask individual MSIs combined with affinity changes
   creates an unavoidable race condition. This is not a new feature but
   a critical workaround for existing hardware defects.

5. **History of Race Conditions in This Driver**: The xgene-msi driver
   has had race condition issues before (commit a93c00e5f975 fixed a
   different race in 2021), indicating this subsystem needs these types
   of fixes for stability.

6. **No Architecture Changes**: The fix uses existing kernel
   infrastructure (irqd_set_resend_when_in_progress) without introducing
   new APIs or changing kernel architecture.

7. **Low Risk of Regression**: The change only affects the specific
   xgene-msi driver and only adds a resend mechanism when interrupts are
   already in progress - it doesn't change the normal interrupt handling
   path.

## Stable Tree Criteria Met:
- ✅ Fixes a real bug (lost interrupts)
- ✅ Small change (1 line)
- ✅ Not a new feature
- ✅ Isolated to specific driver
- ✅ Uses established kernel mechanisms
- ✅ Addresses hardware limitation that affects deployed systems

The fix prevents interrupt loss during CPU affinity changes on X-Gene
MSI controllers, which is exactly the type of bug fix that stable
kernels should receive to maintain system reliability.

 drivers/pci/controller/pci-xgene-msi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/controller/pci-xgene-msi.c b/drivers/pci/controller/pci-xgene-msi.c
index b05ec8b0bb93..50647fa14e69 100644
--- a/drivers/pci/controller/pci-xgene-msi.c
+++ b/drivers/pci/controller/pci-xgene-msi.c
@@ -200,6 +200,7 @@ static int xgene_irq_domain_alloc(struct irq_domain *domain, unsigned int virq,
 	irq_domain_set_info(domain, virq, msi_irq,
 			    &xgene_msi_bottom_irq_chip, domain->host_data,
 			    handle_simple_irq, NULL, NULL);
+	irqd_set_resend_when_in_progress(irq_get_irq_data(virq));
 
 	return 0;
 }
-- 
2.39.5





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux