On Thu, Mar 27 2025 at 16:36, Wen Xiong wrote: > When secondary adapter doing a reset, we use the same code path as > removing operation. We can’t free irqs for Secondary adapter since > kernel has assigned the irqs for Secondary adapter. > > Actually we discussed about "calling pci_free_irq_vectors()" before > doing bist reset when we trying to fix in device driver. > That might cause other problems. It is also not what a user would > expect. For example, if they disabled irq balance and manually setup irq > binding and affinity, if we go and free and reallocate the interrupts > across a reset, this would wipe out those changes, which would not be > expected. You are completely missing the point. This is not a problem restricted to PCI/MSI interrupts. You have interrupts, work, queues and whatever in fully operational state. Then you tell the firmware to reset the adapter and swap the secondary adapter in. This reset operation brings the hardware temporary into an undefined state as you have observed. How is any part of the kernel, which can access and operate on this adapter, supposed to deal with that correctly? You are now focussing solely on papering over the symptom in PCI/MSI because that's where you actually observed the fallout. But as you know curing the symptom is not fixing anything because the underlying root cause still persists and will roar its ugly head at some other place sooner than later. So no, instead of sprinkling horrible band-aids all over the place, you have to sit down and think about a properly orchestrated mechanism for this failover scenario. I understand that you want to preserve the state of the secondary adapter including interrupt numbers and related settings, but for that you have to properly freeze _all_ related kernel resources first, then let the firmware do the swap and once that's complete unfreeze everything again. That will need support in other subsystems, like the interrupt layer, but that needs to be properly designed and integrated so that it works with all possible PCI/MSI backend implementations and under all circumstances. Setting affinity is not the only way to make this go wrong, there are other operations which access the underlying hardware and config space. You just haven't triggered them yet, but they exist. And it's not a problem restricted to the pSeries MSI backend implementation either. That's just one way to expose it. Even if I put the secondary swap problem aside and just look at a single instance, __ipr_remove() is already broken in itself. As I pointed out to you before: __ipr_remove() ipr_initiate_ioa_bringdown() // resets device -> undefined state restore_config_space() .... ipr_free_all_resources() free_irqs() Up to the point where interrupts are freed, they are fully operational and exposed to user space. The related functionality can hit the undefined state not only via set_affinity() independent of the swap problem. The adapter swap is just a worse variant of the above because it affects the secondary side behind its back. This needs quite some work to resolve properly, but that's the only way to fix it once and forever. Proliferating the 'Plug and Pray' approach with a lot of handwaving is just helping you to close your ticket, but it's not a solution. Thanks, tglx