The percpu_devid class of interrupts represent interrupts that are generally produced by CPUs or per-CPU devices (timer, PMU), and are more or less directly wired to that same CPU. The core code has a baked-in assumption that all CPUs have the same devices wired to the same interrupt, making the whole system homogeneous in that respect. Unfortunately, this is wishful thinking, and the big-little contraption has proved more often than not that sanity was not necessarily part of the design requirements. The typical problem on ARM systems is that while all CPUs have a PMU, not all PMUs are the same and require a slightly different programing model. The driver therefore has to treat them differently. Another problem is that some devices are only present on a subset of the CPUs, and not others. All in all, it means that these percpu_devid interrupts (which are represented as PPIs on the GIC class of interrupt controllers) are not ideally handled by the core code. At least two interrupt controllers (GICv3 and Apple AIC) have invented their own schemes to paper over these problems. In the GICv3 case (the most over-engineered one), the low-level irqchip creates a mux irqchip for each PPI (up to 64 of them), and each possible affinity class represents an interrupt in this mux. Like this (compressed for readability): # grep pmu /proc/interrupts 38: 0 0 0 0 0 0 GICv3-23 0 Level arm-pmu 39: 0 0 0 0 0 0 GICv3-23 1 Level arm-pmu Yes, this is nuts. But by hiding that the PPI is split between two affinity groups, and exposing two interrupts that are apparently system-wide, we trick the core code into not exploding. While this solution has worked for the past 9 years, it creates its own class of issues. The most glaring one is that we cannot configure these interrupts as NMIs, as they are muxed interrupts and we don't allow NMIs outside of the root interrupt controller (this would imply taking locks that are not safe in this context). As it turns out, some of the heaviest users of profiling are also users of the most crazy systems out there (cough phones cough)... It is also pretty bad that the core IRQ code cannot handle natively what has become an extremely common HW setup, for better or worse. Finally, I have no desire to replicate the same horror on the next generation of ARM interrupt controller (GICv5), and this is the opportunity to put things right. This series therefore aims at teaching the core code about these interrupts, removing all sort of irqchip-specific handling, and allow NMIs to be naturally handled in these setups. We want: - a single irqdesc representing the system-wide percpu interrupt - the irqaction to carry the affinity the handler applies to For that, we have a few problems to solve: - communicating the affinity of an interrupt from the firmware to the device driver. This is currently done via the irq_{g,s}et_percpu_devid_partition(), but this is stored on a per-irqdesc basis, which wouldn't work anymore with the scheme described above. For that, we expose a new irqdomain op, implemented by the irqchips, that can retrieve FW-specific information including interrupt affinity, which then gets processed by ACPI and OF helpers to report the affinity to the platform layer. A new platform-level API can retrieve both the interrupt and its affinity, and the three drivers that make use of such affinity are updated to use this interface. - teaching the core code about affinity-specific interrupts by allowing a percpu interrupt to be requested for a specific affinity. This involves tracking the affinity at the irqaction level and enforce that it isn't possible to request the interrupt twice with overlapping affinities. A new flavour of request_irq() is provided for that purpose, and the drivers updated to make use of it. - kill the irqchip-specific infrastructure, allowing the normal flow of interrupt request to be restored. With that, we end-up with this on the same platform as above: # grep pmu /proc/interrupts 27: 0 0 0 0 0 0 GICv3 23 Level arm-pmu, arm-pmu which is much more satisfactory, and allows the use of NMIs. It doesn't hurt that the whole thing ends up with a negative diffstat. Patches on top of v6.17-rc5, tested on a bunch of asymmetric systems: - RK3399 (GICv3, DT) - O6 (GICv3, ACPI) - M1 (AIC, DT) as well as on a saner symmetric system (Synquacer). Marc Zyngier (24): irqdomain: Add firmware info reporting interface ACPI: irq: Add IRQ affinity reporting interface of/irq: Add IRQ affinity reporting interface platform: Add firmware-agnostic irq and affinity retrieval interface irqchip/gic-v3: Add FW info retrieval support irqchip/apple-aic: Add FW info retrieval support coresight: trbe: Convert to new IRQ affinity retrieval API perf: arm_pmu: Convert to new IRQ affinity retrieval API perf: arm_spe_pmu: Convert to new IRQ affinity retrieval API perf: arm_pmu: Add PMU/CPU affinity tracking genirq: Merge irqaction::{dev_id,percpu_dev_id} genirq: Factor-in percpu irqaction creation genirq: Add affinity to percpu_devid interrupt requests genirq: Update request_percpu_nmi() to take an affinity genirq: Allow per-cpu interrupt sharing for non-overlapping affinities genirq: Add request_percpu_irq_affinity() helper perf: arm_pmu: Request specific affinities for percpu NMI/IRQ perf: arm_spe_pmu: Request specific affinities for percpu IRQ coresight: trbe: Request specific affinities for percpu IRQ irqchip/gic-v3: Drop support for custom PPI partitions irqchip/apple-aic: Drop support for custom PMU irq partitions irqchip: Kill irq-partition-percpu genirq: Kill irq_{g,s}et_percpu_devid_partition() perf: arm_pmu: Kill last use of per-CPU cpu_armpmu pointer Will Deacon (1): perf: arm_pmu: Use a backend-specific pointer for percpu interrupt request arch/arm64/kernel/smp.c | 2 +- drivers/acpi/irq.c | 15 ++ drivers/base/platform.c | 60 ++++- drivers/hwtracing/coresight/coresight-trbe.c | 9 +- drivers/irqchip/Kconfig | 4 - drivers/irqchip/Makefile | 1 - drivers/irqchip/irq-apple-aic.c | 59 +++-- drivers/irqchip/irq-gic-v3.c | 172 +++++-------- drivers/irqchip/irq-partition-percpu.c | 241 ------------------- drivers/of/irq.c | 20 ++ drivers/perf/arm_pmu.c | 61 +++-- drivers/perf/arm_pmu_acpi.c | 7 +- drivers/perf/arm_pmu_platform.c | 24 +- drivers/perf/arm_pmuv3.c | 2 +- drivers/perf/arm_spe_pmu.c | 13 +- include/linux/acpi.h | 7 + include/linux/interrupt.h | 24 +- include/linux/irq.h | 4 - include/linux/irqchip/irq-partition-percpu.h | 53 ---- include/linux/irqdesc.h | 1 - include/linux/irqdomain.h | 30 +++ include/linux/of_irq.h | 7 + include/linux/perf/arm_pmu.h | 7 +- include/linux/platform_device.h | 2 + kernel/irq/chip.c | 8 +- kernel/irq/irqdesc.c | 24 +- kernel/irq/irqdomain.c | 33 ++- kernel/irq/manage.c | 128 +++++++--- 28 files changed, 449 insertions(+), 569 deletions(-) delete mode 100644 drivers/irqchip/irq-partition-percpu.c delete mode 100644 include/linux/irqchip/irq-partition-percpu.h -- 2.39.2