Introduction ============ NMI-source reporting with FRED [1] provides a new mechanism for identifying the source of NMIs. As part of the FRED event delivery framework, a 16-bit vector bitmap is provided that identifies one or more sources that caused the NMI. Using the source bitmap, the kernel can precisely run the relevant NMI handlers instead of polling the entire NMI handler list. Additionally, the source information would be invaluable for debugging misbehaving handlers and unknown NMIs. Changes since the last version ============================== v4: https://lore.kernel.org/lkml/20240709143906.1040477-1-jacob.jun.pan@xxxxxxxxxxxxxxx/ Apart from the change of personnel, the patches include the following major changes: * Reorder the patches to have the infrastructure changes precede the feature addition. (Sean) * Use a simplified encoding mechanism for NMI-source vectors. (Sean) * Get rid of the alternate NMI vector priority scheme. (below) * Simplify NMI handling logic with source bitmap. (below) Existing NMI handling code already has a priority mechanism for the NMI handlers, with CPU-specific (NMI_LOCAL) handlers executed first followed by platform NMI handlers and unknown NMI (NMI_UNKNOWN) handlers being last. Within each of these NMI types, the handlers registered with NMI_FLAG_FIRST are given priority. It is essential that new NMI-source handling follows the same scheme to maintain consistent behavior with and without NMI-source. If there is a need for a more granular priority scheme, it should be introduced at the generic NMI handler level instead of assigning priorities to NMI-source vectors. This design choice leads to a simplification in the NMI handling logic as well. It is now possible to get rid of the complexity introduced by a new handler lookup table as well as the partial bitmap handling logic. The updated code (patch 5) is significantly less intrusive and easier to maintain. Day in the life of an NMI-source vector ======================================= A brief overview of how NMI-source vectors are used: // Allocate a static source vector at compile time #define NMIS_VECTOR_TEST 1 // Register an NMI handler with the vector register_nmi_handler(NMI_LOCAL, test_handler, 0, "nmi_test", NMIS_VECTOR_TEST); // Generate an NMI with the source vector using NMI encoded delivery __apic_send_IPI_mask(cpumask, APIC_DM_NMI | NMIS_VECTOR_TEST); // Handle an NMI with or without the source information (oversimplified) source_bitmap = fred_event_data(regs); if (!source_bitmap || (source_bitmap & BIT(NMIS_VECTOR_TEST))) test_handler(); // Unregister handler along with the vector unregister_nmi_handler(NMI_LOCAL, "nmi_test"); Patch structure =============== The patches are based on tip:x86/nmi because they depend on the NMI cleanup series merged earlier [2]. Patch 1-2: Prepare FRED/KVM and enumerate NMI-source reporting Patch 3-5: Register and handle NMI-source vectors Patch 6-8: APIC changes to generate NMIs with vectors Patch 9: Improve trace and debug with NMI-source information Many thanks to Sean Christopherson, Xin Li, H. Peter Anvin, Andi Kleen, Tony Luck, Kan Liang, Jacob Pan Jun, Zeng Guang and others for their contributions, reviews and feedback. Future work / Opens =================== I am considering a few additional changes that would be valuable for enhancing NMI handling support. Any feedback, preferences or suggestions on the following would be helpful. Assigning more NMI-source vectors --------------------------------- The current patches assign NMI vectors to a limited number of sources. The microcode rendezvous and crash reboot code use NMI but do not go through the typical register_nmi_handler() path. Their handling is special-cased in exc_nmi(). To isolate blame and improve debugging, it would be useful to assign vectors to them, even if the vectors are ignored during handling. Other NMI sources, such as GHES and Platform NMIs, can also be assigned vectors to speed up their NMI handling and improve isolation. However, this would require a software/hardware agreement on vector reservation and usage. Such an endeavor is likely not worth the effort. Explicitly enabling NMIs ------------------------ HPA brought up the idea [3] of explicitly enabling NMIs only when the kernel is ready to take them. With FRED, if we enter the kernel with NMIs disabled, they could remain disabled until returning back to userspace. DebugFS support --------------- Currently, the kernel has counters for unknown NMIs, swallowed NMIs and other NMI handling data. However, there is no easy way to access that. To identify issues that happen over a longer timeframe, it might be useful to add DebugFS support for NMI statistics. KVM support ----------- The NMI-source feature can be useful for perf users and other NMI use cases in guest VMs. Exposing NMI-source to guests once FRED support is in place should be relatively easier. The prototype code for this is under evaluation. Links ===== [1]: Chapter 9, https://www.intel.com/content/www/us/en/content-details/819481/flexible-return-and-event-delivery-fred-specification.html [2]: https://lore.kernel.org/lkml/20250327234629.3953536-1-sohil.mehta@xxxxxxxxx/ [3]: https://lore.kernel.org/lkml/F5D36889-A868-46D2-A678-8EE26E28556D@xxxxxxxxx/ Jacob Pan (1): perf/x86: Enable NMI-source reporting for perfmon Sohil Mehta (7): x86/cpufeatures: Add the CPUID feature bit for NMI-source reporting x86/nmi: Extend the registration interface to include the NMI-source vector x86/nmi: Assign and register NMI-source vectors x86/nmi: Add support to handle NMIs with source information x86/nmi: Prepare for the new NMI-source vector encoding x86/nmi: Enable NMI-source for IPIs delivered as NMIs x86/nmi: Include NMI-source information in tracepoint and debug prints Zeng Guang (1): x86/fred, KVM: VMX: Pass event data to the FRED entry point from KVM arch/x86/entry/entry_64_fred.S | 2 +- arch/x86/events/amd/ibs.c | 2 +- arch/x86/events/core.c | 6 ++-- arch/x86/events/intel/core.c | 6 ++-- arch/x86/include/asm/apic.h | 38 ++++++++++++++++++++++ arch/x86/include/asm/apicdef.h | 2 +- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/fred.h | 9 +++--- arch/x86/include/asm/nmi.h | 37 ++++++++++++++++++++- arch/x86/kernel/apic/hw_nmi.c | 5 ++- arch/x86/kernel/apic/ipi.c | 4 +-- arch/x86/kernel/apic/local.h | 24 +++++++------- arch/x86/kernel/cpu/cpuid-deps.c | 1 + arch/x86/kernel/cpu/mce/inject.c | 4 +-- arch/x86/kernel/cpu/mshyperv.c | 3 +- arch/x86/kernel/kgdb.c | 8 ++--- arch/x86/kernel/kvm.c | 9 +----- arch/x86/kernel/nmi.c | 50 ++++++++++++++++++++++++++++- arch/x86/kernel/nmi_selftest.c | 9 +++--- arch/x86/kernel/smp.c | 6 ++-- arch/x86/kvm/vmx/vmx.c | 5 +-- arch/x86/platform/uv/uv_nmi.c | 4 +-- drivers/acpi/apei/ghes.c | 2 +- drivers/char/ipmi/ipmi_watchdog.c | 3 +- drivers/edac/igen6_edac.c | 3 +- drivers/thermal/intel/therm_throt.c | 2 +- drivers/watchdog/hpwdt.c | 6 ++-- include/trace/events/nmi.h | 13 +++++--- 28 files changed, 190 insertions(+), 74 deletions(-) base-commit: f2e01dcf6df2d12e86c363ea9c37d53994d89dd6 -- 2.43.0