On Mon, Jul 14, 2025 at 09:57:29AM -0700, Breno Leitao wrote: > Add a global variable, ghes_recovered_erors, to count hardware errors > classified as recoverable or corrected. This counter is exported and > included in vmcoreinfo for post-crash diagnostics. > > Tracking this value helps operators potentially correlate hardware > errors across system events and crash dumps, indicating that RAS logs > might be useful while analyzing these crashes. This discussion and > motivation could be found in [1]. > > Atomic operations are deliberately omitted, as precise accuracy is not > required for this metric. [snip] > @@ -1100,13 +1106,16 @@ static int ghes_proc(struct ghes *ghes) > { > struct acpi_hest_generic_status *estatus = ghes->estatus; > u64 buf_paddr; > - int rc; > + int rc, sev; > > rc = ghes_read_estatus(ghes, estatus, &buf_paddr, FIX_APEI_GHES_IRQ); > if (rc) > goto out; > > - if (ghes_severity(estatus->error_severity) >= GHES_SEV_PANIC) > + sev = ghes_severity(estatus->error_severity); > + if (sev == GHES_SEV_RECOVERABLE || sev == GHES_SEV_CORRECTED) > + ghes_recovered_erors += 1; ghes_recovered_erors++: > + else if (sev >= GHES_SEV_PANIC) > __ghes_panic(ghes, estatus, buf_paddr, FIX_APEI_GHES_IRQ); > > if (!ghes_estatus_cached(estatus)) { > @@ -1750,6 +1759,8 @@ void __init acpi_ghes_init(void) > pr_info(GHES_PFX "APEI firmware first mode is enabled by APEI bit.\n"); > else > pr_info(GHES_PFX "Failed to enable APEI firmware first mode.\n"); > + > + ghes_recovered_erors = 0; Unnecessary. Global variables all start at zero unless otherwise initialized. > } -Tony