This change introduces a new taint flag, bit 20 ('H'), to indicate when the kernel has identified recoverable hardware failures during runtime. The flag is documented in tainted-kernels.rst, defined in panic.h, added to the taint_flags array in panic.c, and supported in the kernel-chktaint debugging tool. Marking kernels that have encountered recoverable hardware errors helps correlate future issues with hardware events, improving diagnostics and support for affected systems Signed-off-by: Breno Leitao <leitao@xxxxxxxxxx> --- Documentation/admin-guide/tainted-kernels.rst | 7 ++++++- include/linux/panic.h | 3 ++- kernel/panic.c | 1 + tools/debugging/kernel-chktaint | 8 ++++++++ 4 files changed, 17 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst index a0cc017e44246..28185e9c0e039 100644 --- a/Documentation/admin-guide/tainted-kernels.rst +++ b/Documentation/admin-guide/tainted-kernels.rst @@ -102,7 +102,8 @@ Bit Log Number Reason that got the kernel tainted 17 _/T 131072 kernel was built with the struct randomization plugin 18 _/N 262144 an in-kernel test has been run 19 _/J 524288 userspace used a mutating debug operation in fwctl -=== === ====== ======================================================== + 20 _/H 1048576 hardware recoverable failures identified +=== === ======= ======================================================== Note: The character ``_`` is representing a blank in this table to make reading easier. @@ -189,3 +190,7 @@ More detailed explanation for tainting 19) ``J`` if userpace opened /dev/fwctl/* and performed a FWTCL_RPC_DEBUG_WRITE to use the devices debugging features. Device debugging features could cause the device to malfunction in undefined ways. + + 20) ``H`` if the kernel identified any recoverable hardware failure earlier + during its operation. This helps to correlate possible future issues to + the fact that the hardware got a recoverable error. diff --git a/include/linux/panic.h b/include/linux/panic.h index 4adc657669354..d8241a052d69a 100644 --- a/include/linux/panic.h +++ b/include/linux/panic.h @@ -73,7 +73,8 @@ static inline void set_arch_panic_timeout(int timeout, int arch_default_timeout) #define TAINT_RANDSTRUCT 17 #define TAINT_TEST 18 #define TAINT_FWCTL 19 -#define TAINT_FLAGS_COUNT 20 +#define TAINT_HW_ERROR_RECOVERED 20 +#define TAINT_FLAGS_COUNT 21 #define TAINT_FLAGS_MAX ((1UL << TAINT_FLAGS_COUNT) - 1) struct taint_flag { diff --git a/kernel/panic.c b/kernel/panic.c index b0b9a8bf4560d..fd13baf5d94bc 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -540,6 +540,7 @@ const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = { TAINT_FLAG(RANDSTRUCT, 'T', ' ', true), TAINT_FLAG(TEST, 'N', ' ', true), TAINT_FLAG(FWCTL, 'J', ' ', true), + TAINT_FLAG(HW_ERROR_RECOVERED, 'H', ' ', false), }; #undef TAINT_FLAG diff --git a/tools/debugging/kernel-chktaint b/tools/debugging/kernel-chktaint index e7da0909d0970..b2099155a820c 100755 --- a/tools/debugging/kernel-chktaint +++ b/tools/debugging/kernel-chktaint @@ -212,6 +212,14 @@ else echo " * fwctl's mutating debug interface was used (#19)" fi +T=`expr $T / 2` +if [ `expr $T % 2` -eq 0 ]; then + addout " " +else + addout "H" + echo " * the kernel identified recoverable hardware errors (#20)" +fi + echo "For a more detailed explanation of the various taint flags see" echo " Documentation/admin-guide/tainted-kernels.rst in the Linux kernel sources" echo " or https://kernel.org/doc/html/latest/admin-guide/tainted-kernels.html" -- 2.47.1