hello Shuai, On Thu, Jul 17, 2025 at 11:03:51AM +0800, Shuai Xue wrote: > 在 2025/7/16 20:42, Breno Leitao 写道: > > That said, Tony proposed a more robust approach—categorizing and > > tracking errors by their source. This would involve maintaining separate > > counters for each source using an counter per enum type: > > > > enum recovered_error_sources { > > ERR_GHES, > > ERR_MCE, > > ERR_AER, > > ... > > ERR_NUM_SOURCES > > }; > > > > See more at: https://lore.kernel.org/all/aHWC-J851eaHa_Au@agluck-desk3/ > > > > Do you think this would help you by any chance? > > Personally, I think this approach would be more helpful. Additionally, I > suggest not mixing CEs (Correctable Errors) and UEs (Uncorrectable > Errors) together. This is especially important for memory errors, as CEs > occur much more frequently than UEs, but their impact is much smaller. Yes, I totally agree. This would be even better than my original solution. Let me spend some time on it and see how further I can go. Thanks for your opinions, --breno