On 6/26/25 15:20, Huang, Kai wrote: > But IMHO we may should just have a simple policy that when a page is marked > as poisoned, it should never be touched again. It's only one page anyway > (for one TD) so losing that doesn't seem bad to me. If we want to clear the > poisoned page, then perhaps we should mark that page to be not-poisoned > again. The simplest policy is to do nothing. The kernel only has 29 places that check PageHWPoison(). I'd guess that roughly half of those are the memory-failure.c infrastructure and bare-minimum code to handle poison, like not allowing pages to go back into the allocator. There are something like 5,000 lines of code in the kernel that deal with a literal 'struct page'. 29 checks for ~5,000 sites is pretty minuscule. We obviously don't have a policy that every place that uses 'struct page' needs to check for poison. We also don't even have a policy where writes to or reads from a page check for poison. Why is this TDX code so special that PageHWPoison() needs to be checked. For instance: $ grep -r PageHWPoison arch/x86/ arch/x86/kernel/cpu/mce/core.c: SetPageHWPoison(p); arch/x86/kernel/cpu/mce/core.c: SetPageHWPoison(p); In other words, this would be the *ONLY* arch/x86 site. Why?