On Tue, Aug 19, 2025 at 8:58 AM Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote: > > Avoid clearing reclaimed TDX private pages unless the platform is affected > by the X86_BUG_TDX_PW_MCE erratum. This significantly reduces VM shutdown > time on unaffected systems. > > Background > > KVM currently clears reclaimed TDX private pages using MOVDIR64B, which: > > - Clears the TD Owner bit (which identifies TDX private memory) and > integrity metadata without triggering integrity violations. > - Clears poison from cache lines without consuming it, avoiding MCEs on > access (refer TDX Module Base spec. 1348549-006US section 6.5. > Handling Machine Check Events during Guest TD Operation). > > The TDX module also uses MOVDIR64B to initialize private pages before use. > If cache flushing is needed, it sets TDX_FEATURES.CLFLUSH_BEFORE_ALLOC. > However, KVM currently flushes unconditionally, refer commit 94c477a751c7b > ("x86/virt/tdx: Add SEAMCALL wrappers to add TD private pages") > > In contrast, when private pages are reclaimed, the TDX Module handles > flushing via the TDH.PHYMEM.CACHE.WB SEAMCALL. > > Problem > > Clearing all private pages during VM shutdown is costly. For guests > with a large amount of memory it can take minutes. > > Solution > > TDX Module Base Architecture spec. documents that private pages reclaimed > from a TD should be initialized using MOVDIR64B, in order to avoid > integrity violation or TD bit mismatch detection when later being read > using a shared HKID, refer April 2025 spec. "Page Initialization" in > section "8.6.2. Platforms not Using ACT: Required Cache Flush and > Initialization by the Host VMM" > > That is an overstatement and will be clarified in coming versions of the > spec. In fact, as outlined in "Table 16.2: Non-ACT Platforms Checks on > Memory" and "Table 16.3: Non-ACT Platforms Checks on Memory Reads in Li > Mode" in the same spec, there is no issue accessing such reclaimed pages > using a shared key that does not have integrity enabled. Linux always uses > KeyID 0 which never has integrity enabled. KeyID 0 is also the TME KeyID > which disallows integrity, refer "TME Policy/Encryption Algorithm" bit > description in "Intel Architecture Memory Encryption Technologies" spec > version 1.6 April 2025. So there is no need to clear pages to avoid > integrity violations. > > There remains a risk of poison consumption. However, in the context of > TDX, it is expected that there would be a machine check associated with the > original poisoning. On some platforms that results in a panic. However > platforms may support "SEAM_NR" Machine Check capability, in which case > Linux machine check handler marks the page as poisoned, which prevents it > from being allocated anymore, refer commit 7911f145de5fe ("x86/mce: > Implement recovery for errors in TDX/SEAM non-root mode") > > Improvement > > By skipping the clearing step on unaffected platforms, shutdown time > can improve by up to 40%. > > On platforms with the X86_BUG_TDX_PW_MCE erratum (SPR and EMR), continue > clearing because these platforms may trigger poison on partial writes to > previously-private pages, even with KeyID 0, refer commit 1e536e1068970 > ("x86/cpu: Detect TDX partial write machine check erratum") > > Reviewed-by: Kirill A. Shutemov <kas@xxxxxxxxxx> > Acked-by: Kai Huang <kai.huang@xxxxxxxxx> > Reviewed-by: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx> > Reviewed-by: Xiaoyao Li <xiaoyao.li@xxxxxxxxx> > Reviewed-by: Binbin Wu <binbin.wu@xxxxxxxxxxxxxxx> > Signed-off-by: Adrian Hunter <adrian.hunter@xxxxxxxxx> Acked-by: Vishal Annapurve <vannapurve@xxxxxxxxxx>