On Tue, Jul 29, 2025 at 12:28:37AM +1200, Kai Huang wrote: >On TDX platforms, dirty cacheline aliases with and without encryption >bits can coexist, and the cpu can flush them back to memory in random >order. During kexec, the caches must be flushed before jumping to the >new kernel otherwise the dirty cachelines could silently corrupt the >memory used by the new kernel due to different encryption property. > >A percpu boolean is used to mark whether the cache of a given CPU may be >in an incoherent state, and the kexec performs WBINVD on the CPUs with >that boolean turned on. > >For TDX, only the TDX module or the TDX guests can generate dirty >cachelines of TDX private memory, i.e., they are only generated when the >kernel does a SEAMCALL. > >Set that boolean when the kernel does SEAMCALL so that kexec can flush >the cache correctly. > >The kernel provides both the __seamcall*() assembly functions and the >seamcall*() wrapper ones which additionally handle running out of >entropy error in a loop. Most of the SEAMCALLs are called using the >seamcall*(), except TDH.VP.ENTER and TDH.PHYMEM.PAGE.RDMD which are >called using __seamcall*() variant directly. > >To cover the two special cases, add a new helper do_seamcall() which >only sets the percpu boolean and then calls the __seamcall*(), and >change the special cases to use do_seamcall(). To cover all other >SEAMCALLs, change seamcall*() to call do_seamcall(). > >For the SEAMCALLs invoked via seamcall*(), they can be made from both >task context and IRQ disabled context. Given SEAMCALL is just a lengthy >instruction (e.g., thousands of cycles) from kernel's point of view and >preempt_{disable|enable}() is cheap compared to it, just unconditionally >disable preemption during setting the boolean and making SEAMCALL. > >Signed-off-by: Kai Huang <kai.huang@xxxxxxxxx> >Tested-by: Farrah Chen <farrah.chen@xxxxxxxxx> Reviewed-by: Chao Gao <chao.gao@xxxxxxxxx> One small question below, <snip> >+static __always_inline u64 do_seamcall(sc_func_t func, u64 fn, >+ struct tdx_module_args *args) >+{ >+ lockdep_assert_preemption_disabled(); >+ >+ /* >+ * SEAMCALLs are made to the TDX module and can generate dirty >+ * cachelines of TDX private memory. Mark cache state incoherent >+ * so that the cache can be flushed during kexec. >+ * >+ * This needs to be done before actually making the SEAMCALL, ... >+ * because kexec-ing CPU could send NMI to stop remote CPUs, >+ * in which case even disabling IRQ won't help here. >+ */ >+ this_cpu_write(cache_state_incoherent, true); I'm not sure this write is guaranteed to occur before the SEAMCALL, as I don't see any explicit barrier to prevent the compiler from reordering them. Do you think this is an issue? >+ >+ return func(fn, args); >+} >+