On Thu, Aug 14, 2025, Rick P Edgecombe wrote: > On Thu, 2025-08-14 at 06:54 -0700, Sean Christopherson wrote: > > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > > > index 66744f5768c8..1bc6f52e0cd7 100644 > > > --- a/arch/x86/kvm/vmx/tdx.c > > > +++ b/arch/x86/kvm/vmx/tdx.c > > > @@ -442,6 +442,18 @@ void tdx_disable_virtualization_cpu(void) > > > tdx_flush_vp(&arg); > > > } > > > local_irq_restore(flags); > > > + > > > + /* > > > + * No more TDX activity on this CPU from here. Flush cache to > > > + * avoid having to do WBINVD in stop_this_cpu() during kexec. > > > + * > > > + * Kexec calls native_stop_other_cpus() to stop remote CPUs > > > + * before booting to new kernel, but that code has a "race" > > > + * when the normal REBOOT IPI times out and NMIs are sent to > > > + * remote CPUs to stop them. Doing WBINVD in stop_this_cpu() > > > + * could potentially increase the possibility of the "race". Why is that race problematic? The changelog just says : However, the native_stop_other_cpus() and stop_this_cpu() have a "race" : which is extremely rare to happen but could cause the system to hang. : : Specifically, the native_stop_other_cpus() firstly sends normal reboot : IPI to remote CPUs and waits one second for them to stop. If that times : out, native_stop_other_cpus() then sends NMIs to remote CPUs to stop : them. without explaining how that can cause a system hang. > > > + */ > > > + tdx_cpu_flush_cache(); > > > > IIUC, this can be: > > > > if (IS_ENABLED(CONFIG_KEXEC)) > > tdx_cpu_flush_cache(); > > > > No strong objection, just 2 cents. I bet !CONFIG_KEXEC && CONFIG_INTEL_TDX_HOST > kernels will be the minority. Seems like an opportunity to simplify the code. Reducing the number of lines of code is not always a simplification. IMO, not checking CONFIG_KEXEC adds "complexity" because anyone that reads the comment (and/or the massive changelog) will be left wondering why there's a bunch of documentation that talks about kexec, but no hint of kexec considerations in the code.