Re: [RFC PATCH 00/20] TD-Preserving updates

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 14, 2025 at 05:21:47PM -0700, Paul E. McKenney wrote:
>On Fri, Jul 11, 2025 at 04:04:48PM +0800, Chao Gao wrote:
>> On Fri, May 23, 2025 at 02:52:23AM -0700, Chao Gao wrote:
>> >Hi Reviewers,
>> >
>> >This series adds support for runtime TDX module updates that preserve
>> >running TDX guests (a.k.a, TD-Preserving updates). The goal is to gather
>> >feedback on the feature design. Please pay attention to the following items:
>> >
>> >1. TD-Preserving updates are done in stop_machine() context. it copy-pastes
>> >   part of multi_cpu_stop() to guarantee step-locked progress on all CPUs.
>> >   But, there are a few differences between them. I am wondering whether
>> >   these differences have reached a point where abstracting a common
>> >   function might do more harm than good. See more details in patch 10.
>
>Please note that multi_cpu_stop() is used by a number of functions,
>so it is a good example of common code.  But you are within your rights
>to create your own function to pass to stop_machine(), and quite a
>few call sites do just that.  Most of them expect this function to be
>executed on only one CPU, but these run on multiple CPUs:
>
>o	__apply_alternatives_multi_stop(), which has CPU 0 do the
>	work and the rest wati on it.
>
>o	cpu_enable_non_boot_scope_capabilities(), which works on
>	a per-CPU basis.
>
>o	do_join(), which is similar to your do_seamldr_install_module().
>	Somewhat similar, anyway.
>
>o	__ftrace_modify_code(), of which there are several, some of
>	which have some vague resemblance to your code.
>
>o	cache_rendezvous_handler(), which works on a per-CPU basis.
>
>o	panic_stop_irqoff_fn(), which is a simple barrier-wait, with
>	the last CPU to arrive doing the work.
>
>I strongly recommend looking at these functions.  They might
>suggest an improved way to do what you are trying to accomplish with
>do_seamldr_install_module().

Hi Paul,

Thanks for your feedback.

Let me clarify what do_seamldr_install_module() does. Patch 10 just adds the
skeleton (sorry for only directing you to patch 10). More functions are added by
subsequent patches. Specifically:

 * TDP_SHUTDOWN (Patch 12)
	Shut down the running TDX module on any CPU while other CPUs must be idle
 * TDP_CPU_INSTALL (Patch 14)
	Load a new TDX module on all CPUs serially
 * TDP_CPU_INIT (patch 16)
	Initialize the new module on all CPUs in parallel
 * TDP_RUN_UPDATE (Patch 17)
	Import metadata from the old module on any CPU while other CPUs must be idle

And there are two requirements:
1. These steps must be executed in a lock-stepped manner, meaning all CPUs must
   complete step X before any CPU proceeds to step X+1.
2. If any CPU encounters an error, all CPUs should bail out rather than proceed
   to the next step.

>
>> >2. P-SEAMLDR seamcalls (specificially SEAMRET from P-SEAMLDR) clear current
>> >   VMCS pointers, which may disrupt KVM. To prevent VMX instructions in IRQ
>> >   context from encountering NULL current-VMCS pointers, P-SEAMLDR
>> >   seamcalls are called with IRQ disabled. I'm uncertain if NMIs could
>> >   cause a problem, but I believe they won't. See more information in patch 3.
>> >
>> >3. Two helpers, cpu_vmcs_load() and cpu_vmcs_store(), are added in patch 3
>> >   to save and restore the current VMCS. KVM has a variant of cpu_vmcs_load(),
>> >   i.e., vmcs_load(). Extracting KVM's version would cause a lot of code
>> >   churn, and I don't think that can be justified for reducing ~16 LoC
>> >   duplication. Please let me know if you disagree.
>> 
>> Gentle ping!
>
>I do not believe that I was CCed on the original.  Just in case you
>were wondering why I did not respond.  ;-)

My bad :(. I forgot to CC you when posting the series.

Btw, it seems that stop_machine.c isn't listed under any entry in MAINTAINERS.
I found your name by checking who submitted pull requests related to
stop_machine.c to Linus.

>
>> There are three open issues: one regarding stop_machine() and two related to
>> interactions with KVM.
>> 
>> Sean and Paul, do you have any preferences or insights on these matters?
>
>Again, you are within your rights to create a new function and pass
>it to stop_machine().  But it seems quite likely that there is a much
>simpler way to get your job done.
>
>Either way, please add a header comment stating what your function
>is trying to do,

Sure. Will do.

>which appears to be to wait for all CPUs to enter
>do_seamldr_install_module() and then just leave?

Emm, do_seamldr_install_module() does more than just a simple barrier-wait at
the end of the series.

>Sort of like
>multi_cpu_stop(), except leaving interrupts enabled and not executing a
>"msdata->fn(msdata->data);", correct?
>
>If so, something like panic_stop_irqoff_fn() might be a simpler model,
>perhaps with the touch_nmi_watchdog() and rcu_momentary_eqs() added.

As said above, lockstep is a key requirement. panic_stop_irqoff_fn()-like
simple model cannot meet our needs here.

>
>Oh, and one bug:  You must have interrupts disabled when you call
>rcu_momentary_eqs().  Please fix this.

Actually, interrupts are disabled in multi_cpu_stop() before it calls
msdata->fn (i.e., do_seamldr_install_module())

In this context, there are two state machines involved. The MULTI_STOP_RUN
state, part of the outer state machine, includes an inner state machine with
the following stages:
 * TDP_START
 * TDP_SHUTDOWN
 * TDP_CPU_INSTALL
 * TDP_CPU_INIT
 * TDP_RUN_UPDATE
 * TDP_DONE

I am concerned about the code duplication between do_seamldr_install_module()
and multi_cpu_stop(). But, I don't see a good way to eliminate the duplication
without adding more complexity. It seems you can also live with the duplication
if do_seamldr_install_module() truly requires another state machine, right?




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux