Re: [RFC v2 05/16] luo: luo_core: integrate with KHO

Pasha Tatashin <pasha.tatashin@xxxxxxxxxx> · Fri, 20 Jun 2025 12:03:59 -0400

On Fri, Jun 20, 2025 at 11:28 AM Pratyush Yadav <pratyush@xxxxxxxxxx> wrote:
>
> Hi Pasha,
>
> On Thu, Jun 19 2025, Pasha Tatashin wrote:
>
> [...]
> >> And it has to be done before kexec load, at least until we resolve this.
> >
> > The before kexec load constrained has been fixed. The only
> > "finalization" constraint we have is it should be before
> > reboot(LINUX_REBOOT_CMD_KEXEC) and only because memory allocations
> > during kernel shutdown are undesirable. Once KHO moves away from a
> > monolithic state machine this constraint disappears. Kernel components
> > could preserve their resources at appropriate times, not necessarily
> > tied to a shutdown-time. For live update scenarios, LUO already
> > orchestrates this timing.
> >
> >> Currently this is triggered either by KHO debugfs or by LUO ioctls. If we
> >> completely drop KHO debugfs and notifiers, we still need something that
> >> would trigger the magic.
> >
> > An external "magic trigger" for KHO (like the current finalize
> > notifier or debugfs command) is necessary for scenarios like live
> > update, where userspace resources are being preserved in a coordinated
> > fashion just before kexec.
> >
> > For kernel-internal resources that are unrelated to such a
> > userspace-driven live update flow, the respective kernel components
> > should directly use KHO's primitive preservation APIs
> > (kho_preserve_folio, etc.) when they need to mark their resources for
> > handover. No separate, state machine or external trigger should be
> > required for these individual, self-contained preservation acts.
>

Hi Pratyush,

> For kernel-internal components, I think this makes a lot of sense,
> especially now that we don't need to get everything done by kexec load
> time. I suppose the liveupdate_reboot() call at reboot time to prepare
> final things can be useful, but subsystems can just as well register
> reboot notifiers to get the same notification.

Correct. If subsystems unrelated to the userspace live update flow,
such as pstore, tracing, telemetry, debugging, or IMA, need to be
notified about a reboot, they can simply register their own reboot
notifier.

> >> I'm not saying we should keep KHO debugfs and notifiers, I'm saying that if
> >> we make LUO the only thing driving KHO, liveupdate is not an appropriate
> >> name.
> >
> > LUO drives KHO specifically for the purpose of live updates. If a
> > different userspace use-case emerges that needs another distinct
> > purpose (e.g., not to preserve a FD a or a device across kernel reboot
> > (i.e. something for which LUO does not provide uAPI)), then that would
> > probably need a separate from LUO uAPI instead of extending the LUO
> > uAPI.
>
> Outside of hypervisor live update, I have a very clear use case in mind:
> userspace memory handover (on guest side). Say a guest running an
> in-memory cache like memcached with many gigabytes of cache wants to
> reboot. It can just shove the cache into a memfd, give it to LUO, and
> restore it after reboot. Some services that suffer from long reboots are
> looking into using this to reduce downtime. Since it pretty much
> overlaps with the hypervisor work for now, I haven't been talking about
> it as much.
>
> Would you also call this use case "live update"? Does it also fit with
> your vision of where LUO should go?

Yes, absolutely. The use case you described (preserving a memcached
instance via memfd) is a perfect fit for LUO's vision.

While the primary use case driving this work is supporting the
preservation of virtual machines on a hypervisor, the framework itself
is not restricted to that scenario. We define "live update" as the
process of updating the kernel from one version to another while
preserving FD-based resources and keeping selected devices
operational. The machine itself can be running storage, database,
networking, containers, or anything else.

A good parallel is Kernel Live Patching: we don't distinguish what
workload is running on a machine when applying a security patch; we
simply patch the running kernel. In the same way, Live Update is
designed to be workload-agnostic. Whether the system is running an
in-memory database, containers, or VMs, its primary goal is to enable
a full kernel update while preserving the userspace-requested state.

Thanks,
Pasha