On Fri, Jun 20, 2025 at 11:28 AM Pratyush Yadav <pratyush@xxxxxxxxxx> wrote: > > Hi Pasha, > > On Thu, Jun 19 2025, Pasha Tatashin wrote: > > [...] > >> And it has to be done before kexec load, at least until we resolve this. > > > > The before kexec load constrained has been fixed. The only > > "finalization" constraint we have is it should be before > > reboot(LINUX_REBOOT_CMD_KEXEC) and only because memory allocations > > during kernel shutdown are undesirable. Once KHO moves away from a > > monolithic state machine this constraint disappears. Kernel components > > could preserve their resources at appropriate times, not necessarily > > tied to a shutdown-time. For live update scenarios, LUO already > > orchestrates this timing. > > > >> Currently this is triggered either by KHO debugfs or by LUO ioctls. If we > >> completely drop KHO debugfs and notifiers, we still need something that > >> would trigger the magic. > > > > An external "magic trigger" for KHO (like the current finalize > > notifier or debugfs command) is necessary for scenarios like live > > update, where userspace resources are being preserved in a coordinated > > fashion just before kexec. > > > > For kernel-internal resources that are unrelated to such a > > userspace-driven live update flow, the respective kernel components > > should directly use KHO's primitive preservation APIs > > (kho_preserve_folio, etc.) when they need to mark their resources for > > handover. No separate, state machine or external trigger should be > > required for these individual, self-contained preservation acts. > Hi Pratyush, > For kernel-internal components, I think this makes a lot of sense, > especially now that we don't need to get everything done by kexec load > time. I suppose the liveupdate_reboot() call at reboot time to prepare > final things can be useful, but subsystems can just as well register > reboot notifiers to get the same notification. Correct. If subsystems unrelated to the userspace live update flow, such as pstore, tracing, telemetry, debugging, or IMA, need to be notified about a reboot, they can simply register their own reboot notifier. > >> I'm not saying we should keep KHO debugfs and notifiers, I'm saying that if > >> we make LUO the only thing driving KHO, liveupdate is not an appropriate > >> name. > > > > LUO drives KHO specifically for the purpose of live updates. If a > > different userspace use-case emerges that needs another distinct > > purpose (e.g., not to preserve a FD a or a device across kernel reboot > > (i.e. something for which LUO does not provide uAPI)), then that would > > probably need a separate from LUO uAPI instead of extending the LUO > > uAPI. > > Outside of hypervisor live update, I have a very clear use case in mind: > userspace memory handover (on guest side). Say a guest running an > in-memory cache like memcached with many gigabytes of cache wants to > reboot. It can just shove the cache into a memfd, give it to LUO, and > restore it after reboot. Some services that suffer from long reboots are > looking into using this to reduce downtime. Since it pretty much > overlaps with the hypervisor work for now, I haven't been talking about > it as much. > > Would you also call this use case "live update"? Does it also fit with > your vision of where LUO should go? Yes, absolutely. The use case you described (preserving a memcached instance via memfd) is a perfect fit for LUO's vision. While the primary use case driving this work is supporting the preservation of virtual machines on a hypervisor, the framework itself is not restricted to that scenario. We define "live update" as the process of updating the kernel from one version to another while preserving FD-based resources and keeping selected devices operational. The machine itself can be running storage, database, networking, containers, or anything else. A good parallel is Kernel Live Patching: we don't distinguish what workload is running on a machine when applying a security patch; we simply patch the running kernel. In the same way, Live Update is designed to be workload-agnostic. Whether the system is running an in-memory database, containers, or VMs, its primary goal is to enable a full kernel update while preserving the userspace-requested state. Thanks, Pasha