Re: [PATCH v3 29/30] luo: allow preserving memfd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 9, 2025 at 10:53 AM Pratyush Yadav <me@xxxxxxxxxxxxxxxxx> wrote:
>
> On Thu, Sep 04 2025, Jason Gunthorpe wrote:
>
> > On Thu, Sep 04, 2025 at 02:57:35PM +0200, Pratyush Yadav wrote:
> >
> >> I don't think it matters if they are preserved or not. The serialization
> >> and deserialization is independent of that. You can very well create a
> >> KHO array that you don't KHO-preserve. On next boot, you can still use
> >> it, you just have to be careful of doing it while scratch-only. Same as
> >> we do now.
> >
> > The KHO array machinery itself can't preserve its own memory
> > either.
>
> It can. Maybe it couldn't in the version I showed you, but now it can.
> See kho_array_preserve() in
> https://lore.kernel.org/linux-mm/20250909144426.33274-2-pratyush@xxxxxxxxxx/
>
> >
> >> For the _hypervisor_ live update case, sure. Though even there, I have a
> >> feeling we will start seeing userspace components on the hypervisor use
> >> memfd for stashing some of their state.
> >
> > Sure, but don't make excessively sparse memfds for kexec use, why
> > should that be hard?
>
> Sure, I don't think they should be excessively sparse. But _some_ level
> of sparseness can be there.

This is right; loosely sparse memfd support is needed. However, an
excessively sparse preservation will be inefficient for LU, unless we
change the backing to be from a separate pool of physical pages that
is always preserved. If we do that, it would probably make sense only
for guestmemfd and only if we ever decide to support overcommitted
VMs. I suspect it is not something that we currently need to worry
about.

> >> applications. Think big storage nodes with memory in order of TiB. Those
> >> can use a memfd to back their caches so on a kernel upgrade the caches
> >> don't have to be re-fetched. Sparseness is to be expected for such use
> >> cases.
> >
> > Oh? I'm surpised you'd have sparseness there. sparseness seems like
> > such a weird feature to want to rely on :\
> >
> >> But perhaps it might be a better idea to come up with a mechanism for
> >> the kernel to discover which formats the "next" kernel speaks so it can
> >> for one decide whether it can do the live update at all, and for another
> >> which formats it should use. Maybe we give a way for luod to choose
> >> formats, and give it the responsibility for doing these checks?
> >
> > I have felt that we should catalog the formats&versions the kernel can
> > read/write in some way during kbuild.
> >
> > Maybe this turns into a sysfs directory of all the data with an
> > 'enable_write' flag that luod could set to 0 to optimize.
> >
> > And maybe this could be a kbuild report that luod could parse to do
> > this optimization.
>
> Or maybe we put that information in a ELF section in the kernel image?
> Not sure how feasible it would be for tooling to read but I think that
> would very closely associate the versions info with the kernel. The
> other option might be to put it somewhere with modules I guess.

To me, all this sounds like hardening, which, while important, can be
added later. The pre-kexec check for compatibility can be defined and
implemented once we have all live update components ready
(KHO/LUO/PCI/IOMMU/VFIO/MEMFD), once we stabilize the versioning
story, and once we start discussing update stability.

Currently, we've agreed that there are no stability guarantees.
Sometime in the future, we may guarantee minor-to-minor stability, and
later, stable-to-stable. Once we start working on minor-to-minor
stability, it would be a good idea to also add hardening where a
pre-live update would check for compatibility.

In reality, this is not something that is high priority for cloud
providers, because these kinds of incompatibilities would be found
during qualification; the kernel will fail to update by detecting a
version mismatch during boot instead of during shutdown.

> > And maybe distro/csps use this information mechanically to check if
> > version pairs are kexec compatible.
> >
> > Which re-enforces my feeling that the formats/version should be first
> > class concepts, every version should be registered and luo should
> > sequence calling the code for the right version at the right time.
> >
> > Jason
>
> --
> Regards,
> Pratyush Yadav





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux