Re: [PATCH v3 29/30] luo: allow preserving memfd

Jason Gunthorpe <jgg@xxxxxxxxxx> · Tue, 26 Aug 2025 13:20:19 -0300

On Thu, Aug 07, 2025 at 01:44:35AM +0000, Pasha Tatashin wrote:

> +	/*
> +	 * Most of the space should be taken by preserved folios. So take its
> +	 * size, plus a page for other properties.
> +	 */
> +	fdt = memfd_luo_create_fdt(PAGE_ALIGN(preserved_size) + PAGE_SIZE);
> +	if (!fdt) {
> +		err = -ENOMEM;
> +		goto err_unpin;
> +	}

This doesn't seem to have any versioning scheme, it really should..

> +	err = fdt_property_placeholder(fdt, "folios", preserved_size,
> +				       (void **)&preserved_folios);
> +	if (err) {
> +		pr_err("Failed to reserve folios property in FDT: %s\n",
> +		       fdt_strerror(err));
> +		err = -ENOMEM;
> +		goto err_free_fdt;
> +	}

Yuk.

This really wants some luo helper

'luo alloc array'
'luo restore array'
'luo free array'

Which would get a linearized list of pages in the vmap to hold the
array and then allocate some structure to record the page list and
return back the u64 of the phys_addr of the top of the structure to
store in whatever.

Getting fdt to allocate the array inside the fds is just not going to
work for anything of size.

> +	for (; i < nr_pfolios; i++) {
> +		const struct memfd_luo_preserved_folio *pfolio = &pfolios[i];
> +		phys_addr_t phys;
> +		u64 index;
> +		int flags;
> +
> +		if (!pfolio->foliodesc)
> +			continue;
> +
> +		phys = PFN_PHYS(PRESERVED_FOLIO_PFN(pfolio->foliodesc));
> +		folio = kho_restore_folio(phys);
> +		if (!folio) {
> +			pr_err("Unable to restore folio at physical address: %llx\n",
> +			       phys);
> +			goto put_file;
> +		}
> +		index = pfolio->index;
> +		flags = PRESERVED_FOLIO_FLAGS(pfolio->foliodesc);
> +
> +		/* Set up the folio for insertion. */
> +		/*
> +		 * TODO: Should find a way to unify this and
> +		 * shmem_alloc_and_add_folio().
> +		 */
> +		__folio_set_locked(folio);
> +		__folio_set_swapbacked(folio);
> 
> +		ret = mem_cgroup_charge(folio, NULL, mapping_gfp_mask(mapping));
> +		if (ret) {
> +			pr_err("shmem: failed to charge folio index %d: %d\n",
> +			       i, ret);
> +			goto unlock_folio;
> +		}

[..]

> +		folio_add_lru(folio);
> +		folio_unlock(folio);
> +		folio_put(folio);
> +	}

Probably some consolidation will be needed to make this less
duplicated..

But overall I think just using the memfd_luo_preserved_folio as the
serialization is entirely file, I don't think this needs anything more
complicated.

What it does need is an alternative to the FDT with versioning.

Which seems to me to be entirely fine as:

 struct memfd_luo_v0 {
    __aligned_u64 size;
    __aligned_u64 pos;
    __aligned_u64 folios;
 };

 struct memfd_luo_v0 memfd_luo_v0 = {.size = size, pos = file->f_pos, folios = folios};
 luo_store_object(&memfd_luo_v0, sizeof(memfd_luo_v0), <.. identifier for this fd..>, /*version=*/0);

Which also shows the actual data needing to be serialized comes from
more than one struct and has to be marshaled in code, somehow, to a
single struct.

Then I imagine a fairly simple forwards/backwards story. If something
new is needed that is non-optional, lets say you compress the folios
list to optimize holes:

 struct memfd_luo_v1 {
    __aligned_u64 size;
    __aligned_u64 pos;
    __aligned_u64 folios_list_with_holes;
 };

Obviously a v0 kernel cannot parse this, but in this case a v1 aware
kernel could optionally duplicate and write out the v0 format as well:

 luo_store_object(&memfd_luo_v0, sizeof(memfd_luo_v0), <.. identifier for this fd..>, /*version=*/0);
 luo_store_object(&memfd_luo_v1, sizeof(memfd_luo_v1), <.. identifier for this fd..>, /*version=*/1);

Then the rule is fairly simple, when the sucessor kernel goes to
deserialize it asks luo for the versions it supports:

 if (luo_restore_object(&memfd_luo_v1, sizeof(memfd_luo_v1), <.. identifier for this fd..>, /*version=*/1))
    restore_v1(&memfd_luo_v1)
 else if (luo_restore_object(&memfd_luo_v0, sizeof(memfd_luo_v0), <.. identifier for this fd..>, /*version=*/0))
    restore_v0(&memfd_luo_v0)
 else
    luo_failure("Do not understand this");

luo core just manages this list of versioned data per serialized
object. There is only one version per object.

Jason