On Mon, Mar 24, 2025 at 05:21:45PM -0700, Changyuan Lyu wrote: > Thanks for the suggestions! I am a little bit concerned about assuming > every FDT fragment is smaller than PAGE_SIZE. In case a child FDT is > larger than PAGE_SIZE, I would like to turn the single u64 in the parent > FDT into a u64 list to record all the underlying pages of the child FDT. Maybe, but I'd suggest leaving some accomodation for this in the API but not implement it until we see proof it is needed. 4k is alot of space for a FDT, and if you are doing per-object FDT I don't see exceeding it. For instance a vfio, memfd, and iommufd object FDTs would not get close. > In this way we assume that most FDT fragment is smaller than 1 page so > "kho,recursive-fdt" is usually just 1 u64, but we can also handle > larger fragments if that really happens. Yes, this is close to what I imagine. You have to decide if the child FDT top pointers will be stored directly in parent FDTs like you sketched above, or if they should be stored in some dedicated allocated and preserved datastructure, like the memory preservation works. There are some tradeoffs in each direction.. > I also allow KHO users to add sub nodes in-place, instead of forcing > to create a new FDT fragment for every sub node, if the KHO user is > confident that those subnodes are small enough to fit in the parent > node's page. In this way we do not need to waste a full page for a small > sub node. An example is the "memblock" node above. Well, I think that sort of misses the bigger picture. What we want is to run serialization of everything in parallel. So merging like you say will complicate that. Really, I think we will have on the order of 10's of objects to serialize so I don't really care if they use partial pages if that makes the serialization faster. As long as the memory is freed once the live update is done, the waste doesn't matter. > Finally, the KHO top level FDT may also be larger than 1 page, this can > be handled using the anchor-page method discussed in the previous mails. This is one of the trade offs I mentioned. If you inline the objects as FDT nodes then you have to scale and multi-page a FDT. If you do a binary-structure like memory preservation then you have to serialize to something that is inherently scalable and 4k granular. The 4k FDT limit really only works if you make liberal use of pointers to binary data. Anything that is not of a predictable size limit would be in some related binary structure. So.. I'd probably suggest to think about how to make multi-page FDT work in the memory description, but not implement it now. When we reach the point where we know we need multi-page FDT then someone would have to implement a growable FDT through vmap or something like that to make it work. Keep this intial step simple, we clearly don't need more than 4k FDT at this point and we aren't doing stable kexec-ABI either. So simplify simplify simplify to get a very thin minimal functionality merged to put the fdbox step on top of. Jason