On 17 Jul 2025, Jose E. Marchesi outgrape: > >> On Wed, 2025-07-16 at 16:15 +0100, Nick Alcock wrote: >> >> [...] >> >>> - So... a third option, which is probably the most BTFish because it's >>> something BTF already does, in a sense: put everything in one section, >>> call it .BTF or .BTFA or whatever, and make that section an archive of >>> named BTF members, and then stuff however many BTF outputs the >>> deduplication generates (or none, if we're just stuffing inputs into >>> outputs without dedupping) into archive members. >>> >>> So, here's a possibility which seems to provide the latter option while >>> still letting existing tools read the first member (likely vmlinux): >>> >>> The idea is that we add a *next member link field* in the BTF header, and a >>> name (a strtab offset). The next member link field is an end-of-header- >>> relative offset just like most of the other header fields, which chains BTF >>> members together in a linked list: >>> >>> parent BTF >>> | >>> v >>> children BTF -> BTF -> BTF -> ... -> BTF >>> >>> The parent is always first in the list. >> >> Hi Nick, >> >> You are talking about BTF section embedded in a final vmlinux binary, right? > > More generally, a section embedded in any object which is the result of > linking two or more objects having .BTF sections: > > ld foo.o (.BTF) bar.o (.BTF) -> baz.o (.BTF) > > This covers the particular vmlinux case I think. Yes, though I wasn't expecting to see this in vmlinux yet! It might happen in the end. What this is used for is *communicating with pahole*: the .btfa file pahole receives is one of these, containing deduplicated BTF for the entire kernel plus all modules, and it's then up to pahole what to do with it. In userspace links (and in intermediate links of multifile kernel modules, used only as input to the btfarchive deduplicator), we do see this sort of thing heavily. >> Could you please elaborate a bit on why do you need multiple members >> within this section (in the context of your third option)? >> I re-read the email but don't get it :( > > As I understand it: > > The linker deduplicates types in the set of input .BTF sections. This > means that when linking foo.o and bar.o, if both compilation units refer > to a type 'quux', there are two possibilities: > > a) The type 'quux' is the same (using C type equivalence rules) in both > compilation units. Then the type is "shared" and the linker puts it > only once in the first output BTF member in baz.o .BTF, the "parent". > > b) The type 'quux' is different in both compilation units. These are > then conflicting types. Then two versions the type, foo.quux and > bar.quux, are placed by the linker in the corresponding "children" > member in baz.o. Yes. (We don't really quite use C type equivalence rules -- we're pickier, since types can be assignment-compatible but still different, and we want to preserve that difference. But that's nitpicking.) This happens really quite a lot in the kernel (I was surprised how often). It happens even more in userspace, sometimes to an almost pathological degree (hello, Ghostscript). LTO may make its prevalence lower in the future, but I doubt this sort of thing will ever go away: it's still with us in C++ programs, and there it's outright undefined behaviour! > Graphically, the .BTF section in a linked binary would contain a > one-level tree of members, with as many children as input compilation > units : > > parent (common types) > | > +--- child1 (types only in child1) > +--- child2 (types only in child2) > . > +--- childN (types only in childN) > > Hope this makes sense. Nick should be able to explain it better than I > do. There are really two cases, because the purpose of "being a child" is sort of overloaded. The kernel is, as ever, different... - Kernel-style builds (the traditional BTF case): vmlinux (parent) (common types, any types shared by more than one module) +--- child1.ko (types only in child1) +--- child2.ko (types only in child2) . +--- childN.ko (types only in childN) Notably, if a type differs (conflicts) across translation units, and all those translation units are in the core kernel, we can't put them in children because none of them are in modules, and children are reserved for modules: so we actually emit them as "hidden types" (a concept BTF doesn't have and that I am not currently proposing, which lets us say "this type is not visible in any namespaces, here's the name of the translation unit it was found in"). The same applies if a type differs within one module. If a type has conflicting definitions in two distinct modules, we can indeed just emit them into each module in turn. Also, if a type has one definition in a lot of modules and then a different one in one or two, we realise that the first definition is "most popular" and emit it into the parent, then emit the conflicting one into the few per-module children it is found in. Types that are used only by one module are placed in that per-module child, both because that's what pahole has always done and because it makes sense for a loosely-coupled project like the kernel not to clutter vmlinux up with thousands of types for huge modules like amdgpu that might never even be loaded. I am not expecting pahole to preserve hidden types, at least not yet (BTF has no way to encode them and no consumer understands them), but it can see them on its input, so it might use hiddenness as a flag that "hey, this type is conflicting, take care with everything with the same name" or something. The concept is not useless even if pahole largely ignores it: it does at least preserve the type graph and ensure that any type that refers to a conflicting type still refers to it after deduplication: it doesn't end up pointing at some other type with the same name. e.g. if we have these two TUs in the core kernel: a.c:struct foo { int a; }; struct bar { struct foo baz; }; b.c:struct foo { long a; }; /* Different! */ struct bar { struct foo baz; }; one struct foo (the least-referenced one) will wind up hidden, but the struct bar in that same TU will *still point at the hidden type*. Both types are *still there* and we don't end up pointing at the same struct foo from both struct bars. - For normal ELF links outside the kernel, the model above doesn't really make sense. Most programs don't have a concept like kernel modules, and most programs are more tightly coupled, so you want to see as many types as possible. So for those, the distribution is like this: parent (all types that are not conflicting) +--- child1.c (conflicting types defined in child1.c) +--- child2.c (conflicting types defined in child2.c) . +--- childN.c (conflicting types defined in child3.c) i.e., conflicting types are placed into children that are named after the translation units they come from. Within those dictionaries, there are no hidden types and there is no possibility of conflict; the shared parent corresponds to "all TUs together" and there can be no conflicts there either. In many ways this is a simpler model, but it just won't cut it for the kernel. We could in the end combine the two schemes, producing a multilevel tree, so that each module, and the core kernel, could contain an archive like userspace links do, with each conflicting type hived off into its own translation unit. This is *definitely* more work, and would probably require consumer changes too. I am not proposing it, at least not yet. But it shows where we could end up: vmlinux (parent) (common types, any types shared by more than one module) +--- core1a.c (conflicting types defined in core1a.c)... ... +--- child1.ko (types found only in child1) +-- child1a.c (conflicting types defined in child1a.c) +-- child1b.b (conflicting types defined in child1a.c) +--- child2.ko (types only in child2) . +--- childN.ko (types only in childN) The distinction between the two link types above is largely controlled via this linker option in GNU ld: --ctf-share-types=<method> How to share CTF types between translation units. <method> is: share-unconflicted (default), share-duplicated The final stage of kernel deduplication (the btfarchive tool) uses share-duplicated mode (and extra stuff to smush multiple translation units together into modules). (that's from current upstream master: obviously I'll have to find some way to say --ctf-or-btf without making it too verbose :) maybe I could just add a --btf-share-types as a synonym?)