Re: [RFC 0/4] BTF archive with unmodified pahole+toolchain

Arnaldo Carvalho de Melo <arnaldo.melo@xxxxxxxxx> · Fri, 08 Aug 2025 00:25:46 -0300

On August 7, 2025 11:52:51 PM GMT-03:00, Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote:
>On Thu, Aug 7, 2025 at 7:36 PM Arnaldo Carvalho de Melo <arnaldo.melo@xxxxxxxxx> wrote:

>> On Thu, Aug 7, 2025, 11:09 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote:

>>> On Thu, Aug 7, 2025 at 11:25 AM Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> wrote:

>>> > This is complementary to today's series from Alan Maguire, as we can use
>>> > the one liner for the kernel build process to test his series without
>>> > requiring installing a toolchain that generates BTF for each .o file
>>> > that will result in vmlinux.

>>> > Next steps on my side are to:

>>> > 1. change pahole for when it receives --format_path=btf check if
>>> > btf__is_archive(btf) is true, then just replace the current vmlinux .BTF
>>> > contents with the raw data in this just loaded BTF, short circuiting
>>> > the whole process.

>>> > 2. the kernel build process should be changed to allow one to ask for
>>> > just BTF, not DWARF, and if so, using the above method, strip the DWARF
>>> > info after using it to generate BTF.

>>> > Then when compilers are producing BTF, we switch to that, falling back
>>> > to the above method when a compiler is known to generate buggy BTF.

>>> > And also to use in CIs, to compare the output generated by the various
>>> > methods in the various components.

>>> > 3. In 2 we can even use the same scheme we use for parallelizing DWARF
>>> > loading when loading all the BTF archive members concatenated in vmlinux
>>> > to dedup them.

>>> Before you jump into 1,2,3 let's discuss the end goal.
>>> I think the assumption here is that this btf-for-each-.o approach
>>> is supposed to speed up the build, right ?
>>>
>>> pahole step on vmlinux is noticeable, but it's still a fraction
>>> of three vmlinux linking steps.

>> I'll need to try thunderbird on the smartphone to send from the smartphone, having said that:

Done, easier than expected, let's see if this gets thru vger...

>> I never looked at why we have those three linking steps, will try to educate myself about that.

>>> How much are we realistically thinking to shave off of that pahole dedup time?

>> Difficult to say, but given this comment I made:

>> "Also an observation: for distros the optimal way to produce BTF _and_ DWARF seems to be the one we have now, don't bother generating .BTF for all .o, just generate DWARF and at the end generate BTF from it 8-)"

>> I fear that most approaches to generate BTF for vmlinux by generating BTF by the compiler or pahole for every .o will only make the total vmlinux generation for the common case (distros) slower, not faster.

>Yes. My gut feel is the same.

:-)

>> Be it the compiler or pahole from DWARF, generating BTF _in addition to DWARF_ for each .o will double the space for the things being represented, as the major benefit from BTF is dedup, not per .o (it's more compact, but not by orders of magnitude as with dedup for the whole vmlinux).

>> Option 3 may end up to be the best, i.e. generate BTF directly (compiler) or from DWARF (pahole) and immediately add it using btf__add_btf() via some BTF thread, _stripping_ it right away from the .o, to avoid doubling the disk space needed (DWARF+BTF per .o), and then, in the end, just dedup, having DWARF (if asked, which 99% of distros will do) and BTF, again, most distros will want (except things like raspberry pi distros, sigh).

>> The same technique, BTW, could be used to reduce the build disk space needed for DWARF, if we can live with completely stripped .o files (no BTF, no DWARF) having it only (dedup'ed: BTF, or not: DWARF) after we harvest it for use in the final vmlinux.

>I see where you're going, but disk space is cheap and modern
>build systems have fast drives. Spinning rust is a thing of the past.
>The total size of intermediate objects doesn't matter much.
>Stripping dwarf won't reduce .o by sizable amount, so I/O throughput
>won't budge.

This is something I think is worth measuring, to clear this doubt with numbers, I'll try to do it, I was already planning to.

>> But the changes in my series are so small that I think they merit consideration even so.

>Agree with that as well, but I'm just not easy about "BTF archives" :)
>The name is too ambitious. Concatenated BTF sections is fine,
>but let's not make a big deal out of it.

Well, other proposals being discussed would add more metadata to traverse these archives, I was just tagging along on the jargon being created :-)

It was just convenient that an unmodified linker was concatenating everything and that from the existing BTF headers I could use a preexisting libbpf API, btf__add_btf() merge everything to then use another preexisting API, btf__dedup() to get to the same end result. 

I don't see, so far, any other use for a "BTF archive", only as a happy intermediate step from a one line change to the kernel to get the linker to have the BTF "Compile Units" put together in the same order as the DWARF ones for the final merge+dedup.

- Arnaldo