Re: [RFC 0/4] BTF archive with unmodified pahole+toolchain

Nick Alcock <nick.alcock@xxxxxxxxxx> · Thu, 21 Aug 2025 22:35:51 +0100

On 8 Aug 2025, Arnaldo Carvalho de Melo told this:

> On August 8, 2025 3:28:13 PM GMT-03:00, Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>>On Thu, 2025-08-07 at 19:09 -0700, Alexei Starovoitov wrote:
>>
>>> Before you jump into 1,2,3 let's discuss the end goal.
>>> I think the assumption here is that this btf-for-each-.o approach
>>> is supposed to speed up the build, right ?

Generating BTF directly in the compiler certainly does, in situations
where we can avoid DWARF. We reduce the amount of data written out by
something like 11GiB (!) in my tests.

>>I'd like to second Alexei's question.
>>In the cover letter Arnaldo points out that un-deduplicated BTF
>>amounts for 325Mb, while total DWARF size is 365Mb.

That very much depends on the kernels you build. In my tests of
enterprise kernels (including modules) with the GCC+btfarchive toolchain
(not feeding it to pahole yet), I found total DWARF of 11.2GiB,
undeduplicated BTF of 550MiB (counting raw .o compiler output alone),
and a final dedupicated BTF size (including all modules) of about 38MiB
(which I'm sure I can reduce).

>>The size of DWARF sections in the final vmlinux is comparable to yours: 307Mb.
>>The total size of the generated binaries is 905Mb.
>>So, unless the above calculations are messed up, the total gain here is:
>>- save ~500Mb generated during build

For me, 11GiB :)

>>- save some time on pahole not needing to parse/convert DWARF

In my tests, a *lot*. I think Arnaldo has recently improved this, but
back in April when I was comparing things, I had to kill pahole when it
was dedupping an allmodconfig kernel-plus-modules because it ate more
than 70GiB of RAM and was still chewing on all 20 cores of my machine
after two hours. btfdedup (which uses the libctf deduplicator used by
GNU ld), despite being single-threaded and doing things like ambiguous
type detection as well, used 12GiB and took 19 minutes. (Multithreading
it is in progress, too). allyesconfig is faster. Anything sane is faster
yet. Enterprise kernels take about four minutes, which is not too
different from pahole.

I was shocked by this: I thought libctf would be slower than pahole, and
instead it turned out to be faster, sometimes much faster. I suspect
much of this frankly ridiculous difference was DWARF conversion, and so
would be improved by doing it in parallel (as here), but... still. Not
having to generate and consume all that DWARF is bound to help! It's
like 95% less work...

>>So, I see several drawbacks:
>>- As you note, there would be two avenues to generate BTF now:
>>  - DWARF + pahole
>>  - BTF + pahole (replaced by BTF + ld at some point?)

The code exists... BTF + ld + dedupping the resulting ld-dedupped output
together.

Note that the code used to deduplicate BTF with libctf (as used by ld)
is not large. Look:
https://github.com/nickalcock/linux/blob/nix/btfa/scripts/btf/btfarchive.c
(and of those functions, you don't need transform_module_names(),
suck_in_modules(), or suck_in_lines(): it's really no more code than is
needed to tell it which inputs map to which modules, then a couple of
lines to trigger dedup and emit the resulting BTF archive).

It's entirely reasonable for pahole in future to simply call libctf's
deduplicator to dedup BTF if it sees that the linker hasn't done it, or
to do what btfarchive does here itself to dedup the linker-deduplicated
per-module output and the vmlinux BTF against each other (and then we
don't need btfarchive at all, which means fewer build system changes).

This would let pahole dedup BTF if needed while not wasting time on it
if the linker already did it, *and* let you ditch the pahole
deduplicator so you don't need to maintain it any more, even when clang
et al are being used. (Obviously, you'd only do this once libctf's dedup
is up to scratch and once it's in a release binutils, since I'm sure
there will be bugs I need to fix!)

>>  This is a potential source of bugs.

That's not a very good argument. *Everything* is a potential source of
bugs. I will of course prioritize fixing any bugs in libctf that affect
pahole's operation: not breaking pahole matters!

>>  Is the goal to forgo DWARF+pahole at some point in the future?
>
> I think the goal is to allow DWARF less builds, which can probably save time even if we do use pahole to convert DWARF generated from the compiler into BTF and right away strip DWARF.
>
> This is for use cases where DWARF isn't needed and we want to for example have CI systems running faster.

Yep! Also this means that you can get new features like type and decl
tags into BTF faster, because it's much quicker to get them into GCC and
libctf (at least for recent compiler releases) than it is to get them
into DWARF just so you can get them out of DWARF again and translate
them into BTF. DWARF simply has many more consumers to think about,
while the kernel is obviously a critical consumer of GCC's and libctf's
generated BTF (we do need to consider userspace, but we don't need to be
as conservative as a giant behemoth like DWARF must be. I'm confident
enough in my testing to be willing to backport things to binutils
release branches as needed, though probably not to points before the
first release where BTF support is added to libctf because that change
is pretty massive.)

> My initial interest was to do minimal changes to pave the way for BTF
> generated for vmlinux directly from the compiler, but the realization
> that DWARF still has a lot of mileage, meaning distros will continue
> to enable it for the foreseeable future makes me think that maybe
> doing nothing and continue to use the current method is the sensible
> thing to do.

Speaking purely selfishly, I would be... unhappy to find that I'd spent
all this effort on a BTF-capable deduplicator only to find you didn't
want to use it no matter how good it ended up being :( this seems like a
rather sudden change of heart...

>>- I assume that it is much faster to land changes in pahole compared
>>  to changes in gcc, so future btf modifications/features might be a
>>  bit harder to execute. Wdyt?

As noted, I think this is not really true, at least once the core BTF
dedup stuff has landed: I can backport stuff on top of them without
doing releases, and distros usually pick it up within a few days. The
principal delay is testing...

> Right, that too, even if we enable generation of BTF for native .o
> files by the compiler we would still want to use pahole to augment it
> with new features or to fixup compiler BTF generation bugs. And maybe
> for generating tags that are only possible to have the necessary info
> at the last moment.

Well, yes. I thought it was always the plan for pahole to keep consuming
and augmenting BTF! Among other things, the kernel uses a bunch of
additional sections that reference BTF types that GNU ld has no idea how
to generate, and which nobody is planning to use outside the kernel.
That's also where a lot of the innovation is happening, and GCC and GNU
ld don't need to get involved in that at all (unless and until you want
them to).

I can say that changing libctf to support *every difference from CTF
that BTF has got* and teaching GNU ld to handle that took about two
months, so implementing single changes in future doesn't seem like an
insurmountable burden (and much of that two months was spent on
infrastructural adjustments to allow easier changes in future -- the
hardest single BTF feature to suppoert was probably datasecs and vars,
and that took about a week including deduplication). Obviously there
will be bugs, but when they show up I'll fix them.

I am not worried about the maintenance burden of supporting new BTF
stuff in binutils libctf and I don't think Jose is worried about it in
GCC either.

I mean, it's not like it's going to be an extra burden for long: the
medium-term goal is to replace CTF with BTF entirely, even for userspace
consumption. There are surprisingly few new features needed before we
can consign CTF to history and converge on one type format to rule them
all. (I think they're all entirely nondisruptive too.)

> Now if we could have hooks in the linker associated with a given ELF
> section name (.BTF) to use instead of just concatenating, and then at
> the end have another hook that would finish the process by doing the
> dedup, just like I do in this series, that would save one of those
> linker calls.

Yeah, we looked at that, but GNU ld's plugin support is totally focused
on the needs of LTO and can't really handle what dedup needs at all:
fixing that would likely be a substantial and fiddly change. As part of
the CTF and BTF work there *are* internal hooks in ld and libbfd that do
what is needed, but they're not exported outside the linker, and
exporting them looks to be... painful. (But it seems unnecessary for GNU
ld, since it will after all be able to dedup BTF with no plugins at all,
and already can in my proof-of-concept branch on binutils-gdb git.)

-- 
NULL && (void)