Re: [RFC 0/4] BTF archive with unmodified pahole+toolchain

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 21, 2025 at 2:35 PM Nick Alcock <nick.alcock@xxxxxxxxxx> wrote:
>
> On 8 Aug 2025, Arnaldo Carvalho de Melo told this:
>
> > On August 8, 2025 3:28:13 PM GMT-03:00, Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
> >>On Thu, 2025-08-07 at 19:09 -0700, Alexei Starovoitov wrote:
> >>
> >>> Before you jump into 1,2,3 let's discuss the end goal.
> >>> I think the assumption here is that this btf-for-each-.o approach
> >>> is supposed to speed up the build, right ?
>
> Generating BTF directly in the compiler certainly does, in situations
> where we can avoid DWARF. We reduce the amount of data written out by
> something like 11GiB (!) in my tests.
>
> >>I'd like to second Alexei's question.
> >>In the cover letter Arnaldo points out that un-deduplicated BTF
> >>amounts for 325Mb, while total DWARF size is 365Mb.
>
> That very much depends on the kernels you build. In my tests of
> enterprise kernels (including modules) with the GCC+btfarchive toolchain
> (not feeding it to pahole yet), I found total DWARF of 11.2GiB,
> undeduplicated BTF of 550MiB (counting raw .o compiler output alone),
> and a final dedupicated BTF size (including all modules) of about 38MiB
> (which I'm sure I can reduce).

11.2G doesn't match Arnaldo's 365Mb.
Frankly I've never seen such huge dwarf objects.
I'm guessing you're using some ultra verbose dwarf compilation
mode. If so, it's not a realistic comparison, since typical
kernel build is what Arnaldo reported.
That's what I observe as well.

> >>The size of DWARF sections in the final vmlinux is comparable to yours: 307Mb.
> >>The total size of the generated binaries is 905Mb.
> >>So, unless the above calculations are messed up, the total gain here is:
> >>- save ~500Mb generated during build
>
> For me, 11GiB :)
>
> >>- save some time on pahole not needing to parse/convert DWARF
>
> In my tests, a *lot*. I think Arnaldo has recently improved this, but
> back in April when I was comparing things, I had to kill pahole when it
> was dedupping an allmodconfig kernel-plus-modules because it ate more
> than 70GiB of RAM and was still chewing on all 20 cores of my machine
> after two hours. btfdedup (which uses the libctf deduplicator used by
> GNU ld), despite being single-threaded and doing things like ambiguous
> type detection as well, used 12GiB and took 19 minutes. (Multithreading
> it is in progress, too). allyesconfig is faster. Anything sane is faster
> yet. Enterprise kernels take about four minutes, which is not too
> different from pahole.
>
> I was shocked by this: I thought libctf would be slower than pahole, and
> instead it turned out to be faster, sometimes much faster. I suspect
> much of this frankly ridiculous difference was DWARF conversion, and so
> would be improved by doing it in parallel (as here), but... still. Not
> having to generate and consume all that DWARF is bound to help! It's
> like 95% less work...

Something doesn't add up here.
Everyone is using pahole and lots of people doing allmodconfig builds
with pahole. Noone reported that pahole consumes 70G and runs for hours.
Something is really not right in your setup.
I suspect the root cause is your 11G size of dwarf.
Pls use typical kernel build configs then we can have apple to apple
comparison and reason about libctf pros/cons.





[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux