Re: [PATCH RFC 0/3] list inline expansions in .BTF.inline

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Thu, 22 May 2025 13:03:00 -0700

On Thu, May 22, 2025 at 10:56 AM Thierry Treyer <ttreyer@xxxxxxxx> wrote:
>
> Hello everyone,
>
> Here are the estimates for the different encoding schemes we discussed:
> - parameters' location takes ~1MB without de-duplication,
> - parameters' location shrinks to ~14kB when de-duplicated,
> - instead of de-duplicating the individual locations,
>   de-duplicating functions' parameter lists yields 187kB of locations data.
>
> We also need to take into account the size of the corresponding funcsec
> table, which starts at 3.6MB. The full details follows:
>
>   1) // params_offset points to the first parameter's location
>      struct fn_info { u32 type_id, offset, params_offset; };
>   2) // param_offsets point to each parameters' location
>      struct fn_info { u32 type_id, offset; u16 param_offsets[proto.arglen]; };
>   3) // locations are stored inline, in the funcsec table
>      struct fn_info { u32 type_id, offset; loc inline_locs[proto.arglen]; };
>
>   Params encoding             Locations Size   Funcsec Size   Total Size
>   ======================================================================
>   (1) param list, no dedup         1,017,654      5,467,824    6,485,478
>   (1) param list, w/ dedup           187,379      5,467,824    5,655,203
>   (2) param offsets, w/ dedup         14,526      4,808,838    4,823,364

This one is almost as good as (3) below, but fits better into the
existing kind+vlen model where there is a variable number of fixed
sized elements (but locations can still be variable-sized and keep
evolving much more easily). I'd go with this one, unless I'm missing
some important benefit of other representations.

>   (3) param list inline            1,017,654      3,645,216    4,662,870
>
>   Estimated size in bytes of the new .BTF.func_aux section, from a
>   production kernel v6.9. It includes both partially and fully inlined
>   functions in the funcsec tables, with all their parameters, either inline
>   or in their own sub-section. It does not include type information that
>   would be required to handle fully inlined functions, functions with
>   conflicting name, and functions with conflicting prototypes.
>
>   The deduplicated locations in 2) are small enough to be indexed by a u16.
>
> Storing the locations inline uses the least amount of space. Followed by
> storing inline a list of offsets to the locations. Neither of these
> approaches have fixed size records in funcsec. "param list, w/ dedup" is
> ~1MB larger than inlined locations, but has fixed size records.
>
> In all cases, the funcsec table uses the most space, compared to the
> locations. The size of the `type` sub-section will also grow when we add
> the missing type information for fully inlined functions, functions with
> conflicting name, and functions with conflicting prototypes.
>
> With fixed size records in the funcsec table, we'd get faster lookup by
> sorting by `type_id` or `offset`.  bpftrace could efficiently search the
> lower bound of a `type_id` to instrument all its inline instances.
> Symbolication tools could efficiently search for inline functions at a
> given offset.
>
> However, it would rule out the most efficient encoding.
> How do we want to approach this tradeoff?
>
> > 2. refine the representation of inline info, exploring adding new
> > kind(s) to UAPI btf.h if needed. This would likely mean new APIs in
> > libbpf to add locations and function site info.
>
>
> I currently have a pahole prototype to emit "param list, no dedup" and am
> close to a patch adding FUNCSEC to libbpf. I was wondering if it would make
> sense for FUNCSEC to be a DATASEC with its 'kind_flag` set?

Why abuse DATASEC if we are extending BTF with new types anyways? I'd
go with a dedicated FUNCSEC (or FUNCSET, maybe?..)

BTW, Alan, you've been working on self-describing BTF (size per fixed
part of kind + size per vlen items). Any update on that one? Did you
get blocked on it somewhere?

>
> Let me know if you have any questions or have new ideas for the encoding!
>
> Have a great day,
> Thierry
>
>
> > On 19 May 2025, at 13:02, Alan Maguire <alan.maguire@xxxxxxxxxx> wrote:
> >
> > hi folks
> >
> > I just wanted to try and capture some of the discussion from last week's
> > BPF office hours where we talked about this and hopefully we can
> > together plot a path forward that supports inline representation and
> > helps us fix some other long-standing issues with more complex function
> > representation. If I've missed anything important or if anything looks
> > wrong, please do chime in!
> >
> > In discussing this, we concluded that
> >
> > - separating the complex function representations into a separate .BTF
> > section (.BTF.func_aux or something like it) would be valuable since it
> > means tracers can continue to interact with existing function
> > representations that have a straightforward relationship between their
> > parameters and calling conventions stored in the .BTF section, and can
> > optionally also utilize the auxiliary function information in .BTF.func_aux
> >
> > - this gives us a bit more freedom to add new kinds etc to that
> > auxiliary function info, and also to control unauthorized access that
> > might be able to retrieve a function address or other potentially
> > sensitive info from the aux function data
> >
> > - it also means that the only kernel support we would likely initially
> > need to add would be to allow reading of
> > /sys/kernel/btf/vmlinux.func_aux , likely via a dummy module supporting
> > sysfs read.
> >
> > - for modules, we would need to support multi-split BTF, i.e split BTF
> > in .BTF.func_aux in the module that sits atop the .BTF section of the
> > module which in turn sits atop the vmlinux BTF.  Again only userspace
> > and tooling support would likely be needed as a first step. I'm looking
> > at this now and it may require no or minimal code changes to libbpf,
> > just testing of the feature.  bpftool and pahole would need to support a
> > means of specifying multiple base BTFs in order, but that seems doable too.
> >
> > We were less conclusive on the final form of the representation, but it
> > would ideally help support fully and partially inlined representations
> > and other situations we have today where the calling
> > convention-specified registers and the function parameters do not
> > cleanly line up. Today we leave such representations out of BTF but a
> > location representation would allow us to add them back in. Similarly
> > for functions with the same name but different signatures, having a
> > function address to clarify which signature goes with which site will help.
> >
> > Again we don't have to solve all these problems at once but having them
> > in mind as we figure out the right form of the representation will help.
> >
> > Something along the lines of the variable section where we have triples
> > of <function type id, site address, location BTF id> for each function
> > site will play a role. Again the exact form of the location data is TBD,
> > but we can experiment here to maximize compactness. Andrii pointed out a
> > BTF kind representation may waste bytes; for example a location will
> > likely not require a name offset string representation. Could be an
> > index into an array of location descriptions perhaps. Would be nice to
> > make use of dedup for locations too, likely within pahole rather than
> > BTF dedup proper. An empirical question is how much dedup will help,
> > likely we will just have to try and see.
> >
> > So based on this I think our next steps are:
> >
> > 1. add address info to pahole; I'm working on a proof-of-concept on this
> > hope to have a newer version out this week. Address info would be needed
> > for functions that we wish to represent in the aux section as a way of
> > associating a function site with a location representation.
> > 2. refine the representation of inline info, exploring adding new
> > kind(s) to UAPI btf.h if needed. This would likely mean new APIs in
> > libbpf to add locations and function site info.
> > 3. explore multi-split BTF, adding libbpf-related tests for
> > creation/manipulation of split BTF where the base is another split BTF.
> > Multi-split BTF would be needed for module function aux info
> >
> > I'm hoping we can remove any blocks to further progress; task 3 above
> > can be tackled in parallel while we explore vmlinux inline
> > representation (multi-split is only needed for the module case where the
> > aux info is created atop the module split BTF). I'm hoping to have a bit
> > more done on task 1 later this week. So hopefully there's nothing here
> > that impedes making progress on the inline problem.
> >
> > Again if there's anything I've missed above or that seems unclear,
> > please do follow up. It's really positive that we're tackling this issue
> > so I want to make sure that nothing gets in the way of progressing this.
> > Thanks again!
> >
> > Alan
>