Re: [PATCH RFC 0/3] list inline expansions in .BTF.inline

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hi folks

I just wanted to try and capture some of the discussion from last week's
BPF office hours where we talked about this and hopefully we can
together plot a path forward that supports inline representation and
helps us fix some other long-standing issues with more complex function
representation. If I've missed anything important or if anything looks
wrong, please do chime in!

In discussing this, we concluded that

- separating the complex function representations into a separate .BTF
section (.BTF.func_aux or something like it) would be valuable since it
means tracers can continue to interact with existing function
representations that have a straightforward relationship between their
parameters and calling conventions stored in the .BTF section, and can
optionally also utilize the auxiliary function information in .BTF.func_aux

- this gives us a bit more freedom to add new kinds etc to that
auxiliary function info, and also to control unauthorized access that
might be able to retrieve a function address or other potentially
sensitive info from the aux function data

- it also means that the only kernel support we would likely initially
need to add would be to allow reading of
/sys/kernel/btf/vmlinux.func_aux , likely via a dummy module supporting
sysfs read.

- for modules, we would need to support multi-split BTF, i.e split BTF
in .BTF.func_aux in the module that sits atop the .BTF section of the
module which in turn sits atop the vmlinux BTF.  Again only userspace
and tooling support would likely be needed as a first step. I'm looking
at this now and it may require no or minimal code changes to libbpf,
just testing of the feature.  bpftool and pahole would need to support a
means of specifying multiple base BTFs in order, but that seems doable too.

We were less conclusive on the final form of the representation, but it
would ideally help support fully and partially inlined representations
and other situations we have today where the calling
convention-specified registers and the function parameters do not
cleanly line up. Today we leave such representations out of BTF but a
location representation would allow us to add them back in. Similarly
for functions with the same name but different signatures, having a
function address to clarify which signature goes with which site will help.

Again we don't have to solve all these problems at once but having them
in mind as we figure out the right form of the representation will help.

Something along the lines of the variable section where we have triples
of <function type id, site address, location BTF id> for each function
site will play a role. Again the exact form of the location data is TBD,
but we can experiment here to maximize compactness. Andrii pointed out a
BTF kind representation may waste bytes; for example a location will
likely not require a name offset string representation. Could be an
index into an array of location descriptions perhaps. Would be nice to
make use of dedup for locations too, likely within pahole rather than
BTF dedup proper. An empirical question is how much dedup will help,
likely we will just have to try and see.

So based on this I think our next steps are:

1. add address info to pahole; I'm working on a proof-of-concept on this
hope to have a newer version out this week. Address info would be needed
for functions that we wish to represent in the aux section as a way of
associating a function site with a location representation.
2. refine the representation of inline info, exploring adding new
kind(s) to UAPI btf.h if needed. This would likely mean new APIs in
libbpf to add locations and function site info.
3. explore multi-split BTF, adding libbpf-related tests for
creation/manipulation of split BTF where the base is another split BTF.
Multi-split BTF would be needed for module function aux info

I'm hoping we can remove any blocks to further progress; task 3 above
can be tackled in parallel while we explore vmlinux inline
representation (multi-split is only needed for the module case where the
aux info is created atop the module split BTF). I'm hoping to have a bit
more done on task 1 later this week. So hopefully there's nothing here
that impedes making progress on the inline problem.

Again if there's anything I've missed above or that seems unclear,
please do follow up. It's really positive that we're tackling this issue
so I want to make sure that nothing gets in the way of progressing this.
Thanks again!

Alan


On 16/04/2025 20:20, Thierry Treyer via B4 Relay wrote:
> This proposal extends BTF to list the locations of inlined functions and
> their arguments in a new '.BTF.inline` section.
> 
> == Background ==
> 
> Inline functions are often a blind spot for profiling and tracing tools:
> * They cannot probe fully inlined functions.
>   The BTF contains no data about them.
> * They miss calls to partially inlined functions,
>   where a function has a symbol, but is also inlined in some callers.
> * They cannot account for time spent in inlined calls.
>   Instead, they report the time to the caller.
> * They don't provide a way to access the arguments of an inlined call.
> 
> The issue is exacerbated by Link-Time Optimization, which enables more
> inlining across Object files. One workaround is to disable inlining for
> the profiled functions, but that requires a whole kernel compilation and
> doesn't allow for iterative exploration.
> 
> The information required to solve the above problems is not easily
> accessible. It requires parsing most of the DWARF's '.debug_info` section,
> which is time consuming and not trivial.
> Instead, this proposal leverages and extends the existing information
> contained in '.BTF` (for typing) and '.BTF.ext` (for caller location),
> with information from a new section called '.BTF.inline`,
> listing inlined instances.
> 
> == .BTF.inline Section ==
> 
> The new '.BTF.inline` section has a layout similar to '.BTF`.
> 
>  off |0-bit      |16-bits  |24-bits  |32-bits                           |
> -----+-----------+---------+---------+----------------------------------+
> 0x00 |   magic   | version |  flags  |          header length           |
> 0x08 |      inline info offset       |        inline info length        |
> 0x10 |        location offset        |          location length         |
> -----+------------------------------------------------------------------+
>      ~                        inline info section                       ~
> -----+------------------------------------------------------------------+
>      ~                          location section                        ~
> -----+------------------------------------------------------------------+
> 
> It starts with a header (see 'struct btf_inline_header`),
> followed by two subsections:
> 1. The 'Inline Info' section contains an entry for each inlined function.
>    Each entry describes the instance's location in its caller and is
>    followed by the offsets in the 'Location' section of the parameters
>    location expressions. See 'struct btf_inline_instance`.
> 2. The 'Location' section contains location expressions describing how
>    to retrieve the value of a parameter. The expressions are NULL-
>    terminated and are adressed similarly to '.BTF`'s string table.
> 
> struct btf_inline_header {
>   uint16_t magic;
>   uint8_t version, flags;
>   uint32_t header_length;
>   uint32_t inline_info_offset, inline_info_length;
>   uint32_t location_offset, location_length;
> };
> 
> struct btf_inline_instance {
>   type_id_t callee_id;     // BTF id of the inlined function
>   type_id_t caller_id;     // BTF id of the caller
>   uint32_t caller_offset;  // offset of the callee within the caller
>   uint16_t nr_parms;       // number of parameters
> //uint32_t parm_location[nr_parms];  // offset of the location expression
> };                                   // in 'Location' for each parameter
> 
> == Location Expressions ==
> 
> We looked at the DWARF location expressions for the arguments of inlined
> instances having <= 100 instances, on a production kernel v6.9.0. This
> yielded 176,800 instances with 269,327 arguments. We learned that most
> expressions are simple register access, perhaps with an offset. We would
> get access to 87% of the arguments by implementing literal and register.
> 
> Op. Category      Expr. Count    Expr. %
> ----------------------------------------
> literal                 10714      3.98%
> register+above         234698     87.14%
> arithmetic+above       239444     88.90%
> composite+above        240394     89.26%
> stack+above            242075     89.88%
> empty                   27252     10.12%
> 
> We propose to re-encode DWARF location expressions into a custom BTF
> location expression format. It operates on a stack of values, similar to
> DWARF's location expressions, but is stripped of unused operators,
> while allowing future expansions.
> 
> A location expression is composed of a series of operations, terminated
> by a NULL-byte/LOC_END_OF_EXPR operator. The very first expression in the
> 'Location' subsection must be an empty expression constisting only of
> LOC_END_OF_EXPR.
> 
> An operator is a tagged union: the tag describes the operation to carry
> out and the union contains the operands.
>  
>  ID | Operator Name        | Operands[...]
> ----+----------------------+-------------------------------------------
>   0 | LOC_END_OF_EXPR      | _none_
>   1 | LOC_SIGNED_CONST_1   |  s8: constant's value
>   2 | LOC_SIGNED_CONST_2   | s16: constant's value
>   3 | LOC_SIGNED_CONST_4   | s32: constant's value
>   4 | LOC_SIGNED_CONST_8   | s64: constant's value
>   5 | LOC_UNSIGNED_CONST_1 |  u8: constant's value
>   6 | LOC_UNSIGNED_CONST_2 | u16: constant's value
>   7 | LOC_UNSIGNED_CONST_4 | u32: constant's value
>   8 | LOC_UNSIGNED_CONST_8 | u64: constant's value
>   9 | LOC_REGISTER         |  u8: DWARF register number from the ABI
>  10 | LOC_REGISTER_OFFSET  |  u8: DWARF register number from the ABI
>                            | s64: offset added to the register's value
>  11 | LOC_DEREF            |  u8: size of the deref'd type
> 
> This list should be further expanded to include arithmetic operations.
> 
> Example: accessing a field at offset 12B from a struct whose adresse is
>          in the '%rdi` register, on amd64, has the following encoding:
> 
> [0x0a 0x05 0x000000000000000c] [0x0b 0x04] [0x00]
>  |    |    ` Offset Added       |    |      ` LOC_END_OF_EXPR
>  |    ` Register Number         |    ` Size of Deref.
>  ` LOC_REGISTER_OFFSET          ` LOC_DEREF
> 
> == Summary ==
> 
> Combining the new information from '.BTF.inline` with the existing data
> from '.BTF` and '.BTF.ext`, tools will be able to locate inline functions
> and their arguments. Symbolizer can also use the data to display the
> functions inlined at a given address.
> 
> Fully inlined functions are not part of the BTF and thus are not covered
> by this proposal. Adding them to the BTF would enable their coverage and
> should be considered.
> 
> Signed-off-by: Thierry Treyer <ttreyer@xxxxxxxx>
> ---
> Thierry Treyer (3):
>       dwarf_loader: Add parameters list to inlined expansion
>       dwarf_loader: Add name to inline expansion
>       inline_encoder: Introduce inline encoder to emit BTF.inline
> 
>  CMakeLists.txt   |   3 +-
>  btf_encoder.c    |   5 +
>  btf_encoder.h    |   2 +
>  btf_inline.pk    |  55 ++++++
>  dwarf_loader.c   | 176 ++++++++++++--------
>  dwarves.c        |  26 +++
>  dwarves.h        |   7 +
>  inline_encoder.c | 496 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  inline_encoder.h |  25 +++
>  pahole.c         |  40 ++++-
>  10 files changed, 765 insertions(+), 70 deletions(-)
> ---
> base-commit: 4ef47f84324e925051a55de10f9a4f44ef1da844
> change-id: 20250416-btf_inline-e5047eea9b6f
> 
> Best regards,





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux