Re: [PATCH RFC 0/3] list inline expansions in .BTF.inline

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 27 May 2025 14:41:17 -0700

On Mon, May 26, 2025 at 7:30 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote:
>
> On 23/05/2025 19:57, Thierry Treyer wrote:
> >>>>  2) // param_offsets point to each parameters' location
> >>>>     struct fn_info { u32 type_id, offset; u16 param_offsets[proto.arglen]; };
> >>>>  [...]
> >>>>  (2) param offsets, w/ dedup         14,526      4,808,838    4,823,364
> >>>
> >>> This one is almost as good as (3) below, but fits better into the
> >>> existing kind+vlen model where there is a variable number of fixed
> >>> sized elements (but locations can still be variable-sized and keep
> >>> evolving much more easily). I'd go with this one, unless I'm missing
> >>> some important benefit of other representations.
> >>
> >> Thierry, could you please provide some details for the representation
> >> of both fn_info and parameters for this case?
> >
> > The locations are stored in their own sub-section, like strings, using the
> > encoding described previously. A location is a tagged union of an operation
> > and its operands describing how to find to parameter’s value.
> >
> > The locations for nil, ’%rdi’ and ’*(%rdi + 32)’ are encoded as follow:
> >
> >   [0x00] [0x09 0x05] [0x0a 0x05 0x00000020]
> > #  `NIL   `REG   #5   |    `Reg#5        `Offset added to Reg’s value
> > #                     `ADDR_REG_OFF
> >
> > The funcsec table starts with a `struct btf_type` of type FUNCSEC, followed by
> > vlen `struct btf_func_secinfo` (referred previously as fn_info):
> >
> >   .align(4)
> >   struct btf_func_secinfo {
> >     __u32 type_id;                       // Type ID of FUNC
> >     __u32 offset;                        // Offset in section
> >     __u16 parameter_offsets[proto.vlen]; // Offsets to params’ location
> >   };
> >
> > To know how many parameters a function has, you’d use its type_id to retrieve
> > its FUNC, then its FUNC_PROTO to finally get the FUNC_PROTO vlen.
> > Optimized out parameters won’t have a location, so we need a NIL to skip them.
> >
> >
> > Given a function with arg0 optimized out, arg1 at *(%rdi + 32) and arg2 in %rdi.
> > You’d get the following encoding:
> >
> >   [1] FUNC_PROTO, vlen=3
> >       ...args
> >   [2] FUNC 'foo' type_id=1
> >   [3] FUNCSEC '.text', vlen=1           # ,NIL   ,*(%rdi + 32)
> >       - type_id=n, offset=0x1234, params=[0x0, 0x3, 0x1]
> >                                         #             `%rdi
> >
> > # Regular BTF encoding for 1 and 2
> >   ...
> > # ,FUNCSEC ’.text’, vlen=1
> >   [0x000001 0x14000001 0x00000000]
> > # ,btf_func_secinfo      ,params=[0x0, 0x3, 0x1] + extra nil for alignment
> >   [0x00000002 0x00001234 0x0000 0x0003 0x0001 0x0000]
> >
> > Note: I didn’t take into account the 4-bytes padding requirement of BTF.
> >       I’ve sent the correct numbers when responding to Alexei.
> >
> >> I'm curious how far this version is from exhausting u16 limit.
> >
> >
> > We’re already using 22% of the 64 kiB addressable by u16.
> >
> >> Why abuse DATASEC if we are extending BTF with new types anyways? I'd
> >> go with a dedicated FUNCSEC (or FUNCSET, maybe?..)
> >
> > I'm not sure that a 'set' describes the table best, since a function
> > can have multiple entries in the table.
> > FUNCSEC is ugly, but it conveys that the offsets are from a section’s base.
>
>
> I totally agree that we have more freedom to define new representations
> here, so don't feel too constrained by existing representations like
> DATASEC if they are not helpful.
>
> One thing I hadn't really thought about before you suggested it is
> having the locations in a separate section from types as we have for
> strings. Do we need that? Or could we have a BTF_KIND_LOC_SEC that is
> associated with the FUNC_SEC via a type id (loc sec points at the type
> of the associated func sec) and contains the packed location info?
>
> In other words
>
> [3] FUNCSEC '.text', vlen= ...
> <func_id, offset, param_location_offsets[]>
> ...
> [4] LOCSEC '.text', type_id=3
> <packed locations>

LOCSEC pointing to FUNCSEC isn't that useful, no? You'd want to go
from FUNCSEC to LOCSEC quickly, not the other way around, no? But I
also don't see the need to have a per-ELF-section set of locations,
tbh... One set ought to be enough across all FUNCSECs?

> ...
>