Re: [PATCH RFC 0/3] list inline expansions in .BTF.inline

Alan Maguire <alan.maguire@xxxxxxxxxx> · Mon, 26 May 2025 15:30:05 +0100

On 23/05/2025 19:57, Thierry Treyer wrote:
>>>>  2) // param_offsets point to each parameters' location
>>>>     struct fn_info { u32 type_id, offset; u16 param_offsets[proto.arglen]; };
>>>>  [...]
>>>>  (2) param offsets, w/ dedup         14,526      4,808,838    4,823,364
>>>
>>> This one is almost as good as (3) below, but fits better into the
>>> existing kind+vlen model where there is a variable number of fixed
>>> sized elements (but locations can still be variable-sized and keep
>>> evolving much more easily). I'd go with this one, unless I'm missing
>>> some important benefit of other representations.
>>
>> Thierry, could you please provide some details for the representation
>> of both fn_info and parameters for this case?
> 
> The locations are stored in their own sub-section, like strings, using the
> encoding described previously. A location is a tagged union of an operation
> and its operands describing how to find to parameter’s value.
> 
> The locations for nil, ’%rdi’ and ’*(%rdi + 32)’ are encoded as follow:
> 
>   [0x00] [0x09 0x05] [0x0a 0x05 0x00000020]
> #  `NIL   `REG   #5   |    `Reg#5        `Offset added to Reg’s value
> #                     `ADDR_REG_OFF
> 
> The funcsec table starts with a `struct btf_type` of type FUNCSEC, followed by
> vlen `struct btf_func_secinfo` (referred previously as fn_info):
> 
>   .align(4)
>   struct btf_func_secinfo {
>     __u32 type_id;                       // Type ID of FUNC
>     __u32 offset;                        // Offset in section
>     __u16 parameter_offsets[proto.vlen]; // Offsets to params’ location
>   };
> 
> To know how many parameters a function has, you’d use its type_id to retrieve
> its FUNC, then its FUNC_PROTO to finally get the FUNC_PROTO vlen.
> Optimized out parameters won’t have a location, so we need a NIL to skip them.
> 
> 
> Given a function with arg0 optimized out, arg1 at *(%rdi + 32) and arg2 in %rdi.
> You’d get the following encoding:
> 
>   [1] FUNC_PROTO, vlen=3
>       ...args
>   [2] FUNC 'foo' type_id=1
>   [3] FUNCSEC '.text', vlen=1           # ,NIL   ,*(%rdi + 32)
>       - type_id=n, offset=0x1234, params=[0x0, 0x3, 0x1]
>                                         #             `%rdi
> 
> # Regular BTF encoding for 1 and 2
>   ...
> # ,FUNCSEC ’.text’, vlen=1
>   [0x000001 0x14000001 0x00000000]
> # ,btf_func_secinfo      ,params=[0x0, 0x3, 0x1] + extra nil for alignment
>   [0x00000002 0x00001234 0x0000 0x0003 0x0001 0x0000]
> 
> Note: I didn’t take into account the 4-bytes padding requirement of BTF.
>       I’ve sent the correct numbers when responding to Alexei.
> 
>> I'm curious how far this version is from exhausting u16 limit.
> 
> 
> We’re already using 22% of the 64 kiB addressable by u16.
> 
>> Why abuse DATASEC if we are extending BTF with new types anyways? I'd
>> go with a dedicated FUNCSEC (or FUNCSET, maybe?..)
> 
> I'm not sure that a 'set' describes the table best, since a function
> can have multiple entries in the table.
> FUNCSEC is ugly, but it conveys that the offsets are from a section’s base.

I totally agree that we have more freedom to define new representations
here, so don't feel too constrained by existing representations like
DATASEC if they are not helpful.

One thing I hadn't really thought about before you suggested it is
having the locations in a separate section from types as we have for
strings. Do we need that? Or could we have a BTF_KIND_LOC_SEC that is
associated with the FUNC_SEC via a type id (loc sec points at the type
of the associated func sec) and contains the packed location info?

In other words

[3] FUNCSEC '.text', vlen= ...
<func_id, offset, param_location_offsets[]>
...
[4] LOCSEC '.text', type_id=3
<packed locations>
...