On Mon, May 26, 2025 at 7:30 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote: > > On 23/05/2025 19:57, Thierry Treyer wrote: > >>>> 2) // param_offsets point to each parameters' location > >>>> struct fn_info { u32 type_id, offset; u16 param_offsets[proto.arglen]; }; > >>>> [...] > >>>> (2) param offsets, w/ dedup 14,526 4,808,838 4,823,364 > >>> > >>> This one is almost as good as (3) below, but fits better into the > >>> existing kind+vlen model where there is a variable number of fixed > >>> sized elements (but locations can still be variable-sized and keep > >>> evolving much more easily). I'd go with this one, unless I'm missing > >>> some important benefit of other representations. > >> > >> Thierry, could you please provide some details for the representation > >> of both fn_info and parameters for this case? > > > > The locations are stored in their own sub-section, like strings, using the > > encoding described previously. A location is a tagged union of an operation > > and its operands describing how to find to parameter’s value. > > > > The locations for nil, ’%rdi’ and ’*(%rdi + 32)’ are encoded as follow: > > > > [0x00] [0x09 0x05] [0x0a 0x05 0x00000020] > > # `NIL `REG #5 | `Reg#5 `Offset added to Reg’s value > > # `ADDR_REG_OFF > > > > The funcsec table starts with a `struct btf_type` of type FUNCSEC, followed by > > vlen `struct btf_func_secinfo` (referred previously as fn_info): > > > > .align(4) > > struct btf_func_secinfo { > > __u32 type_id; // Type ID of FUNC > > __u32 offset; // Offset in section > > __u16 parameter_offsets[proto.vlen]; // Offsets to params’ location > > }; > > > > To know how many parameters a function has, you’d use its type_id to retrieve > > its FUNC, then its FUNC_PROTO to finally get the FUNC_PROTO vlen. > > Optimized out parameters won’t have a location, so we need a NIL to skip them. > > > > > > Given a function with arg0 optimized out, arg1 at *(%rdi + 32) and arg2 in %rdi. > > You’d get the following encoding: > > > > [1] FUNC_PROTO, vlen=3 > > ...args > > [2] FUNC 'foo' type_id=1 > > [3] FUNCSEC '.text', vlen=1 # ,NIL ,*(%rdi + 32) > > - type_id=n, offset=0x1234, params=[0x0, 0x3, 0x1] > > # `%rdi > > > > # Regular BTF encoding for 1 and 2 > > ... > > # ,FUNCSEC ’.text’, vlen=1 > > [0x000001 0x14000001 0x00000000] > > # ,btf_func_secinfo ,params=[0x0, 0x3, 0x1] + extra nil for alignment > > [0x00000002 0x00001234 0x0000 0x0003 0x0001 0x0000] > > > > Note: I didn’t take into account the 4-bytes padding requirement of BTF. > > I’ve sent the correct numbers when responding to Alexei. > > > >> I'm curious how far this version is from exhausting u16 limit. > > > > > > We’re already using 22% of the 64 kiB addressable by u16. > > > >> Why abuse DATASEC if we are extending BTF with new types anyways? I'd > >> go with a dedicated FUNCSEC (or FUNCSET, maybe?..) > > > > I'm not sure that a 'set' describes the table best, since a function > > can have multiple entries in the table. > > FUNCSEC is ugly, but it conveys that the offsets are from a section’s base. > > > I totally agree that we have more freedom to define new representations > here, so don't feel too constrained by existing representations like > DATASEC if they are not helpful. > > One thing I hadn't really thought about before you suggested it is > having the locations in a separate section from types as we have for > strings. Do we need that? Or could we have a BTF_KIND_LOC_SEC that is > associated with the FUNC_SEC via a type id (loc sec points at the type > of the associated func sec) and contains the packed location info? > > In other words > > [3] FUNCSEC '.text', vlen= ... > <func_id, offset, param_location_offsets[]> > ... > [4] LOCSEC '.text', type_id=3 > <packed locations> LOCSEC pointing to FUNCSEC isn't that useful, no? You'd want to go from FUNCSEC to LOCSEC quickly, not the other way around, no? But I also don't see the need to have a per-ELF-section set of locations, tbh... One set ought to be enough across all FUNCSECs? > ... >