On Thu, 2025-05-01 at 15:56 +0100, Alan Maguire wrote: > Currently we use function names (or prefixes in the case of > foo.isra.0) to match betwen ELF symtab entries and DWARF > representations. This can lead to wrong matches, especially > where optimized function representations are concerned. Instead > sort and search ELF functions by address, and use the retrieved > "struct function" address to carry out DWARF->ELF matches. > > Note this is work-in-progress and many functions are missing as > many functions do not have - or at least we have not retrieved - > address info associated with their DWARF representations. > > As things stand, there are exactly 1000 functions missing from > BTF encoded using the address-based approach, since we skip functions > for which we have no address info. This approach actually adds > 63 functions, so there are effectively 1063 missing functions. > > 485 of these missing functions are __probestub functions we do not need, i.e. > > [66116] FUNC '__probestub_xhci_setup_device' type_id=61452 linkage=static > [61452] FUNC_PROTO '(anon)' ret_type_id=0 vlen=2 > '__data' type_id=108 > 'vdev' type_id=44186 > > The real function is: > > [147543] FUNC 'xhci_setup_device' type_id=147542 linkage=static > [147542] FUNC_PROTO '(anon)' ret_type_id=21 vlen=4 > 'hcd' type_id=37057 > 'udev' type_id=37029 > 'setup' type_id=44209 > 'timeout_ms' type_id=9 > > This leaves us with a mismatch of 578 functions. These include > 140 missing __bpf_trace_ functions, which are definitely needed. > > So perhaps we can fix up our DWARF representation to find associated > addresses for some/all of these, but we may end up having to fall > back to name-based association for some cases. > > Signed-off-by: Alan Maguire <alan.maguire@xxxxxxxxxx> > --- Hi Alan, The change makes sense to me, the code updates look reasonable. Interestingly enough, I observe much smaller discrepancies, when using llvm (version 19) for kernel compilation: a. functions detected by dwarves/next but not detected with patch: 56 b. functions detected with this patch but not detected by dwarves/next: 70. I only investigated group (b) and noticed two oddities, there are probably other. - function "kmem_cache_release" is discarded from BTF by dwarves/next with the following log: kmem_cache_release (kmem_cache_release): skipping BTF encoding of function due to param type mismatch for param#1 s != k while it is present with this patch. Debugging a bit I can see that btf_encoder__save_func() is called for this function only once with patch but twice by dwarves/next. I suspect this happens because of how btf_encoder__encode_cu() looks after this patch: int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct conf_load *conf_load) { ... cu__for_each_function(cu, core_id, fn) { ... if (...) { ... func = btf_encoder__find_function(encoder, addr); ... } else { if (!fn->external) continue; } if (!func) continue; err = btf_encoder__save_func(encoder, fn, func); if (err) goto out; } ... } Previously find function call used name: `btf_encoder__find_function(encoder, name, strlen(name))`, Because now it uses address specified in DWARF I suspect that: - The function is inlined or something and has different addresses encoded in DWARF but only one address encoded in ELF symbol table. (There is an inlined instance of the `kmem_cache_release` in DWARF). - `func` is NULL for one of two DWARF instances of this function and `btf_encoder__save_func` is not called. - another oddity is about functions with aliases, here is an example from `thermal_netlink.c`: static int thermal_genl_event_threshold_up(struct param *p) { ... } ... int thermal_genl_event_threshold_down(struct param *p) __attribute__((alias("thermal_genl_event_threshold_up"))); In symbol table it is encoded as: 238180: ffffffff82b2d590 611 FUNC GLOBAL DEFAULT 1 thermal_genl_event_threshold_down While in DWARF it is encoded as: DW_TAG_subprogram DW_AT_low_pc (0xffffffff82b2d590) DW_AT_high_pc (0xffffffff82b2d7f3) DW_AT_frame_base (DW_OP_reg6 RBP) DW_AT_call_all_calls (true) DW_AT_name ("thermal_genl_event_threshold_up") DW_AT_decl_file ("/home/eddy/work/bpf-next/drivers/thermal/thermal_netlink.c") DW_AT_decl_line (263) DW_AT_prototyped (true) DW_AT_type (0x059ec28b "int") And I assume that it is not in the BTF generated by dwarves/next because of the same `btf_encoder__find_function` check. Thanks, Eduard