Re: [RFC dwarves 3/3] btf_encoder: use function address to match ELF -> DWARF

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2025-05-01 at 15:56 +0100, Alan Maguire wrote:
> Currently we use function names (or prefixes in the case of
> foo.isra.0) to match betwen ELF symtab entries and DWARF
> representations.  This can lead to wrong matches, especially
> where optimized function representations are concerned.  Instead
> sort and search ELF functions by address, and use the retrieved
> "struct function" address to carry out DWARF->ELF matches.
> 
> Note this is work-in-progress and many functions are missing as
> many functions do not have - or at least we have not retrieved -
> address info associated with their DWARF representations.
> 
> As things stand, there are exactly 1000 functions missing from
> BTF encoded using the address-based approach, since we skip functions
> for which we have no address info.  This approach actually adds
> 63 functions, so there are effectively 1063 missing functions.
> 
> 485 of these missing functions are __probestub functions we do not need, i.e.
> 
> [66116] FUNC '__probestub_xhci_setup_device' type_id=61452 linkage=static
> [61452] FUNC_PROTO '(anon)' ret_type_id=0 vlen=2
>         '__data' type_id=108
>         'vdev' type_id=44186
> 
> The real function is:
> 
> [147543] FUNC 'xhci_setup_device' type_id=147542 linkage=static
> [147542] FUNC_PROTO '(anon)' ret_type_id=21 vlen=4
>         'hcd' type_id=37057
>         'udev' type_id=37029
>         'setup' type_id=44209
>         'timeout_ms' type_id=9
> 
> This leaves us with a mismatch of 578 functions.  These include
> 140 missing __bpf_trace_ functions, which are definitely needed.
> 
> So perhaps we can fix up our DWARF representation to find associated
> addresses for some/all of these, but we may end up having to fall
> back to name-based association for some cases.
> 
> Signed-off-by: Alan Maguire <alan.maguire@xxxxxxxxxx>
> ---

Hi Alan,

The change makes sense to me, the code updates look reasonable.
Interestingly enough, I observe much smaller discrepancies, when using
llvm (version 19) for kernel compilation:
a. functions detected by dwarves/next but not detected with patch: 56
b. functions detected with this patch but not detected by dwarves/next: 70.

I only investigated group (b) and noticed two oddities, there are probably other.
- function "kmem_cache_release" is discarded from BTF by dwarves/next
  with the following log:

    kmem_cache_release (kmem_cache_release): skipping BTF encoding of function due to param type mismatch for param#1 s != k
  
  while it is present with this patch.
  Debugging a bit I can see that btf_encoder__save_func() is called
  for this function only once with patch but twice by dwarves/next.
  I suspect this happens because of how btf_encoder__encode_cu() looks after this patch:

    int btf_encoder__encode_cu(struct btf_encoder *encoder, struct cu *cu, struct conf_load *conf_load)
    {
        ...
        cu__for_each_function(cu, core_id, fn) {
            ...
            if (...) {
                ...
                func = btf_encoder__find_function(encoder, addr);
                ...
            } else {
                if (!fn->external)
                    continue;
            }
            if (!func)
                continue;
    
            err = btf_encoder__save_func(encoder, fn, func);
            if (err)
                goto out;
        }
        ...
    }

  Previously find function call used name: `btf_encoder__find_function(encoder, name, strlen(name))`,
  Because now it uses address specified in DWARF I suspect that:
  - The function is inlined or something and has different addresses
    encoded in DWARF but only one address encoded in ELF symbol table.
    (There is an inlined instance of the `kmem_cache_release` in DWARF).
  - `func` is NULL for one of two DWARF instances
    of this function and `btf_encoder__save_func` is not called.

- another oddity is about functions with aliases, here is an example
  from `thermal_netlink.c`:
  
    static int thermal_genl_event_threshold_up(struct param *p) { ... }
    ...
    int thermal_genl_event_threshold_down(struct param *p)
        __attribute__((alias("thermal_genl_event_threshold_up")));

  In symbol table it is encoded as:
  
    238180: ffffffff82b2d590   611 FUNC    GLOBAL DEFAULT     1 thermal_genl_event_threshold_down

  While in DWARF it is encoded as:

    DW_TAG_subprogram
      DW_AT_low_pc    (0xffffffff82b2d590)
      DW_AT_high_pc   (0xffffffff82b2d7f3)
      DW_AT_frame_base        (DW_OP_reg6 RBP)
      DW_AT_call_all_calls    (true)
      DW_AT_name      ("thermal_genl_event_threshold_up")
      DW_AT_decl_file ("/home/eddy/work/bpf-next/drivers/thermal/thermal_netlink.c")
      DW_AT_decl_line (263)
      DW_AT_prototyped        (true)
      DW_AT_type      (0x059ec28b "int")

  And I assume that it is not in the BTF generated by dwarves/next
  because of the same `btf_encoder__find_function` check.
  
Thanks,
Eduard






[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux