On Thu, Apr 10, 2025 at 01:20:45PM +0100, Alan Maguire wrote: > On 10/04/2025 09:33, Tony Ambardar wrote: > > While doing JIT development on armhf BTF kernels, I hit a strange issue > > where some functions were missing in BTF data. This required considerable > > debugging but can be reproduced simply: > > > > $ bpftool --version > > bpftool v7.6.0 > > using libbpf v1.6 > > features: llvm, skeletons > > > > $ pahole --version > > v1.29 > > > > $ pahole -J -j --btf_features=decl_tag,consistent_func,decl_tag_kfuncs .tmp_vmlinux_armhf > > btf_encoder__tag_kfunc: failed to find kfunc 'scx_bpf_select_cpu_dfl' in BTF > > btf_encoder__tag_kfuncs: failed to tag kfunc 'scx_bpf_select_cpu_dfl' > > > > $ pfunct -Fbtf -E -f scx_bpf_select_cpu_dfl .tmp_vmlinux_armhf > > <nothing> > > > > $ pfunct -Fdwarf -E -f scx_bpf_select_cpu_dfl .tmp_vmlinux_armhf > > s32 scx_bpf_select_cpu_dfl(struct task_struct * p, s32 prev_cpu, u64 wake_flags, bool * is_idle); > > > > $ pahole -J -j --btf_features=decl_tag,decl_tag_kfuncs .tmp_vmlinux_armhf > > > > $ pfunct -Fbtf -E -f scx_bpf_select_cpu_dfl .tmp_vmlinux_armhf > > bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct * p, s32 prev_cpu, u64 wake_flags, bool * is_idle); > > > > The key things to note are the pahole 'consistent_func' feature and the u64 > > 'wake_flags' parameter vs. arm 32-bit registers. These point to existing > > code handling arguments larger than register-size, but only structs. > > > > Generalize the code for any type of argument exceeding register size (i.e. > > cu->addr_size). This should work for integral or aggregate types, and also > > avoids a bug in the current code where a register-sized struct could be > > mistaken for larger. > > > > Fixes: a53c58158b76 ("dwarf_loader: Mark functions that do not use expected registers for params") > > Signed-off-by: Tony Ambardar <tony.ambardar@xxxxxxxxx> > > Thanks for investigating this! I've tested this versus baseline on > x86_64 and aarch64. I'm seeing some small divergence in functions > encoded; for example on aarch64 we don't get a representation for > > static int __io_run_local_work(struct io_ring_ctx *ctx, io_tw_token_t > tw, int min_events, int max_events); > > The reason for that is the second argument is a typedef io_tw_token_t, > which is in turn a typedef for: > > struct io_tw_state { > }; > > i.e. an empty struct. > > The reason is with your patch we've moved from type-centric to > size-centric criteria used to allow functions into BTF that have > unexpected register usage; because the above function uses unexpected > registers _and_ does not exceed the address size, the function is marked > as having an inconsistent reg mapping. In this case, that seems > reasonable since it is true; there is no register needed to represent > the second argument. > > The deeper rationale here in allowing functions that have structs that > may be represented by multiple registers is that we can handle this > outcome; the BPF_PROG2() macro was added to handle such cases and seems > to handle multi-register representation but _not_ representations where > a register is not needed at all. I'm basing that on the > ___bpf_union_arg() macro in bpf_tracing.h so please correct me if I'm > wrong (we could potentially add a sizeof(t) == 0 clause here perhaps). > > So in other words, though we see small divergences in representation I > _think_ they are consistent with our expectations. > > I'd really like to see wider testing of this patch before it lands > however so we can shake out other problematic cases if any. If folks > could try this and compare BTF representations to baseline that would be > great! In particular comparing raw BTF is necessary since vmlinux.h > representations don't include functions (aside from kfuncs). Now that we > have always-reproducible BTF a simple diff of "bpftool btf dump file > vmlinux" can be used to make such comparisons. > > However perhaps we could also think about enhancing the bpf_tracing.h > macro to handle zero-sized parameters like empty structs such that later > parameters are mapped to registers correctly (presuming that's > possible)? Yonghong, what do you think? Hi Alan, Thanks so much for the additional context. I pressed pause to consider this while waiting for further testing news or feedback, but haven't seen anything since. Have you heard anything OOB? I also understood dwarves could have CI working now, so wondering how those tests with the patch might have gone. In fact, it would be great to have a regular arm32 CI running if that's possible. Could you share how the CI changes are being managed? I've recently been trying to update the arm32 JIT and test_progs in tandem, with the goal of having a working 32-bit target for kernel-patches/bpf CI, but some baby-steps with dwarves or libbpf could be very helpful. As far as type-based vs size-based criteria, I'm not wedded to either, and did look at the type-based route as currently exists. I needed to add cases for DW_TAG_base_type (for ints), DW_TAG_volatile_type (recursive), DW_TAG_union_type (same issues as structs), and then we still need size tests anyway. Sticking with size-based (and a zero-test as you suggested) seemed the simplest and preserved the functions you noticed missing. Cheers, Tony > > Thanks! > > Alan > > > --- > > dwarf_loader.c | 37 ++++++++++++------------------------- > > 1 file changed, 12 insertions(+), 25 deletions(-) > > > > diff --git a/dwarf_loader.c b/dwarf_loader.c > > index e1ba7bc..22abfdb 100644 > > --- a/dwarf_loader.c > > +++ b/dwarf_loader.c > > @@ -2914,23 +2914,9 @@ out: > > return 0; > > } > > > > -static bool param__is_struct(struct cu *cu, struct tag *tag) > > +static bool param__is_wide(struct cu *cu, struct tag *tag) > > { > > - struct tag *type = cu__type(cu, tag->type); > > - > > - if (!type) > > - return false; > > - > > - switch (type->tag) { > > - case DW_TAG_structure_type: > > - return true; > > - case DW_TAG_const_type: > > - case DW_TAG_typedef: > > - /* handle "typedef struct", const parameter */ > > - return param__is_struct(cu, type); > > - default: > > - return false; > > - } > > + return tag__size(tag, cu) > cu->addr_size; > > } > > > > static int cu__resolve_func_ret_types_optimized(struct cu *cu) > > @@ -2942,9 +2928,9 @@ static int cu__resolve_func_ret_types_optimized(struct cu *cu) > > struct tag *tag = pt->entries[i]; > > struct parameter *pos; > > struct function *fn = tag__function(tag); > > - bool has_unexpected_reg = false, has_struct_param = false; > > + bool has_unexpected_reg = false, has_wide_param = false; > > > > - /* mark function as optimized if parameter is, or > > + /* Mark function as optimized if parameter is, or > > * if parameter does not have a location; at this > > * point location presence has been marked in > > * abstract origins for cases where a parameter > > @@ -2953,10 +2939,11 @@ static int cu__resolve_func_ret_types_optimized(struct cu *cu) > > * > > * Also mark functions which, due to optimization, > > * use an unexpected register for a parameter. > > - * Exception is functions which have a struct > > - * as a parameter, as multiple registers may > > - * be used to represent it, throwing off register > > - * to parameter mapping. > > + * Exception is functions which have a wide > > + * parameter, as multiple registers may be used > > + * to represent it, throwing off register to > > + * parameter mapping. Examples could include > > + * structs or 64-bit types on a 32-bit arch. > > */ > > ftype__for_each_parameter(&fn->proto, pos) { > > if (pos->optimized || !pos->has_loc) > > @@ -2967,11 +2954,11 @@ static int cu__resolve_func_ret_types_optimized(struct cu *cu) > > } > > if (has_unexpected_reg) { > > ftype__for_each_parameter(&fn->proto, pos) { > > - has_struct_param = param__is_struct(cu, &pos->tag); > > - if (has_struct_param) > > + has_wide_param = param__is_wide(cu, &pos->tag); > > + if (has_wide_param) > > break; > > } > > - if (!has_struct_param) > > + if (!has_wide_param) > > fn->proto.unexpected_reg = 1; > > } > > >