Re: [PATCH dwarves v1] dwarf_loader: Fix skipped encoding of function BTF on 32-bit systems

Tony Ambardar <tony.ambardar@xxxxxxxxx> · Wed, 16 Apr 2025 03:33:11 -0700

On Thu, Apr 10, 2025 at 01:20:45PM +0100, Alan Maguire wrote:
> On 10/04/2025 09:33, Tony Ambardar wrote:
> > While doing JIT development on armhf BTF kernels, I hit a strange issue
> > where some functions were missing in BTF data. This required considerable
> > debugging but can be reproduced simply:
> > 
> > $ bpftool --version
> > bpftool v7.6.0
> > using libbpf v1.6
> > features: llvm, skeletons
> > 
> > $ pahole --version
> > v1.29
> > 
> > $ pahole -J -j --btf_features=decl_tag,consistent_func,decl_tag_kfuncs .tmp_vmlinux_armhf
> > btf_encoder__tag_kfunc: failed to find kfunc 'scx_bpf_select_cpu_dfl' in BTF
> > btf_encoder__tag_kfuncs: failed to tag kfunc 'scx_bpf_select_cpu_dfl'
> > 
> > $ pfunct -Fbtf -E -f scx_bpf_select_cpu_dfl .tmp_vmlinux_armhf
> > <nothing>
> > 
> > $ pfunct -Fdwarf -E -f scx_bpf_select_cpu_dfl .tmp_vmlinux_armhf
> > s32 scx_bpf_select_cpu_dfl(struct task_struct * p, s32 prev_cpu, u64 wake_flags, bool * is_idle);
> > 
> > $ pahole -J -j --btf_features=decl_tag,decl_tag_kfuncs .tmp_vmlinux_armhf
> > 
> > $ pfunct -Fbtf -E -f scx_bpf_select_cpu_dfl .tmp_vmlinux_armhf
> > bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct * p, s32 prev_cpu, u64 wake_flags, bool * is_idle);
> > 
> > The key things to note are the pahole 'consistent_func' feature and the u64
> > 'wake_flags' parameter vs. arm 32-bit registers. These point to existing
> > code handling arguments larger than register-size, but only structs.
> > 
> > Generalize the code for any type of argument exceeding register size (i.e.
> > cu->addr_size). This should work for integral or aggregate types, and also
> > avoids a bug in the current code where a register-sized struct could be
> > mistaken for larger.
> > 
> > Fixes: a53c58158b76 ("dwarf_loader: Mark functions that do not use expected registers for params")
> > Signed-off-by: Tony Ambardar <tony.ambardar@xxxxxxxxx>
> 
> Thanks for investigating this! I've tested this versus baseline on
> x86_64 and aarch64. I'm seeing some small divergence in functions
> encoded; for example on aarch64 we don't get a representation for
> 
> static int __io_run_local_work(struct io_ring_ctx *ctx, io_tw_token_t
> tw, int min_events, int max_events);
> 
> The reason for that is the second argument is a typedef io_tw_token_t,
> which is in turn a typedef for:
> 
> struct io_tw_state {
> };
> 
> i.e. an empty struct.
> 
> The reason is with your patch we've moved from type-centric to
> size-centric criteria used to allow functions into BTF that have
> unexpected register usage; because the above function uses unexpected
> registers _and_ does not exceed the address size, the function is marked
> as having an inconsistent reg mapping. In this case, that seems
> reasonable since it is true; there is no register needed to represent
> the second argument.
> 
> The deeper rationale here in allowing functions that have structs that
> may be represented by multiple registers is that we can handle this
> outcome; the BPF_PROG2() macro was added to handle such cases and seems
> to handle multi-register representation but _not_ representations where
> a register is not needed at all. I'm basing that on the
> ___bpf_union_arg() macro in bpf_tracing.h so please correct me if I'm
> wrong (we could potentially add a sizeof(t) == 0 clause here perhaps).
> 
> So in other words, though we see small divergences in representation I
> _think_ they are consistent with our expectations.
> 
> I'd really like to see wider testing of this patch before it lands
> however so we can shake out other problematic cases if any. If folks
> could try this and compare BTF representations to baseline that would be
> great! In particular comparing raw BTF is necessary since vmlinux.h
> representations don't include functions (aside from kfuncs). Now that we
> have always-reproducible BTF a simple diff of "bpftool btf dump file
> vmlinux" can be used to make such comparisons.
> 
> However perhaps we could also think about enhancing the bpf_tracing.h
> macro to handle zero-sized parameters like empty structs such that later
> parameters are mapped to registers correctly (presuming that's
> possible)? Yonghong, what do you think?

Hi Alan,

Thanks so much for the additional context. I pressed pause to consider
this while waiting for further testing news or feedback, but haven't seen
anything since. Have you heard anything OOB?

I also understood dwarves could have CI working now, so wondering how
those tests with the patch might have gone. In fact, it would be great to
have a regular arm32 CI running if that's possible. Could you share how
the CI changes are being managed? I've recently been trying to update
the arm32 JIT and test_progs in tandem, with the goal of having a working
32-bit target for kernel-patches/bpf CI, but some baby-steps with dwarves
or libbpf could be very helpful.

As far as type-based vs size-based criteria, I'm not wedded to either, and
did look at the type-based route as currently exists. I needed to add
cases for DW_TAG_base_type (for ints), DW_TAG_volatile_type (recursive),
DW_TAG_union_type (same issues as structs), and then we still need size
tests anyway. Sticking with size-based (and a zero-test as you suggested)
seemed the simplest and preserved the functions you noticed missing.

Cheers,
Tony

> 
> Thanks!
> 
> Alan
> 
> > ---
> >  dwarf_loader.c | 37 ++++++++++++-------------------------
> >  1 file changed, 12 insertions(+), 25 deletions(-)
> > 
> > diff --git a/dwarf_loader.c b/dwarf_loader.c
> > index e1ba7bc..22abfdb 100644
> > --- a/dwarf_loader.c
> > +++ b/dwarf_loader.c
> > @@ -2914,23 +2914,9 @@ out:
> >  	return 0;
> >  }
> >  
> > -static bool param__is_struct(struct cu *cu, struct tag *tag)
> > +static bool param__is_wide(struct cu *cu, struct tag *tag)
> >  {
> > -	struct tag *type = cu__type(cu, tag->type);
> > -
> > -	if (!type)
> > -		return false;
> > -
> > -	switch (type->tag) {
> > -	case DW_TAG_structure_type:
> > -		return true;
> > -	case DW_TAG_const_type:
> > -	case DW_TAG_typedef:
> > -		/* handle "typedef struct", const parameter */
> > -		return param__is_struct(cu, type);
> > -	default:
> > -		return false;
> > -	}
> > +	return tag__size(tag, cu) > cu->addr_size;
> >  }
> >  
> >  static int cu__resolve_func_ret_types_optimized(struct cu *cu)
> > @@ -2942,9 +2928,9 @@ static int cu__resolve_func_ret_types_optimized(struct cu *cu)
> >  		struct tag *tag = pt->entries[i];
> >  		struct parameter *pos;
> >  		struct function *fn = tag__function(tag);
> > -		bool has_unexpected_reg = false, has_struct_param = false;
> > +		bool has_unexpected_reg = false, has_wide_param = false;
> >  
> > -		/* mark function as optimized if parameter is, or
> > +		/* Mark function as optimized if parameter is, or
> >  		 * if parameter does not have a location; at this
> >  		 * point location presence has been marked in
> >  		 * abstract origins for cases where a parameter
> > @@ -2953,10 +2939,11 @@ static int cu__resolve_func_ret_types_optimized(struct cu *cu)
> >  		 *
> >  		 * Also mark functions which, due to optimization,
> >  		 * use an unexpected register for a parameter.
> > -		 * Exception is functions which have a struct
> > -		 * as a parameter, as multiple registers may
> > -		 * be used to represent it, throwing off register
> > -		 * to parameter mapping.
> > +		 * Exception is functions which have a wide
> > +		 * parameter, as multiple registers may be used
> > +		 * to represent it, throwing off register to
> > +		 * parameter mapping. Examples could include
> > +		 * structs or 64-bit types on a 32-bit arch.
> >  		 */
> >  		ftype__for_each_parameter(&fn->proto, pos) {
> >  			if (pos->optimized || !pos->has_loc)
> > @@ -2967,11 +2954,11 @@ static int cu__resolve_func_ret_types_optimized(struct cu *cu)
> >  		}
> >  		if (has_unexpected_reg) {
> >  			ftype__for_each_parameter(&fn->proto, pos) {
> > -				has_struct_param = param__is_struct(cu, &pos->tag);
> > -				if (has_struct_param)
> > +				has_wide_param = param__is_wide(cu, &pos->tag);
> > +				if (has_wide_param)
> >  					break;
> >  			}
> > -			if (!has_struct_param)
> > +			if (!has_wide_param)
> >  				fn->proto.unexpected_reg = 1;
> >  		}
> >  
>