On Fri, Apr 25, 2025 at 1:36 PM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote: > > On 25/04/2025 18:58, Andrii Nakryiko wrote: > > On Fri, Apr 25, 2025 at 10:50 AM Alan Maguire <alan.maguire@xxxxxxxxxx> wrote: > >> > >> On 25/04/2025 15:50, Alexei Starovoitov wrote: > >>> Hi All, > >>> > >>> Looks like pahole fails to deduplicate BTF when kernel and > >>> kernel module are built with gcc-14. > >>> I see this issue with various kernel .config-s on bpf and > >>> bpf-next trees. > >>> I tried pahole 1.28 and the latest master. Same issues. > >>> > >>> BTF in bpf_testmod.ko built with gcc-14 has 2849 types. > >>> When built with gcc-13 it has 454 types. > >>> So something is confusing dedup logic. > >>> Would be great if dedup experts can take a look, > >>> since this dedup issue is breaking a lot of selftests/bpf. > >>> > >>> Also vmlinux.h generated out of the kernel compiled with gcc-13 > >>> and out of the kernel compiled with gcc-14 shows these differences: > >>> > >>> --- vmlinux13.h 2025-04-24 21:33:50.556884372 -0700 > >>> +++ vmlinux14.h 2025-04-24 21:39:10.310488992 -0700 > >>> @@ -148815,7 +148815,6 @@ > >>> extern int hid_bpf_input_report(struct hid_bpf_ctx *ctx, enum > >>> hid_report_type type, u8 *buf, const size_t buf__sz) __weak __ksym; > >>> extern void hid_bpf_release_context(struct hid_bpf_ctx *ctx) __weak __ksym; > >>> extern int hid_bpf_try_input_report(struct hid_bpf_ctx *ctx, enum > >>> hid_report_type type, u8 *buf, const size_t buf__sz) __weak __ksym; > >>> -extern bool scx_bpf_consume(u64 dsq_id) __weak __ksym; > >>> extern int scx_bpf_cpu_node(s32 cpu) __weak __ksym; > >>> extern struct rq *scx_bpf_cpu_rq(s32 cpu) __weak __ksym; > >>> extern u32 scx_bpf_cpuperf_cap(s32 cpu) __weak __ksym; > >>> @@ -148825,12 +148824,8 @@ > >>> extern void scx_bpf_destroy_dsq(u64 dsq_id) __weak __ksym; > >>> extern void scx_bpf_dispatch(struct task_struct *p, u64 dsq_id, u64 > >>> slice, u64 enq_flags) __weak __ksym; > >>> extern void scx_bpf_dispatch_cancel(void) __weak __ksym; > >>> -extern bool scx_bpf_dispatch_from_dsq(struct bpf_iter_scx_dsq > >>> *it__iter, struct task_struct *p, u64 dsq_id, u64 enq_flags) __weak > >>> __ksym; > >>> -extern void scx_bpf_dispatch_from_dsq_set_slice(struct > >>> bpf_iter_scx_dsq *it__iter, u64 slice) __weak __ksym; > >>> extern void scx_bpf_dispatch_from_dsq_set_vtime(struct > >>> bpf_iter_scx_dsq *it__iter, u64 vtime) __weak __ksym; > >>> extern u32 scx_bpf_dispatch_nr_slots(void) __weak __ksym; > >>> -extern void scx_bpf_dispatch_vtime(struct task_struct *p, u64 dsq_id, > >>> u64 slice, u64 vtime, u64 enq_flags) __weak __ksym; > >>> -extern bool scx_bpf_dispatch_vtime_from_dsq(struct bpf_iter_scx_dsq > >>> *it__iter, struct task_struct *p, u64 dsq_id, u64 enq_flags) __weak > >>> __ksym; > >>> extern void scx_bpf_dsq_insert(struct task_struct *p, u64 dsq_id, u64 > >>> slice, u64 enq_flags) __weak __ksym; > >>> extern void scx_bpf_dsq_insert_vtime(struct task_struct *p, u64 > >>> dsq_id, u64 slice, u64 vtime, u64 enq_flags) __weak __ksym; > >>> extern bool scx_bpf_dsq_move(struct bpf_iter_scx_dsq *it__iter, > >>> struct task_struct *p, u64 dsq_id, u64 enq_flags) __weak __ksym; > >>> > >>> gcc-14's kernel is clearly wrong. > >>> These 5 kfuncs still exist in the kernel. > >>> I manually checked there is no if __GNUC__ > 13 in the code. > >>> Also: > >>> nm bld/vmlinux|grep -w scx_bpf_consume > >>> ffffffff8159d4b0 T scx_bpf_consume > >>> ffffffff8120ea81 t scx_bpf_consume.cold > >>> > >>> I suspect the second issue is not related to the dedup problem. > >>> All 5 missing kfuncs have ".cold" optimized bodies. > >>> But ".cold" maybe a red herring, since > >>> nm bld/vmlinux|grep -w scx_bpf_dispatch > >>> ffffffff8159d020 T scx_bpf_dispatch > >>> ffffffff8120ea0f t scx_bpf_dispatch.cold > >>> but this kfunc is present in vmlinux14.h > >>> > >>> If it makes a difference I have these configs: > >>> # CONFIG_DEBUG_INFO_DWARF4 is not set > >>> # CONFIG_DEBUG_INFO_DWARF5 is not set > >>> # CONFIG_DEBUG_INFO_REDUCED is not set > >>> CONFIG_DEBUG_INFO_COMPRESSED_NONE=y > >>> # CONFIG_DEBUG_INFO_COMPRESSED_ZLIB is not set > >>> # CONFIG_DEBUG_INFO_SPLIT is not set > >>> CONFIG_DEBUG_INFO_BTF=y > >>> CONFIG_PAHOLE_HAS_SPLIT_BTF=y > >>> CONFIG_DEBUG_INFO_BTF_MODULES=y > >> > >> thanks for the report! I've just reproduced this now with gcc 14; my > >> initial theory was it might be DWARF5-related, but dedup issues occur > >> for modules with CONFIG_DEBUG_INFO_DWARF4=y also. I'm seeing task_struct > >> duplicates in module BTF among other things, so I will try and dig > >> further and report back when I find something. Like you I suspect the > > > > This is a bizarre case. I have a custom small tool that recursively > > traverses two parallel subgraphs of BTF types and prints anything that > > differs between them ([0]). (I had to disable distilled BTF to make > > use of this, the issue is present both with distilled BTF and > > without). > > > > I see that struct sock both in vmlinux and bpf_testmod.ko are > > *IDENTICAL*. There is no difference I could detect. So very weird. I'm > > thinking of bisecting, as this didn't happen before with exactly the > > same compiler and pahole, so this must be a kernel-side change. > > > > [0] https://github.com/anakryiko/libbpf-bootstrap/tree/btfdiff-hack > > > > thanks for the pointer to this! My initial suspicion was that we had > some sort of dups of slightly-differently-defined primitive types that > bubbled up through multiple structs in the module case since the level > of duplication is so high; a colleague ran across something like this > recently and indeed if I dump vmlinux BTF in C format I see: > > typedef unsigned char u8___2; > > ...along with the original u8 definition: > > typedef unsigned char __u8; > typedef __u8 u8; Are you sure you are not dumping distilled BTF? > > However on checking I didn't find any references to the "wrong" u8, so I > don't think it is the cause (the definition comes from > crypto/jitterentropy.c so as a .c redefinition it's less likely to cause > chaos across multiple CUs). > > Perhaps we should be thinking of cases where "#ifdef MODULE" leads to > different structure content, maybe something changed that results in > that leaking into core kernel structures like task_struct. Haven't had > any luck finding a common culprit across duplicated structures yet.. > > Alan