On 07/05/2025 18:17, Kumar Kartikeya Dwivedi wrote: > Begin reporting may_goto timeouts to BPF program's stderr stream. > Make sure that we don't end up spamming too many errors if the > program keeps failing repeatedly and filling up the stream, hence > emit at most 512 error messages from the kernel for a given stream. > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> this series is great, having runtime error reporting like this is hugely valuable! One question below... > --- > include/linux/bpf.h | 21 ++++++++++++++------- > kernel/bpf/core.c | 17 ++++++++++++++++- > kernel/bpf/stream.c | 5 +++++ > 3 files changed, 35 insertions(+), 8 deletions(-) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index 46ce05aad0ed..daf95333be78 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -1667,6 +1667,7 @@ struct bpf_prog_aux { > struct rcu_head rcu; > }; > struct bpf_stream stream[2]; > + atomic_t stream_error_cnt; > }; > > struct bpf_prog { > @@ -3589,6 +3590,8 @@ void bpf_bprintf_cleanup(struct bpf_bprintf_data *data); > int bpf_try_get_buffers(struct bpf_bprintf_buffers **bufs); > void bpf_put_buffers(void); > > +#define BPF_PROG_STREAM_ERROR_CNT 512 > + > void bpf_prog_stream_init(struct bpf_prog *prog); > void bpf_prog_stream_free(struct bpf_prog *prog); > > @@ -3600,16 +3603,20 @@ int bpf_stream_stage_commit(struct bpf_stream_stage *ss, struct bpf_prog *prog, > enum bpf_stream_id stream_id); > int bpf_stream_stage_dump_stack(struct bpf_stream_stage *ss); > > +bool bpf_prog_stream_error_limit(struct bpf_prog *prog); > + > #define bpf_stream_printk(...) bpf_stream_stage_printk(&__ss, __VA_ARGS__) > #define bpf_stream_dump_stack() bpf_stream_stage_dump_stack(&__ss) > > -#define bpf_stream_stage(prog, stream_id, expr) \ > - ({ \ > - struct bpf_stream_stage __ss; \ > - bpf_stream_stage_init(&__ss); \ > - (expr); \ > - bpf_stream_stage_commit(&__ss, prog, stream_id); \ > - bpf_stream_stage_free(&__ss); \ > +#define bpf_stream_stage(prog, stream_id, expr) \ > + ({ \ > + struct bpf_stream_stage __ss; \ > + if (!bpf_prog_stream_error_limit(prog)) { \ > + bpf_stream_stage_init(&__ss); \ > + (expr); \ > + bpf_stream_stage_commit(&__ss, prog, stream_id); \ > + bpf_stream_stage_free(&__ss); \ > + } \ > }) > > #ifdef CONFIG_BPF_LSM > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c > index dcb665bff22f..d21c304fe829 100644 > --- a/kernel/bpf/core.c > +++ b/kernel/bpf/core.c > @@ -3156,6 +3156,19 @@ u64 __weak arch_bpf_timed_may_goto(void) > return 0; > } > > +static noinline void bpf_prog_report_may_goto_violation(void) > +{ > + struct bpf_prog *prog; > + > + prog = bpf_prog_find_from_stack(); > + if (!prog) > + return; > + bpf_stream_stage(prog, BPF_STDERR, ({ > + bpf_stream_printk("ERROR: Timeout detected for may_goto instruction\n"); > + bpf_stream_dump_stack(); > + })); > +} > + Given that we can hit a stream stage error limit, and that some users might want a high-level picture before diving into stream output, is there any scope here for adding error stats covering situations like this? I can imagine some users (perhaps users of bpftool) might not want to see the full error stream but rather get a summary of runtime error stats first, so recording runtime error counts (perhaps contingent on bpf_stats_enabled?) might be worthwhile too? Doesn't have to be this series of course, but just wondering if others perceive a need here too. A tracepoint for BPF runtime errors that is passed a bpf prog + an enum representing the error encountered would be pretty handy for tracers I suspect; that would allow them to tailor their output based upon their needs when runtime errors are hit, with later dumping of the whole error stream if required. Thanks! Alan