Re: [RFC PATCH bpf-next/net v1 07/13] bpf: Introduce per-prog stdout/stderr streams

Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> · Tue, 22 Apr 2025 04:30:37 +0200

On Tue, 22 Apr 2025 at 03:42, Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> > > > +BTF_KFUNCS_START(stream_consumer_kfunc_set)
> > > > +BTF_ID_FLAGS(func, bpf_stream_next_elem_batch, KF_ACQUIRE | KF_RET_NULL | KF_TRUSTED_ARGS)
> > > > +BTF_ID_FLAGS(func, bpf_stream_free_elem_batch, KF_RELEASE)
> > > > +BTF_ID_FLAGS(func, bpf_stream_next_elem, KF_ACQUIRE | KF_RET_NULL | KF_TRUSTED_ARGS)
> > > > +BTF_ID_FLAGS(func, bpf_stream_free_elem, KF_RELEASE)
> > > > +BTF_ID_FLAGS(func, bpf_prog_stream_get, KF_ACQUIRE | KF_RET_NULL)
> > > > +BTF_ID_FLAGS(func, bpf_prog_stream_put, KF_RELEASE)
> > > > +BTF_KFUNCS_END(stream_consumer_kfunc_set)
> > >
> > > This is a complicated API.
> > > If we anticipate that users intend to write this info to ring buffers
> > > maybe just provide a function doing that and do not expose complete API?
> >
> > I don't think anyone will use these functions directly, though they
> > can if they want to.
> > It's meant to be hidden behind bpftool, and macros to print stuff like
> > bpf_printk().
>
> I have to agree with Eduard here.
> The api feels overdesigned. I don't see how it can be reused
> anywhere outside of bpftool.

Can you explain why?
Apart from syscall prog requirement which is easy to lift (but didn't
bother for RFC).

> Instead of introducing mem_slice concept and above kfuncs,
> let's have one special kfunc that will take prog_id, stream_id
> and will move things into dynptr returning EAGAIN/ENOENT when necessary.
> EFAULT shouldn't be possible when the whole
>   SEC("syscall")
>   int bpftool_dump_prog_stream(void *ctx) {..}
> will be one kfunc.
> bpf_stream_dtor_ids won't be needed either.
> Hard coding such a big logic into one kfunc is not pretty,
> and smaller kfuncs/building blocks would be preferred any day,
> but these stream_consumer_kfunc_set just don't look reusable.

I don't follow. How is "shove all messages into ringbuf" more reusable?
I can sympathize with the complexity argument Eduard made, ideally
we'd have one pop() operation to pull out things.
We can discuss how to make things simpler.

With some documentation, it is not hard to follow:
Everytime you consume something from a stream, you splice out a batch
of N messages, you can then splice out each message individually from
a batch.
Both the batch and individual elements need to be freed. You can keep
processing the batch until it's finished, but the better idea is to
pace oneself and do N at a time.

In user space, you usually fill up your buffer's length (say
PAGE_SIZE) when reading something, then delimit and parse and draw
message boundaries when parsing.

You can flip the API in this patch around and use it to stream things
into the program.
We just need to add one more element in prog->aux->stream[BPF_STDIN = 0].
Even without that, you can make it work on stdout. The writing is bidirectional.

User space side can obtain the stream of a prog and printk to it.
Kernel side can pop and parse messages.
Probably not a bright idea to do so when in the fast path of the
kernel to read strings, but in other cases, why not?
It can certainly make sense for self contained programs that want to
consume options from the cmdline generically.
The generic loader can just forward the string when invoked in user
space into the stream, and call some init/main function for the
program before attaching it.
Your BPF init/main gets either raw stream, or prepared argv, argc
blocks that the loader's infra can set up for you.

At that point all logic in the program can be self-contained, you can
probably just write everything in BPF C, including setting config
options etc.
There would be no need to put these strings into a ringbuf, it would
just be for the BPF side to parse and make sense of.

But anyway, I will not push back too much. If everyone thinks "stream
data to ringbuf kfunc" is the way to go, I'll make the change in v2.