Re: [PATCH bpf-next v4 02/12] bpf: Introduce BPF standard streams

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2 Jul 2025 at 18:10, Emil Tsalapatis <emil@xxxxxxxxxxxxxxx> wrote:
>
> Looks good overall, some nits:
>
> On Tue, Jul 1, 2025 at 11:17 PM Kumar Kartikeya Dwivedi
> <memxor@xxxxxxxxx> wrote:
> >
> > Add support for a stream API to the kernel and expose related kfuncs to
> > BPF programs. Two streams are exposed, BPF_STDOUT and BPF_STDERR. These
> > can be used for printing messages that can be consumed from user space,
> > thus it's similar in spirit to existing trace_pipe interface.
> >
> > The kernel will use the BPF_STDERR stream to notify the program of any
> > errors encountered at runtime. BPF programs themselves may use both
> > streams for writing debug messages. BPF library-like code may use
> > BPF_STDERR to print warnings or errors on misuse at runtime.
> >
> > The implementation of a stream is as follows. Everytime a message is
> > emitted from the kernel (directly, or through a BPF program), a record
> > is allocated by bump allocating from per-cpu region backed by a page
> > obtained using alloc_pages_nolock(). This ensures that we can allocate
> > memory from any context. The eventual plan is to discard this scheme in
> > favor of Alexei's kmalloc_nolock() [0].
> >
> > This record is then locklessly inserted into a list (llist_add()) so
> > that the printing side doesn't require holding any locks, and works in
> > any context. Each stream has a maximum capacity of 4MB of text, and each
> > printed message is accounted against this limit.
> >
> > Messages from a program are emitted using the bpf_stream_vprintk kfunc,
> > which takes a stream_id argument in addition to working otherwise
> > similar to bpf_trace_vprintk.
> >
> > The bprintf buffer helpers are extracted out to be reused for printing
> > the string into them before copying it into the stream, so that we can
> > (with the defined max limit) format a string and know its true length
> > before performing allocations of the stream element.
> >
> > For consuming elements from a stream, we expose a bpf(2) syscall command
> > named BPF_PROG_STREAM_READ_BY_FD, which allows reading data from the
> > stream of a given prog_fd into a user space buffer. The main logic is
> > implemented in bpf_stream_read(). The log messages are queued in
> > bpf_stream::log by the bpf_stream_vprintk kfunc, and then pulled and
> > ordered correctly in the stream backlog.
> >
> > For this purpose, we hold a lock around bpf_stream_backlog_peek(), as
> > llist_del_first() (if we maintained a second lockless list for the
> > backlog) wouldn't be safe from multiple threads anyway. Then, if we
> > fail to find something in the backlog log, we splice out everything from
> > the lockless log, and place it in the backlog log, and then return the
> > head of the backlog. Once the full length of the element is consumed, we
> > will pop it and free it.
> >
> > The lockless list bpf_stream::log is a LIFO stack. Elements obtained
> > using a llist_del_all() operation are in LIFO order, thus would break
> > the chronological ordering if printed directly. Hence, this batch of
> > messages is first reversed. Then, it is stashed into a separate list in
> > the stream, i.e. the backlog_log. The head of this list is the actual
> > message that should always be returned to the caller. All of this is
> > done in bpf_stream_backlog_fill().
> >
> > From the kernel side, the writing into the stream will be a bit more
> > involved than the typical printk. First, the kernel typically may print
> > a collection of messages into the stream, and parallel writers into the
> > stream may suffer from interleaving of messages. To ensure each group of
> > messages is visible atomically, we can lift the advantage of using a
> > lockless list for pushing in messages.
> >
> > To enable this, we add a bpf_stream_stage() macro, and require kernel
> > users to use bpf_stream_printk statements for the passed expression to
> > write into the stream. Underneath the macro, we have a message staging
> > API, where a bpf_stream_stage object on the stack accumulates the
> > messages being printed into a local llist_head, and then a commit
> > operation splices the whole batch into the stream's lockless log list.
> >
> > This is especially pertinent for rqspinlock deadlock messages printed to
> > program streams. After this change, we see each deadlock invocation as a
> > non-interleaving contiguous message without any confusion on the
> > reader's part, improving their user experience in debugging the fault.
> >
> > While programs cannot benefit from this staged stream writing API, they
> > could just as well hold an rqspinlock around their print statements to
> > serialize messages, hence this is kept kernel-internal for now.
> >
> > Overall, this infrastructure provides NMI-safe any context printing of
> > messages to two dedicated streams.
> >
> > Later patches will add support for printing splats in case of BPF arena
> > page faults, rqspinlock deadlocks, and cond_break timeouts, and
> > integration of this facility into bpftool for dumping messages to user
> > space.
> >
> >   [0]: https://lore.kernel.org/bpf/20250501032718.65476-1-alexei.starovoitov@xxxxxxxxx
> >
> > Reviewed-by: Eduard Zingerman <eddyz87@xxxxxxxxx>
> > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx>
> > ---
> >  include/linux/bpf.h            |  52 ++++
> >  include/uapi/linux/bpf.h       |  24 ++
> >  kernel/bpf/Makefile            |   2 +-
> >  kernel/bpf/core.c              |   5 +
> >  kernel/bpf/helpers.c           |   1 +
> >  kernel/bpf/stream.c            | 478 +++++++++++++++++++++++++++++++++
> >  kernel/bpf/syscall.c           |  25 ++
> >  kernel/bpf/verifier.c          |   1 +
> >  tools/include/uapi/linux/bpf.h |  24 ++
> >  9 files changed, 611 insertions(+), 1 deletion(-)
> >  create mode 100644 kernel/bpf/stream.c
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 4fff0cee8622..85b1cbe494f5 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1538,6 +1538,37 @@ struct btf_mod_pair {
> >
> >  struct bpf_kfunc_desc_tab;
> >
> > +enum bpf_stream_id {
> > +       BPF_STDOUT = 1,
> > +       BPF_STDERR = 2,
> > +};
> > +
> > +struct bpf_stream_elem {
> > +       struct llist_node node;
> > +       int total_len;
> > +       int consumed_len;
> > +       char str[];
> > +};
> > +
> > +enum {
> > +       /* 100k bytes */
> > +       BPF_STREAM_MAX_CAPACITY = 100000ULL,
> > +};
> > +
> > +struct bpf_stream {
> > +       atomic_t capacity;
> > +       struct llist_head log;  /* list of in-flight stream elements in LIFO order */
> > +
> > +       struct mutex lock;  /* lock protecting backlog_{head,tail} */
> > +       struct llist_node *backlog_head; /* list of in-flight stream elements in FIFO order */
> > +       struct llist_node *backlog_tail; /* tail of the list above */
> > +};
> > +
> > +struct bpf_stream_stage {
> > +       struct llist_head log;
> > +       int len;
> > +};
> > +
> >  struct bpf_prog_aux {
> >         atomic64_t refcnt;
> >         u32 used_map_cnt;
> > @@ -1646,6 +1677,7 @@ struct bpf_prog_aux {
> >                 struct work_struct work;
> >                 struct rcu_head rcu;
> >         };
> > +       struct bpf_stream stream[2];
> >  };
> >
> >  struct bpf_prog {
> > @@ -2408,6 +2440,7 @@ int  generic_map_delete_batch(struct bpf_map *map,
> >  struct bpf_map *bpf_map_get_curr_or_next(u32 *id);
> >  struct bpf_prog *bpf_prog_get_curr_or_next(u32 *id);
> >
> > +
> >  int bpf_map_alloc_pages(const struct bpf_map *map, int nid,
> >                         unsigned long nr_pages, struct page **page_array);
> >  #ifdef CONFIG_MEMCG
> > @@ -3573,6 +3606,25 @@ void bpf_bprintf_cleanup(struct bpf_bprintf_data *data);
> >  int bpf_try_get_buffers(struct bpf_bprintf_buffers **bufs);
> >  void bpf_put_buffers(void);
> >
> > +void bpf_prog_stream_init(struct bpf_prog *prog);
> > +void bpf_prog_stream_free(struct bpf_prog *prog);
> > +int bpf_prog_stream_read(struct bpf_prog *prog, enum bpf_stream_id stream_id, void __user *buf, int len);
> > +void bpf_stream_stage_init(struct bpf_stream_stage *ss);
> > +void bpf_stream_stage_free(struct bpf_stream_stage *ss);
> > +__printf(2, 3)
> > +int bpf_stream_stage_printk(struct bpf_stream_stage *ss, const char *fmt, ...);
> > +int bpf_stream_stage_commit(struct bpf_stream_stage *ss, struct bpf_prog *prog,
> > +                           enum bpf_stream_id stream_id);
> > +
> > +#define bpf_stream_printk(ss, ...) bpf_stream_stage_printk(&ss, __VA_ARGS__)
> > +
> > +#define bpf_stream_stage(ss, prog, stream_id, expr)            \
> > +       ({                                                     \
> > +               bpf_stream_stage_init(&ss);                    \
> > +               (expr);                                        \
> > +               bpf_stream_stage_commit(&ss, prog, stream_id); \
> > +               bpf_stream_stage_free(&ss);                    \
> > +       })
> >
> >  #ifdef CONFIG_BPF_LSM
> >  void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype);
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 719ba230032f..0670e15a6100 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -906,6 +906,17 @@ union bpf_iter_link_info {
> >   *             A new file descriptor (a nonnegative integer), or -1 if an
> >   *             error occurred (in which case, *errno* is set appropriately).
> >   *
> > + * BPF_PROG_STREAM_READ_BY_FD
> > + *     Description
> > + *             Read data of a program's BPF stream. The program is identified
> > + *             by *prog_fd*, and the stream is identified by the *stream_id*.
> > + *             The data is copied to a buffer pointed to by *stream_buf*, and
> > + *             filled less than or equal to *stream_buf_len* bytes.
> > + *
> > + *     Return
> > + *             Number of bytes read from the stream on success, or -1 if an
> > + *             error occurred (in which case, *errno* is set appropriately).
> > + *
> >   * NOTES
> >   *     eBPF objects (maps and programs) can be shared between processes.
> >   *
> > @@ -961,6 +972,7 @@ enum bpf_cmd {
> >         BPF_LINK_DETACH,
> >         BPF_PROG_BIND_MAP,
> >         BPF_TOKEN_CREATE,
> > +       BPF_PROG_STREAM_READ_BY_FD,
> >         __MAX_BPF_CMD,
> >  };
> >
> > @@ -1463,6 +1475,11 @@ struct bpf_stack_build_id {
> >
> >  #define BPF_OBJ_NAME_LEN 16U
> >
> > +enum {
> > +       BPF_STREAM_STDOUT = 1,
> > +       BPF_STREAM_STDERR = 2,
> > +};
> > +
> >  union bpf_attr {
> >         struct { /* anonymous struct used by BPF_MAP_CREATE command */
> >                 __u32   map_type;       /* one of enum bpf_map_type */
> > @@ -1849,6 +1866,13 @@ union bpf_attr {
> >                 __u32           bpffs_fd;
> >         } token_create;
> >
> > +       struct {
> > +               __aligned_u64   stream_buf;
> > +               __u32           stream_buf_len;
> > +               __u32           stream_id;
> > +               __u32           prog_fd;
> > +       } prog_stream_read;
> > +
> >  } __attribute__((aligned(8)));
> >
> >  /* The description below is an attempt at providing documentation to eBPF
> > diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
> > index 3a335c50e6e3..269c04a24664 100644
> > --- a/kernel/bpf/Makefile
> > +++ b/kernel/bpf/Makefile
> > @@ -14,7 +14,7 @@ obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
> >  obj-${CONFIG_BPF_LSM}    += bpf_inode_storage.o
> >  obj-$(CONFIG_BPF_SYSCALL) += disasm.o mprog.o
> >  obj-$(CONFIG_BPF_JIT) += trampoline.o
> > -obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o rqspinlock.o
> > +obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o rqspinlock.o stream.o
> >  ifeq ($(CONFIG_MMU)$(CONFIG_64BIT),yy)
> >  obj-$(CONFIG_BPF_SYSCALL) += arena.o range_tree.o
> >  endif
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index e536a34a32c8..f0def24573ae 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -134,6 +134,10 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag
> >         mutex_init(&fp->aux->ext_mutex);
> >         mutex_init(&fp->aux->dst_mutex);
> >
> > +#ifdef CONFIG_BPF_SYSCALL
> > +       bpf_prog_stream_init(fp);
> > +#endif
> > +
> >         return fp;
> >  }
> >
> > @@ -2862,6 +2866,7 @@ static void bpf_prog_free_deferred(struct work_struct *work)
> >         aux = container_of(work, struct bpf_prog_aux, work);
> >  #ifdef CONFIG_BPF_SYSCALL
> >         bpf_free_kfunc_btf_tab(aux->kfunc_btf_tab);
> > +       bpf_prog_stream_free(aux->prog);
> >  #endif
> >  #ifdef CONFIG_CGROUP_BPF
> >         if (aux->cgroup_atype != CGROUP_BPF_ATTACH_TYPE_INVALID)
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 8f1cc1d525db..61fdd343d6f5 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -3778,6 +3778,7 @@ BTF_ID_FLAGS(func, bpf_strnstr);
> >  #if defined(CONFIG_BPF_LSM) && defined(CONFIG_CGROUPS)
> >  BTF_ID_FLAGS(func, bpf_cgroup_read_xattr, KF_RCU)
> >  #endif
> > +BTF_ID_FLAGS(func, bpf_stream_vprintk, KF_TRUSTED_ARGS)
> >  BTF_KFUNCS_END(common_btf_ids)
> >
> >  static const struct btf_kfunc_id_set common_kfunc_set = {
> > diff --git a/kernel/bpf/stream.c b/kernel/bpf/stream.c
> > new file mode 100644
> > index 000000000000..c4925f8d275f
> > --- /dev/null
> > +++ b/kernel/bpf/stream.c
> > @@ -0,0 +1,478 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/* Copyright (c) 2025 Meta Platforms, Inc. and affiliates. */
> > +
> > +#include <linux/bpf.h>
> > +#include <linux/bpf_mem_alloc.h>
> > +#include <linux/percpu.h>
> > +#include <linux/refcount.h>
> > +#include <linux/gfp.h>
> > +#include <linux/memory.h>
> > +#include <linux/local_lock.h>
> > +#include <linux/mutex.h>
> > +
> > +/*
> > + * Simple per-CPU NMI-safe bump allocation mechanism, backed by the NMI-safe
> > + * try_alloc_pages()/free_pages_nolock() primitives. We allocate a page and
> > + * stash it in a local per-CPU variable, and bump allocate from the page
> > + * whenever items need to be printed to a stream. Each page holds a global
> > + * atomic refcount in its first 4 bytes, and then records of variable length
> > + * that describe the printed messages. Once the global refcount has dropped to
> > + * zero, it is a signal to free the page back to the kernel's page allocator,
> > + * given all the individual records in it have been consumed.
> > + *
> > + * It is possible the same page is used to serve allocations across different
> > + * programs, which may be consumed at different times individually, hence
> > + * maintaining a reference count per-page is critical for correct lifetime
> > + * tracking.
> > + *
> > + * The bpf_stream_page code will be replaced to use kmalloc_nolock() once it
> > + * lands.
> > + */
> > +struct bpf_stream_page {
> > +       refcount_t ref;
> > +       u32 consumed;
> > +       char buf[];
> > +};
> > +
> > +/* Available room to add data to a refcounted page. */
> > +#define BPF_STREAM_PAGE_SZ (PAGE_SIZE - offsetofend(struct bpf_stream_page, consumed))
> > +
> > +static DEFINE_PER_CPU(local_trylock_t, stream_local_lock) = INIT_LOCAL_TRYLOCK(stream_local_lock);
> > +static DEFINE_PER_CPU(struct bpf_stream_page *, stream_pcpu_page);
> > +
> > +static bool bpf_stream_page_local_lock(unsigned long *flags)
> > +{
> > +       return local_trylock_irqsave(&stream_local_lock, *flags);
> > +}
> > +
> > +static void bpf_stream_page_local_unlock(unsigned long *flags)
> > +{
> > +       local_unlock_irqrestore(&stream_local_lock, *flags);
> > +}
> > +
> > +static void bpf_stream_page_free(struct bpf_stream_page *stream_page)
> > +{
> > +       struct page *p;
> > +
> > +       if (!stream_page)
> > +               return;
> > +       p = virt_to_page(stream_page);
> > +       free_pages_nolock(p, 0);
> > +}
> > +
> > +static void bpf_stream_page_get(struct bpf_stream_page *stream_page)
> > +{
> > +       refcount_inc(&stream_page->ref);
> > +}
> > +
> > +static void bpf_stream_page_put(struct bpf_stream_page *stream_page)
> > +{
> > +       if (refcount_dec_and_test(&stream_page->ref))
> > +               bpf_stream_page_free(stream_page);
> > +}
> > +
> > +static void bpf_stream_page_init(struct bpf_stream_page *stream_page)
> > +{
> > +       refcount_set(&stream_page->ref, 1);
> > +       stream_page->consumed = 0;
> > +}
> > +
> > +static struct bpf_stream_page *bpf_stream_page_replace(void)
> > +{
> > +       struct bpf_stream_page *stream_page, *old_stream_page;
> > +       struct page *page;
> > +
> > +       page = alloc_pages_nolock(NUMA_NO_NODE, 0);
> > +       if (!page)
> > +               return NULL;
> > +       stream_page = page_address(page);
> > +       bpf_stream_page_init(stream_page);
> > +
> > +       old_stream_page = this_cpu_read(stream_pcpu_page);
> > +       if (old_stream_page)
> > +               bpf_stream_page_put(old_stream_page);
> > +       this_cpu_write(stream_pcpu_page, stream_page);
> > +       return stream_page;
> > +}
> > +
> > +static int bpf_stream_page_check_room(struct bpf_stream_page *stream_page, int len)
> > +{
> > +       int min = offsetof(struct bpf_stream_elem, str[0]);
> > +       int consumed = stream_page->consumed;
> > +       int total = BPF_STREAM_PAGE_SZ;
> > +       int rem = max(0, total - consumed - min);
> > +
> > +       /* Let's give room of at least 8 bytes. */
> > +       WARN_ON_ONCE(rem % 8 != 0);
> > +       rem = rem < 8 ? 0 : rem;
> > +       return min(len, rem);
> > +}
> > +
> > +static void bpf_stream_elem_init(struct bpf_stream_elem *elem, int len)
> > +{
> > +       init_llist_node(&elem->node);
> > +       elem->total_len = len;
> > +       elem->consumed_len = 0;
> > +}
> > +
> > +static struct bpf_stream_page *bpf_stream_page_from_elem(struct bpf_stream_elem *elem)
> > +{
> > +       unsigned long addr = (unsigned long)elem;
> > +
> > +       return (struct bpf_stream_page *)PAGE_ALIGN_DOWN(addr);
> > +}
> > +
> > +static struct bpf_stream_elem *bpf_stream_page_push_elem(struct bpf_stream_page *stream_page, int len)
> > +{
> > +       u32 consumed = stream_page->consumed;
> > +
> > +       stream_page->consumed += round_up(offsetof(struct bpf_stream_elem, str[len]), 8);
> > +       return (struct bpf_stream_elem *)&stream_page->buf[consumed];
> > +}
> > +
> > +static noinline struct bpf_stream_elem *bpf_stream_page_reserve_elem(int len)
>
> Why noinline?

Ack, should be dropped.

>
> > +{
> > +       struct bpf_stream_elem *elem = NULL;
> > +       struct bpf_stream_page *page;
> > +       int room = 0;
> > +
> > +       page = this_cpu_read(stream_pcpu_page);
> > +       if (!page)
> > +               page = bpf_stream_page_replace();
> > +       if (!page)
> > +               return NULL;
> > +
> > +       room = bpf_stream_page_check_room(page, len);
> > +       if (room != len)
> > +               page = bpf_stream_page_replace();
> > +       if (!page)
> > +               return NULL;
> > +       bpf_stream_page_get(page);
> > +       room = bpf_stream_page_check_room(page, len);
> > +       WARN_ON_ONCE(room != len);
> > +
> > +       elem = bpf_stream_page_push_elem(page, room);
> > +       bpf_stream_elem_init(elem, room);
> > +       return elem;
> > +}
> > +
> > +static struct bpf_stream_elem *bpf_stream_elem_alloc(int len)
> > +{
> > +       const int max_len = ARRAY_SIZE((struct bpf_bprintf_buffers){}.buf);
> > +       struct bpf_stream_elem *elem;
> > +       unsigned long flags;
> > +
> > +       BUILD_BUG_ON(max_len > BPF_STREAM_PAGE_SZ);
> > +       /*
> > +        * Length denotes the amount of data to be written as part of stream element,
> > +        * thus includes '\0' byte. We're capped by how much bpf_bprintf_buffers can
> > +        * accomodate, therefore deny allocations that won't fit into them.
> > +        */
> > +       if (len < 0 || len > max_len)
> > +               return NULL;
> > +
> > +       if (!bpf_stream_page_local_lock(&flags))
> > +               return NULL;
> > +       elem = bpf_stream_page_reserve_elem(len);
> > +       bpf_stream_page_local_unlock(&flags);
> > +       return elem;
> > +}
> > +
> > +static int __bpf_stream_push_str(struct llist_head *log, const char *str, int len)
> > +{
> > +       struct bpf_stream_elem *elem = NULL;
> > +
> > +       /*
> > +        * Allocate a bpf_prog_stream_elem and push it to the bpf_prog_stream
> > +        * log, elements will be popped at once and reversed to print the log.
> > +        */
> > +       elem = bpf_stream_elem_alloc(len);
> > +       if (!elem)
> > +               return -ENOMEM;
> > +
> > +       memcpy(elem->str, str, len);
> > +       llist_add(&elem->node, log);
> > +
> > +       return 0;
> > +}
> > +
> > +static int bpf_stream_consume_capacity(struct bpf_stream *stream, int len)
> > +{
> > +       if (atomic_read(&stream->capacity) >= BPF_STREAM_MAX_CAPACITY)
> > +               return -ENOSPC;
> > +       if (atomic_add_return(len, &stream->capacity) >= BPF_STREAM_MAX_CAPACITY) {
> > +               atomic_sub(len, &stream->capacity);
> > +               return -ENOSPC;
> > +       }
> > +       return 0;
> > +}
> > +
> > +static void bpf_stream_release_capacity(struct bpf_stream *stream, struct bpf_stream_elem *elem)
> > +{
> > +       int len = elem->total_len;
> > +
> > +       atomic_sub(len, &stream->capacity);
> > +}
> > +
> > +static int bpf_stream_push_str(struct bpf_stream *stream, const char *str, int len)
> > +{
> > +       int ret = bpf_stream_consume_capacity(stream, len);
> > +
> > +       return ret ?: __bpf_stream_push_str(&stream->log, str, len);
> > +}
> > +
> > +static struct bpf_stream *bpf_stream_get(enum bpf_stream_id stream_id, struct bpf_prog_aux *aux)
> > +{
> > +       if (stream_id != BPF_STDOUT && stream_id != BPF_STDERR)
> > +               return NULL;
> > +       return &aux->stream[stream_id - 1];
> > +}
> > +
> > +static void bpf_stream_free_elem(struct bpf_stream_elem *elem)
> > +{
> > +       struct bpf_stream_page *p;
> > +
> > +       p = bpf_stream_page_from_elem(elem);
> > +       bpf_stream_page_put(p);
> > +}
> > +
> > +static void bpf_stream_free_list(struct llist_node *list)
> > +{
> > +       struct bpf_stream_elem *elem, *tmp;
> > +
> > +       llist_for_each_entry_safe(elem, tmp, list, node)
> > +               bpf_stream_free_elem(elem);
> > +}
> > +
> > +static struct llist_node *bpf_stream_backlog_peek(struct bpf_stream *stream)
> > +{
> > +       return stream->backlog_head;
> > +}
> > +
> > +static struct llist_node *bpf_stream_backlog_pop(struct bpf_stream *stream)
> > +{
> > +       struct llist_node *node;
> > +
> > +       node = stream->backlog_head;
> > +       if (stream->backlog_head == stream->backlog_tail)
> > +               stream->backlog_head = stream->backlog_tail = NULL;
> > +       else
> > +               stream->backlog_head = node->next;
> > +       return node;
> > +}
> > +
> > +static void bpf_stream_backlog_fill(struct bpf_stream *stream)
> > +{
> > +       struct llist_node *head, *tail;
> > +
> > +       if (llist_empty(&stream->log))
> > +               return;
> > +       tail = llist_del_all(&stream->log);
> > +       if (!tail)
> > +               return;
> > +       head = llist_reverse_order(tail);
> > +
> > +       if (!stream->backlog_head) {
> > +               stream->backlog_head = head;
> > +               stream->backlog_tail = tail;
> > +       } else {
> > +               stream->backlog_tail->next = head;
> > +               stream->backlog_tail = tail;
> > +       }
> > +
> > +       return;
> > +}
> > +
> > +static bool bpf_stream_consume_elem(struct bpf_stream_elem *elem, int *len)
> > +{
> > +       int rem = elem->total_len - elem->consumed_len;
> > +       int used = min(rem, *len);
> > +
> > +       elem->consumed_len += used;
> > +       *len -= used;
> > +
> > +       return elem->consumed_len == elem->total_len;
> > +}
> > +
> > +static int bpf_stream_read(struct bpf_stream *stream, void __user *buf, int len)
> > +{
> > +       int rem_len = len, cons_len, ret = 0;
> > +       struct bpf_stream_elem *elem = NULL;
> > +       struct llist_node *node;
> > +
> > +       mutex_lock(&stream->lock);
> > +
> > +       while (rem_len) {
> > +               int pos = len - rem_len;
> > +               bool cont;
> > +
> > +               node = bpf_stream_backlog_peek(stream);
> > +               if (!node) {
> > +                       bpf_stream_backlog_fill(stream);
> > +                       node = bpf_stream_backlog_peek(stream);
> > +               }
> > +               if (!node)
> > +                       break;
> > +               elem = container_of(node, typeof(*elem), node);
> > +
> > +               cons_len = elem->consumed_len;
> > +               cont = bpf_stream_consume_elem(elem, &rem_len) == false;
> > +
> > +               ret = copy_to_user(buf + pos, elem->str + cons_len,
> > +                                  elem->consumed_len - cons_len);
> > +               /* Restore in case of error. */
> > +               if (ret) {
> > +                       ret = -EFAULT;
> > +                       elem->consumed_len = cons_len;
> > +                       break;
> > +               }
> > +
> > +               if (cont)
> > +                       continue;
> > +               bpf_stream_backlog_pop(stream);
> > +               bpf_stream_release_capacity(stream, elem);
> > +               bpf_stream_free_elem(elem);
> > +       }
> > +
> > +       mutex_unlock(&stream->lock);
> > +       return ret ? ret : len - rem_len;
> > +}
> > +
> > +int bpf_prog_stream_read(struct bpf_prog *prog, enum bpf_stream_id stream_id, void __user *buf, int len)
> > +{
> > +       struct bpf_stream *stream;
> > +
> > +       stream = bpf_stream_get(stream_id, prog->aux);
> > +       if (!stream)
> > +               return -ENOENT;
> > +       return bpf_stream_read(stream, buf, len);
> > +}
> > +
> > +__bpf_kfunc_start_defs();
> > +
> > +/*
> > + * Avoid using enum bpf_stream_id so that kfunc users don't have to pull in the
> > + * enum in headers.
> > + */
> > +__bpf_kfunc int bpf_stream_vprintk(int stream_id, const char *fmt__str, const void *args, u32 len__sz, void *aux__prog)
> > +{
> > +       struct bpf_bprintf_data data = {
> > +               .get_bin_args   = true,
> > +               .get_buf        = true,
> > +       };
> > +       struct bpf_prog_aux *aux = aux__prog;
> > +       u32 fmt_size = strlen(fmt__str) + 1;
> > +       struct bpf_stream *stream;
> > +       u32 data_len = len__sz;
> > +       int ret, num_args;
> > +
> > +       stream = bpf_stream_get(stream_id, aux);
> > +       if (!stream)
> > +               return -ENOENT;
> > +
> > +       if (data_len & 7 || data_len > MAX_BPRINTF_VARARGS * 8 ||
>
> Maybe rename data_len to vararg/argarr_len or something else? Right
> now  looks like it's the length of the actual data instead
> of the vararg array.
>

Kept it the same as trace_printk for consistency.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux