Re: [PATCH bpf-next v3 2/3] selftests/bpf: support array presets in veristat

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Fri, 13 Jun 2025 10:21:26 -0700

On Fri, Jun 13, 2025 at 10:10 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> On Fri, 2025-06-13 at 09:59 -0700, Andrii Nakryiko wrote:
> > On Fri, Jun 13, 2025 at 9:48 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
> > >
> > > On Fri, 2025-06-13 at 09:34 -0700, Andrii Nakryiko wrote:
> > > > On Wed, Jun 11, 2025 at 5:21 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
> > > > >
> > > > > What do you think about a more recursive representation for presets?
> > > > > E.g. as follows:
> > > > >
> > > > >   struct rvalue {
> > > > >     long long i; /* use find_enum_value() at parse time to avoid union */
> > > > >   };
> > > > >
> > > > >   struct lvalue {
> > > > >     enum { VAR, FIELD, ARRAY } type;
> > > > >     union {
> > > > >       struct {
> > > > >         char *name;
> > > > >       } var;
> > > > >       struct {
> > > > >         struct lvalue *base;
> > > > >         char *name;
> > > > >       } field;
> > > > >       struct {
> > > > >         struct lvalue *base;
> > > > >         struct rvalue index;
> > > > >       } array;
> > > > >     };
> > > > >   };
> > > > >
> > > > >   struct preset {
> > > > >     struct lvalue *lv;
> > > > >     struct rvalue rv;
> > > > >   };
> > > > >
> > > > > It can handle matrices ("a[2][3]") and offset/type computation would
> > > > > be a simple recursive function.
> > > > >
> > > >
> > > > Why do we need recursion, though? All we should need is an array specs
> > > > of one of the following types:
> > > >
> > > >   a) field access by name
> > > >     a.1) we might want to distinguish array field vs non-array field
> > > > to better catch unintended user errors
> > > >   b) indexing by immediate integer
> > > >   c) indexing by symbolic enum name (or we can combine b and c,
> > > > whatever works better).
> > > >
> > > > And that's all. And it will support multi-dimensional arrays.
> > > >
> > > > We then process this array one at a time. Each *step* itself might be
> > > > recursive: finding a field by name in C struct is necessarily
> > > > recursive due to anonymous embedded struct/union fields. But other
> > > > than that, it's a simple sequential operation.
> > > >
> > > > So unless I'm missing something, let's not add data recursion if it's
> > > > not needed.
> > >
> > > Recursive representation I simpler in a sense that you need only one
> > > data type. W/o recursion you need to distinguish between container
> > > data type that links to each constituent.
> >
> > So you have this tagged union of three different types some of which
> > are self-referential (struct lvalue *). Is that what is simpler than
> > an array of a similarly tagged union, but with no `struct lvalue *`
> > recursive data pointers? How so?
>
> The following:
>
>   struct rvalue {
>     long long i;
>     const char *enum;
>   };
>
>   struct field_access {
>     enum { FIELD, ARRAY } type;
>     union {
>       struct {
>         char *name;
>       } field;
>       struct {
>         struct rvalue index;
>       } array;
>     };
>   };
>
>   struct preset {
>     struct field_access *atoms;
>     int atoms_cnt;
>     const char *var;
>     struct rvalue rv;
>   };
>
> Is logically more complex data structure, because it has two levels
> for expression computation: to compute single offset and to drive a
> loop for offset accumulation.

Perhaps it's subjective? I find an array of field_access structs
simpler. Maybe because we have a similar approach with resolving CO-RE
relocation spec strings (I think it's in bpf_core_spec_match()).

I do prefer simpler non-recursive data structures.

>
> > > Plus in a computation function you need to distinguish first step from
> > > all others.  Recursive organization simplifies both data types and
> > > computation function.
> >
> > Not sure I follow. You have two situations, and it doesn't matter
> > whether it's a first or not first element. It's either lookup by field
> > name or indexing using integer or enum. In the latter case we should
> > have an active "current type" of array kind we are working with (so
> > yes, we need error checking, but just like with a recursive approach).
> >
> > Maybe I'm slow, but I don't see how recursivity is making anything
> > simpler, sorry.