Re: [PATCH bpf-next v3 15/16] selftests/bpf: Add test cases for hash map with dynptr key

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Mon, 7 Apr 2025 09:05:45 -0700

On Sun, Apr 6, 2025 at 7:47 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> On 4/5/2025 1:58 AM, Andrii Nakryiko wrote:
> > On Thu, Mar 27, 2025 at 1:23 AM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
> >> From: Hou Tao <houtao1@xxxxxxxxxx>
> >>
> >> Add three positive test cases to test the basic operations on the
> >> dynptr-keyed hash map. The basic operations include lookup, update,
> >> delete and get_next_key. These operations are exercised both through
> >> bpf syscall and bpf program. These three test cases use different map
> >> keys. The first test case uses both bpf_dynptr and a struct with only
> >> bpf_dynptr as map key, the second one uses a struct with an integer and
> >> a bpf_dynptr as map key, and the last one use a struct with two
> >> bpf_dynptr as map key: one in the struct itself and another is nested in
> >> another struct.
> >>
> >> Also add multiple negative test cases for dynptr-keyed hash map. These
> >> test cases mainly check whether the layout of dynptr and non-dynptr in
> >> the stack is matched with the definition of map->key_record.
> >>
> >> Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx>
> >> ---
> >>  .../bpf/prog_tests/htab_dynkey_test.c         | 446 ++++++++++++++++++
> >>  .../bpf/progs/htab_dynkey_test_failure.c      | 266 +++++++++++
> >>  .../bpf/progs/htab_dynkey_test_success.c      | 382 +++++++++++++++
> >>  3 files changed, 1094 insertions(+)
> >>  create mode 100644 tools/testing/selftests/bpf/prog_tests/htab_dynkey_test.c
> >>  create mode 100644 tools/testing/selftests/bpf/progs/htab_dynkey_test_failure.c
> >>  create mode 100644 tools/testing/selftests/bpf/progs/htab_dynkey_test_success.c
> >>
> > [...]
> >
> >> diff --git a/tools/testing/selftests/bpf/progs/htab_dynkey_test_success.c b/tools/testing/selftests/bpf/progs/htab_dynkey_test_success.c
> >> new file mode 100644
> >> index 0000000000000..84e6931cc19c0
> >> --- /dev/null
> >> +++ b/tools/testing/selftests/bpf/progs/htab_dynkey_test_success.c
> >> @@ -0,0 +1,382 @@
> >> +// SPDX-License-Identifier: GPL-2.0
> >> +/* Copyright (C) 2025. Huawei Technologies Co., Ltd */
> >> +#include <linux/types.h>
> >> +#include <linux/bpf.h>
> >> +#include <bpf/bpf_helpers.h>
> >> +#include <bpf/bpf_tracing.h>
> >> +#include <errno.h>
> >> +
> >> +#include "bpf_misc.h"
> >> +
> >> +char _license[] SEC("license") = "GPL";
> >> +
> >> +struct pure_dynptr_key {
> >> +       struct bpf_dynptr name;
> >> +};
> >> +
> >> +struct mixed_dynptr_key {
> >> +       int id;
> >> +       struct bpf_dynptr name;
> >> +};
> >> +
> >> +struct multiple_dynptr_key {
> >> +       struct pure_dynptr_key f_1;
> >> +       unsigned long f_2;
> >> +       struct mixed_dynptr_key f_3;
> >> +       unsigned long f_4;
> >> +};
> >> +
> > [...]
> >
> >> +       /* Delete the newly-inserted key */
> >> +       bpf_ringbuf_reserve_dynptr(&ringbuf, sizeof(systemd_name), 0, &key.f_3.name);
> >> +       err = bpf_dynptr_write(&key.f_3.name, 0, (void *)systemd_name, sizeof(systemd_name), 0);
> >> +       if (err) {
> >> +               bpf_ringbuf_discard_dynptr(&key.f_3.name, 0);
> >> +               err = 10;
> >> +               goto out;
> >> +       }
> >> +       err = bpf_map_delete_elem(htab, &key);
> >> +       if (err) {
> >> +               bpf_ringbuf_discard_dynptr(&key.f_3.name, 0);
> >> +               err = 11;
> >> +               goto out;
> >> +       }
> >> +
> >> +       /* Lookup it again */
> >> +       value = bpf_map_lookup_elem(htab, &key);
> >> +       bpf_ringbuf_discard_dynptr(&key.f_3.name, 0);
> >> +       if (value) {
> >> +               err = 12;
> >> +               goto out;
> >> +       }
> >> +out:
> >> +       return err;
> >> +}
> >
> > So, I'm not a big fan of this approach of literally embedding struct
> > bpf_dynptr into map key type and actually initializing and working
> > with it directly, like you do here with
> > bpf_ringbuf_reserve_dynptr(..., &key.f_3.name).
> >
> > Here's why. This approach only works for *map keys* (not map values)
> > and only when **the copy of the key** is on the stack (i.e., for map
> > lookups/updates/deletes). This approach won't work for having dynptrs
> > inside map value (for variable sized map values), nor does it really
> > work when you get a direct pointer to map key in
> > bpf_for_each_map_elem().
>
> Yes. The reason why the key should be on the stack is due to the
> limitation (or the design) of bpf_dynptr. However I didn't understand
> why it doesn't work for map value just like other special field in the
> map value (e.g., bpf_timer) ?

bpf_timer and other special things that go into map_value have to
painfully and carefully handle simultaneous access and modification of
map value. So they either do locking (and thus are not compatible or
reliable under NMI), or would need to be implemented locklessly.

Dynptr is by design assumed to not be dealing with concurrent
modifications, so bpf_dynptr_adjust(), for instance, can just update
offset in place without any locking. Reliably and quickly.

> >
> > Curiously, you do have bpf_for_each_map_elem() "example" in patch #16
> > in benchmarks, but you are carefully avoiding actually touching the
> > `void *key` passed to your callback. Instead you create a local key,
> > do lookup, and then compare the pointers to value to know that you
> > "guessed" the key right.
> >
> > This doesn't seem to be how bpf_for_each_map_elem() is really meant to
> > work: you'd want to be able to work with that key for real, get its
> > data, etc. Not guess and confirm, like you do.
>
> Er, bpf_for_each_map_elem() for dynptr-keyed hash map has not been
> implemented yet (as said in the cover letter), so I used the values in
> the array map as the lookup key for the hash map.

It would be interesting to see an example on how you were thinking to
implement dynptr inside map key. Can you provide a hypothetical
example on how you were thinking to approach this?

> >
> > And in case it's not obvious why this approach won't work when dynptrs
> > are stored inside map value. Dynptr itself relies on not being
> > modified concurrently. We achieve that through *always* keeping it on
> > BPF programs stack, guaranteeing that no concurrently running BPF
> > program (BPF program sharing the map, or same program on different
> > CPU) can touch the dynptr. This is pretty fundamental. And I don't
> > think we should add more locking to dynptr itself just to enable this.
>
> I didn't follow that. Even dynptr is kept in map value, how will it be
> modified concurrently ? When there are special fields in the map value,
> the update of the map value will be out-of-place update and the old
> dynptr will be kept as intact.

Easy. bpf_map_lookup_elem() for the same key from two concurrent CPUs.
You get a pointer to the same map value, which BPF programs can modify
without any locking absolutely concurrently and in parallel.

So you don't even have to do bpf_map_update_elem() to run into troubles.

>
> >
> > So I have an alternative proposal that will extend to map values and
> > real map keys (not they local copy on the stack).
> >
> > I say, we stop pretending that it's an actual dynptr that is stored in
> > the key. It should be some sort of "dynptr impression" (I don't want
> > to bikeshed right now), and user would have to put it into map key for
> > lookup/update/delete through a special kfunc (let's call this
> > "bpf_dynptr_stash" for now). When working with an existing map key
> > (and map value in the future), we need to create a local real dynptr
> > from its map key/value "impression", say, with "bpf_dynptr_unstash".
> >
> > bpf_dynptr_stash() is effectively bpf_dynptr_clone() (so all the
> > mechanics is already supported by verifier). bpf_dynptr_unstash() is
> > effectively bpf_dynptr_from_mem(). But they might need a slight change
> > to accommodate a different actual struct type we'll use for that
> > stashed dynptr.
> >
> > So just to show what I mean on pseudo example:
> >
> >
> > struct bpf_stashed_dynptr {
> >    __bpf_md_ptr(void *, data);
> >    __u32 size;
> >    __u32 reserved;
> > }
>
> It will be an ABI for both bpf program and bpf syscall just like
> bpf_dynptr, right ? Therefore, when bpf_stashed_dynptr is used in the
> bpf program, we need to implement something similar for the struct just
> like dynptr, because we need to ensure both ->data and ->size are valid,

Yes, direct BPF program (or user space) access to this
bpf_stashed_dynptr has to be restricted, of course, just like for any
other embedded special struct (timer, wq, lock). Only the kernel and
stashing/unstashing API should be able to access this data directly
(and very carefully, of course).

> right ? If it may be not safe to keep dynptr in the map key/value, how
> will it be safe to keep bpf_stashed_dynptr in the map key/value ?

Because you'll have a carefully written two APIs to stash/unstash BPF
dynptr into/out of map value. Those two will do this operation
atomically in the face of concurrent map value modifications. But once
you have a local dynptr, all existing dynptr APIs (including
bpf_dynptr_adjust) will deal with local dynptr that is safe to modify.

You can't really achieve the same with dynptr even if you restrict
what kind of API can be called on dynptr-in-map-value. Because even
read-only APIs like bpf_dynptr_slice() assume that underlying dynptr
can be accessed without locking and won't be concurrently modified.
This is not true at least for per-CPU maps, isn't it? So user space
can update per-CPU map value while it is being accessed from the BPF
program. This will inevitably lead to problems when working with
dynptr inside map value directly.

> >
> > struct id_dname_key {
> >        int id;
> >        struct bpf_stashed_dynptr name;
> > };
> >

[...]

> > /* FOR_EACH_MAP_ELEM_KEY READING */
> > static int cb(void *map, void *key, void *value, void *ctx)
> > {
> >     struct id_dname_key *k = key;
> >     struct bpf_dynptr dptr;
> >     const void *name;
> >
> >     /* create local real dynptr from stashed one in the key in the map */
> >     bpf_dynptr_unstash(&k->name, &dptr);
> >
> >     /* get direct memory access to the data stored in the key, NO COPIES! */
> >     name = bpf_dynptr_slice(&dptr, ....);
> >     if (name)
> >         bpf_printk("my_key.name: %s", name);
> > }
>
> The point here is to avoid keeping bpf_dynptr in the map key and to save
> it in the stack instead, right ?

yes

> >
> > ...
> >
> > bpf_for_each_map_elem(&htab, cb, NULL, 0); /* iterate */
> >
> >
> >
> > And I'm too lazy to write this for hypothetical map value use case.
> > Map value has an extra challenge of making sure stashing/unstashing
> > handle racy updates from other CPUs, which I believe you can do with
> > seqcount-like approach (no heavy-weight locking).
> >
> > BTW, this dedicated `struct bpf_stashed_dynptr` completely avoids that
> > double-defined `struct bpf_dynptr` you do in patch #6. Kernel will
> > know it's something like a real dynptr when doing update/lookup/delete
> > from on-the-stack key copy, and that it's a completely different thing
> > when it's actually stored inside the map in the key (and, eventually,
> > in the value). And in user space it will be a still different
> > definition, which kernel will provide when doing lookups from user
> > space.
> >
> > Hope this makes sense.
>
> Thanks for the suggestion.
> > [...]
>