On Fri, May 16, 2025 at 3:22 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Fri, May 16, 2025 at 1:41 PM Amery Hung <ameryhung@xxxxxxxxx> wrote: > > > > > > + size = round_up(size, 8); > > > > > > why roundup ? and to 8 in particular? > > > If user space wants byte size keys, why not let it? > > > > I will remove it. This was to prevent breaking using TLD in atomic > > operations, but it should be very unlikely as they are thread > > specific. > > You mean for a case where one part of the app (like a shared library) > is using u32, but the other is using u64 and doing atomic ops on it? > > Make sense to align the off set by tld_create_key(), > but it can be done without rounding up all previous keys to 8. > 63 bytes in the header are wasted. So use 2 as an offset. > A single cmpxchg 4 byte can update cnt+offset. > Actually why store size in each tld_metadata ? > Won't the logic will be simpler if it's an offset ? > bpf side tld_fetch_key() wouldn't need to count. > I changed to size since metadata is initialized to 0 and size == 0 can be used to signal pending metadata update, while 0 is a valid offset. I will save offset in metadata in the next respin. Tejun suggested a run-length key encoding, and there are other fields in the metadata that can be used for the signaling. > > > > + if (i < cnt) { > > > > + /* > > > > + * Pending tld_create_key() uses size to signal if the metadata has > > > > + * been fully updated. > > > > + */ > > > > + while (!(sz = __atomic_load_n(&tld_metadata_p->metadata[i].size, > > > > + __ATOMIC_ACQUIRE))) > > > > + sched_yield(); > > > > + > > > > + if (!strncmp(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN)) > > > > + return (tld_key_t) {.off = -EEXIST}; > > > > + > > > > + off += sz; > > > > + continue; > > > > + } > > > > + > > > > + if (off + size > TLD_DATA_SIZE) > > > > + return (tld_key_t) {.off = -E2BIG}; > > > > + > > > > + /* > > > > + * Only one tld_create_key() can increase the current cnt by one and > > > > + * takes the latest available slot. Other threads will check again if a new > > > > + * TLD can still be added, and then compete for the new slot after the > > > > + * succeeding thread update the size. > > > > + */ > > > > + if (!__atomic_compare_exchange_n(&tld_metadata_p->cnt, &cnt, cnt + 1, true, > > > > + __ATOMIC_RELAXED, __ATOMIC_RELAXED)) > > > > > > weak and relaxed/relaxed ? > > > > I can't see reordering issue with cnt so I choose to use relax. I can > > change to strong acq/rel just to be safe. > > > > > That's unusual. > > > I don't know what it is supposed to do. > > > I think weak=false and __ATOMIC_ACQUIRE, __ATOMIC_RELAXED > > > would look as expected. > > > > > > > Do you mean weak=false and __ATOMIC_RELAXED, __ATOMIC_ACQUIRE? > > no idea. I just grepped the kernel and saw: > TEST_KERNEL_LOCKED(atomic_builtin_with_memorder, > __atomic_compare_exchange_n(flag, &v, 1, 0, > __ATOMIC_ACQUIRE, __ATOMIC_RELAXED), > __atomic_store_n(flag, 0, __ATOMIC_RELEASE)); > TEST_KERNEL_LOCKED(atomic_builtin_wrong_memorder, > __atomic_compare_exchange_n(flag, &v, 1, 0, > __ATOMIC_RELAXED, __ATOMIC_RELAXED), > __atomic_store_n(flag, 0, __ATOMIC_RELAXED)); > > I'd just use __ATOMIC_SEQ_CST everywhere. > Speed is not important here. Make sense. I will use __ATOMIC_SEQ_CST. Thanks for the suggestion. > > > > > > > + goto retry; > > > > + > > > > + strncpy(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN); > > > > + __atomic_store_n(&tld_metadata_p->metadata[i].size, size, __ATOMIC_RELEASE); > > > > + return (tld_key_t) {.off = off}; > > > > + } > > > > + > > > > + return (tld_key_t) {.off = -ENOSPC}; > > > > +} > > > > + > > > > +__attribute__((unused)) > > > > +static inline bool tld_key_is_err(tld_key_t key) > > > > +{ > > > > + return key.off < 0; > > > > +} > > > > + > > > > +__attribute__((unused)) > > > > +static inline int tld_key_err_or_zero(tld_key_t key) > > > > +{ > > > > + return tld_key_is_err(key) ? key.off : 0; > > > > +} > > > > + > > > > +/** > > > > + * tld_get_data() - Gets a pointer to the TLD associated with the key. > > > > + * > > > > + * @map_fd: A file descriptor of the underlying task local storage map, > > > > + * tld_data_map > > > > + * @key: A key object returned by tld_create_key(). > > > > + * > > > > + * Returns a pointer to the TLD if the key is valid; NULL if no key has been > > > > + * added, not enough memory for TLD for this thread, or the key is invalid. > > > > + * > > > > + * Threads that call tld_get_data() must call tld_free() on exit to prevent > > > > + * memory leak. > > > > + */ > > > > +__attribute__((unused)) > > > > +static void *tld_get_data(int map_fd, tld_key_t key) > > > > +{ > > > > + if (!READ_ONCE(tld_metadata_p)) > > > > + return NULL; > > > > + > > > > + if (!tld_data_p && __tld_init_data(map_fd)) > > > > + return NULL; > > > > > > Why call it again? > > > tld_create_key() should have done it, no? > > > > > > > A TLD is created by calling tld_create_key() once. Then, threads may > > call tld_get_data() to get their thread-specific TLD. So it is > > possible for a thread to call tld_get_data() with tld_data_p==NULL. > > I see. Please add a comment. I will explain it in the comment. > > > > > + > > > > + return tld_data_p->data + key.off; > > > > +} > > > > + > > > > +/** > > > > + * tld_free() - Frees task local data memory of the calling thread > > > > + */ > > > > +__attribute__((unused)) > > > > +static void tld_free(void) > > > > +{ > > > > + if (tld_data_p) > > > > + free(tld_data_p); > > > > +} > > > > > > Since this .h allocates tld_metadata_p, it probably needs > > > a helper to free it too. > > > > > > > + > > > > +#endif /* __TASK_LOCAL_DATA_H */ > > > > diff --git a/tools/testing/selftests/bpf/progs/task_local_data.bpf.h b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h > > > > new file mode 100644 > > > > index 000000000000..5f48e408a5e5 > > > > --- /dev/null > > > > +++ b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h > > > > @@ -0,0 +1,220 @@ > > > > +/* SPDX-License-Identifier: GPL-2.0 */ > > > > +#ifndef __TASK_LOCAL_DATA_BPF_H > > > > +#define __TASK_LOCAL_DATA_BPF_H > > > > + > > > > +/* > > > > + * Task local data is a library that facilitates sharing per-task data > > > > + * between user space and bpf programs. > > > > + * > > > > + * > > > > + * PREREQUISITE > > > > + * > > > > + * A TLD, an entry of data in task local data, first needs to be created by the > > > > + * user space. This is done by calling user space API, tld_create_key(), with > > > > + * the name of the TLD and the size. > > > > + * > > > > + * tld_key_t prio, in_cs; > > > > + * > > > > + * prio = tld_create_key("priority", sizeof(int)); > > > > + * in_cs = tld_create_key("in_critical_section", sizeof(bool)); > > > > + * > > > > + * A key associated with the TLD, which has an opaque type tld_key_t, will be > > > > + * returned. It can be used to get a pointer to the TLD in the user space by > > > > + * calling tld_get_data(). > > > > + * > > > > + * > > > > + * USAGE > > > > + * > > > > + * Similar to user space, bpf programs locate a TLD using the same key. > > > > + * tld_fetch_key() allows bpf programs to retrieve a key created in the user > > > > + * space by name, which is specified in the second argument of tld_create_key(). > > > > + * tld_fetch_key() additionally will cache the key in a task local storage map, > > > > + * tld_key_map, to avoid performing costly string comparisons every time when > > > > + * accessing a TLD. This requires the developer to define the map value type of > > > > + * tld_key_map, struct tld_keys. It only needs to contain keys used by bpf > > > > + * programs in the compilation unit. > > > > + * > > > > + * struct tld_keys { > > > > + * tld_key_t prio; > > > > + * tld_key_t in_cs; > > > > + * }; > > > > + * > > > > + * Then, for every new task, a bpf program will fetch and cache keys once and > > > > + * for all. This should be done ideally in a non-critical path (e.g., in > > > > + * sched_ext_ops::init_task). > > > > + * > > > > + * struct tld_object tld_obj; > > > > + * > > > > + * err = tld_object_init(task, &tld); > > > > + * if (err) > > > > + * return 0; > > > > + * > > > > + * tld_fetch_key(&tld_obj, "priority", prio); > > > > + * tld_fetch_key(&tld_obj, "in_critical_section", in_cs); > > > > + * > > > > + * Note that, the first argument of tld_fetch_key() is a pointer to tld_object. > > > > + * It should be declared as a stack variable and initialized via tld_object_init(). > > > > + * > > > > + * Finally, just like user space programs, bpf programs can get a pointer to a > > > > + * TLD by calling tld_get_data(), with cached keys. > > > > + * > > > > + * int *p; > > > > + * > > > > + * p = tld_get_data(&tld_obj, prio, sizeof(int)); > > > > + * if (p) > > > > + * // do something depending on *p > > > > + */ > > > > +#include <errno.h> > > > > +#include <bpf/bpf_helpers.h> > > > > + > > > > +#define TLD_DATA_SIZE __PAGE_SIZE > > > > +#define TLD_DATA_CNT 63 > > > > +#define TLD_NAME_LEN 62 > > > > + > > > > +typedef struct { > > > > + __s16 off; > > > > +} tld_key_t; > > > > + > > > > +struct u_tld_data *dummy_data; > > > > +struct u_tld_metadata *dummy_metadata; > > > > + > > > > +struct tld_metadata { > > > > + char name[TLD_NAME_LEN]; > > > > + __u16 size; > > > > +}; > > > > + > > > > +struct u_tld_metadata { > > > > + __u8 cnt; > > > > + __u8 padding[63]; > > > > + struct tld_metadata metadata[TLD_DATA_CNT]; > > > > +}; > > > > + > > > > +struct u_tld_data { > > > > + char data[TLD_DATA_SIZE]; > > > > +}; > > > > + > > > > +struct tld_map_value { > > > > + struct u_tld_data __uptr *data; > > > > + struct u_tld_metadata __uptr *metadata; > > > > +}; > > > > + > > > > +struct tld_object { > > > > + struct tld_map_value *data_map; > > > > + struct tld_keys *key_map; > > > > +}; > > > > + > > > > +/* > > > > + * Map value of tld_key_map for caching keys. Must be defined by the developer. > > > > + * Members should be tld_key_t and passed to the 3rd argument of tld_fetch_key(). > > > > + */ > > > > +struct tld_keys; > > > > + > > > > +struct { > > > > + __uint(type, BPF_MAP_TYPE_TASK_STORAGE); > > > > + __uint(map_flags, BPF_F_NO_PREALLOC); > > > > + __type(key, int); > > > > + __type(value, struct tld_map_value); > > > > +} tld_data_map SEC(".maps"); > > > > + > > > > +struct { > > > > + __uint(type, BPF_MAP_TYPE_TASK_STORAGE); > > > > + __uint(map_flags, BPF_F_NO_PREALLOC); > > > > + __type(key, int); > > > > + __type(value, struct tld_keys); > > > > +} tld_key_map SEC(".maps"); > > > > + > > > > +/** > > > > + * tld_object_init() - Initializes a tld_object. > > > > + * > > > > + * @task: The task_struct of the target task > > > > + * @tld_obj: A pointer to a tld_object to be initialized > > > > + * > > > > + * Returns 0 on success; -ENODATA if the task has no TLD; -ENOMEM if the creation > > > > + * of tld_key_map fails > > > > + */ > > > > +__attribute__((unused)) > > > > +static int tld_object_init(struct task_struct *task, struct tld_object *tld_obj) > > > > +{ > > > > + tld_obj->data_map = bpf_task_storage_get(&tld_data_map, task, 0, 0); > > > > + if (!tld_obj->data_map) > > > > + return -ENODATA; > > > > + > > > > + tld_obj->key_map = bpf_task_storage_get(&tld_key_map, task, 0, > > > > + BPF_LOCAL_STORAGE_GET_F_CREATE); > > > > + if (!tld_obj->key_map) > > > > + return -ENOMEM; > > > > + > > > > + return 0; > > > > +} > > > > + > > > > +/** > > > > + * tld_fetch_key() - Fetches the key to a TLD by name. The key has to be created > > > > + * by user space first with tld_create_key(). > > > > + * > > > > + * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init() > > > > + * @name: The name of the key associated with a TLD > > > > + * @key: The key in struct tld_keys to be saved to > > > > + * > > > > + * Returns a positive integer on success; 0 otherwise > > > > + */ > > > > +#define tld_fetch_key(tld_obj, name, key) \ > > > > + ({ \ > > > > + (tld_obj)->key_map->key.off = __tld_fetch_key(tld_obj, name); \ > > > > + }) > > > > + > > > > +__attribute__((unused)) > > > > +static u16 __tld_fetch_key(struct tld_object *tld_obj, const char *name) > > > > +{ > > > > + int i, meta_off, cnt; > > > > + void *metadata, *nm, *sz; > > > > + u16 off = 0; > > > > + > > > > + if (!tld_obj->data_map || !tld_obj->data_map->metadata) > > > > + return 0; > > > > + > > > > + cnt = tld_obj->data_map->metadata->cnt; > > > > + metadata = tld_obj->data_map->metadata->metadata; > > > > + > > > > + bpf_for(i, 0, cnt) { > > > > + meta_off = i * sizeof(struct tld_metadata); > > > > + if (meta_off > TLD_DATA_SIZE - offsetof(struct u_tld_metadata, metadata) > > > > + - sizeof(struct tld_metadata)) > > > > + break; > > > > + > > > > + nm = metadata + meta_off + offsetof(struct tld_metadata, name); > > > > + sz = metadata + meta_off + offsetof(struct tld_metadata, size); > > > > + > > > > + /* > > > > + * Reserve 0 for uninitialized keys. Increase the offset of a valid key > > > > + * by one and subtract it later in tld_get_data(). > > > > + */ > > > > + if (!bpf_strncmp(nm, TLD_NAME_LEN, name)) > > > > + return off + 1; > > > > > > I think all this +1, -1 dance is doing is helping to catch an > > > error when tld_get_data() is called without tld_fetch_key(). > > > I feel this is too defensive. > > > > > > Let tld_fetch_key() return proper negative error code. > > > > > > > I can certainly return negative error code. > > > > But for the +1, -1 logic, I think is a simpler way to differentiate an > > uninitialized key in tld_key_map from the first TLD (both key.off == > > 0). After all, bpf programs can call tld_get_data() without > > tld_fetch_key(). > > I'm saying we don't need to catch this case. > progs should not call tld_get_data() without tld_fetch_key(). > If they do, it's a bug. > Got it. I will document this in the comment of tld_get_data(). > > > > > > + > > > > + off += *(u16 *)sz; > > > > + } > > > > + > > > > + return 0; > > > > +} > > > > + > > > > +/** > > > > + * tld_get_data() - Retrieves a pointer to the TLD associated with the key. > > > > + * > > > > + * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init() > > > > + * @key: The key of a TLD saved in tld_maps > > > > + * @size: The size of the TLD. Must be a known constant value > > > > + * > > > > + * Returns a pointer to the TLD data associated with the key; NULL if the key > > > > + * is not valid or the size is too big > > > > + */ > > > > +#define tld_get_data(tld_obj, key, size) \ > > > > + __tld_get_data(tld_obj, (tld_obj)->key_map->key.off - 1, size) > > > > + > > > > +__attribute__((unused)) > > > > +__always_inline void *__tld_get_data(struct tld_object *tld_obj, u32 off, u32 size) > > > > +{ > > > > + return (tld_obj->data_map->data && off >= 0 && off < TLD_DATA_SIZE - size) ? > > > > + (void *)tld_obj->data_map->data + off : NULL; > > > > > > If we require users to call tld_fetch_key() first and check for errors > > > tld_get_data() can be faster: > > > #define tld_get_data(tld_obj, key) > > > (void *)tld_obj->data_map->data + tld_obj->key_map->key.off > > > > > > > tld_get_data() can be called in a bpf program without tld_fetch_key(), > > so checking tld_obj->data_map->data is needed as the first load from > > tld_obj->data_map->data yields a "mem_or_null". > > > > I did try to save this uptr "mem" after null check to stack (e.g., in > > a tld_object) so that we can save subsequent checks. However, the > > compiler sometime will do a fresh load from map_ptr when reading > > tld_obj->data_map->data. Might need some work or trick to make it > > happen. > > likely because you do tld_obj->data_map->data twice. > > > > I wouldn't bother with extra checks, especially for size. > > > > > > > It needs to be bound-checked. If tld_get_data() doesn't do it, bpf > > programs have to do it before acceess. Otherwise: > > > > ; return (tld_obj->data_map->data && off >= 0) ? @ task_local_data.bpf.h:218 > > 25: (bf) r3 = r1 ; R1_w=mem(sz=4096) R3_w=mem(sz=4096) > > 26: (0f) r3 += r2 ; > > R2_w=scalar(smin=0,smax=umax=0xffffffff,smin32=0xffff7fff,smax32=32766,var_off=(0x0; > > 0xffffffff)) R3_w=mem(sz=4096,smin=0,smax=umax=0xffffffff,var_off=(0x0; > > 0xfffffff) > > ; test_value1 = *int_p; @ test_task_local_data.c:63 > > 27: (61) r2 = *(u32 *)(r3 +0) > > R3 unbounded memory access, make sure to bounds check any such access > > That's easy to fix. > Then something like: > #define tld_get_data(tld_obj, key) \ > ({ > void * data = tld_obj->data_map->data; > if (data) > data += tld_obj->key_map->key.off & (PAGE_SIZE - 1); > data; > }) > > size is really not needed. The verifier sees it as one page. > Bad bpf prog can write into the wrong key and the verifier cannot stop it. > key.off is a variable offset, so the verifier may assume key.off == PAGE_SIZE - 1. If a bpf program tries to dereference a pointer returned by the proposed tld_get_data() as an int * without bound check, the verifier will still consider this a potential out-of-bound access: invalid access to memory, mem_size=4096 off=4095 size=4 I think if there needs to be a bound check anyways, hiding it tld_get_data() makes the user written part less complex. > > > Bigger question.. can we cache the whole pointer for each key ? > > > and then > > > #define tld_get_data(tld_obj, key) ld_obj->key_map->key > > > > Maybe define the member type of tld_key_map as __uptr and allow bpf > > programs to update a uptr field with a valid uptr? > > yeah. That indeed gets complicated. Maybe it's possible with some > verifier changes, but let's not go there yet. > The tld_get_data() proposed above is speedy enough.