On Mon, May 19, 2025 at 1:04 PM Tejun Heo <tj@xxxxxxxxxx> wrote: > > Hello, > > On Thu, May 15, 2025 at 02:16:00PM -0700, Amery Hung wrote: > ... > > +#define PAGE_SIZE 4096 > > This might conflict with other definitions. Looks like non-4k page sizes are > a lot more popular on arm. Would this be a problem? > > > +static int __tld_init_metadata(int map_fd) > > +{ > > + struct u_tld_metadata *new_metadata; > > + struct tld_map_value map_val; > > + int task_fd = 0, err; > > + > > + task_fd = syscall(SYS_pidfd_open, getpid(), 0); > > + if (task_fd < 0) { > > + err = -errno; > > + goto out; > > + } > > + > > + new_metadata = aligned_alloc(PAGE_SIZE, PAGE_SIZE); > > Is 4k size limit from UPTR? Is it still 4k on machines with >4k pages? If > this isn't a hard limit from UPTR, would it make sense to encode the size in > the header part of the metadata? > UPTR size limit is a page. I will make PAGE_SIZE arch dependent. For metadata, since all threads of a process share one metadata page, I think it is okay to make it a fixed size. For data, I think it makes sense to encode it in the header of metadata. > > +static int __tld_init_data(int map_fd) > > +{ > > + struct u_tld_data *new_data = NULL; > > + struct tld_map_value map_val; > > + int err, task_fd = 0; > > + > > + task_fd = syscall(SYS_pidfd_open, gettid(), PIDFD_THREAD); > > + if (task_fd < 0) { > > + err = -errno; > > + goto out; > > + } > > + > > + new_data = aligned_alloc(PAGE_SIZE, TLD_DATA_SIZE); > > Ditto. > > Noob question. Does this means that each thread will map a 4k page no matter > how much data it actually uses? Unfortunately this is the case currently, but hey maybe we can make data size dynamic > > > +__attribute__((unused)) > > +static tld_key_t tld_create_key(int map_fd, const char *name, size_t size) > > +{ > > + int err, i, cnt, sz, off = 0; > > + > > + if (!READ_ONCE(tld_metadata_p)) { > > + err = __tld_init_metadata(map_fd); > > + if (err) > > + return (tld_key_t) {.off = err}; > > + } > > + > > + if (!tld_data_p) { > > + err = __tld_init_data(map_fd); > > + if (err) > > + return (tld_key_t) {.off = err}; > > + } > > + > > + size = round_up(size, 8); > > + > > + for (i = 0; i < TLD_DATA_CNT; i++) { > > +retry: > > + cnt = __atomic_load_n(&tld_metadata_p->cnt, __ATOMIC_RELAXED); > > + if (i < cnt) { > > + /* > > + * Pending tld_create_key() uses size to signal if the metadata has > > + * been fully updated. > > + */ > > + while (!(sz = __atomic_load_n(&tld_metadata_p->metadata[i].size, > > + __ATOMIC_ACQUIRE))) > > + sched_yield(); > > + > > + if (!strncmp(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN)) > > + return (tld_key_t) {.off = -EEXIST}; > > + > > + off += sz; > > + continue; > > + } > > + > > + if (off + size > TLD_DATA_SIZE) > > + return (tld_key_t) {.off = -E2BIG}; > > + > > + /* > > + * Only one tld_create_key() can increase the current cnt by one and > > + * takes the latest available slot. Other threads will check again if a new > > + * TLD can still be added, and then compete for the new slot after the > > + * succeeding thread update the size. > > + */ > > + if (!__atomic_compare_exchange_n(&tld_metadata_p->cnt, &cnt, cnt + 1, true, > > + __ATOMIC_RELAXED, __ATOMIC_RELAXED)) > > + goto retry; > > + > > + strncpy(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN); > > + __atomic_store_n(&tld_metadata_p->metadata[i].size, size, __ATOMIC_RELEASE); > > + return (tld_key_t) {.off = off}; > > + } > > + > > + return (tld_key_t) {.off = -ENOSPC}; > > +} > > This looks fine to me but I wonder whether run-length encoding the key > strings would be more efficient and less restrictive in terms of key length. > e.g.: > > struct key { > u32 data_len; > u16 key_off; > u16 key_len; > }; > > struct metadata { > struct key keys[MAX_KEYS]; > char key_strs[SOME_SIZE]; > }; > > The logic can be mostly the same. The only difference would be that key > string is not inline. Determine winner in the creation path by compxchg'ing > on data_len, but set key_off and key_len only after key string is updated. > Losing on cmpxhcg or seeing an entry where key_len is zero means that that > one lost and should relax and retry. It can still use the same 4k metadata > page but will likely be able to allow more keys while also relaxing > restrictions on key length. > > Hmm... maybe making the key string variably sized makes things difficult for > the BPF code. If so (or for any other reasons), please feel free to ignore > the above. I think this is a great suggestion. The current implementation may waste spaces in metadata if a key does not use all 62 bytes. I don't see an obvious obstacle in bpf. I will try to incorporate this in the next respin. > > > +#endif /* __TASK_LOCAL_DATA_H */ > > diff --git a/tools/testing/selftests/bpf/progs/task_local_data.bpf.h b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h > > new file mode 100644 > > index 000000000000..5f48e408a5e5 > > --- /dev/null > > +++ b/tools/testing/selftests/bpf/progs/task_local_data.bpf.h > ... > > +/** > > + * tld_get_data() - Retrieves a pointer to the TLD associated with the key. > > + * > > + * @tld_obj: A pointer to a valid tld_object initialized by tld_object_init() > > + * @key: The key of a TLD saved in tld_maps > > + * @size: The size of the TLD. Must be a known constant value > > + * > > + * Returns a pointer to the TLD data associated with the key; NULL if the key > > + * is not valid or the size is too big > > + */ > > +#define tld_get_data(tld_obj, key, size) \ > > + __tld_get_data(tld_obj, (tld_obj)->key_map->key.off - 1, size) > > + > > +__attribute__((unused)) > > +__always_inline void *__tld_get_data(struct tld_object *tld_obj, u32 off, u32 size) > > +{ > > + return (tld_obj->data_map->data && off >= 0 && off < TLD_DATA_SIZE - size) ? > > + (void *)tld_obj->data_map->data + off : NULL; > > +} > > Neat. > > Generally looks great to me. The only thing I wonder is whether the data > area sizing can be determined at init time rather than fixed to 4k. > I think we can achieve it by first limiting tld_create_key() to the init phase (i.e., only calling them in C/C++ constructor). Then, tld_create_key() will not allocate memory for data. Instead, on the first call to tld_get_data(), we freeze the size of the data area and allocate the memory just enough or round up to the power of two. For C, we can define a new macro API (e.g., tld_define_key()) that generates a __attribute__((constructor)) function that in turn calls tld_create_key(). > Thanks. > > -- > tejun