Hi, This a respin of uptr KV store. It is renamed to task local data (TLD) as the problem statement and the solution have changed, and it now draws more similarities to pthread thread-specific data. * Overview * This patchset is a continuation of the original UPTR work[0], which aims to provide a fast way for user space programs to pass per-task hints to sched_ext schedulers. UPTR built the foundation by supporting sharing user pages with bpf programs through task local storage maps. Additionally, sched_ext would like to allow multiple developers to share a storage without the need to explicitly agreeing on the layout of it. This simplify code base management and makes experimenting easier. While a centralized storage layout definition would have worked, the friction of synchronizing it across different repos is not desirable. This patchset contains the user space plumbing so that user space and bpf program developers can exchange per-task hints easily through simple interfaces. * Design * BPF task local data is a simple API for sharing task-specific data between user space and bpf programs, where data are refered to using string keys. As shown in the following figure, user space programs can define a task local data using bpf_tld_type_var(). The data is effectively a variable declared with __thread, which every thread owns an independent copy and can be directly accessed. On the bpf side, a task local data first needs to be initialized for every new task once (e.g., in sched_ext_ops::init_task) using bpf_tld_init_var(). Then, other bpf programs can get a pointer to the data using bpf_tld_lookup(). The task local data APIs refer to data using string keys so developers does not need to deal with addresses of data in a shared storage. ┌─ Application ─────────────────────────────────────────┐ │ ┌─ library A ──────────────┐ │ │ bpf_tld_type_var(int, X) │ bpf_tld_type_var(int, Y) │ │ │ └┬─────────────────────────┘ │ └───────┬───────────────────│───────────────────────────┘ │ X = 123; │ Y = true; V V + ─ Task local data ─ ─ ─ ─ ─ ─ + | ┌─ task_kvs_map ────────────┐ | ┌─ sched_ext_ops::init_task ──────┐ | │ BPF Task local storage │ | │ bpf_tld_init_var(&kvs, X); │ | │ ┌───────────────────┐ │ |<─┤ bpf_tld_init_var(&kvs, Y); │ | │ │ __uptr *udata │ │ | └─────────────────────────────────┘ | │ └───────────────────┘ │ | | │ ┌───────────────────┐ │ | ┌─ Other sched_ext_ops op ────────┐ | │ │ __uptr *umetadata │ │ | │ int *y; ├┐ | │ └───────────────────┘ │ |<─┤ y = bpf_tld_lookup(&kvs, Y, 1); ││ | └───────────────────────────┘ | │ if (y) ││ | ┌─ task_kvs_off_map ────────┐ | │ /* do something */ ││ | │ BPF Task local storage │ | └┬────────────────────────────────┘│ | └───────────────────────────┘ | └─────────────────────────────────┘ + ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ + * Implementation * Task local data API hides the memory management from the developers. Internally, it shares user data with bpf programs through udata UPTRs. Task local data from different compilation units are placed into a custom "udata" section by the declaration API, bpf_tld_type_var(), so that they are placed together in the memory. User space will need to call bpf_tld_thread_init() for every new thread to pin udata pages to kernel. The metadata used to address udata is stored in umetadata UPTR. It is generated by constructors inserted by bpf_tld_type_var() and bpf_tld_thread_init(). umetadata is an array of 64 metadata corresponding to each data, which contains the key and the offset of data in udata. During initialization, bpf_tld_init_var() will search umetadata for a matching key and cache its offset in task_kvs_off_map. Later, bpf_tld_lookup() will use the cached offset to retreive a pointer to udata. * Limitation * Currently, it is assumed all key-value pairs are known as a program starts. All compilation units using task local data should be statically linked together so that values are all placed together in a udata section and therefore can be shared with bpf through two UPTRs. The next iteration will explore how bpf task local data can work in dynamic libraries. Maybe more udata UPTRs will be added to pin page of TLS of dynamically loaded modules. Or maybe it will allocate memory for data instead of relying on __thread, and change how user space interact with task local data slightly. The later approach can also save some troubles dealing with the restriction of UPTR. Some other limitations: - Total task local data cannot exceed a page - Only support 64 task local data - Some memory waste for data whose size is not power of two due to UPTR limitation [0] https://lore.kernel.org/bpf/20241023234759.860539-1-martin.lau@xxxxxxxxx/ Amery Hung (2): selftests/bpf: Introduce task local data selftests/bpf: Test basic workflow of task local data .../bpf/prog_tests/task_local_data.c | 159 +++++++++++++++ .../bpf/prog_tests/task_local_data.h | 58 ++++++ .../bpf/prog_tests/test_task_local_data.c | 156 +++++++++++++++ .../selftests/bpf/progs/task_local_data.h | 181 ++++++++++++++++++ .../bpf/progs/test_task_local_data_basic.c | 78 ++++++++ .../selftests/bpf/task_local_data_common.h | 49 +++++ 6 files changed, 681 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/task_local_data.c create mode 100644 tools/testing/selftests/bpf/prog_tests/task_local_data.h create mode 100644 tools/testing/selftests/bpf/prog_tests/test_task_local_data.c create mode 100644 tools/testing/selftests/bpf/progs/task_local_data.h create mode 100644 tools/testing/selftests/bpf/progs/test_task_local_data_basic.c create mode 100644 tools/testing/selftests/bpf/task_local_data_common.h -- 2.47.1