On Mon, Jul 28, 2025 at 6:54 AM Emil Tsalapatis <linux-lists@xxxxxxxxxxxxxxx> wrote: > > Some feedback on the cover letter: > > On Thu, Jul 17, 2025 at 12:48 PM Amery Hung <ameryhung@xxxxxxxxx> wrote: > > > > * Motivation * > > > > CPU schedulers can potentially make a better decision with hints from > > user space process. To support experimenting user space hinting with > > sched_ext, there needs a mechanism to pass a "per-task hint" from user > > space to the bpf scheduler "efficiently". > > > > I think this is underselling the feature a bit, isn't it useful for > all BPF users? > Otherwise why isn't it in the scx repo? Task local data indeed is a generic mechanism to share per-task data between user space and bpf. The v6 cover letter referenced scx to give readers a more concrete idea where this patchset comes from and how it will work end to end. I will make the motivation more centered on TLD as a per-task data sharing mechanism. > > > The proposed mechanism is task local data. Similar to pthread key or > > __thread, it allows users to define thread-specific data. In addition, > > the user pages that back task local data are pinned to the kernel to > > share with bpf programs directly. As a result, user space programs > > can directly update per-thread hints, and then bpf program can read > > the hint with little overhead. The diagram in the design section gives > > a sneak peek of how it works. > > > > > > * Overview * > > > > Task local data defines an abstract storage type for storing data specific > > to each task and provides user space and bpf libraries to access it. The > > result is a fast and easy way to share per-task data between user space > > and bpf programs. The intended use case is sched_ext, where user space > > programs will pass hints to sched_ext bpf programs to affect task > > scheduling. > > > > Same as above, why is this a sched_ext feature? Maybe some of the design > choices are informed by it, but there is nothing sched_ext specific in here. > > > Task local data is built on top of task local storage map and UPTR[0] > > to achieve fast per-task data sharing. UPTR is a type of special field > > supported in task local storage map value. A user page assigned to a UPTR > > will be pinned by the kernel when the map is updated. Therefore, user > > space programs can update data seen by bpf programs without syscalls. > > > > Additionally, unlike most bpf maps, task local data does not require a > > static map value definition. This design is driven by sched_ext, which > > would like to allow multiple developers to share a storage without the > > need to explicitly agree on the layout of it. While a centralized layout > > definition would have worked, the friction of synchronizing it across > > different repos is not desirable. This simplify code base management and > > makes experimenting easier. > > > > This text is hard to follow because the point it makes belongs in the design > section, this is a design point that does not really matter at this > point in the text. > At least make it more abstract maybe? Thanks for reviewing. I will make it more abstract and move a part of it to the design section. > > > In the rest of the cover letter, "task local data" is used to refer to > > the abstract storage and TLD is used to denote a single data entry in > > the storage. > > > > > > * Design * > > > > Task local data library provides simple APIs for user space and bpf > > through two header files, task_local_data.h and task_loca_data.bpf.h, > > Typo: task_loca_data ->task_local_data > > > respectively. The usage is illustrated in the following diagram. > > An entry of data in the task local data, TLD, first needs to be defined > > with TLD_DEFINE_KEY() with the size of the data and a name associated with > > the data. The macro defines and initialize an opaque key object of > > Typo: initialize -> initializes > > > tld_key_t type, which can be used to locate a TLD. The same key may be > > passed to tld_get_data() in different threads, and a pointer to data > > specific to the calling thread will be returned. The pointer will > > remain valid until the process terminates, so there is not need to call > > Typo: not need -> no need > > > tld_get_data() in subsequent accesses. > > > > TLD_DEFINE_KEY() is allowed to define TLDs up to roughly a page. In the > > case when a TLD can only be known and created on the fly, > > tld_create_key() can be called. Since the total TLD size cannot be known > > beforehand, a memory of size TLD_DYN_DATA_SIZE is allocated for each > > thread to accommodate them. > > > > On the bpf side, programs will use also use tld_get_data() to locate > > TLDs. The arugments contain a name and a key to a TLD. The name is > > used for the first tld_get_data() to a TLD, which will lookup the TLD > > by name and save the corresponding key to a task local data map, > > tld_key_map. The map value type, struct tld_keys, __must__ be defined by > > developers. It should contain keys used in the compilation unit. > > > > > > ┌─ Application ───────────────────────────────────────────────────────┐ > > │ TLD_DEFINE_KEY(kx, "X", 4); ┌─ library A ─────────────────────┐│ > > │ │ void func(...) ││ > > │ int main(...) │ { ││ > > │ { │ tld_key_t ky; ││ > > │ int *x; │ bool *y; ││ > > │ │ ││ > > │ x = tld_get_data(fd, kx); │ ky = tld_create_key("Y", 1);││ > > │ if (x) *x = 123; │ y = tld_get_data(fd, ky); ││ > > │ ┌────────┤ if (y) *y = true; ││ > > │ │ └─────────────────────────────────┘│ > > └───────┬─────────────────│───────────────────────────────────────────┘ > > V V > > + ─ Task local data ─ ─ ─ ─ ─ + ┌─ BPF program ──────────────────────┐ > > | ┌─ tld_data_map ──────────┐ | │ struct tld_object obj; │ > > | │ BPF Task local storage │ | │ bool *y; │ > > | │ │ | │ int *x; │ > > | │ __uptr *data │ | │ │ > > | │ __uptr *metadata │ | │ if (tld_init_object(task, &obj)) │ > > | └─────────────────────────┘ | │ return 0; │ > > | ┌─ tld_key_map ───────────┐ | │ │ > > | │ BPF Task local storage │ | │ x = tld_get_data(&obj, kx, "X", 4);│ > > | │ │ |<─┤ if (x) /* do something */ │ > > | │ tld_key_t kx; │ | │ │ > > | │ tld_key_t ky; │ | │ y = tld_get_data(&obj, ky, "Y", 1);│ > > | └─────────────────────────┘ | │ if (y) /* do something */ │ > > + ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ + └────────────────────────────────────┘ > > > > > > > > * Implementation * > > > > Task local data defines the storage to be a task local storage map with > > two UPTRs, data and metadata. Data points to a blob of memory for storing > > TLDs individual to every task with the offset of data in a page. Metadata, > > Nit: Maybe just use the name of the struct, u_tld_metadata? Will change. > > > individual to each process and shared by its threads, records the total > > number and size of TLDs and the metadata of each TLD. Metadata for a > > TLD contains the key name and the size of the TLD. > > > > struct u_tld_data { > > u64 start; > > char data[PAGE_SIZE - 8]; > > }; > > > > struct u_tld_metadata { > > u8 cnt; > > u16 size; > > struct metadata data[TLD_DATA_CNT]; > > Rename data -> metadata? That's how it is in the code. > > > }; > > > > Both user space and bpf API follow the same protocol when accessing > > task local data. A pointer to a TLD is located by a key. tld_key_t > > effectively is the offset of a TLD in data. To add a TLD, user space > > typo: user space API -> the user space API > > > API, loops through metadata->data until an empty slot is found and update > > it. It also adds sizes of prior TLDs along the way to derive the offset. > > To locate a TLD in bpf when the first time tld_get_data() is called, > > __tld_fetch_key() also loops through metadata->data until the name is > > Same here, isn't it ->metadata? > Thanks for catching all the typos! > > found. The offset is also derived by adding sizes. When the TLD is not > > found, the current TLD count is cached instead to skip name comparison > > that has been done. The detail of task local data operations can be found > > in patch 1. > > > > > > * Misc * > > > > The metadata can potentially use run-length encoding for names to reduce > > memory wastage and support save more TLDs. I have a version that works, > > but the selftest takes a bit longer to finish. More investigation needed > > to find the root cause. I will save this for the future when there is a > > need to store more than 63 TLDs. > > > > > > [0] https://lore.kernel.org/bpf/20241023234759.860539-1-martin.lau@xxxxxxxxx/ > > > > Reviewed-by: Emil Tsalapatis <emil@xxxxxxxxxxxxxxx> > > > > > --- > > > > v5 -> v6 > > - Address Andrii's comment > > - Fix verification failure in no_alu32 > > - Some cleanup > > v5: https://lore.kernel.org/bpf/20250627233958.2602271-1-ameryhung@xxxxxxxxx/ > > > > v4 -> v5 > > - Add an option to free memory on thread exit to prevent memory leak > > - Add an option to reduce memory waste if the allocator can > > use just enough memory to fullfill aligned_alloc() (e.g., glibc) > > - Tweak bpf API > > - Remove tld_fetch_key() as it does not work in init_tasl > > - tld_get_data() now tries to fetch key if it is not cached yet > > - Optimize bpf side tld_get_data() > > - Faster fast path > > - Less code > > - Use stdatomic.h in user space library with seq_cst order > > - Introduce TLD_DEFINE_KEY() as the default TLD creation API for > > easier memory management. > > - TLD_DEFINE_KEY() can consume memory up to a page and no memory > > is wasted since their size is known before per-thread data > > allocation. > > - tld_create_key() can only use up to TLD_DYN_DATA_SIZE. Since > > tld_create_key can run any time even after per-thread data > > allocation, it is impossible to predict the total size. A > > configurable size of memory is allocated on top of the total > > size of TLD_DEFINE_KEY() to accommodate dynamic key creation. > > - Add tld prefix to all macros > > - Replace map_update(NO_EXIST) in __tld_init_data() with cmpxchg() > > - No more +1,-1 dance on the bpf side > > - Reduce printf from ASSERT in race test > > - Try implementing run-length encoding for name and decide to > > save it for the future > > v4: https://lore.kernel.org/bpf/20250515211606.2697271-1-ameryhung@xxxxxxxxx/ > > > > v3 -> v4 > > - API improvements > > - Simplify API > > - Drop string obfuscation > > - Use opaque type for key > > - Better documentation > > - Implementation > > - Switch to dynamic allocation for per-task data > > - Now offer as header-only libraries > > - No TLS map pinning; leave it to users > > - Drop pthread dependency > > - Add more invalid tld_create_key() test > > - Add a race test for tld_create_key() > > v3: https://lore.kernel.org/bpf/20250425214039.2919818-1-ameryhung@xxxxxxxxx/ > > > > > > Amery Hung (3): > > selftests/bpf: Introduce task local data > > selftests/bpf: Test basic task local data operations > > selftests/bpf: Test concurrent task local data key creation > > > > .../bpf/prog_tests/task_local_data.h | 388 ++++++++++++++++++ > > .../bpf/prog_tests/test_task_local_data.c | 297 ++++++++++++++ > > .../selftests/bpf/progs/task_local_data.bpf.h | 227 ++++++++++ > > .../bpf/progs/test_task_local_data.c | 65 +++ > > 4 files changed, 977 insertions(+) > > create mode 100644 tools/testing/selftests/bpf/prog_tests/task_local_data.h > > create mode 100644 tools/testing/selftests/bpf/prog_tests/test_task_local_data.c > > create mode 100644 tools/testing/selftests/bpf/progs/task_local_data.bpf.h > > create mode 100644 tools/testing/selftests/bpf/progs/test_task_local_data.c > > > > -- > > 2.47.1 > > > >