Re: [PATCH RFC v3 0/2] Task local data API

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Fri, 2 May 2025 09:14:47 -0700

On Thu, May 1, 2025 at 9:26 PM Amery Hung <ameryhung@xxxxxxxxx> wrote:
>
> On Thu, May 1, 2025 at 1:37 PM Andrii Nakryiko
> <andrii.nakryiko@xxxxxxxxx> wrote:
> >
> > On Fri, Apr 25, 2025 at 2:40 PM Amery Hung <ameryhung@xxxxxxxxx> wrote:
> > >
> > > Hi,
> > >
> > > This a respin of uptr KV store. It is renamed to task local data (TLD)
> > > as the problem statement and the solution have changed, and it now draws
> > > more similarities to pthread thread-specific data.
> > >

[...]

> >
> > This API can be called just once per each key that process cares
> > about. And this can be done at any point, really, very dynamically.
> > The implementation will:
> >   - (just once per process) open pinned BPF map, remember its FD;
> >   - (just once) allocate struct tld_metadata, unless we define it as
> > pre-allocated global variable;
> >   - (locklessly) check if key_name is already in tld_metadata, if yes
> > - return already assigned offset;
> >   - (locklessly) if not, add this key and assign it offset that is
> > offs[cnt - 1] + szs[cnt - 1] (i.e., we just tightly pack all the
> > values (though we should take care of alignment requirements, of
> > course);
> >   - return newly assigned offset;
> >
> > Now, the second essential API is called for each participating thread
> > for each different key. And again, this is all very dynamic. It's
> > possible that some threads won't use any of this TLD stuff, in which
> > case there will be no overhead (memory or CPU), and not even an entry
> > in task local storage map for that thread. So, API:
> >
>
> The advantage of no memory wasted for threads that are not using TLD
> doesn't seem to be that definite to me. If users add per-process
> hints, then this scheme can potentially use a lot more memory (i.e.,
> PAGE_SIZE * number of threads). Maybe we need another uptr for
> per-process data? Or do you think this is out of the scope of TLD and
> we should recommend other solutions?
>

I'd keep it simple. One page per thread isn't a big deal at all, in my
mind. If the application has a few threads, then a bunch of kilobytes
is not a big deal. If the application has thousands of threads, then a
few megabytes for this is the least of that application's concern,
it's already heavy-weight as hell. I think we are overpivoting on
saving a few bytes here.

[...]