On Thu, May 1, 2025 at 7:22 PM Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > I wasn't trying to optimize those few bytes taken by szs, tbh. > Allocating from the end of the page bakes in the assumption that we > won't ever need more than one page. I don't know if I'd do that. But > we can just track "next available offset" instead, so it doesn't > really matter much. Right. That works too. > > > > I'm not quite sure how different processes can do it locklessly. > > There are no different processes, it's all one process, many > threads... Or is that what you meant? tld_metadata is *per process*, > tld_data is *per thread*. Processes don't need to coordinate anything > between themselves, only threads within the process. Yeah. I confused myself thinking that we need to support this through fork/exec. Since they will be different processes they will have their own task local storage map elements, so any kind of signaling into a child needs to be done in user space. Using bpf tls map won't work. So using one "tld" library in multiple threads within a single process works. No need to complicate things by asking kernel tls map. "tld" library can keep whatever state it needs, centralized locking, etc. > As for how I'd do offset allocation and key addition locklessly. You > are right that it can't be done completely locklessly, but just > looping and yielding probably would be fine. > = > > Then the sequence of adding the key would be something like below. > I've modified tld_metadata a bit to make this simpler and more > economical (and I fixed definition of keys array of array of chars, > oops): > > struct tld_metadata { > int cnt; > int next_off; > char keys[MAX_KEY_CNT][MAX_KEY_LEN]; > __u16 offs[MAX_KEY_CNT]; > }; > > struct tld_metadata *m = ...; > const char *new_key = ...; > int i = 0; > > /* all m->offs[i] are set to -1 on creation */ > again: > > int key_cnt = m->cnt; > for (; i < key_cnt; i++) { > while (m->offs[i] < 0) /* update in progress */ > sched_yield(); > > if (strcmp(m->keys[i], new_key) == 0) > return m->offs[i]; > > if (!cmpxchg(*m->cnt, key_cnt, key_cnt + 1)) { > goto again; /* we raced, key might have been added > already, recheck, but keep i */ > > /* slot key_cnt is ours, we need to calculate and assign offset */ > int new_off = m->next_off; > m->next_off = new_off + key_sz; > > m->keys[key_cnt][0] = '\0'; > strncat(m->keys[key_cnt], new_key, MAX_KEY_LEN); > > /* MEMORY BARRIERS SHOULD BE CAREFULLY CONSIDERED */ > > m->offs[key_cnt] = new_off; /* this is finalizing key -> offset > assignment */ > > /* MEMORY BARRIERS SHOULD BE CAREFULLY CONSIDERED */ > > return new_off; /* we are done */ > } > > Something like that. There is that looping and yield to not miss > someone else winning the race and adding a key, so that's the locking > part. But given that adding a key definition is supposed to be one > time operation (per key), I don't think we should be fancy with > locking. something like that should work. I wish there was some trivial futex wrapper in .h that can be used instead of pthread_mutex baggage.