Hello, > > > Also you misread the kcsan report. > It says that 'read' comes from: > > read to 0xffff888118f3d568 of 4 bytes by task 4719 on cpu 1: > lookup_nulls_elem_raw kernel/bpf/hashtab.c:643 [inline] > which is reading hash and key of htab_elem while > write side actually writes hash too: > *(u32 *)((void *)node + lru->hash_offset) = hash; Thanks for the clarification. I misattributed the race to the ref field, but the KCSAN report indeed points to a data race between a reader, lookup_nulls_elem_raw(), accessing the hash or key fields, and a writer, bpf_lru_pop_free(), reinitializing and reusing the same element from the LRU freelist without waiting for an RCU grace period. > I think it is possible. The elem in the lru's freelist currently does not wait > for a rcu gp before reuse. There is a chance that the rcu reader is still > reading the hash value that was put in the freelist, while the writer is reusing > and updating it. > > I think the percpu_freelist used in the regular hashmap should have similar > behavior, so may be worth finding a common solution, such as waiting for a rcu > gp before reusing it. To resolve this, would it make sense to ensure that elements popped from the free list are only reused after a grace period? Similar to how other parts of the kernel manage safe object reuse. -- Regards, Shankari On Wed, Jul 16, 2025 at 2:57 AM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > > On 7/15/25 7:49 AM, Alexei Starovoitov wrote: > > Also you misread the kcsan report. > > > > It says that 'read' comes from: > > > > read to 0xffff888118f3d568 of 4 bytes by task 4719 on cpu 1: > > lookup_nulls_elem_raw kernel/bpf/hashtab.c:643 [inline] > > > > which is reading hash and key of htab_elem while > > write side actually writes hash too: > > *(u32 *)((void *)node + lru->hash_offset) = hash; > > > > Martin, > > is it really possible for these read/write to race ? > > I think it is possible. The elem in the lru's freelist currently does not wait > for a rcu gp before reuse. There is a chance that the rcu reader is still > reading the hash value that was put in the freelist, while the writer is reusing > and updating it. > > I think the percpu_freelist used in the regular hashmap should have similar > behavior, so may be worth finding a common solution, such as waiting for a rcu > gp before reusing it.