Re: [PATCH] bpf: restrict verifier access to bpf_lru_node.ref

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,
>
>
> Also you misread the kcsan report.

> It says that 'read' comes from:
>
> read to 0xffff888118f3d568 of 4 bytes by task 4719 on cpu 1:
>  lookup_nulls_elem_raw kernel/bpf/hashtab.c:643 [inline]

> which is reading hash and key of htab_elem while
> write side actually writes hash too:
> *(u32 *)((void *)node + lru->hash_offset) = hash;

Thanks for the clarification. I misattributed the race to the ref
field, but the KCSAN report indeed points to a data race between a
reader, lookup_nulls_elem_raw(), accessing the hash or key fields, and
a writer, bpf_lru_pop_free(), reinitializing and reusing the same
element from the LRU freelist without waiting for an RCU grace period.

> I think it is possible. The elem in the lru's freelist currently does not wait
> for a rcu gp before reuse. There is a chance that the rcu reader is still
> reading the hash value that was put in the freelist, while the writer is reusing
> and updating it.
>
> I think the percpu_freelist used in the regular hashmap should have similar
> behavior, so may be worth finding a common solution, such as waiting for a rcu
> gp before reusing it.

To resolve this, would it make sense to ensure that elements popped
from the free list are only reused after a grace period? Similar to
how other parts of the kernel manage safe object reuse.

--
Regards,
Shankari



On Wed, Jul 16, 2025 at 2:57 AM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:
>
> On 7/15/25 7:49 AM, Alexei Starovoitov wrote:
> > Also you misread the kcsan report.
> >
> > It says that 'read' comes from:
> >
> > read to 0xffff888118f3d568 of 4 bytes by task 4719 on cpu 1:
> >   lookup_nulls_elem_raw kernel/bpf/hashtab.c:643 [inline]
> >
> > which is reading hash and key of htab_elem while
> > write side actually writes hash too:
> > *(u32 *)((void *)node + lru->hash_offset) = hash;
> >
> > Martin,
> > is it really possible for these read/write to race ?
>
> I think it is possible. The elem in the lru's freelist currently does not wait
> for a rcu gp before reuse. There is a chance that the rcu reader is still
> reading the hash value that was put in the freelist, while the writer is reusing
> and updating it.
>
> I think the percpu_freelist used in the regular hashmap should have similar
> behavior, so may be worth finding a common solution, such as waiting for a rcu
> gp before reusing it.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux