Hi, On 6/3/2025 10:45 AM, Alexei Starovoitov wrote: > On Mon, Jun 2, 2025 at 7:27 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: >> Hi, >> >> On 6/3/2025 7:08 AM, Alexei Starovoitov wrote: >>> On Sun, May 25, 2025 at 11:08 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote: >>>> From: Hou Tao <houtao1@xxxxxxxxxx> >>>> >>>> BPF hash map supports special fields in its value, and BPF program is >>>> free to manipulate these special fields even after the element is >>>> deleted from the hash map. For non-preallocated hash map, these special >>>> fields will be leaked when the map is destroyed. Therefore, implement >>>> necessary BPF memory allocator dtor to free these special fields. >>>> >>>> Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx> >>>> --- >>>> kernel/bpf/hashtab.c | 34 ++++++++++++++++++++++++++++++++-- >>>> 1 file changed, 32 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c >>>> index dd6c157cb828..2531177d1464 100644 >>>> --- a/kernel/bpf/hashtab.c >>>> +++ b/kernel/bpf/hashtab.c >>>> @@ -128,6 +128,8 @@ struct htab_elem { >>>> char key[] __aligned(8); >>>> }; >>>> >>>> +static void check_and_free_fields(struct bpf_htab *htab, struct htab_elem *elem); >>>> + >>>> static inline bool htab_is_prealloc(const struct bpf_htab *htab) >>>> { >>>> return !(htab->map.map_flags & BPF_F_NO_PREALLOC); >>>> @@ -464,6 +466,33 @@ static int htab_map_alloc_check(union bpf_attr *attr) >>>> return 0; >>>> } >>>> >>>> +static void htab_ma_dtor(void *obj, void *ctx) >>>> +{ >>>> + struct bpf_htab *htab = ctx; >>>> + >>>> + /* The per-cpu pointer saved in the htab_elem may have been freed >>>> + * by htab->pcpu_ma. Therefore, freeing the special fields in the >>>> + * per-cpu pointer through the dtor of htab->pcpu_ma instead. >>>> + */ >>>> + if (htab_is_percpu(htab)) >>>> + return; >>>> + check_and_free_fields(htab, obj); >>>> +} >>> It seems the selftest in patch 3 might be working "by accident", >>> but older tests should be crashing? >> Er, didn't follow the reason why the older test should be crashing. For >> the per-cpu hash map, the per-cpu pointer gets through >> htab_elem_get_ptr() is valid until one tasks trace RCU gp passes, >> therefore, the bpf prog will always manipulate a valid per-cpu pointer. >> The comment above wants to say is that htab_elem_get_ptr() in >> htab_ma_dtor() may return a freed per-cpu pointer, because the order of >> freeing htab_elem->ptr_to_pptr and htab_elem is nondeterministic. >> >>> It looks like you're calling check_and_free_fields() twice now. >>> Once from htab_elem_free() and then again in htab_ma_dtor() ? >>> >>> If we're going for dtor then htab should delegate all clean up to it. >> Yes. check_and_free_fields() is called twice. > I meant that things will be crashing since it's called twice. > In bpf_obj_free_fields() > timer/wq are probably fine to be called multiple times. > list_head/rb_tree/kptr should be fine as well, > but uptr will double unpin. > > I doubt we really thought through the consequences of > calling bpf_obj_free_fields() on the same map value multiple times. Thanks for the explanation. I got it now. Considering that uptr is only available for task local storage, it is fine for now. But we should handle that if check_and_free_fields() is called multiple times. > >> There are three possible >> locations to free the special fields: >> >> 1) when the element is freed through bpf_mem_cache_free() >> 2) before the element is reused >> 3) before the element is freed to slab. >> >> 3) is necessary to ensure these special fields will not be leaked and 1) >> is necessary to ensure these special fields will be freed before it is >> reused. I didn't find a good way to only call it once. > My point is that we have to call it once, especially since > it's recursive in obj_drop, or... > if we have to call it multiple times we need to fix uptr and > review all other special fields carefully. I see. > >> Maybe adding a >> bit flag to the pointer of the element to indicate that it has been >> destroyed is OK ? Will try it in the next revision. > Not sure what you have in mind. Er. I was just wondering that if it is possible to the bit 0 of the element pointer to indicate that the element has been destroyed, therefore, the second invocation of dtor can be avoided. > .