Re: [RFC PATCH bpf-next 2/3] bpf: Implement bpf mem allocator dtor for hash map

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 6/3/2025 10:45 AM, Alexei Starovoitov wrote:
> On Mon, Jun 2, 2025 at 7:27 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>> Hi,
>>
>> On 6/3/2025 7:08 AM, Alexei Starovoitov wrote:
>>> On Sun, May 25, 2025 at 11:08 PM Hou Tao <houtao@xxxxxxxxxxxxxxx> wrote:
>>>> From: Hou Tao <houtao1@xxxxxxxxxx>
>>>>
>>>> BPF hash map supports special fields in its value, and BPF program is
>>>> free to manipulate these special fields even after the element is
>>>> deleted from the hash map. For non-preallocated hash map, these special
>>>> fields will be leaked when the map is destroyed. Therefore, implement
>>>> necessary BPF memory allocator dtor to free these special fields.
>>>>
>>>> Signed-off-by: Hou Tao <houtao1@xxxxxxxxxx>
>>>> ---
>>>>  kernel/bpf/hashtab.c | 34 ++++++++++++++++++++++++++++++++--
>>>>  1 file changed, 32 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>>>> index dd6c157cb828..2531177d1464 100644
>>>> --- a/kernel/bpf/hashtab.c
>>>> +++ b/kernel/bpf/hashtab.c
>>>> @@ -128,6 +128,8 @@ struct htab_elem {
>>>>         char key[] __aligned(8);
>>>>  };
>>>>
>>>> +static void check_and_free_fields(struct bpf_htab *htab, struct htab_elem *elem);
>>>> +
>>>>  static inline bool htab_is_prealloc(const struct bpf_htab *htab)
>>>>  {
>>>>         return !(htab->map.map_flags & BPF_F_NO_PREALLOC);
>>>> @@ -464,6 +466,33 @@ static int htab_map_alloc_check(union bpf_attr *attr)
>>>>         return 0;
>>>>  }
>>>>
>>>> +static void htab_ma_dtor(void *obj, void *ctx)
>>>> +{
>>>> +       struct bpf_htab *htab = ctx;
>>>> +
>>>> +       /* The per-cpu pointer saved in the htab_elem may have been freed
>>>> +        * by htab->pcpu_ma. Therefore, freeing the special fields in the
>>>> +        * per-cpu pointer through the dtor of htab->pcpu_ma instead.
>>>> +        */
>>>> +       if (htab_is_percpu(htab))
>>>> +               return;
>>>> +       check_and_free_fields(htab, obj);
>>>> +}
>>> It seems the selftest in patch 3 might be working "by accident",
>>> but older tests should be crashing?
>> Er, didn't follow the reason why the older test should be crashing.  For
>> the per-cpu hash map, the per-cpu pointer gets through
>> htab_elem_get_ptr() is valid until one tasks trace RCU gp passes,
>> therefore, the bpf prog will always manipulate a valid per-cpu pointer.
>> The comment above wants to say is that htab_elem_get_ptr() in
>> htab_ma_dtor() may return a freed per-cpu pointer, because the order of
>> freeing htab_elem->ptr_to_pptr and htab_elem is nondeterministic.
>>
>>> It looks like you're calling check_and_free_fields() twice now.
>>> Once from htab_elem_free() and then again in htab_ma_dtor() ?
>>>
>>> If we're going for dtor then htab should delegate all clean up to it.
>> Yes. check_and_free_fields() is called twice.
> I meant that things will be crashing since it's called twice.
> In bpf_obj_free_fields()
> timer/wq are probably fine to be called multiple times.
> list_head/rb_tree/kptr should be fine as well,
> but uptr will double unpin.
>
> I doubt we really thought through the consequences of
> calling bpf_obj_free_fields() on the same map value multiple times.

Thanks for the explanation. I got it now. Considering that uptr is only
available for task local storage, it is fine for now. But we should
handle that if check_and_free_fields() is called multiple times.
>
>> There are three possible
>> locations to free the special fields:
>>
>> 1) when the element is freed through bpf_mem_cache_free()
>> 2) before the element is reused
>> 3) before the element is freed to slab.
>>
>> 3) is necessary to ensure these special fields will not be leaked and 1)
>> is necessary to ensure these special fields will be freed before it is
>> reused. I didn't find a good way to only call it once.
> My point is that we have to call it once, especially since
> it's recursive in obj_drop, or...
> if we have to call it multiple times we need to fix uptr and
> review all other special fields carefully.

I see.
>
>> Maybe adding a
>> bit flag to the pointer of the element to indicate that it has been
>> destroyed is OK ? Will try it in the next revision.
> Not sure what you have in mind.

Er. I was just wondering that if it is possible to the bit 0 of the
element pointer to indicate that the element has been destroyed,
therefore, the second invocation of dtor can be avoided.

> .





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux