Re: [PATCH bpf-next v4 3/4] bpf: Add kfuncs for read-only string operations

Viktor Malik <vmalik@xxxxxxxxxx> · Thu, 15 May 2025 14:27:28 +0200

On 5/10/25 00:03, Andrii Nakryiko wrote:
> On Fri, May 9, 2025 at 2:37 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>>
>> On Fri, 2025-05-09 at 11:20 -0700, Andrii Nakryiko wrote:
>>
>> [...]
>>
>>>> +/**
>>>> + * bpf_strchr - Find the first occurrence of a character in a string
>>>> + * @s: The string to be searched
>>>> + * @c: The character to search for
>>>> + *
>>>> + * Note that the %NUL-terminator is considered part of the string, and can
>>>> + * be searched for.
>>>> + *
>>>> + * Return:
>>>> + * * const char * - Pointer to the first occurrence of @c within @s
>>>> + * * %NULL        - @c not found in @s
>>>> + * * %-EFAULT     - Cannot read @s
>>>> + * * %-E2BIG      - @s too large
>>>> + */
>>>> +__bpf_kfunc const char *bpf_strchr(const char *s, char c)
>>>
>>> so let's say we found the character, we return a pointer to it, and
>>> that memory goes away (because we never owned it, so we don't really
>>> know what and when will happen with it). Question, will verifier allow
>>> BPF program to dereference this pointer? If yes, that's a problem. But
>>> if not, then I'm not sure there is much point in returning a pointer.

You are right, at the moment, the verifier marks the returned pointers
as `rdonly_mem_or_null` so an attempt to dereference them will result
into a verifier error. Which is clearly not very useful.

I'd say that, theoretically, the pointers returned from these kfuncs
should be treated by the verifier in the same way as the passed
pointers. That is, if PTR_TO_MAP_VALUE is passed,
PTR_TO_MAP_VALUE_OR_NULL should be returned, and so on.

>>> I'm just trying to imply that in BPF world integer-based APIs work
>>> better/safer, overall? For strings, we can switch any
>>> pointer-returning API to position-returning (or negative error) API
>>> and it would more or less naturally fit into BPF API surface, no?
>>
>> Integer based API solves the problem with memory access but is not
>> really ergonomic. W/o special logic in verifier the returned int would
>> be unbounded, hence the user would have to compare it with string
>> length before using.
>>
>> It looks like some verifier logic is necessary regardless of API being
>> integer or pointer based. In any case verifier needs additional rules
>> for each pointer type to adjust bounds on the return value or its refobj_id.

IMO the problem here is that we can't just say anything about the
returned pointer (or index) rather than it is within the bounds of the
original string (or within the passed size for bounded kfuncs). So, any
access to that pointer with an offset other than 0 will still need an
explicit bounds check.

> You can't safely dereference any pointer returned from these APIs,
> because the memory might not be there anymore.

You can't if the memory comes from an untrusted source. But what if the
memory is owned by the BPF program (e.g. on stack or in a map)? Then, it
should be possible to dereference it safely, shouldn't it? IMHO, this
would be quite a common use-case: read string into BPF memory using
bpf_probe_read_str -> use string kfunc to search it -> do something with
the returned pointer (dereference it). From ergonomics perspective, it
shouldn't be necessary to use bpf_probe_read or __get_kernel_nofault
again.

> For integers, same idea. If you use bpf_probe_read_{kernel,user} to
> read data, then verifier doesn't care about the value of integer.
> 
> But that's not ergonomic, so in some other thread few days ago I was
> proposing that we should add an untyped counterpart to bpf_core_cast()
> that would just make any memory accesses performed using
> __get_kernel_nofault() semantics. And so then:
> 
> 
> const char *str = <random value or we got it from somewhere untrusted>;
> int space_idx = bpf_strchr(str, ' ');
> if (space_idx < 0)
>   return -1; /* bad luck */
> 
> const char *s = bpf_mem_cast(str);
> char buf[64] = {};
> 
> bpf_for(i, 0, space_idx)
>     buf[i] = s[i]; /* MAGIC */
> 
> bpf_printk("STUFF BEFORE SPACE: %s", buf);

I can imagine that this would be a nice helper for accessing untrusted
memory in general but I don't think it's specific to string kfuncs. At
the moment, reading an untrusted string requires bpf_probe_read_str so
calling it after processing the string via a kfunc doesn't introduce any
extra call.

BTW note that at the moment, the kfuncs do not accept strings from
untrusted sources as the verifier doesn't know how to treat `char *`
kfunc args and treats them as scalars (which are incompatible with other
pointers). Here, changing also the kfunc args to ints would probably
help, although I think that it would not be ergonomic at all. So, we
still need some verifier work to handle `char *` args to kfuncs.

> Tbh, when dealing with libc string APIs, I still very frequently
> convert resulting pointers into indices, so I don't think it's
> actually an API regression to have index-based string APIs

I agree here. In addition, it looks to me that returning pointers would
require extra verifier work while returning indices would not. And we
still need to apply extra bounds checks or access helpers in a majority
of use-cases so we don't loose much ergonomics with integer-based APIs.