On Thu, May 15, 2025 at 5:27 AM Viktor Malik <vmalik@xxxxxxxxxx> wrote: > > On 5/10/25 00:03, Andrii Nakryiko wrote: > > On Fri, May 9, 2025 at 2:37 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > >> > >> On Fri, 2025-05-09 at 11:20 -0700, Andrii Nakryiko wrote: > >> > >> [...] > >> > >>>> +/** > >>>> + * bpf_strchr - Find the first occurrence of a character in a string > >>>> + * @s: The string to be searched > >>>> + * @c: The character to search for > >>>> + * > >>>> + * Note that the %NUL-terminator is considered part of the string, and can > >>>> + * be searched for. > >>>> + * > >>>> + * Return: > >>>> + * * const char * - Pointer to the first occurrence of @c within @s > >>>> + * * %NULL - @c not found in @s > >>>> + * * %-EFAULT - Cannot read @s > >>>> + * * %-E2BIG - @s too large > >>>> + */ > >>>> +__bpf_kfunc const char *bpf_strchr(const char *s, char c) > >>> > >>> so let's say we found the character, we return a pointer to it, and > >>> that memory goes away (because we never owned it, so we don't really > >>> know what and when will happen with it). Question, will verifier allow > >>> BPF program to dereference this pointer? If yes, that's a problem. But > >>> if not, then I'm not sure there is much point in returning a pointer. > > You are right, at the moment, the verifier marks the returned pointers > as `rdonly_mem_or_null` so an attempt to dereference them will result > into a verifier error. Which is clearly not very useful. > > I'd say that, theoretically, the pointers returned from these kfuncs > should be treated by the verifier in the same way as the passed > pointers. That is, if PTR_TO_MAP_VALUE is passed, > PTR_TO_MAP_VALUE_OR_NULL should be returned, and so on. > > >>> I'm just trying to imply that in BPF world integer-based APIs work > >>> better/safer, overall? For strings, we can switch any > >>> pointer-returning API to position-returning (or negative error) API > >>> and it would more or less naturally fit into BPF API surface, no? > >> > >> Integer based API solves the problem with memory access but is not > >> really ergonomic. W/o special logic in verifier the returned int would > >> be unbounded, hence the user would have to compare it with string > >> length before using. > >> > >> It looks like some verifier logic is necessary regardless of API being > >> integer or pointer based. In any case verifier needs additional rules > >> for each pointer type to adjust bounds on the return value or its refobj_id. > > IMO the problem here is that we can't just say anything about the > returned pointer (or index) rather than it is within the bounds of the > original string (or within the passed size for bounded kfuncs). So, any > access to that pointer with an offset other than 0 will still need an > explicit bounds check. Exactly. > > > You can't safely dereference any pointer returned from these APIs, > > because the memory might not be there anymore. > > You can't if the memory comes from an untrusted source. But what if the > memory is owned by the BPF program (e.g. on stack or in a map)? Then, it > should be possible to dereference it safely, shouldn't it? IMHO, this > would be quite a common use-case: read string into BPF memory using > bpf_probe_read_str -> use string kfunc to search it -> do something with > the returned pointer (dereference it). From ergonomics perspective, it > shouldn't be necessary to use bpf_probe_read or __get_kernel_nofault > again. For bpf_probe_read_str -> use kfunc scenario, I thought the main *goal* is to avoid the bpf_probe_read_str operation altogether. That's why we allow unsafe pointers passed into those kfuncs you are adding and why we use __get_kernel_nofault internally. So with that, you'd actually just use, say, bpf_strchr(), get back some pointer or index, calculate substring (prefix) length, and *then* maybe bpf_probe_read_str into ringbuf or local buffer, if you'd like to capture the data and do some post processing. > > > For integers, same idea. If you use bpf_probe_read_{kernel,user} to > > read data, then verifier doesn't care about the value of integer. > > > > But that's not ergonomic, so in some other thread few days ago I was > > proposing that we should add an untyped counterpart to bpf_core_cast() > > that would just make any memory accesses performed using > > __get_kernel_nofault() semantics. And so then: > > > > > > const char *str = <random value or we got it from somewhere untrusted>; > > int space_idx = bpf_strchr(str, ' '); > > if (space_idx < 0) > > return -1; /* bad luck */ > > > > const char *s = bpf_mem_cast(str); > > char buf[64] = {}; > > > > bpf_for(i, 0, space_idx) > > buf[i] = s[i]; /* MAGIC */ > > > > bpf_printk("STUFF BEFORE SPACE: %s", buf); > > I can imagine that this would be a nice helper for accessing untrusted > memory in general but I don't think it's specific to string kfuncs. At > the moment, reading an untrusted string requires bpf_probe_read_str so > calling it after processing the string via a kfunc doesn't introduce any > extra call. I might be confused, see above. My impression was that with your functions we won't need bpf_probe_read_str() and that was the whole point of your __get_kernel_nofault-based reimplementation. If not for that, we'd use trusted memory pointers and just used internal kernel strings, knowing that we are working with MAP_VALUE or PTR_TO_STACK of well-bounded size. Then again, see what I wrote above. I imagine that the user would not do bpf_probe_read_str() at all. I'll do bpf_strchr(), followed by bpf_memcast(), followed by iterating from 0 to the index returned from bpf_strchr(), if I need to process the substring. > > BTW note that at the moment, the kfuncs do not accept strings from > untrusted sources as the verifier doesn't know how to treat `char *` > kfunc args and treats them as scalars (which are incompatible with other > pointers). Here, changing also the kfunc args to ints would probably > help, although I think that it would not be ergonomic at all. So, we > still need some verifier work to handle `char *` args to kfuncs. Ok, so that's probably the missing piece. We need to teach verifiers to allow such untrusted pointers. I thought that was the whole idea behind adding these APIs: to allow such usage. > > > Tbh, when dealing with libc string APIs, I still very frequently > > convert resulting pointers into indices, so I don't think it's > > actually an API regression to have index-based string APIs > > I agree here. In addition, it looks to me that returning pointers would > require extra verifier work while returning indices would not. And we > still need to apply extra bounds checks or access helpers in a majority > of use-cases so we don't loose much ergonomics with integer-based APIs. > +1