On Fri, May 9, 2025 at 2:37 PM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote: > > On Fri, 2025-05-09 at 11:20 -0700, Andrii Nakryiko wrote: > > [...] > > > > +/** > > > + * bpf_strchr - Find the first occurrence of a character in a string > > > + * @s: The string to be searched > > > + * @c: The character to search for > > > + * > > > + * Note that the %NUL-terminator is considered part of the string, and can > > > + * be searched for. > > > + * > > > + * Return: > > > + * * const char * - Pointer to the first occurrence of @c within @s > > > + * * %NULL - @c not found in @s > > > + * * %-EFAULT - Cannot read @s > > > + * * %-E2BIG - @s too large > > > + */ > > > +__bpf_kfunc const char *bpf_strchr(const char *s, char c) > > > > so let's say we found the character, we return a pointer to it, and > > that memory goes away (because we never owned it, so we don't really > > know what and when will happen with it). Question, will verifier allow > > BPF program to dereference this pointer? If yes, that's a problem. But > > if not, then I'm not sure there is much point in returning a pointer. > > > > > > I'm just trying to imply that in BPF world integer-based APIs work > > better/safer, overall? For strings, we can switch any > > pointer-returning API to position-returning (or negative error) API > > and it would more or less naturally fit into BPF API surface, no? > > Integer based API solves the problem with memory access but is not > really ergonomic. W/o special logic in verifier the returned int would > be unbounded, hence the user would have to compare it with string > length before using. > > It looks like some verifier logic is necessary regardless of API being > integer or pointer based. In any case verifier needs additional rules > for each pointer type to adjust bounds on the return value or its refobj_id. > You can't safely dereference any pointer returned from these APIs, because the memory might not be there anymore. For integers, same idea. If you use bpf_probe_read_{kernel,user} to read data, then verifier doesn't care about the value of integer. But that's not ergonomic, so in some other thread few days ago I was proposing that we should add an untyped counterpart to bpf_core_cast() that would just make any memory accesses performed using __get_kernel_nofault() semantics. And so then: const char *str = <random value or we got it from somewhere untrusted>; int space_idx = bpf_strchr(str, ' '); if (space_idx < 0) return -1; /* bad luck */ const char *s = bpf_mem_cast(str); char buf[64] = {}; bpf_for(i, 0, space_idx) buf[i] = s[i]; /* MAGIC */ bpf_printk("STUFF BEFORE SPACE: %s", buf); Tbh, when dealing with libc string APIs, I still very frequently convert resulting pointers into indices, so I don't think it's actually an API regression to have index-based string APIs