On Mon, Jun 23, 2025 at 6:48 AM Viktor Malik <vmalik@xxxxxxxxxx> wrote: > > String operations are commonly used so this exposes the most common ones > to BPF programs. For now, we limit ourselves to operations which do not > copy memory around. > > Unfortunately, most in-kernel implementations assume that strings are > %NUL-terminated, which is not necessarily true, and therefore we cannot > use them directly in the BPF context. Instead, we open-code them using > __get_kernel_nofault instead of plain dereference to make them safe and > limit the strings length to XATTR_SIZE_MAX to make sure the functions > terminate. When __get_kernel_nofault fails, functions return -EFAULT. > Similarly, when the size bound is reached, the functions return -E2BIG. > In addition, we return -ERANGE when the passed strings are outside of > the kernel address space. > > Note that thanks to these dynamic safety checks, no other constraints > are put on the kfunc args (they are marked with the "__ign" suffix to > skip any verifier checks for them). > > All of the functions return integers, including functions which normally > (in kernel or libc) return pointers to the strings. The reason is that > since the strings are generally treated as unsafe, the pointers couldn't > be dereferenced anyways. So, instead, we return an index to the string > and let user decide what to do with it. This also nicely fits with > returning various error codes when necessary (see above). > > Suggested-by: Alexei Starovoitov <ast@xxxxxxxxxx> > Signed-off-by: Viktor Malik <vmalik@xxxxxxxxxx> > --- > kernel/bpf/helpers.c | 389 +++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 389 insertions(+) > few more comments below, beside the -ENOENT issue Alexei mentioned earlier [...] > +/** > + * bpf_strrchr - Find the last occurrence of a character in a string > + * @s__ign: The string to be searched > + * @c: The character to search for > + * > + * Return: > + * * >=0 - Index of the last occurrence of @c within @s__ign > + * * %-1 - @c not found in @s__ign > + * * %-EFAULT - Cannot read @s__ign > + * * %-E2BIG - @s__ign is too large > + * * %-ERANGE - @s__ign is outside of kernel address space > + */ > +__bpf_kfunc int bpf_strrchr(const char *s__ign, int c) > +{ > + char sc; > + int i, last = -1; > + > + if (!copy_from_kernel_nofault_allowed(s__ign, 1)) > + return -ERANGE; > + > + guard(pagefault)(); > + for (i = 0; i < XATTR_SIZE_MAX; i++) { > + __get_kernel_nofault(&sc, s__ign, char, err_out); > + if (sc == '\0') > + return last; > + if (sc == c) > + last = i; swap these two ifs, so that bpf_strrchr("blah", 0) will still return a meaningful result (effectively become bpf_strlen(), of course)? That should match strrchr's behavior in libc, if I'm reading this correctly: "The terminating NUL character is considered to be part of the string." > + s__ign++; > + } > + return -E2BIG; > +err_out: > + return -EFAULT; > +} > + [...] > +/** > + * bpf_strspn - Calculate the length of the initial substring of @s__ign which > + * only contains letters in @accept__ign > + * @s__ign: The string to be searched > + * @accept__ign: The string to search for > + * > + * Return: > + * * >=0 - The length of the initial substring of @s__ign which only > + * contains letters from @accept__ign > + * * %-EFAULT - Cannot read one of the strings > + * * %-E2BIG - One of the strings is too large > + * * %-ERANGE - One of the strings is outside of kernel address space > + */ > +__bpf_kfunc int bpf_strspn(const char *s__ign, const char *accept__ign) > +{ > + char cs, ca; > + bool found; > + int i, j; > + > + if (!copy_from_kernel_nofault_allowed(s__ign, 1) || > + !copy_from_kernel_nofault_allowed(accept__ign, 1)) { > + return -ERANGE; > + } > + > + guard(pagefault)(); > + for (i = 0; i < XATTR_SIZE_MAX; i++) { > + __get_kernel_nofault(&cs, s__ign, char, err_out); > + if (cs == '\0') > + return i; > + found = false; > + for (j = 0; j < XATTR_SIZE_MAX; j++) { > + __get_kernel_nofault(&ca, accept__ign + j, char, err_out); > + if (cs == ca) { > + found = true; > + break; > + } > + if (ca == '\0') > + break; > + } > + if (!found) > + return i; nit: you shouldn't need "found", just `ca == '\0'` would mean "not found", I think? so you'd have a succinct `if (cs == ca || ca == '\0') break;` in the innermost loop > + s__ign++; > + } > + return -E2BIG; > +err_out: > + return -EFAULT; > +} > + [...] > +/** > + * bpf_strnstr - Find the first substring in a length-limited string > + * @s1__ign: The string to be searched > + * @s2__ign: The string to search for > + * @len: the maximum number of characters to search > + * > + * Return: > + * * >=0 - Index of the first character of the first occurrence of @s2__ign > + * within the first @len characters of @s1__ign > + * * %-1 - @s2__ign not found in the first @len characters of @s1__ign > + * * %-EFAULT - Cannot read one of the strings > + * * %-E2BIG - One of the strings is too large > + * * %-ERANGE - One of the strings is outside of kernel address space > + */ > +__bpf_kfunc int bpf_strnstr(const char *s1__ign, const char *s2__ign, size_t len) > +{ > + char c1, c2; > + int i, j; > + > + if (!copy_from_kernel_nofault_allowed(s1__ign, 1) || > + !copy_from_kernel_nofault_allowed(s2__ign, 1)) { > + return -ERANGE; > + } > + > + guard(pagefault)(); > + for (i = 0; i < XATTR_SIZE_MAX; i++) { > + for (j = 0; i + j < len && j < XATTR_SIZE_MAX; j++) { > + __get_kernel_nofault(&c1, s1__ign + j, char, err_out); move this after you check `c2 == 0` below? why reading this character if we are not going to compare it? > + __get_kernel_nofault(&c2, s2__ign + j, char, err_out); > + if (c2 == '\0') > + return i; > + if (c1 == '\0') > + return -1; > + if (c1 != c2) > + break; > + } > + if (j == XATTR_SIZE_MAX) > + return -E2BIG; > + if (i + j == len) > + return -1; > + s1__ign++; > + } > + return -E2BIG; > +err_out: > + return -EFAULT; > +} > + [...]