On Wed, Aug 13, 2025 at 2:49 AM Ondrej Mosnacek <omosnace@xxxxxxxxxx> wrote: > > On Sat, Aug 9, 2025 at 2:46 AM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > On Fri, Aug 8, 2025 at 4:59 PM Serge E. Hallyn <serge@xxxxxxxxxx> wrote: > > > > > > On Wed, Aug 06, 2025 at 08:18:55PM -0500, Serge E. Hallyn wrote: > > > > On Wed, Aug 06, 2025 at 04:31:05PM +0200, Ondrej Mosnacek wrote: > > > > > Don't check against the overloaded CAP_SYS_ADMINin do_jit(), but instead > > > > > use bpf_capable(), which checks against the more granular CAP_BPF first. > > > > > Going straight to CAP_SYS_ADMIN may cause unnecessary audit log spam > > > > > under SELinux, as privileged domains using BPF would usually only be > > > > > allowed CAP_BPF and not CAP_SYS_ADMIN. > > > > > > > > > > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2369326 > > > > > Fixes: d4e89d212d40 ("x86/bpf: Call branch history clearing sequence on exit") > > > > > Signed-off-by: Ondrej Mosnacek <omosnace@xxxxxxxxxx> > > > > > > > > So this seems correct, *provided* that we consider it within the purview of > > > > CAP_BPF to be able to avoid clearing the branch history buffer. > > > > true, but... > > > > > > > > > > I suspect that's the case, but it might warrant discussion. > > > > > > > > Reviewed-by: Serge Hallyn <serge@xxxxxxxxxx> > > > > > > (BTW, I'm assuming this will get pulled into a BPF tree or something, and > > > doesn't need to go into the capabilities tree. Let me know if that's wrong) > > > > Right. > > scripts/get_maintainer.pl arch/x86/net/bpf_jit_comp.c > > is your friend. > > > > Pls cc author-s of the commit in question in the future. > > Adding them now. > > > > > > > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c > > > > > index 15672cb926fc1..2a825e5745ca1 100644 > > > > > --- a/arch/x86/net/bpf_jit_comp.c > > > > > +++ b/arch/x86/net/bpf_jit_comp.c > > > > > @@ -2591,8 +2591,7 @@ emit_jmp: > > > > > seen_exit = true; > > > > > /* Update cleanup_addr */ > > > > > ctx->cleanup_addr = proglen; > > > > > - if (bpf_prog_was_classic(bpf_prog) && > > > > > - !capable(CAP_SYS_ADMIN)) { > > > > > + if (bpf_prog_was_classic(bpf_prog) && !bpf_capable()) { > > > > This looks wrong for several reasons. > > > > 1. > > bpf_capable() and CAP_BPF in general applies to eBPF only. > > There is no precedent so far to do anything differently > > for cBPF when CAP_BPF is present. > > That's not entirely true, see below. > > > 2. > > commit log states that > > "privileged domains using BPF would usually only be allowed CAP_BPF > > and not CAP_SYS_ADMIN" > > which is true for eBPF only, since cBPF is always allowed for > > all unpriv users. > > Start chrome browser and you get cBPF loaded. > > Processes using cBPF (via SO_ATTACH_FILTER) already can trigger a > CAP_BPF check - when the net.core.bpf_jit_harden sysctl is set to 1, > then the sequence sk_attach_filter() -> __get_filter() -> > bpf_prog_alloc() -> bpf_prog_alloc_no_stats() -> > bpf_jit_blinding_enabled() -> bpf_token_capable() happens for the same > iio-sensor-proxy syscall as the one that hits the CAP_SYS_ADMIN check. Agree that it's a similar case and we should standardize on the way to check whether mitigation is necessary. But I think in both cases it should be "_noaudit" version. In other words, everywhere where jit, verifier or anything else is checking caps to apply a mitigation or not we should use "_noaudit". > Because of this we have already granted the BPF capability in > Fedora/RHEL SELinux policy to many domains that would usually run as > root and that use SO_ATTACH_FILTER. The logic being that they are > legitimately using BPF + without SELinux they would be fully > privileged (root) and they would pass that check + it seemed they > could otherwise lose some performance due to the hardening (though I'm > not sure now if it applies to cBPF, so this point could be moot) + > CAP_BPF doesn't grant any excess privileges beyond this (as opposed to > e.g. CAP_SYS_ADMIN). This is what I meant behind that commit log > statement, though I didn't remember the details, so I didn't state it > as clearly as I could have (my apologies). > > Now this same usage started triggering the new plain CAP_SYS_ADMIN > check so I naturally assumed that changing it to bpf_capable() would > be the most logical solution (as it would let us keep the services > excluded from the hardening via CAP_BPF without granting the broad > CAP_SYS_ADMIN). I see. Sounds like you identified bpf_jit_blinding_enabled() case and special cased it selinux. So doing bpf_capable() in JIT would fit the existing selinux handling. > > Is the fact that CAP_BPF check is reachable via cBPF use unexpected > behavior? If both cBPF and eBPF can be JIT'd and CAP_BPF is already > being used for the "exempt from JIT hardening" semantics in one place, > why should cBPF and eBPF be treated differently? In fact, shouldn't > the decision to apply the Spectre mitigation also take into account > the net.core.bpf_jit_harden sysctl even when the program is not cBPF? > > > 3. > > glancing over bugzilla it seems that the issue is > > excessive audit spam and not related to CAP_BPF and privileges. > > If so then the fix is to use > > ns_capable_noaudit(&init_user_ns, CAP_SYS_ADMIN) > > > > 4. > > I don't understand how the patch is supposed to fix the issue. > > iio-sensor-proxy is probably unpriv. Why would it use CAP_BPF? > > It's using cBPF, so there is no reason for it to have CAP_BPF. > > So capable(CAP_BPF) will fail just like capable(CAP_SYS_ADMIN), > > but since CAP_BPF check was done first, the audit won't > > be printed, because it's some undocumented internal selinux behavior ? > > None of it is in the commit log :( > > It is not unprivileged. It runs as root and without SELinux it would > have all capabilities allowed. If it were running without any > capabilities, then indeed there would be no SELinux checks. > > > 5. > > And finally all that looks like a selinux bug. > > Just because something in the kernel is asking capable(CAP_SYS_ADMIN) > > there is no need to spam users with the wrong message: > > "SELinux is preventing iio-sensor-prox from using the 'sys_admin' capabilities." > > iio-sensor-prox is not trying to use 'sys_admin' capabilities. > > cBPF prog will be loaded anyway, with or without BHB clearing. > > Well, it depends... In this case the AVC denial informs us that the > kernel is making some decision depending on the capability and that a > decision should be made in the policy to allow or silence the access > vector. Even when the consequence is not a failure of the syscall, it > still may be useful to have the denial reported, since there is a > potential performance impact. OTOH, with CAP_SYS_ADMIN if the decision > is to not allow it, then silencing it via a dontaudit rule would > potentially hide other more critical CAP_SYS_ADMIN denials, so it's > hard to decide what is better - to silence this specific case in the > kernel vs. to let the user allow/silence the specific AV in the > policy... imo we should apply a general rule for using "_noaudit" for all checks that don't end up as a denial. There are various bpf_allow_ptr_leaks(), bpf_bypass_spec_v[14]() checks in the verifier that should be converted to "_noaudit". It's true that the check affects performance and users could be interested in the end result, but if they enable mitigations they expect the performance hit across the board, so skipping a mitigation for a privileged process is a bonus. A prog will be tiny bit faster. So not worth flagging anywhere.