Catalin Marinas <catalin.marinas@xxxxxxx> writes: > On Fri, Aug 29, 2025 at 01:07:35AM -0700, Ankur Arora wrote: >> diff --git a/arch/arm64/include/asm/rqspinlock.h b/arch/arm64/include/asm/rqspinlock.h >> index a385603436e9..ce8feadeb9a9 100644 >> --- a/arch/arm64/include/asm/rqspinlock.h >> +++ b/arch/arm64/include/asm/rqspinlock.h >> @@ -3,6 +3,9 @@ >> #define _ASM_RQSPINLOCK_H >> >> #include <asm/barrier.h> >> + >> +#define res_smp_cond_load_acquire_waiting() arch_timer_evtstrm_available() > > More on this below, I don't think we should define it. > >> diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c >> index 5ab354d55d82..8de1395422e8 100644 >> --- a/kernel/bpf/rqspinlock.c >> +++ b/kernel/bpf/rqspinlock.c >> @@ -82,6 +82,7 @@ struct rqspinlock_timeout { >> u64 duration; >> u64 cur; >> u16 spin; >> + u8 wait; >> }; >> >> #define RES_TIMEOUT_VAL 2 >> @@ -241,26 +242,20 @@ static noinline int check_timeout(rqspinlock_t *lock, u32 mask, >> } >> >> /* >> - * Do not amortize with spins when res_smp_cond_load_acquire is defined, >> - * as the macro does internal amortization for us. >> + * Only amortize with spins when we don't have a waiting implementation. >> */ >> -#ifndef res_smp_cond_load_acquire >> #define RES_CHECK_TIMEOUT(ts, ret, mask) \ >> ({ \ >> - if (!(ts).spin++) \ >> + if ((ts).wait || !(ts).spin++) \ >> (ret) = check_timeout((lock), (mask), &(ts)); \ >> (ret); \ >> }) >> -#else >> -#define RES_CHECK_TIMEOUT(ts, ret, mask) \ >> - ({ (ret) = check_timeout((lock), (mask), &(ts)); }) >> -#endif > > IIUC, RES_CHECK_TIMEOUT in the current res_smp_cond_load_acquire() usage > doesn't amortise the spins, as the comment suggests, but rather the > calls to check_timeout(). This is fine, it matches the behaviour of > smp_cond_load_relaxed_timewait() you introduced in the first patch. The > only difference is the number of spins - 200 (matching poll_idle) vs 64K > above. Does 200 work for the above? Works for me. I had added this because there seemed to be vast gulf between 64K and 200. Happy to drop this. >> /* >> * Initialize the 'spin' member. >> * Set spin member to 0 to trigger AA/ABBA checks immediately. >> */ >> -#define RES_INIT_TIMEOUT(ts) ({ (ts).spin = 0; }) >> +#define RES_INIT_TIMEOUT(ts) ({ (ts).spin = 0; (ts).wait = res_smp_cond_load_acquire_waiting(); }) > > First of all, I don't really like the smp_cond_load_acquire_waiting(), > that's an implementation detail of smp_cond_load_*_timewait() that > shouldn't leak outside. But more importantly, RES_CHECK_TIMEOUT() is > also used outside the smp_cond_load_acquire_timewait() condition. The > (ts).wait check only makes sense when used together with the WFE > waiting. > > I would leave RES_CHECK_TIMEOUT() as is for the stand-alone cases and > just use check_timeout() in the smp_cond_load_acquire_timewait() > scenarios. I would also drop the res_smp_cond_load_acquire() macro since > you now defined smp_cond_load_acquire_timewait() generically and can be > used directly. Sounds good. -- ankur