Hi, This series adds waited variants of the smp_cond_load() primitives: smp_cond_load_relaxed_timewait(), and smp_cond_load_acquire_timewait(). There are two known users for these interfaces: - poll_idle() [1] - resilient queued spinlocks [2] For both of these cases we want to wait on a condition but also want to terminate the wait based on a timeout. Before describing how v2 implements these interfaces, let me recap the problems in v1 (Catalin outlined most of these in [3]): smp_cond_load_relaxed_spinwait(ptr, cond_expr, time_expr_ns, time_limit_ns) took four arguments, with ptr and cond_expr doing the usual smp_cond_load() things and time_expr_ns and time_limit_ns being used to decide the terminating condition. There were some problems in the timekeeping: 1. How often do we do the (relatively expensive) time-check? The choice made was once very 200 spin-wait iterations, with each iteration trying to idle the pipeline by executing cpu_relax(). The choice of 200 was, of course, arbitrary and somewhat meaningless across architectures. On recent x86, cpu_relax()/PAUSE takes ~20-30 cycles, but on (non-SMT) arm64 cpu_relax()/YIELD is effectively just a NOP. Even if each architecture had its own limit, this will also vary across microarchitectures. 2. On arm64, which can do better than just cpu_relax(), for instance, by waiting for a store on an address (WFE), the implementation exclusively used WFE, with the spin-wait only used as a fallback for when the event-stream was disabled. One problem with this was that the worst case timeout overshoot with WFE is ARCH_TIMER_EVT_STREAM_PERIOD_US (100us) and so there's a vast gulf between that and a potentially much smaller granularity with the spin-wait versions. In addition the interface provided no way for the caller to specify or limit the oveshoot. Non-timekeeping issues: 3. The interface was useful for poll_idle() like users but was not usable if the caller needed to do any work. For instance, rqspinlock uses it thus: smp_cond_load_acquire_timewait(v, c, 0, 1) Here the time-check always evaluates to false and all of the logic (ex. deadlock checking) is folded into the conditional. With that foundation, the new interface is: smp_cond_load_relaxed_timewait(ptr, cond_expr, wait_policy, time_expr, time_end) The added parameter, wait_policy provides a mechanism for the caller to apportion time spent spinning or, where supported, in a wait. This is somewhat inspired from the queue_poll() mechanism used with smp_cond_load() in arm-smmu-v3 [4]. It addresses (1) by deciding the time-check granularity based on a time interval instead of spinning for a fixed number of iterations. (2) is addressed by the wait_policy allowing for different slack values. The implemented versions of wait_policy allow for a coarse or a fine grained slack. A user defined wait_policy could choose its own wait parameter. This would also address (3). With that, patches 1-5, add the generic and arm64 logic: "asm-generic: barrier: add smp_cond_load_relaxed_timewait()", "asm-generic: barrier: add wait_policy handlers" "arm64: barrier: enable waiting in smp_cond_load_relaxed_timewait()" "arm64: barrier: add coarse wait for smp_cond_load_relaxed_timewait()" "arm64: barrier: add fine wait for smp_cond_load_relaxed_timewait()". And, patch 6, adds the acquire variant: "asm-generic: barrier: add smp_cond_load_acquire_timewait()" And, finally patch 7 lays out how this could be used for rqspinlock: "bpf: rqspinlock: add rqspinlock policy handler for arm64". Any comments appreciated! Ankur [1] https://lore.kernel.org/lkml/20241107190818.522639-3-ankur.a.arora@xxxxxxxxxx/ [2] Uses the smp_cond_load_acquire_timewait() from v1 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/include/asm/rqspinlock.h [3] https://lore.kernel.org/lkml/Z8dRalfxYcJIcLGj@xxxxxxx/ [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c#n223 Cc: Arnd Bergmann <arnd@xxxxxxxx> Cc: Will Deacon <will@xxxxxxxxxx> Cc: Catalin Marinas <catalin.marinas@xxxxxxx> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Cc: Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> Cc: Alexei Starovoitov <ast@xxxxxxxxxx> Cc: linux-arch@xxxxxxxxxxxxxxx Ankur Arora (7): asm-generic: barrier: add smp_cond_load_relaxed_timewait() asm-generic: barrier: add wait_policy handlers arm64: barrier: enable waiting in smp_cond_load_relaxed_timewait() arm64: barrier: add coarse wait for smp_cond_load_relaxed_timewait() arm64: barrier: add fine wait for smp_cond_load_relaxed_timewait() asm-generic: barrier: add smp_cond_load_acquire_timewait() bpf: rqspinlock: add rqspinlock policy handler for arm64 arch/arm64/include/asm/barrier.h | 82 +++++++++++++++ arch/arm64/include/asm/rqspinlock.h | 96 ++++-------------- include/asm-generic/barrier.h | 150 ++++++++++++++++++++++++++++ 3 files changed, 251 insertions(+), 77 deletions(-) -- 2.43.5