On 6/5/2025 6:00 PM, Johan Hovold wrote: > On Thu, Jun 05, 2025 at 04:41:32PM +0800, Baochen Qiang wrote: >> On 6/4/2025 10:45 PM, Johan Hovold wrote: >>> Add the missing memory barrier to make sure that destination ring >>> descriptors are read after the head pointers to avoid using stale data >>> on weakly ordered architectures like aarch64. >>> >>> The barrier is added to the ath12k_hal_srng_access_begin() helper for >>> symmetry with follow-on fixes for source ring buffer corruption which >>> will add barriers to ath12k_hal_srng_access_end(). >>> >>> Note that this may fix the empty descriptor issue recently worked around >>> by commit 51ad34a47e9f ("wifi: ath12k: Add drop descriptor handling for >>> monitor ring"). >> >> why? I would expect drunk cookies are valid in case of HAL_MON_DEST_INFO0_EMPTY_DESC, >> rather than anything caused by reordering. > > Based on a quick look it seemed like this could possibly fall in the > same category as some of the other workarounds I've spotted while > looking into these ordering issues (e.g. f9fff67d2d7c ("wifi: ath11k: > Fix SKB corruption in REO destination ring")). > > If you say this one is clearly unrelated, I'll drop the comment. Praneesh, could you comment here since you made that change? > >>> @@ -343,9 +343,6 @@ static int ath12k_ce_completed_recv_next(struct ath12k_ce_pipe *pipe, >>> goto err; >>> } >>> >>> - /* Make sure descriptor is read after the head pointer. */ >>> - dma_rmb(); >>> - >>> *nbytes = ath12k_hal_ce_dst_status_get_length(desc); >>> >>> *skb = pipe->dest_ring->skb[sw_index]; >>> diff --git a/drivers/net/wireless/ath/ath12k/hal.c b/drivers/net/wireless/ath/ath12k/hal.c >>> index 91d5126ca149..9eea13ed5565 100644 >>> --- a/drivers/net/wireless/ath/ath12k/hal.c >>> +++ b/drivers/net/wireless/ath/ath12k/hal.c >>> @@ -2126,13 +2126,24 @@ void *ath12k_hal_srng_src_get_next_reaped(struct ath12k_base *ab, >>> >>> void ath12k_hal_srng_access_begin(struct ath12k_base *ab, struct hal_srng *srng) >>> { >>> + u32 hp; >>> + >>> lockdep_assert_held(&srng->lock); >>> >>> - if (srng->ring_dir == HAL_SRNG_DIR_SRC) >>> + if (srng->ring_dir == HAL_SRNG_DIR_SRC) { >>> srng->u.src_ring.cached_tp = >>> *(volatile u32 *)srng->u.src_ring.tp_addr; >>> - else >>> - srng->u.dst_ring.cached_hp = READ_ONCE(*srng->u.dst_ring.hp_addr); >>> + } else { >>> + hp = READ_ONCE(*srng->u.dst_ring.hp_addr); >>> + >>> + if (hp != srng->u.dst_ring.cached_hp) { >> >> This consumes additional CPU cycles in hot path, which is a concern to me. >> >> Based on that, I prefer the v1 implementation. > > The conditional avoids a memory barrier in case the ring is empty, so > for all callers but ath12k_ce_completed_recv_next() it's an improvement > over v1 in that sense. > > I could make the barrier unconditional, which will only add one barrier > to ath12k_ce_completed_recv_next() in case the ring is empty compared to > v1. Perhaps that's a good compromise if you worry about the extra > comparison? I guess the unconditional barrier also has impact on performance? If so I am not sure which one is better then ... Let's just keep it as is and see what others think. > > I very much want to avoid having both explicit barriers in the caller > and barriers in the hal end() helper. I think it should be either or. > >>> + srng->u.dst_ring.cached_hp = hp; >>> + /* Make sure descriptor is read after the head >>> + * pointer. >>> + */ >>> + dma_rmb(); >>> + } >>> + } > > Johan