On 6/16/2025 5:29 PM, Praneesh P wrote: > > On 6/5/2025 4:19 PM, Baochen Qiang wrote: >> >> On 6/5/2025 6:00 PM, Johan Hovold wrote: >>> On Thu, Jun 05, 2025 at 04:41:32PM +0800, Baochen Qiang wrote: >>>> On 6/4/2025 10:45 PM, Johan Hovold wrote: >>>>> Add the missing memory barrier to make sure that destination ring >>>>> descriptors are read after the head pointers to avoid using stale data >>>>> on weakly ordered architectures like aarch64. >>>>> >>>>> The barrier is added to the ath12k_hal_srng_access_begin() helper for >>>>> symmetry with follow-on fixes for source ring buffer corruption which >>>>> will add barriers to ath12k_hal_srng_access_end(). >>>>> >>>>> Note that this may fix the empty descriptor issue recently worked around >>>>> by commit 51ad34a47e9f ("wifi: ath12k: Add drop descriptor handling for >>>>> monitor ring"). >>>> why? I would expect drunk cookies are valid in case of HAL_MON_DEST_INFO0_EMPTY_DESC, >>>> rather than anything caused by reordering. >>> Based on a quick look it seemed like this could possibly fall in the >>> same category as some of the other workarounds I've spotted while >>> looking into these ordering issues (e.g. f9fff67d2d7c ("wifi: ath11k: >>> Fix SKB corruption in REO destination ring")). >>> >>> If you say this one is clearly unrelated, I'll drop the comment. >> Praneesh, could you comment here since you made that change? > Empty/Drop descriptor is intentionally issued by the hardware during backpressure scenario > and is unrelated to the issue discussed in this patch series. Thanks Praneesh. Johan, according to that, please drop the comment. >>>>> @@ -343,9 +343,6 @@ static int ath12k_ce_completed_recv_next(struct ath12k_ce_pipe >>>>> *pipe, >>>>> goto err; >>>>> } >>>>> - /* Make sure descriptor is read after the head pointer. */ >>>>> - dma_rmb(); >>>>> - >>>>> *nbytes = ath12k_hal_ce_dst_status_get_length(desc); >>>>> *skb = pipe->dest_ring->skb[sw_index]; >>>>> diff --git a/drivers/net/wireless/ath/ath12k/hal.c b/drivers/net/wireless/ath/ath12k/ >>>>> hal.c >>>>> index 91d5126ca149..9eea13ed5565 100644 >>>>> --- a/drivers/net/wireless/ath/ath12k/hal.c >>>>> +++ b/drivers/net/wireless/ath/ath12k/hal.c >>>>> @@ -2126,13 +2126,24 @@ void *ath12k_hal_srng_src_get_next_reaped(struct ath12k_base >>>>> *ab, >>>>> void ath12k_hal_srng_access_begin(struct ath12k_base *ab, struct hal_srng *srng) >>>>> { >>>>> + u32 hp; >>>>> + >>>>> lockdep_assert_held(&srng->lock); >>>>> - if (srng->ring_dir == HAL_SRNG_DIR_SRC) >>>>> + if (srng->ring_dir == HAL_SRNG_DIR_SRC) { >>>>> srng->u.src_ring.cached_tp = >>>>> *(volatile u32 *)srng->u.src_ring.tp_addr; >>>>> - else >>>>> - srng->u.dst_ring.cached_hp = READ_ONCE(*srng->u.dst_ring.hp_addr); >>>>> + } else { >>>>> + hp = READ_ONCE(*srng->u.dst_ring.hp_addr); >>>>> + >>>>> + if (hp != srng->u.dst_ring.cached_hp) { >>>> This consumes additional CPU cycles in hot path, which is a concern to me. >>>> >>>> Based on that, I prefer the v1 implementation. >>> The conditional avoids a memory barrier in case the ring is empty, so >>> for all callers but ath12k_ce_completed_recv_next() it's an improvement >>> over v1 in that sense. >>> >>> I could make the barrier unconditional, which will only add one barrier >>> to ath12k_ce_completed_recv_next() in case the ring is empty compared to >>> v1. Perhaps that's a good compromise if you worry about the extra >>> comparison? >> I guess the unconditional barrier also has impact on performance? If so I am not sure >> which one is better then ... >> >> Let's just keep it as is and see what others think. >> >>> I very much want to avoid having both explicit barriers in the caller >>> and barriers in the hal end() helper. I think it should be either or. >>> >>>>> + srng->u.dst_ring.cached_hp = hp; >>>>> + /* Make sure descriptor is read after the head >>>>> + * pointer. >>>>> + */ >>>>> + dma_rmb(); >>>>> + } >>>>> + } >>> Johan >>