On 6/2/2025 4:03 PM, Johan Hovold wrote: > On Thu, May 29, 2025 at 03:03:38PM +0800, Miaoqing Pan wrote: >> On 5/26/2025 7:48 PM, Johan Hovold wrote: >>> Add the missing memory barriers to make sure that destination ring >>> descriptors are read after the head pointers to avoid using stale data >>> on weakly ordered architectures like aarch64. > >>> @@ -3851,6 +3851,9 @@ int ath11k_dp_process_rx_err(struct ath11k_base *ab, struct napi_struct *napi, >>> >>> ath11k_hal_srng_access_begin(ab, srng); >>> >>> + /* Make sure descriptor is read after the head pointer. */ >>> + dma_rmb(); >>> + >> >> Thanks Johan, for continuing to follow up on this issue. I have some >> different opinions. >> >> This change somewhat deviates from the fix approach described in >> https://lore.kernel.org/all/20250321095219.19369-1-johan+linaro@xxxxxxxxxx/. >> In this case, the descriptor might be accessed before it is updated or >> while it is still being updated. Therefore, a dma_rmb() should be added >> after the call to ath11k_hal_srng_dst_get_next_entry() and before >> accessing ath11k_hal_ce_dst_status_get_length(), to ensure that the DMA >> has completed before reading the descriptor. >> >> However, in this patch, the memory barrier is used to protect the head >> pointer (HP). I don't think a memory barrier is necessary for HP, >> because even if an outdated HP is fetched, >> ath11k_hal_srng_dst_get_next_entry() will return NULL and exit safely. > > No, the barrier is needed between reading the head pointer and accessing > descriptor fields, that's what matters. > > You can still end up with reading stale descriptor data even when > ath11k_hal_srng_dst_get_next_entry() returns non-NULL due to speculation > (that's what happens on the X13s). The fact is that a dma_rmb() does not even prevent speculation, no matter where it is placed, right? If so the whole point of dma_rmb() is to prevent from compiler reordering or CPU reordering, but is it really possible? The sequence is 1# reading HP srng->u.dst_ring.cached_hp = READ_ONCE(*srng->u.dst_ring.hp_addr); 2# validate HP if (srng->u.dst_ring.tp == srng->u.dst_ring.cached_hp) return NULL; 3# get desc desc = srng->ring_base_vaddr + srng->u.dst_ring.tp; 4# accessing desc ath11k_hal_desc_reo_parse_err(... desc, ...) Clearly each step depends on the results of previous steps. In this case the compiler/CPU is expected to be smart enough to not do any reordering, isn't it? > > Whether to place it before or after (or inside) > ath11k_hal_srng_dst_get_next_entry() is a trade off between readability, > maintainability and whether we want to avoid unnecessary barriers in > cases like the above where we strictly only need one barrier before the > loop (or if we want to avoid the barrier in case the ring is ever > empty). > >> So, placing the memory barrier inside >> ath11k_hal_srng_dst_get_next_entry() would be more appropriate. >> >> @@ -678,6 +678,8 @@ u32 *ath11k_hal_srng_dst_get_next_entry(struct >> ath11k_base *ab, >> if (srng->flags & HAL_SRNG_FLAGS_CACHED) >> ath11k_hal_srng_prefetch_desc(ab, srng); >> >> + dma_rmb(); >> + >> return desc; >> } > > So this will add a barrier in each iteration of the loop, but we only > need a single one after reading the head pointer. > > [ Also note that ath11k_hal_srng_dst_peek() would similarly need a > barrier if we were to move them into those helpers. ] > > Johan >