Re: [PATCH 1/3] wifi: ath11k: fix dest ring-buffer corruption

Baochen Qiang <quic_bqiang@xxxxxxxxxxx> · Tue, 3 Jun 2025 18:52:37 +0800

On 6/2/2025 4:03 PM, Johan Hovold wrote:
> On Thu, May 29, 2025 at 03:03:38PM +0800, Miaoqing Pan wrote:
>> On 5/26/2025 7:48 PM, Johan Hovold wrote:
>>> Add the missing memory barriers to make sure that destination ring
>>> descriptors are read after the head pointers to avoid using stale data
>>> on weakly ordered architectures like aarch64.
> 
>>> @@ -3851,6 +3851,9 @@ int ath11k_dp_process_rx_err(struct ath11k_base *ab, struct napi_struct *napi,
>>>   
>>>   	ath11k_hal_srng_access_begin(ab, srng);
>>>   
>>> +	/* Make sure descriptor is read after the head pointer. */
>>> +	dma_rmb();
>>> +
>>
>> Thanks Johan, for continuing to follow up on this issue. I have some 
>> different opinions.
>>
>> This change somewhat deviates from the fix approach described in 
>> https://lore.kernel.org/all/20250321095219.19369-1-johan+linaro@xxxxxxxxxx/. 
>> In this case, the descriptor might be accessed before it is updated or 
>> while it is still being updated. Therefore, a dma_rmb() should be added 
>> after the call to ath11k_hal_srng_dst_get_next_entry() and before 
>> accessing ath11k_hal_ce_dst_status_get_length(), to ensure that the DMA 
>> has completed before reading the descriptor.
>>
>> However, in this patch, the memory barrier is used to protect the head 
>> pointer (HP). I don't think a memory barrier is necessary for HP, 
>> because even if an outdated HP is fetched, 
>> ath11k_hal_srng_dst_get_next_entry() will return NULL and exit safely. 
> 
> No, the barrier is needed between reading the head pointer and accessing
> descriptor fields, that's what matters.
> 
> You can still end up with reading stale descriptor data even when
> ath11k_hal_srng_dst_get_next_entry() returns non-NULL due to speculation
> (that's what happens on the X13s).

The fact is that a dma_rmb() does not even prevent speculation, no matter where it is
placed, right? If so the whole point of dma_rmb() is to prevent from compiler reordering
or CPU reordering, but is it really possible?

The sequence is

	1# reading HP
		srng->u.dst_ring.cached_hp = READ_ONCE(*srng->u.dst_ring.hp_addr);

	2# validate HP
		if (srng->u.dst_ring.tp == srng->u.dst_ring.cached_hp)
			return NULL;

	3# get desc
		desc = srng->ring_base_vaddr + srng->u.dst_ring.tp;

	4# accessing desc
		ath11k_hal_desc_reo_parse_err(... desc, ...)

Clearly each step depends on the results of previous steps. In this case the compiler/CPU
is expected to be smart enough to not do any reordering, isn't it?

> 
> Whether to place it before or after (or inside)
> ath11k_hal_srng_dst_get_next_entry() is a trade off between readability, 
> maintainability and whether we want to avoid unnecessary barriers in
> cases like the above where we strictly only need one barrier before the
> loop (or if we want to avoid the barrier in case the ring is ever
> empty).
> 
>> So, placing the memory barrier inside 
>> ath11k_hal_srng_dst_get_next_entry() would be more appropriate.
>>
>> @@ -678,6 +678,8 @@ u32 *ath11k_hal_srng_dst_get_next_entry(struct 
>> ath11k_base *ab,
>>          if (srng->flags & HAL_SRNG_FLAGS_CACHED)
>>                  ath11k_hal_srng_prefetch_desc(ab, srng);
>>
>> +       dma_rmb();
>> +
>>          return desc;
>>   }
> 
> So this will add a barrier in each iteration of the loop, but we only
> need a single one after reading the head pointer.
> 
> [ Also note that ath11k_hal_srng_dst_peek() would similarly need a
> barrier if we were to move them into those helpers. ]
> 
> Johan
>