On Tue, Jun 03, 2025 at 06:52:37PM +0800, Baochen Qiang wrote: > On 6/2/2025 4:03 PM, Johan Hovold wrote: > > No, the barrier is needed between reading the head pointer and accessing > > descriptor fields, that's what matters. > > > > You can still end up with reading stale descriptor data even when > > ath11k_hal_srng_dst_get_next_entry() returns non-NULL due to speculation > > (that's what happens on the X13s). > > The fact is that a dma_rmb() does not even prevent speculation, no matter where it is > placed, right? It prevents the speculated load from being used. > If so the whole point of dma_rmb() is to prevent from compiler reordering > or CPU reordering, but is it really possible? > > The sequence is > > 1# reading HP > srng->u.dst_ring.cached_hp = READ_ONCE(*srng->u.dst_ring.hp_addr); > > 2# validate HP > if (srng->u.dst_ring.tp == srng->u.dst_ring.cached_hp) > return NULL; > > 3# get desc > desc = srng->ring_base_vaddr + srng->u.dst_ring.tp; > > 4# accessing desc > ath11k_hal_desc_reo_parse_err(... desc, ...) > > Clearly each step depends on the results of previous steps. In this case the compiler/CPU > is expected to be smart enough to not do any reordering, isn't it? Steps 3 and 4 can be done speculatively before the load in step 1 is complete as long as the result is discarded if it turns out not to be needed. Johan