On Wed, Jun 04, 2025 at 01:32:08PM +0800, Miaoqing Pan wrote: > On 6/4/2025 10:34 AM, Miaoqing Pan wrote: > > On 6/3/2025 7:51 PM, Johan Hovold wrote: > >> On Tue, Jun 03, 2025 at 06:52:37PM +0800, Baochen Qiang wrote: > >>> On 6/2/2025 4:03 PM, Johan Hovold wrote: > >> > >>>> No, the barrier is needed between reading the head pointer and > >>>> accessing > >>>> descriptor fields, that's what matters. > >>>> > >>>> You can still end up with reading stale descriptor data even when > >>>> ath11k_hal_srng_dst_get_next_entry() returns non-NULL due to > >>>> speculation > >>>> (that's what happens on the X13s). > >>> > >>> The fact is that a dma_rmb() does not even prevent speculation, no > >>> matter where it is > >>> placed, right? > >> > >> It prevents the speculated load from being used. > >> > >>> If so the whole point of dma_rmb() is to prevent from compiler > >>> reordering > >>> or CPU reordering, but is it really possible? > >>> > >>> The sequence is > >>> > >>> 1# reading HP > >>> srng->u.dst_ring.cached_hp = READ_ONCE(*srng- > >>> >u.dst_ring.hp_addr); > >>> > >>> 2# validate HP > >>> if (srng->u.dst_ring.tp == srng->u.dst_ring.cached_hp) > >>> return NULL; > >>> > >>> 3# get desc > >>> desc = srng->ring_base_vaddr + srng->u.dst_ring.tp; > >>> > >>> 4# accessing desc > >>> ath11k_hal_desc_reo_parse_err(... desc, ...) > >>> > >>> Clearly each step depends on the results of previous steps. In this > >>> case the compiler/CPU > >>> is expected to be smart enough to not do any reordering, isn't it? > >> > >> Steps 3 and 4 can be done speculatively before the load in step 1 is > >> complete as long as the result is discarded if it turns out not to be > >> needed. > > If the condition in step 2 is true and step 3 speculatively loads > > descriptor from TP before step 1, could this cause issues? > > Sorry for typo, if the condition in step 2 is false and step 3 > speculatively loads descriptor from TP before step 1, could this cause > issues? Almost correct; the descriptor can be loaded (from TP) before the head pointer is loaded and thus before the condition in step 2 has been evaluated. And if the condition in step 2 later turns out to be false, step 4 may use stale data from before the head pointer was updated. Johan