On Mon, Jul 21, 2025 at 07:49:10AM +0800, Keir Fraser wrote: > On Sun, Jul 20, 2025 at 08:08:30AM +0800, Yao Yuan wrote: > > On Sat, Jul 19, 2025 at 07:58:19AM +0000, Keir Fraser wrote: > > > On Sat, Jul 19, 2025 at 10:15:56AM +0800, Yao Yuan wrote: > > > > On Fri, Jul 18, 2025 at 08:00:17AM -0700, Sean Christopherson wrote: > > > > > On Thu, Jul 17, 2025, Yao Yuan wrote: > > > > > > On Wed, Jul 16, 2025 at 11:07:35AM +0800, Keir Fraser wrote: > > > > > > > In preparation to remove synchronize_srcu() from MMIO registration, > > > > > > > remove the distributor's dependency on this implicit barrier by > > > > > > > direct acquire-release synchronization on the flag write and its > > > > > > > lock-free check. > > > > > > > > > > > > > > Signed-off-by: Keir Fraser <keirf@xxxxxxxxxx> > > > > > > > --- > > > > > > > arch/arm64/kvm/vgic/vgic-init.c | 11 ++--------- > > > > > > > 1 file changed, 2 insertions(+), 9 deletions(-) > > > > > > > > > > > > > > diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c > > > > > > > index 502b65049703..bc83672e461b 100644 > > > > > > > --- a/arch/arm64/kvm/vgic/vgic-init.c > > > > > > > +++ b/arch/arm64/kvm/vgic/vgic-init.c > > > > > > > @@ -567,7 +567,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > > > > > > > gpa_t dist_base; > > > > > > > int ret = 0; > > > > > > > > > > > > > > - if (likely(dist->ready)) > > > > > > > + if (likely(smp_load_acquire(&dist->ready))) > > > > > > > return 0; > > > > > > > > > > > > > > mutex_lock(&kvm->slots_lock); > > > > > > > @@ -598,14 +598,7 @@ int kvm_vgic_map_resources(struct kvm *kvm) > > > > > > > goto out_slots; > > > > > > > } > > > > > > > > > > > > > > - /* > > > > > > > - * kvm_io_bus_register_dev() guarantees all readers see the new MMIO > > > > > > > - * registration before returning through synchronize_srcu(), which also > > > > > > > - * implies a full memory barrier. As such, marking the distributor as > > > > > > > - * 'ready' here is guaranteed to be ordered after all vCPUs having seen > > > > > > > - * a completely configured distributor. > > > > > > > - */ > > > > > > > - dist->ready = true; > > > > > > > + smp_store_release(&dist->ready, true); > > > > > > > > > > > > No need the store-release and load-acquire for replacing > > > > > > synchronize_srcu_expedited() w/ call_srcu() IIUC: > > > > > > > > > > This isn't about using call_srcu(), because it's not actually about kvm->buses. > > > > > This code is concerned with ensuring that all stores to kvm->arch.vgic are ordered > > > > > before the store to set kvm->arch.vgic.ready, so that vCPUs never see "ready==true" > > > > > with a half-baked distributor. > > > > > > > > > > In the current code, kvm_vgic_map_resources() relies on the synchronize_srcu() in > > > > > kvm_io_bus_register_dev() to provide the ordering guarantees. Switching to > > > > > smp_store_release() + smp_load_acquire() removes the dependency on the > > > > > synchronize_srcu() so that the synchronize_srcu() call can be safely removed. > > > > > > > > Yes, I understand this and agree with your point. > > > > > > > > Just for discusstion: I thought it should also work even w/o > > > > introduce the load acqure + store release after switch to > > > > call_srcu(): The smp_mb() in call_srcu() order the all store > > > > to kvm->arch.vgic before store kvm->arch.vgic.ready in > > > > current implementation. > > > > > > The load-acquire would still be required, to ensure that accesses to > > > kvm->arch.vgic do not get reordered earlier than the lock-free check > > > of kvm->arch.vgic.ready. Otherwise that CPU could see that the vgic is > > > initialised, but then use speculated reads of uninitialised vgic state. > > > > > > > Thanks for your explanation. > > > > I see. But there's "mutex_lock(&kvm->slot_lock);" before later > > acccessing to the kvm->arch.vgic, so I think the order can be > > guaranteed. Of cause as you said a explicitly acquire-load + > > store-release is better than before implicitly implementation. > > If vgic_dist::ready is observed true by the lock-free read (the one > which is turned into load-acquire by this patch) then the function > immediately returns with no mutex_lock() executed. It is reads of > vgic_dist *after* return from kvm_vgic_map_resources() that you have > to worry about, and which require load-acquire semantics. I think this is the main purpose of such lock-free reading here, to avoid lock contention on VM w/ large vCPUs for vcpus' first time run together. store-release makes sure the changes to vgic_dist::ready become visible after the changes to vgic_dist become visible, but it doesn't guarantee the vgic_dist::ready becomes visible to reader on aother CPU **IMMEDIATELY**, thus load-acquire in reader side is request for this. Is above understanding correct ? > > > > > > > >