On Mon, Aug 11, 2025 at 01:45:46PM -0700, Sean Christopherson wrote: > On Mon, Aug 11, 2025, Yury Norov wrote: > > Testing cpumask for a CPU to be cleared just before setting the exact > > same CPU is useless because the end result is always the same: CPU is > > set. > > No, it is not useless. Blindly writing to the variable will unnecessarily bounce > the cacheline, and this is a hot path. How hot is that path? How bad the cache contention is? Is there any evidence that conditional cpumask_set_cpu() worth the effort? The original patch doesn't discuss that at all, and without any comment the code looks just buggy. > > While there, switch CPU setter to a non-atomic version. Atomicity is > > useless here > > No, atomicity isn't useless here either. Dropping atomicity could result in > CPU's bit being lost. I.e. the atomic accesses aren't for the benefit of > smp_call_function_many_cond(), the writes are atomic so that multiple vCPUs can > concurrently update the mask without needing additional protection. OK, I see. Something heavy hit my head before I decided to drop atomicity there. > > because sev_writeback_caches() ends up with a plain > > for_each_cpu() loop in smp_call_function_many_cond(), which is not > > atomic by nature. > > That's fine. As noted in sev_writeback_caches(), if vCPU could be running, then > the caller is responsible for ensuring that all vCPUs flush caches before the > memory being reclaimed is fully freed. Those guarantees are provided by KVM's > MMU. > > sev_writeback_caches() => smp_call_function_many_cond() could hit false positives, > i.e. trigger WBINVD on CPUs that couldn't possibly have accessed the memory being > reclaimed, but such false positives are functionally benign, and are "intended" > in the sense that we chose to prioritize simplicity over precision. So, I don't object to drop the patch, but it would be really nice to have this if (!cpumask_test_cpu()) cpumask_set_cpu() pattern explained, and even better supported with performance numbers. Thanks, Yury