On Fri, Sep 12 2025 at 10:32, Daniel Wagner wrote: > On Wed, Sep 10, 2025 at 10:20:26AM +0200, Thomas Gleixner wrote: >> > The cpu_online_mask might change over time, it's not a static bitmap. >> > Thus it's necessary to update the blk_hk_online_mask. Doing some sort of >> > caching is certainly possible. Given that we have plenty of cpumask >> > logic operation in the cpu_group_evenly code path later, I am not so >> > sure this really makes a huge difference. >> >> Sure, but none of this is serialized against CPU hotplug operations. So >> the resulting mask, which is handed into the spreading code can be >> concurrently modified. IOW it's not as const as the code claims. > > Thanks for explaining. > > In group_cpu_evenly: > > /* > * Make a local cache of 'cpu_present_mask', so the two stages > * spread can observe consistent 'cpu_present_mask' without holding > * cpu hotplug lock, then we can reduce deadlock risk with cpu > * hotplug code. > * > * Here CPU hotplug may happen when reading `cpu_present_mask`, and > * we can live with the case because it only affects that hotplug > * CPU is handled in the 1st or 2nd stage, and either way is correct > * from API user viewpoint since 2-stage spread is sort of > * optimization. > */ > cpumask_copy(npresmsk, data_race(cpu_present_mask)); The present mask is very different from the online mask. The present mask only changes on physical hotplug when: - a offline CPU is removed from the present set of CPUs - a offline CPU is added to it. In neither case the CPU can be involved in any operation related to the actual offline/online operations. Also contrary to your approach, this code takes the possibility of a concurrently changing mask into account by taking a racy snapshot, which is immutable for the following operation. What you are doing with that static mask, makes it a target of concurrent modification, which is obviously a recipe for subtle bugs. > Turns out the two stage spread just needs consistent 'cpu_present_mask', > and remove the CPU hotplug lock by storing it into one local cache. This > way doesn't change correctness, because all CPUs are still covered. > > This sounds like I should do something similar with cpu_online_mask. Indeed. Thanks, tglx