Re: [PATCH v8 10/12] blk-mq: use hk cpus only when isolcpus=io_queue is enabled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 12 2025 at 10:32, Daniel Wagner wrote:
> On Wed, Sep 10, 2025 at 10:20:26AM +0200, Thomas Gleixner wrote:
>> > The cpu_online_mask might change over time, it's not a static bitmap.
>> > Thus it's necessary to update the blk_hk_online_mask. Doing some sort of
>> > caching is certainly possible. Given that we have plenty of cpumask
>> > logic operation in the cpu_group_evenly code path later, I am not so
>> > sure this really makes a huge difference.
>> 
>> Sure,  but none of this is serialized against CPU hotplug operations. So
>> the resulting mask, which is handed into the spreading code can be
>> concurrently modified. IOW it's not as const as the code claims.
>
> Thanks for explaining.
>
> In group_cpu_evenly:
>
> 	/*
> 	 * Make a local cache of 'cpu_present_mask', so the two stages
> 	 * spread can observe consistent 'cpu_present_mask' without holding
> 	 * cpu hotplug lock, then we can reduce deadlock risk with cpu
> 	 * hotplug code.
> 	 *
> 	 * Here CPU hotplug may happen when reading `cpu_present_mask`, and
> 	 * we can live with the case because it only affects that hotplug
> 	 * CPU is handled in the 1st or 2nd stage, and either way is correct
> 	 * from API user viewpoint since 2-stage spread is sort of
> 	 * optimization.
> 	 */
> 	cpumask_copy(npresmsk, data_race(cpu_present_mask));

The present mask is very different from the online mask. The present
mask only changes on physical hotplug when:

     - a offline CPU is removed from the present set of CPUs

     - a offline CPU is added to it.

In neither case the CPU can be involved in any operation related to the
actual offline/online operations.

Also contrary to your approach, this code takes the possibility of
a concurrently changing mask into account by taking a racy snapshot,
which is immutable for the following operation.

What you are doing with that static mask, makes it a target of
concurrent modification, which is obviously a recipe for subtle bugs.

>   Turns out the two stage spread just needs consistent 'cpu_present_mask',
>   and remove the CPU hotplug lock by storing it into one local cache.  This
>   way doesn't change correctness, because all CPUs are still covered.
>
> This sounds like I should do something similar with cpu_online_mask.

Indeed.

Thanks,

        tglx





[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux