Re: Lockdep failure due to 'wierd' per-cpu wakeup_vcpus_on_cpu_lock lock

Maxim Levitsky <mlevitsk@xxxxxxxxxx> · Fri, 28 Mar 2025 19:02:39 -0400



On Thu, 2025-03-27 at 20:06 +0800, Yan Zhao wrote:
> On Fri, Mar 21, 2025 at 12:49:42PM +0100, Paolo Bonzini wrote:
> > On Wed, Mar 19, 2025 at 5:17 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > > Yan posted a patch to fudge around the issue[*], I strongly objected (and still
> > > object) to making a functional and confusing code change to fudge around a lockdep
> > > false positive.
> > 
> > In that thread I had made another suggestion, which Yan also tried,
> > which was to use subclasses:
> > 
> > - in the sched_out path, which cannot race with the others:
> >   raw_spin_lock_nested(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu), 1);
> > 
> > - in the irq and sched_in paths, which can race with each other:
> >   raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
> Hi Paolo, Sean, Maxim,
> 
> The sched_out path still may race with sched_in path. e.g.
>     CPU 0                 CPU 1
> -----------------     ---------------
> vCPU 0 sched_out
> vCPU 1 sched_in
> vCPU 1 sched_out      vCPU 0 sched_in
> 
> vCPU 0 sched_in may race with vCPU 1 sched_out on CPU 0's wakeup list.
> 
> 
> So, the situation is
> sched_in, sched_out: race
> sched_in, irq:       race
> sched_out, irq: mutual exclusive, do not race
> 
> 
> Hence, do you think below subclasses assignments reasonable?
> irq: subclass 0
> sched_out: subclass 1
> sched_in: subclasses 0 and 1
> 
> As inspired by Sean's solution, I made below patch to inform lockdep that the
> sched_in path involves both subclasses 0 and 1 by adding a line
> "spin_acquire(&spinlock->dep_map, 1, 0, _RET_IP_)".
> 
> I like it because it accurately conveys the situation to lockdep :)
> What are your thoughts?
> 
> Thanks
> Yan
> 
> diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
> index ec08fa3caf43..c5684225255a 100644
> --- a/arch/x86/kvm/vmx/posted_intr.c
> +++ b/arch/x86/kvm/vmx/posted_intr.c
> @@ -89,9 +89,12 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
>          * current pCPU if the task was migrated.
>          */
>         if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR) {
> -               raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
> +               raw_spinlock_t *spinlock = &per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu);
> +               raw_spin_lock(spinlock);
> +               spin_acquire(&spinlock->dep_map, 1, 0, _RET_IP_);
>                 list_del(&vmx->pi_wakeup_list);
> -               raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
> +               spin_release(&spinlock->dep_map, _RET_IP_);
> +               raw_spin_unlock(spinlock);
>         }
> 
>         dest = cpu_physical_id(cpu);
> @@ -152,7 +155,7 @@ static void pi_enable_wakeup_handler(struct kvm_vcpu *vcpu)
> 
>         local_irq_save(flags);
> 
> -       raw_spin_lock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
> +       raw_spin_lock_nested(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu), 1);
>         list_add_tail(&vmx->pi_wakeup_list,
>                       &per_cpu(wakeup_vcpus_on_cpu, vcpu->cpu));
>         raw_spin_unlock(&per_cpu(wakeup_vcpus_on_cpu_lock, vcpu->cpu));
> 
> 
I also agree that this is a good idea!
Best regards,
	Maxim Levitsky