Re: [PATCH 3/4] x86: hyperv: Expose hv_map_msi_interrupt function

Nuno Das Neves <nunodasneves@xxxxxxxxxxxxxxxxxxx> · Mon, 23 Jun 2025 15:13:49 -0700

On 6/20/2025 9:19 AM, Thomas Gleixner wrote:
> On Wed, Jun 18 2025 at 14:08, Nuno Das Neves wrote:
>> On 6/11/2025 4:07 PM, Michael Kelley wrote:
>>> From: Nuno Das Neves <nunodasneves@xxxxxxxxxxxxxxxxxxx> Sent: Tuesday, June 10, 2025 4:52 PM
>>>> +/**
>>>> + * hv_map_msi_interrupt() - "Map" the MSI IRQ in the hypervisor.
>>>> + * @data:      Describes the IRQ
>>>> + * @out_entry: Hypervisor (MSI) interrupt entry (can be NULL)
>>>> + *
>>>> + * Map the IRQ in the hypervisor by issuing a MAP_DEVICE_INTERRUPT hypercall.
>>>> + */
>>>> +int hv_map_msi_interrupt(struct irq_data *data,
>>>> +			 struct hv_interrupt_entry *out_entry)
>>>>  {
>>>> -	union hv_device_id device_id = hv_build_pci_dev_id(dev);
>>>> +	struct msi_desc *msidesc;
>>>> +	struct pci_dev *dev;
>>>> +	union hv_device_id device_id;
>>>> +	struct hv_interrupt_entry dummy;
>>>> +	struct irq_cfg *cfg = irqd_cfg(data);
>>>> +	const cpumask_t *affinity;
>>>> +	int cpu;
>>>> +	u64 res;
> 
> https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#variable-declarations
> 
>>>>
>>>> -	return hv_map_interrupt(device_id, false, cpu, vector, entry);
>>>> +	msidesc = irq_data_get_msi_desc(data);
>>>> +	dev = msi_desc_to_pci_dev(msidesc);
>>>> +	device_id = hv_build_pci_dev_id(dev);
>>>> +	affinity = irq_data_get_effective_affinity_mask(data);
>>>> +	cpu = cpumask_first_and(affinity, cpu_online_mask);
>>>
>>> Is the cpus_read_lock held at this point? I'm not sure what the
>>> overall calling sequence looks like. If it is not held, the CPU that
>>> is selected could go offline before hv_map_interrupt() is called.
>>> This computation of the target CPU is the same as in the code
>>> before this patch, but that existing code looks like it has the
>>> same problem.
>>>
>>
>> Thanks for pointing it out - It *looks* like the read lock is not held
>> everywhere this could be called, so it could indeed be a problem.
>>
>> I've been thinking about different ways around this but I lack the
>> knowledge to have an informed opinion about it:
>>
>> - We could take the cpu read lock in this function, would that work?
> 
> Obviously not.
> 
>> - I'm not actually sure why the code is getting the first cpu off the effective
>>   affinity mask in the first place. It is possible to get the apic id (and hence
>>   the cpu) already associated with the irq, as per e.g. x86_vector_msi_compose_msg()
>>   Maybe we could get the cpu that way, assuming that doesn't have a
>>   similar issue.
> 
> There is no reason to fiddle in the underlying low level data. The
> effective affinity mask is there for a reason.
> 
>> - We could just let this race happen, maybe the outcome isn't too catastrophic?
> 
> Let's terminate guesswork mode and look at the facts.
> 
> The point is that hv_map_msi_interrupt() is called from:
> 
>     1) hv_irq_compose_msi_msg()
> 
>     2) hv_arch_irq_unmask() (in patch 4/4)
> 
> Both functions are interrupt chip callbacks and invoked with the
> interrupt descriptor lock held.
> 
> At the point where they are called, the effective affinity mask is valid
> and immutable. Nothing can modify it as any modification requires the
> interrupt descriptor lock to be held. This applies to the CPU hotplug
> machinery too. So this AND cpu_online_mask is a complete pointless
> voodoo exercise.
> 

Thanks for explaining.

> Just use:
> 
>      cpu = cpumask_first(irq_data_get_effective_affinity_mask(data));
> 
> and be done with it.
> 
> Please fix that first with a seperate patch before moving this code
> around.

Will do!

Nuno

> 
> Thanks,
> 
>         tglx