RE: pci_probe called concurrently in machine with 2 identical PCI devices causing race condition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Lukas,

The problem is that when this race occurs, the second NPU (PCI device) remains uninitialized in the kernel driver. And I don't think it's specific to the driver and device we are using, hence I am asking on this mailing list.

The driver keeps internal global array of initialized devices and their count. The working sequence is this:
 - call pci_probe for 1st NPU, store it at index 0 in the array, increment count
 - call pci_probe for second NPU, store it at index 1, increment count

What happens in erroneous case:
 - call pci_probe, store it at index 0
 - call pci_probe, store it at index 0 !!
 - increment the counter in first pci probe

In this case, datapath on top of these ASICs does not work, because it expects the driver to initialize both ASICs.

I know this can be fixed in the driver by proper locking and we have contacted the vendor. However, I think this can happen in any machine with 2 identical PCI devices, because as far as I know, existing PCI drivers usually do not assume that probe function can be called from multiple threads.

Thanks,
Jozef

-----Original Message-----
From: Lukas Wunner <lukas@xxxxxxxxx> 
Sent: Thursday, June 26, 2025 2:09 PM
To: Jozef Matejcik (Nokia) <jozef.matejcik@xxxxxxxxx>
Cc: linux-pci@xxxxxxxxxxxxxxx
Subject: Re: pci_probe called concurrently in machine with 2 identical PCI devices causing race condition

[You don't often get email from lukas@xxxxxxxxx. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.



On Thu, Jun 26, 2025 at 10:14:00AM +0000, Jozef Matejcik (Nokia) wrote:
> We have one specific problem related to Linux PCI subsystem.
>
> We have a device with 2 identical NPUs, so 2 identical PCI devices 
> sharing the same 3rd party driver. Our problem is that _pci_probe of 
> this driver is called concurrently from 2 kernel threads. It happens 
> more frequently when kernel debug logs are enabled in GRUB, appr.
> every 20th or 30th reboot of the device.

So what exactly is the "problem"?  Does something not work?
Do you get errors or warnings?

> So the fix is specifically related to devices with multiple VFs.
> But does this take into account the setup with 2 separate, but 
> otherwise identical PCI devices? Is it possible this can occur in any 
> machine with 2 identical PCI devices?

Not unless probing of one PF creates another PF.

Thanks,

Lukas





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux