pci_probe called concurrently in machine with 2 identical PCI devices causing race condition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi kernel community,

We have one specific problem related to Linux PCI subsystem.

We have a device with 2 identical NPUs, so 2 identical PCI devices sharing the same 3rd party driver. Our problem is that _pci_probe of this driver is called concurrently from 2 kernel threads. It happens more frequently when kernel debug logs are enabled in GRUB, appr. every 20th or 30th reboot of the device.

I am writing this mail because it's possible this is generic issue of Linux PCI subsystem which may affect more people/companies - please correct me if I am wrong.

When digging for this in driver's source and Linux kernel source, I found this place in pci_call_probe:

    if (cpu < nr_cpu_ids)
        error = work_on_cpu(cpu, local_pci_probe, &ddi);
    else
        error = local_pci_probe(&ddi);

This was added in 0b2c2a71 in 2017. Quoting part of commit message:

    PCI: Replace the racy recursion prevention

    pci_call_probe() can called recursively when a physcial function is probed
    and the probing creates virtual functions, which are populated via
    pci_bus_add_device() which in turn can end up calling pci_call_probe()
    again.
 <end of quote>

So the fix is specifically related to devices with multiple VFs. But does this take into account the setup with 2 separate, but otherwise identical PCI devices? Is it possible this can occur in any machine with 2 identical PCI devices?

Snippet from dmesg (unfortunately, I am not sure how much I can share):

[   76.586492] linux-kernel-bde (154): DO_NOT_COMMIT: in _pci_probe at 2627
[   76.586494] linux-kernel-bde (154): DO_NOT_COMMIT: ctrl addr before: 0000000000000000, _ndevices: 0
[   76.586497] linux-kernel-bde (154): DO_NOT_COMMIT: ctrl addr after: 00000000f24dc905, _ndevices: 0
[   76.595735] linux-kernel-bde (4688): DO_NOT_COMMIT: _devices at 00000000f24dc905, sizeof(*_devices): 472
[   76.603415] linux-kernel-bde (154): DO_NOT_COMMIT: ctrl->dev_type set to 256
[   76.628884] linux-kernel-bde (4688): DO_NOT_COMMIT: dev->device: 8854
[   76.644076] linux-kernel-bde (4688): DO_NOT_COMMIT: in _pci_probe at 2627
[   76.661176] linux-kernel-bde (4688): DO_NOT_COMMIT: ctrl addr before: 0000000000000000, _ndevices: 0
[   76.679854] linux-kernel-bde (4688): DO_NOT_COMMIT: ctrl addr after: 00000000f24dc905, _ndevices: 0

I checked sources of several drivers for various PCI devices, but none of them seem to assume probe callback can be called from multiple threads.
Output of uname -a:
Linux Dut-A 6.1.128-13-amd64 #1 SMP PREEMPT_DYNAMIC Thu Jun 12 07:22:21 UTC 2025 x86_64 GNU/Linux

Regards,
Jozef





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux