Dear fwts folks,
I am looping you in to a discussion from last year. The thread is at
https://lore.kernel.org/r/a154f694-c48b-4b3b-809f-4b74ec86a924@xxxxxxxxxxxxx
Am 12.06.24 um 00:31 schrieb Bjorn Helgaas:
On Tue, Jun 11, 2024 at 11:05:31AM +0800, Yunsheng Lin wrote:
On 2024/6/11 4:27, Paul Menzel wrote:
Am 10.06.24 um 21:42 schrieb Bjorn Helgaas:
[+cc Yunsheng, thread at
https://lore.kernel.org/r/a154f694-c48b-4b3b-809f-4b74ec86a924@xxxxxxxxxxxxx]
On Sun, Jun 09, 2024 at 10:31:05AM +0200, Paul Menzel wrote:
On the servers below Linux warns:
Unknown NUMA node; performance will be reduced
This warning was added by ad5086108b9f ("PCI: Warn if no host bridge
NUMA node info"), which appeared in v5.5, so I assume this isn't new.
That commit log says:
In pci_call_probe(), we try to run driver probe functions on the node where
the device is attached. If we don't know which node the device is attached
to, the driver will likely run on the wrong node. This will still work,
but performance will not be as good as it could be.
On NUMA systems, warn if we don't know which node a PCI host bridge is
attached to. This is likely an indication that ACPI didn't supply a _PXM
method or the DT didn't supply a "numa-node-id" property.
I assume these are all ACPI systems, so likely missing _PXM.
An acpidump could confirm this.
I created an issue in the Linux Kernel Bugzilla [1] and attached
the output of `acpidump` on a Dell PowerEdge T630 there. The
DSDT contains:
Device (PCI1)
{
[…]
Method (_PXM, 0, NotSerialized) // _PXM: Device Proximity
{
If ((CLOD == 0x00))
{
Return (0x01)
}
Else
{
Return (0x02)
}
}
[…]
}
I think the devices on buses 7f and ff are Intel chipset devices, and
I doubt we have drivers for any of them. They have vendor/device IDs
of 8086:6fXX, and I didn't see any reference to them:
$ git grep -i \<0x6f..\>
$
Interesting. Any ideas, what these chipset devices do?
If we *did* have drivers, they would certainly benefit from having
_PXM, but since there are no probe methods, I don't think it matters
that we don't know where they should run.
Maybe the message should be downgraded from "dev_warn" to "dev_info"
since there's no functional problem, and the user can't really do
anything about it.
We could also consider moving it to the actual probe path, so we don't
emit a message unless there is an affected driver.
The problem seems to be how we decide if there is an affected driver?
do we care about the out-of-tree driver? doesn't the out-of-tree driver
suffer from the similar problem if BIOS is not providing the correct
numa info?
I don't care about out-of-tree drivers at all. This message is only a
hint about maybe not getting the absolute best possible performance
anyway.
The 'Unknown NUMA node; performance will be reduced' warning seems to
be added to give the vendor some pressure to fix the BIOS as fast as
possible, downgrading from "dev_warn" to "dev_info" or moving it to
the actual probe path does not seems to fix the problem, just alliviate
the pressure for vendor to fix the BIOS?
True, BIOS vendors *might* care about fixing a warning, and likely
wouldn't even notice a dev_info.
It's possible somebody could add a test case to the firmware test
suite (https://github.com/fwts/fwts.git). Not sure if vendors care
about that either.
Would it be possible that fwts checks for this in the ACPI tables. I
think fwts already highlights everything more severe than or equal to
warnings, doesn’t it. Maybe some explanation could be added? Especially,
as it’s not clear, that it’s most likely a firmware issue.
I suspect Linux users might care about the dev_warn because I suspect
it breaks the pretty graphical boot sequence.
As far as the Linux kernel, I think making it dev_info is enough.
Kind regards,
Paul