On Tue, Apr 22, 2025 at 04:02:07PM +0300, Ilpo Järvinen wrote: > When bifurcated to x2, Xeon 6 Root Port performance is sensitive to the > configuration of Extended Tags, Max Read Request Size (MRRS), and 10-Bit > Tag Requester (note: there is currently no 10-Bit Tag support in the > kernel). While those can be configured to the recommended values by FW, > kernel may decide to overwrite the initial values. > > Unfortunately, there is no mechanism for FW to indicate OS which parts > of PCIe configuration should not be altered. Thus, the only option is > to add such logic into the kernel as quirks. > > There is a pre-existing quirk flag to disable Extended Tags. Depending > on CONFIG_PCIE_BUS_* setting, MRRS may be overwritten by what the > kernel thinks is the best for performance (the largest supported > value), resulting in performance degradation instead with these Root > Ports. (There would have been a pre-existing quirk to disallow > increasing MRRS but it is not identical to rejecting >128B MRRS.) > > Add a quirk that disallows enabling Extended Tags and setting MRRS > larger than 128B for devices under Xeon 6 Root Ports if the Root Port is > bifurcated to x2. Reject >128B MRRS only when it is going to be written > by the kernel (this assumes FW configured a good initial value for MRRS > in case the kernel is not touching MRRS at all). > > It was first attempted to always write MRRS when the quirk is needed > (always overwrite the initial value). That turned out to be quite > invasive change, however, given the complexity of the initial setup > callchain and various stages returning early when they decide no changes > are necessary, requiring override each. As such, the initial value for > MRRS is now left into the hands of FW. > > Link: https://cdrdv2.intel.com/v1/dl/getContent/837176 > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> > --- > > v2: > - Explain in changelog why FW cannot solve this on its own > - Moved the quirk under arch/x86/pci/ > - Don't NULL check value from pci_find_host_bridge() > - Added comment above the quirk about the performance degradation > - Removed all setup chain 128B quirk overrides expect for MRRS write > itself (assumes a sane initial value is set by FW) > > arch/x86/pci/fixup.c | 30 ++++++++++++++++++++++++++++++ > drivers/pci/pci.c | 15 ++++++++------- > include/linux/pci.h | 1 + > 3 files changed, 39 insertions(+), 7 deletions(-) > > diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c > index efefeb82ab61..aa9617bc4b55 100644 > --- a/arch/x86/pci/fixup.c > +++ b/arch/x86/pci/fixup.c > @@ -294,6 +294,36 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PB1, pcie_r > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PC, pcie_rootport_aspm_quirk); > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MCH_PC1, pcie_rootport_aspm_quirk); > > +/* > + * PCIe devices underneath Xeon6 PCIe Root Port bifurcated to 2x have slower > + * performance with Extended Tags and MRRS > 128B. Workaround the performance > + * problems by disabling Extended Tags and limiting MRRS to 128B. > + * > + * https://cdrdv2.intel.com/v1/dl/getContent/837176 > + */ > +static void quirk_pcie2x_no_tags_no_mrrs(struct pci_dev *pdev) > +{ > + struct pci_host_bridge *bridge = pci_find_host_bridge(pdev->bus); > + u32 linkcap; > + > + pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &linkcap); > + if (FIELD_GET(PCI_EXP_LNKCAP_MLW, linkcap) != 0x2) > + return; > + > + bridge->no_ext_tags = 1; > + bridge->only_128b_mrrs = 1; My 2 cents here. Wouldn't it work if you hardcode MRRS to 128 in PCI_EXP_DEVCTL here and then set pci_host_bridge::no_inc_mrrs to 1? This would avoid introducing an extra flag and also serve the same purpose. - Mani -- மணிவண்ணன் சதாசிவம்