Re: [BUG] Thunderbolt eGPU PCI BARs incorrectly assigned, fails to assign memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 1 Sep 2025, Steve Oswald wrote:

> I've added the dmesg output fordyndbg="file drivers/pci/*. I wasn't
> sure if I added it with the escaped quotes correctly.
> https://gist.github.com/stepeos/cd060c7d66ab195f51ab4d5675b4e4af/raw/9cf5fc3a8c4f13588a33d61865f804f85e50470a/dmesg_linux_6.11.0_dyndbg.log

Hi again,

I think the patch below should solve your issue. If you manage to get it 
tested and are confident enough it fixed your issue, you may provide your 
Tested-by tag so I can include it into the official submission of the fix.


From 1c8ef31c6ac6616869b447a473e879b233df62db Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= <ilpo.jarvinen@xxxxxxxxxxxxxxx>
Date: Wed, 3 Sep 2025 15:53:48 +0300
Subject: [PATCH 1/1] PCI: Prevent shrinking bridge window from its required
 size
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

pci_bridge_distribute_available_resources() -> ... ->
adjust_bridge_window() is called in between __pci_bus_size_bridges()
and assigning the resources. Since the commit 948675736a77 ("PCI: Allow
adjust_bridge_window() to shrink resource if necessary")
adjust_bridge_window() can also shrink the bridge window to force it to
a smaller size. The shrunken size, however, conflicts what
__pci_bus_size_bridges() -> pbus_size_mem() calculated as the required
bridge window size. By shrinking the resource size,
adjust_bridge_window() prevents rest of the resource fitting algorithm
working as intended. Resource fitting logic is expecting assignment
failures when bridge windows need resizing, but there are cases where
failures are no longer happening after the commit 948675736a77 ("PCI:
Allow adjust_bridge_window() to shrink resource if necessary").

The commit 948675736a77 ("PCI: Allow adjust_bridge_window() to shrink
resource if necessary") justifies the change by the extra reservation
made due to hpmemsize parameter, however, the kernel code contradicts
with that statement. (For simplicity, finer-grained hpmmiosize and
hpmmiopref parameters that can be used to the same effect as hpmemsize
are ignored in this description.)

pbus_size_mem() calls calculate_memsize() twice. First with add_size=0
to find out the minimal required resource size. The second call occurs
with add_size=hpmemsize (effectively) but the result does not directly
affect the resource size only resulting in an entry on the realloc_head
list (a.k.a. add_list). Yet, adjust_bridge_window() directly changes
the resource size which does not include what is reserved due to
hpmemsize. Also, if the required size for the bridge window exceeds
hpmemsize, the parameter does not have any effect even on the second
size calculcation made by pbus_size_mem(); from calculate_memsize():

  size = max(size, add_size) + children_add_size;

The commit ae4611f1d7e9 ("PCI: Set resource size directly in
adjust_bridge_window()") that precedes the commit 948675736a77 ("PCI:
Allow adjust_bridge_window() to shrink resource if necessary") is also
related to causing this problem. Its changelog explicitly states
adjust_bridge_window() wants to "guarantee" allocation success.
Guaranteed allocations, however, are incompatible with how the other
parts of the resource fitting algorithm work. The given justification
fails to explain why guaranteed allocations at this stage are required
nor why forcing window to a smaller value than what was calculated by
pbus_size_mem() is correct. The the change might have worked by chance
in some test scenario, too small bridge window does not "guarantee"
success from the point of view of the endpoint device resource
assignments. No issue is mentioned within the changelog so it's unclear
if the change was made to fix some observed issue nor and what that
issue was.

The unwanted shrinking of a bridge window occurs, e.g., when a device
with large BARs such as eGPU is attached using Thunderbolt and the Root
Port holds less than enough resource space for the eGPU. The GPU
resources are in order of GBs and the default hotplug allocation is
mere 2M (DEFAULT_HOTPLUG_MMIO_PREF_SIZE). The problem is illustrated by
this log (filtered to the relevant content only):

pci 0000:00:07.0: PCI bridge to [bus 03-2c]
pci 0000:00:07.0:   bridge window [mem 0x6000000000-0x601bffffff 64bit pref]
pci 0000:03:00.0: PCI bridge to [bus 00]
pci 0000:03:00.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
pci 0000:03:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:03:00.0: PCI bridge to [bus 04-2c]
pcieport 0000:00:07.0: Assigned bridge window [mem 0x6000000000-0x601bffffff 64bit pref] to [bus 03-2c] cannot fit 0xc00000000 required for 0000:03:00.0 bridging to [bus 04-2c]
pci 0000:03:00.0: bridge window [mem 0x800000000-0x10003fffff 64bit pref] to [bus 04-2c] add_size 100000 add_align 100000
pcieport 0000:00:07.0: distributing available resources
pci 0000:03:00.0: bridge window [mem 0x800000000-0x10003fffff 64bit pref] shrunken by 0x00000007e4400000
pci 0000:03:00.0: bridge window [mem 0x6000000000-0x601bffffff 64bit pref]: assigned

The initial size of the Root Port's window is 448MB (0x601bffffff -
0x6000000000). __pci_bus_size_bridges() -> pbus_size_mem() calculates
the required size to be 32772 MB (0x10003fffff - 0x800000000) which
would fit the eGPU resources. adjust_bridge_window() then shrinks the
bridge window down to what is guaranteed to fit into the Root Port's
bridge window. The bridge window for 03:00.0 is also eliminated from
the add_list (a.k.a. realloc_head) list by adjust_bridge_window().

After adjustment, the resources are assigned and as the bridge window
for 03:00.0 is assigned successfully, no failure is recorded. Without a
failure, no attempt to resize the window of the Root Port is required.
The end result is eGPU not having large enough resources to work.

The commit 948675736a77 ("PCI: Allow adjust_bridge_window() to shrink
resource if necessary") also claims nested bridge windows are sized the
same, which is false. pbus_size_mem() calculates the size for the
parent bridge window by summing all the downstream resources so the
resource fitting calculates larger bridge window for the parent to
accomodate the childen. That is, hpmemsize does not result the same
size for the case where there are nested bridge windows.

In order to fix the most immediate problem, don't shrink the resource
size as hpmemsize had nothing to do with it. When considering add_size,
only reduce it up to what is added due to hpmemsize (if required size
is larger than hpmemsize, the parameter has no impact, see
calculate_memsize()).

This is not exactly a revert of the commits e4611f1d7e9 ("PCI: Set
resource size directly in adjust_bridge_window()") and 948675736a77
("PCI: Allow adjust_bridge_window() to shrink resource if necessary")
as shrinking still remains in place but is implemented differently,
and the end result behaves very differently.

It is possible that those two commits fixed some other issue that is
not described with enough detail in the changelog and undoing parts of
them results in another regression due to behavioral change.
Nonetheless, as described above, the solution by those two commits was
flawed and the issue, if one exists, should be solved in a way that is
compatible with the rest of the resource fitting algorithm instead of
working against it.

Besides shrinking, the case where adjust_bridge_window() expands the
bridge window is likely somewhat wrong as well because it removes the
entry from add_list (a.k.a. realloc_head), but it is less damaging as
that only impacts optional resources and may have no impact if
expanding by hpmemsize is larger than what add_size was. Fixing it is
left as further work.

Reported-by: Steve Oswald <stevepeter.oswald@xxxxxxxxx>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx>
---
 drivers/pci/setup-bus.c | 42 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 40 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 23082bc0ca37..4dd618bc4196 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1828,6 +1828,7 @@ static void adjust_bridge_window(struct pci_dev *bridge, struct resource *res,
 				 resource_size_t new_size)
 {
 	resource_size_t add_size, size = resource_size(res);
+	struct pci_dev_resource *dev_res;
 
 	if (res->parent)
 		return;
@@ -1840,9 +1841,46 @@ static void adjust_bridge_window(struct pci_dev *bridge, struct resource *res,
 		pci_dbg(bridge, "bridge window %pR extended by %pa\n", res,
 			&add_size);
 	} else if (new_size < size) {
+		int idx = pci_resource_num(bridge, res);
+
+		/*
+		 * hpio/mmio/mmioprefsize hasn't been included at all? See the
+		 * add_size param at the callsites of calculate_memsize().
+		 */
+		if (!add_list)
+			return;
+
+		/* Only shrink if the hotplug extra relates to window size. */
+		switch (idx) {
+			case PCI_BRIDGE_IO_WINDOW:
+				if (size > pci_hotplug_io_size)
+					return;
+				break;
+			case PCI_BRIDGE_MEM_WINDOW:
+				if (size > pci_hotplug_mmio_size)
+					return;
+				break;
+			case PCI_BRIDGE_PREF_MEM_WINDOW:
+				if (size > pci_hotplug_mmio_pref_size)
+					return;
+				break;
+			default:
+				break;
+		}
+
+		dev_res = res_to_dev_res(add_list, res);
 		add_size = size - new_size;
-		pci_dbg(bridge, "bridge window %pR shrunken by %pa\n", res,
-			&add_size);
+		if (add_size < dev_res->add_size) {
+			dev_res->add_size -= add_size;
+			pci_dbg(bridge, "bridge window %pR optional size shrunken by %pa\n",
+				res, &add_size);
+		} else {
+			pci_dbg(bridge, "bridge window %pR optional size removed\n",
+				res);
+			remove_from_list(add_list, res);
+		}
+		return;
+
 	} else {
 		return;
 	}

base-commit: f6d41443f54856ceece0d5b584f47f681513bde4
-- 
2.39.5

[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux