On Fri, Jun 13, 2025 at 02:31:10PM -0500, Mario Limonciello wrote: > On 6/13/2025 2:07 PM, Bjorn Helgaas wrote: > > On Thu, Jun 12, 2025 at 10:12:14PM -0500, Mario Limonciello wrote: > > > From: Mario Limonciello <mario.limonciello@xxxxxxx> > > > > > > On an A+N mobile system the APU is not being selected by some desktop > > > environments for any rendering tasks. This is because the neither GPU > > > is being treated as "boot_vga" but that is what some environments use > > > to select a GPU [1]. > > > > What is "A+N" and "APU"? > > A+N is meant to refer to an AMD APU + NVIDIA dGPU. > APU is an SoC that contains a lot more IP than just x86 cores. In this > context it contains both integrated graphics and display IP. So I guess "APU is not being selected" refers to the AMD APU? > > I didn't quite follow the second sentence. I guess you're saying some > > userspace environments use the "boot_vga" sysfs file to select a GPU? > > And on this A+N system, neither device has the file? > > Yeah. Specifically this problem happens in Xorg that the library it uses > (libpciaccess) looks for a boot_vga file. That file isn't found on either > GPU and so Xorg picks the first GPU it finds in the PCI heirarchy which > happens to be the NVIDIA GPU. > > > > The VGA arbiter driver only looks at devices that report as PCI display > > > VGA class devices. Neither GPU on the system is a display VGA class > > > device: > > > > > > c5:00.0 3D controller: NVIDIA Corporation Device 2db9 (rev a1) > > > c6:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 150e (rev d1) > > > > > > So neither device gets the boot_vga sysfs file. The vga_is_boot_device() > > > function already has some handling for integrated GPUs by looking at the > > > ACPI HID entries, so if all PCI display class devices are looked at it > > > actually can detect properly with this device ordering. However if there > > > is a different ordering it could flag the wrong device. > > > > > > Modify the VGA arbiter code and matching sysfs file entries to examine all > > > PCI display class devices. After every device is added to the arbiter list > > > make a pass on all devices and explicitly check whether one is integrated. > > > > > > This will cause all GPUs to gain a `boot_vga` file, but the correct device > > > (APU in this case) will now show `1` and the incorrect device shows `0`. > > > Userspace then picks the right device as well. > > > > What determines whether the APU is the "correct" device? > > In this context is the one that is physically connected to the eDP panel. > When the "wrong" one is selected then it is used for rendering. How does the code figure out which is connected to the eDP panel? I didn't see anything in the patch that I can relate to this. This needs to be something people who are not AMD and NVIDIA experts can figure out in five years. It feels like we're fixing a point problem and next month's system might have the opposite issue, and we won't know how to make both systems work. > Without this patch the net outcome ends up being that the APU display > hardware drives the eDP but the desktop is rendered using the N dGPU. There > is a lot of latency in doing it this way, and worse off the N dGPU stays > powered on at all times. It never enters into runtime PM. > > > +{ > > > + struct pci_dev *candidate = vga_default_device(); > > > + struct vga_device *vgadev; > > > + > > > + list_for_each_entry(vgadev, &vga_list, list) { > > > + if (vga_is_boot_device(vgadev)) { > > > + /* check if one is an integrated GPU, use that if so */ > > > + if (candidate) { > > > + if (vga_arb_integrated_gpu(&candidate->dev)) > > > + break; > > > + if (vga_arb_integrated_gpu(&vgadev->pdev->dev)) { > > > + candidate = vgadev->pdev; > > > + break; > > > + } > > > + } else > > > + candidate = vgadev->pdev; > > > + } > > > + } > > > + > > > + if (candidate) > > > + vga_set_default_device(candidate); > > > +} > > > > It looks like this is related to the integrated GPU code in > > vga_is_boot_device(). Can this be done by updating the logic there, > > so it's more clear what change this patch makes? > > > > It seems like this patch would change the selection in a way that > > makes some of the vga_is_boot_device() comments obsolete. E.g., "We > > select the default VGA device in this order"? Or "we use the *last* > > [integrated GPU] because that was the previous behavior"? > > > > The end of vga_is_boot_device() mentions non-legacy (non-VGA) devices, > > but I don't remember now how we ever got there because > > vga_arb_device_init() and pci_notify() only call > > vga_arbiter_add_pci_device() for VGA devices. > > Sure I'll review the comments and update. As for moving the logic I didn't > see an obvious way to do this. This code is "tie-breaker" code in the case > that two GPUs are otherwise ranked equally. How do we break the tie? I guess we use the first one we find? The comment in vga_is_boot_device() says we expect only a single integrated GPU, but I guess this system breaks that assumption? And in the absence of other clues, vga_is_boot_device() decides that every integrated GPU it finds should be the default, so the last one wins? But now we want the first one to win? > > > - while ((pdev = > > > - pci_get_subsys(PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, > > > - PCI_ANY_ID, pdev)) != NULL) { > > > - if (pci_is_vga(pdev)) > > > - vga_arbiter_add_pci_device(pdev); > > > - } > > > + while ((pdev = pci_get_base_class(PCI_BASE_CLASS_DISPLAY, pdev))) > > > + vga_arbiter_add_pci_device(pdev); > > > > I guess pci_get_base_class(PCI_BASE_CLASS_DISPLAY) is sort of a source > > code optimization and this one-line change would be equivalent? > > > > - if (pci_is_vga(pdev)) > > + if (pci_is_display(pdev)) > > vga_arbiter_add_pci_device(pdev); > > > > If so, I think I prefer the one-liner because then everything in this > > file would use pci_is_display() and we wouldn't have to figure out the > > equivalent-ness of pci_get_base_class(PCI_BASE_CLASS_DISPLAY). > > pci_get_base_class() is a search function. It only really makes sense for > iterating. Right I'm saying that if you do this: while ((pdev = pci_get_subsys(PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, pdev)) != NULL) { if (pci_is_display(pdev)) vga_arbiter_add_pci_device(pdev); } the patch is a bit smaller and we don't have to look up pci_get_base_class() and confirm that it returns things for which pci_is_display() is true. That's just a little more analysis. Bjorn