On 5/30/25 7:38 AM, Ilpo Järvinen wrote: cut >>>> Reverting the following patches fixes the problem: >>>> a34d74877c66 PCI: Restore assigned resources fully after release >>>> 2499f5348431 PCI: Rework optional resource handling >>>> 96336ec70264 PCI: Perform reset_resource() and build fail list in sync >>> >>> So it's confirmed that you needed to revert also this last commit >>> 96336ec70264, not just the rework change? >> >> I needed to revert 96336ec70264 as well otherwise the build fails. > > Hi again, Hi! cut > > The missing helper is basically this: cut I used the following: +static bool pci_resource_is_disabled_rom(const struct resource *res, int resno) +{ + return resno == PCI_ROM_RESOURCE && !(res->flags & IORESOURCE_ROM_ENABLE); +} > > Because of this, the actual culprit could be in 2499f5348431, not it > 96336ec70264 (which would make more sense as it does significant rework > on the assignment algorithm). I confirm with the above that the problem is in 2499f5348431 indeed. cut >> I added the suggested prints >> (https://paste.ofcode.org/DgmZGGgS6D36nWEzmfCqMm) on top of v6.15 with >> the downstream PCIe pixel driver and I obtain the following. Note that >> all added prints contain "tudor" for differentiation. >> >> [ 15.211179][ T1107] pci 0001:01:00.0: [144d:a5a5] type 00 class >> 0x000000 PCIe Endpoint >> [ 15.212248][ T1107] pci 0001:01:00.0: BAR 0 [mem >> 0x00000000-0x000fffff 64bit] >> [ 15.212775][ T1107] pci 0001:01:00.0: ROM [mem 0x00000000-0x0000ffff >> pref] >> [ 15.213195][ T1107] pci 0001:01:00.0: enabling Extended Tags >> [ 15.213720][ T1107] pci 0001:01:00.0: PME# supported from D0 D3hot >> D3cold >> [ 15.214035][ T1107] pci 0001:01:00.0: 15.752 Gb/s available PCIe >> bandwidth, limited by 8.0 GT/s PCIe x2 link at 0001:00:00.0 (capable of >> 31.506 Gb/s with 16.0 GT/s PCIe x2 link) >> [ 15.222286][ T1107] pci 0001:01:00.0: tudor: 1: pbus_size_mem: BAR 0 >> [mem 0x00000000-0x000fffff 64bit] list empty? 1 >> [ 15.222813][ T1107] pci 0001:01:00.0: tudor: 1: pbus_size_mem: ROM >> [mem 0x00000000-0x0000ffff pref] list empty? 1 >> [ 15.224429][ T1107] pci 0001:01:00.0: tudor: 2: pbus_size_mem: ROM >> [mem 0x00000000-0x0000ffff pref] list empty? 0 >> [ 15.224750][ T1107] pcieport 0001:00:00.0: bridge window [mem >> 0x00100000-0x001fffff] to [bus 01-ff] add_size 100000 add_align 100000 >> >> [ 15.225393][ T1107] tudor : pci_assign_unassigned_bus_resources: >> before __pci_bus_assign_resources -> list empty? 0 >> [ 15.225594][ T1107] pcieport 0001:00:00.0: tudor: >> pdev_sort_resources: bridge window [mem 0x00100000-0x001fffff] resource >> added in head list >> [ 15.226078][ T1107] pcieport 0001:00:00.0: bridge window [mem >> 0x40000000-0x401fffff]: assigned > > So here it ends up assigning the resource here I think. > > > That print isn't one of yours in reassign_resources_sorted() so the > assignment must have been made in assign_requested_resources_sorted(). But > then nothing is printed out from reassign_resources_sorted() so I suspect > __assign_resources_sorted() has short-circuited. > > We know that realloc_head is not empty, so that leaves the goto out from > if (list_empty(&local_fail_head)), which kind of makes sense, all > entries on the head list were assigned. But the code there tries to remove > all head list resources from realloc_head so why it doesn't get removed is > still a mystery. assign_requested_resources_sorted() doesn't seem to > remove anything from the head list so that resource should still be on the > head list AFAICT so it should call that remove_from_list(realloc_head, > dev_res->res) for it. > > So can you see if that theory holds water and it short-circuits without > removing the entry from realloc_head? > cut. I saw your other reply. Will check a bit both and respond there directly. >> >>> In any case, that BUG_ON() seems a bit drastic action for what might be >>> just a single resource allocation failure so it should be downgraded to: >>> >>> if (WARN_ON(!list_empty(&add_list)) >>> free_list(&add_list); >>> >>> ... or WARN_ON_ONCE(). >> >> I saw your patch doing this, the phone now boots, but obviously I still >> see the WARN, so maybe there's still something to be fixed. > cut > Now that it boots, can you please check if /proc/iomem is the same both in > the non-working and working config. If that resource got assigned > successfully, it might well be there is no actual differences in the > assigned resources (which again doesn't mean there wouldn't be a bug in > the logic as discussed above). I confirm /proc/iomem is identical when comparing the no revert and the WARN_ON_ONCE() case, and when reverting the blamed commit case.