Re: [PATCH 24/25] PCI: Perform reset_resource() and build fail list in sync

D Scott Phillips <scott@xxxxxxxxxxxxxxxxxxxxxx> · Thu, 26 Jun 2025 07:53:45 -0700

Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> writes:

> On Wed, 25 Jun 2025, D Scott Phillips wrote:
>
>> Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> writes:
>> 
>> > On Wed, 18 Jun 2025, D Scott Phillips wrote:
>> >
>> >> Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> writes:
>> >> 
>> >> > Resetting resource is problematic as it prevent attempting to allocate
>> >> > the resource later, unless something in between restores the resource.
>> >> > Similarly, if fail_head does not contain all resources that were reset,
>> >> > those resource cannot be restored later.
>> >> >
>> >> > The entire reset/restore cycle adds complexity and leaving resources
>> >> > into reseted state causes issues to other code such as for checks done
>> >> > in pci_enable_resources(). Take a small step towards not resetting
>> >> > resources by delaying reset until the end of resource assignment and
>> >> > build failure list (fail_head) in sync with the reset to avoid leaving
>> >> > behind resources that cannot be restored (for the case where the caller
>> >> > provides fail_head in the first place to allow restore somewhere in the
>> >> > callchain, as is not all callers pass non-NULL fail_head).
>> >> >
>> >> > The Expansion ROM check is temporarily left in place while building the
>> >> > failure list until the upcoming change which reworks optional resource
>> >> > handling.
>> >> >
>> >> > Ideally, whole resource reset could be removed but doing that in a big
>> >> > step would make the impact non-tractable due to complexity of all
>> >> > related code.
>> >> >
>> >> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx>
>> >> 
>> >> Hi Ilpo, I'm seeing a crash on arm64 at boot that I bisected to this
>> >> change. I don't think it's the same as the other issues reported. I've
>> >> confirmed the crash is still there after your follow up patches.  The
>> >> crash itself is below[1].
>> >> 
>> >> It looks like the problem begins when:
>> >> 
>> >> amdgpu_device_resize_fb_bar()
>> >>  pci_resize_resource()
>> >>   pci_reassign_bridge_resources()
>> >>    __pci_bus_size_bridges()
>> >> 
>> >> adds pci_hotplug_io_size to `realloc_head`. The io resource allocation
>> >> has failed earlier because the root port doesn't have an io window[2].
>> >> 
>> >> Then with this patch, pci_reassign_bridge_resources()'s call to
>> >> __pci_bridge_assign_resources() now returns the io added space for
>> >> hotplug in the `failed` list where the old code dropped it and did not.
>> >> 
>> >> That sends pci_reassign_bridge_resources() into the `cleanup:` path,
>> >> where I think the cleanup code doesn't properly release the resources
>> >> that were assigned by __pci_bridge_assign_resources() and there's a
>> >> conflict reported in pci_claim_resource() where a restored resource is
>> >> found as conflicting with itself:
>> >> 
>> >> > pcieport 000d:00:01.0: bridge window [mem 0x340000000000-0x340017ffffff 64bit pref]: can't claim; address conflict with PCI Bus 000d:01 [mem 0x340000000000-0x340017ffffff 64bit pref]
>> >> 
>> >> Setting `pci=hpiosize=0` avoids this crash, as does this change:
>> >> 
>> >> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
>> >> index 16d5d390599a..59ece11702da 100644
>> >> --- a/drivers/pci/setup-bus.c
>> >> +++ b/drivers/pci/setup-bus.c
>> >> @@ -2442,7 +2442,7 @@ int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type)
>> >>  	LIST_HEAD(saved);
>> >>  	LIST_HEAD(added);
>> >>  	LIST_HEAD(failed);
>> >> -	unsigned int i;
>> >> +	unsigned int i, relevant_fails;
>> >>  	int ret;
>> >>  
>> >>  	down_read(&pci_bus_sem);
>> >> @@ -2490,7 +2490,16 @@ int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type)
>> >>  	__pci_bridge_assign_resources(bridge, &added, &failed);
>> >>  	BUG_ON(!list_empty(&added));
>> >>  
>> >> -	if (!list_empty(&failed)) {
>> >> +	relevant_fails = 0;
>> >> +	list_for_each_entry(dev_res, &failed, list) {
>> >> +		restore_dev_resource(dev_res);
>> >> +		if (((dev_res->res->flags ^ type) & PCI_RES_TYPE_MASK) == 0)
>> >> +			relevant_fails++;
>> >> +	}
>> >> +	free_list(&failed);
>> >> +
>> >> +	/* Cleanup if we had failures in resources of interest */
>> >> +	if (relevant_fails != 0) {
>> >>  		ret = -ENOSPC;
>> >>  		goto cleanup;
>> >>  	}
>> >> @@ -2509,11 +2518,6 @@ int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type)
>> >>  	return 0;
>> >>  
>> >>  cleanup:
>> >> -	/* Restore size and flags */
>> >> -	list_for_each_entry(dev_res, &failed, list)
>> >> -		restore_dev_resource(dev_res);
>> >> -	free_list(&failed);
>> >> -
>> >>  	/* Revert to the old configuration */
>> >>  	list_for_each_entry(dev_res, &saved, list) {
>> >>  		struct resource *res = dev_res->res;
>> >> 
>> >> I don't know this code well enough to know if that changes is completely
>> >> bonkers or what.
>> >
>> > Hi again,
>> >
>> > Thanks for all the details what you think went wrong, it was really 
>> > useful. I think you have it towards the right direction but a more 
>> > targetted seems enough to address this (this needs to be confirmed, please
>> > test the patch below).
>> >
>> > The most correct solution would be to make all the resource fitting code 
>> > to focus on the resources that match the type filter. However, that looks 
>> > way too scary change at the moment to implement, and especially, let it 
>> > end up into stable (to fix this issue). So it looks this somewhat band-aid 
>> > solution similar to your attempt might be better as a fix for now.
>> >
>> > In medium term, I'd want to avoid using type as a filter and base all 
>> > such decisions on matching the bridge window resource the dev resource 
>> > belongs to. I've some work towards that direction already which removes 
>> > lots of complexity in which bridge window is going to be selected as 
>> > there will be a single place to make always the same decision. That change 
>> > is also going to simplify the internal interfaces between functions very 
>> > noticably (but the change require more testing before I've enough 
>> > confidence to submit it). That work doesn't cover this resize side yet but 
>> > it should be extended there as well.
>> >
>> > So please test this somewhat band-aid patch:
>> >
>> > From 971686ed85e341e7234f8fe8b666140187f63ad1 Mon Sep 17 00:00:00 2001
>> > From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= <ilpo.jarvinen@xxxxxxxxxxxxxxx>
>> > Date: Wed, 25 Jun 2025 20:30:43 +0300
>> > Subject: [PATCH 1/1] PCI: Fix failure detectiong during resource resize
>> 
>> detection
>> 
>> > MIME-Version: 1.0
>> > Content-Type: text/plain; charset=UTF-8
>> > Content-Transfer-Encoding: 8bit
>> >
>> > Since the commit 96336ec70264 ("PCI: Perform reset_resource() and build
>> > fail list in sync") the failed list is always built and returned to let
>> > the caller decide if what to do with the failures. The caller may want
>> > to retry resource fitting and assignment and before that can happen,
>> > the resources should be restored to their original state (a reset
>> > effectively clears the struct resource), which requires returning them
>> > on the failed list so that the original state remains stored in the
>> > associated struct pci_dev_resource.
>> >
>> > Resource resizing is different from the ordinary resource fitting and
>> > assignment in that it only considers part of the resources. This means
>> > failures for other resource types are not relevant at all and should be
>> > ignored. As resize doesn't unassign such unrelated resources, those
>> > resource ending up into the failed list implies assignment of that
>> > resource must have failed before resize too. The check in
>> > pci_reassign_bridge_resources() to decide if the whole assignment is
>> > successful, however, is based on list emptiness which may cause false
>> > negatives when the failed list resources with unrelated type.
>> >
>> > If the failed list is not empty, call pci_required_resource_failed()
>> > and extend it to be able to filter on specific resource types too (if
>> > provided).
>> >
>> > Calling pci_required_resource_failed() at this point is slightly
>> > problematic because the resource itself is reset when the failed list
>> > is constructed in __assign_resources_sorted(). As a result,
>> > pci_resource_is_optional() does not have access to the original
>> > resource flags. This could be worked around by restoring and
>> > re-reseting the resource around the call to pci_resource_is_optional(),
>> > however, it shouldn't cause issue as resource resizing is meant for
>> > 64-bit prefetchable resources according to Christian König (see the
>> > Link which unfortunately doesn't point directly to Christian's reply
>> > because lore didn't store that email at all).
>> >
>> > Link: https://lore.kernel.org/all/c5d1b5d8-8669-5572-75a7-0b480f581ac1@xxxxxxxxxxxxxxx/
>> > Reported-by: D Scott Phillips <scott@xxxxxxxxxxxxxxxxxxxxxx>
>> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx>
>> > Cc: Christian König <christian.koenig@xxxxxxx>
>> > ---
>> >  drivers/pci/setup-bus.c | 26 ++++++++++++++++++--------
>> >  1 file changed, 18 insertions(+), 8 deletions(-)
>> >
>> > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
>> > index 07c3d021a47e..8284bbdc44b4 100644
>> > --- a/drivers/pci/setup-bus.c
>> > +++ b/drivers/pci/setup-bus.c
>> > @@ -28,6 +28,10 @@
>> >  #include <linux/acpi.h>
>> >  #include "pci.h"
>> >  
>> > +#define PCI_RES_TYPE_MASK \
>> > +	(IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH |\
>> > +	 IORESOURCE_MEM_64)
>> > +
>> >  unsigned int pci_flags;
>> >  EXPORT_SYMBOL_GPL(pci_flags);
>> >  
>> > @@ -384,13 +388,19 @@ static bool pci_need_to_release(unsigned long mask, struct resource *res)
>> >  }
>> >  
>> >  /* Return: @true if assignment of a required resource failed. */
>> > -static bool pci_required_resource_failed(struct list_head *fail_head)
>> > +static bool pci_required_resource_failed(struct list_head *fail_head,
>> > +					 unsigned long type)
>> >  {
>> >  	struct pci_dev_resource *fail_res;
>> >  
>> > +	type &= ~PCI_RES_TYPE_MASK;
>> 
>> Is this meant to be `type &= PCI_RES_TYPE_MASK`? If not, then I think
>> the new `if` check below is effectively just `if (type)`.
>
> Yes, it should have been without that ~. Can you test the change with 
> that changed? I'm sorry about the extra trouble.

Hi Ilpo, no trouble at all, and thanks for your effort in fixing this
case. With that change to ~, the patch keeps working for my case.

Tested-by: D Scott Phillips <scott@xxxxxxxxxxxxxxxxxxxxxx>
Reviewed-by: D Scott Phillips <scott@xxxxxxxxxxxxxxxxxxxxxx>