When a PCI device is removed with surprise hotplug, there may still be attempts to attach the device to the default domain as part of tear down via (__iommu_release_dma_ownership()), or because the removal happens during probe (__iommu_probe_device()). In both cases zpci_register_ioat() fails with a cc value indicating that the device handle is invalid. This is because the device is no longer part of the instance as far as the hypervisor is concerned. Currently this leads to an error return and s390_iommu_attach_device() fails. This triggers the WARN_ON() in __iommu_group_set_domain_nofail() because attaching to the default domain must never fail. With the device fenced by the hypervisor no DMAs to or from memory are possible and the IOMMU translations have no effect. Proceed as if the registration was successful and let the hotplug event handling clean up the device. This is similar to how devices in the error state are handled since commit 59bbf596791b ("iommu/s390: Make attach succeed even if the device is in error state") except that for removal the domain will not be registered later. This approach was also previously discussed at the link. Handle both cases, error state and removal, in a helper which checks if the error needs to be propagated or ignored. Avoid magic number condition codes by using the pre-existing, but never used, defines for PCI load/store condition codes and rename them to reflect that they apply to all PCI instructions. Cc: stable@xxxxxxxxxxxxxxx # v6.2 Link: https://lore.kernel.org/linux-iommu/20240808194155.GD1985367@xxxxxxxx/ Suggested-by: Jason Gunthorpe <jgg@xxxxxxxx> Signed-off-by: Niklas Schnelle <schnelle@xxxxxxxxxxxxx> --- arch/s390/include/asm/pci_insn.h | 10 +++++----- drivers/iommu/s390-iommu.c | 26 +++++++++++++++++++------- 2 files changed, 24 insertions(+), 12 deletions(-) diff --git a/arch/s390/include/asm/pci_insn.h b/arch/s390/include/asm/pci_insn.h index e5f57cfe1d458276a14a5c54409fba4c43962a3a..025c6dcbf893310b473423ab9f5a21b6eaf8c623 100644 --- a/arch/s390/include/asm/pci_insn.h +++ b/arch/s390/include/asm/pci_insn.h @@ -16,11 +16,11 @@ #define ZPCI_PCI_ST_FUNC_NOT_AVAIL 40 #define ZPCI_PCI_ST_ALREADY_IN_RQ_STATE 44 -/* Load/Store return codes */ -#define ZPCI_PCI_LS_OK 0 -#define ZPCI_PCI_LS_ERR 1 -#define ZPCI_PCI_LS_BUSY 2 -#define ZPCI_PCI_LS_INVAL_HANDLE 3 +/* PCI instruction condition codes */ +#define ZPCI_CC_OK 0 +#define ZPCI_CC_ERR 1 +#define ZPCI_CC_BUSY 2 +#define ZPCI_CC_INVAL_HANDLE 3 /* Load/Store address space identifiers */ #define ZPCI_PCIAS_MEMIO_0 0 diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c index 9c80d61deb2c0bba4fae59129323cc90f998693d..f04de62288a8f0e9d640f9f2032f961c4135bc42 100644 --- a/drivers/iommu/s390-iommu.c +++ b/drivers/iommu/s390-iommu.c @@ -612,6 +612,23 @@ static u64 get_iota_region_flag(struct s390_domain *domain) } } +static bool reg_ioat_propagate_error(int cc, u8 status) +{ + /* + * If the device is in the error state the reset routine + * will register the IOAT of the newly set domain on re-enable + */ + if (cc == ZPCI_CC_ERR && status == ZPCI_PCI_ST_FUNC_NOT_AVAIL) + return false; + /* + * If the device was removed treat registration as success + * and let the subsequent error event trigger tear down. + */ + if (cc == ZPCI_CC_INVAL_HANDLE) + return false; + return cc != ZPCI_CC_OK; +} + static int s390_iommu_domain_reg_ioat(struct zpci_dev *zdev, struct iommu_domain *domain, u8 *status) { @@ -696,7 +713,7 @@ static int s390_iommu_attach_device(struct iommu_domain *domain, /* If we fail now DMA remains blocked via blocking domain */ cc = s390_iommu_domain_reg_ioat(zdev, domain, &status); - if (cc && status != ZPCI_PCI_ST_FUNC_NOT_AVAIL) + if (reg_ioat_propagate_error(cc, status)) return -EIO; zdev->dma_table = s390_domain->dma_table; zdev_s390_domain_update(zdev, domain); @@ -1123,12 +1140,7 @@ static int s390_attach_dev_identity(struct iommu_domain *domain, /* If we fail now DMA remains blocked via blocking domain */ cc = s390_iommu_domain_reg_ioat(zdev, domain, &status); - - /* - * If the device is undergoing error recovery the reset code - * will re-establish the new domain. - */ - if (cc && status != ZPCI_PCI_ST_FUNC_NOT_AVAIL) + if (reg_ioat_propagate_error(cc, status)) return -EIO; zdev_s390_domain_update(zdev, domain); --- base-commit: b320789d6883cc00ac78ce83bccbfe7ed58afcf0 change-id: 20250904-iommu_succeed_attach_removed-fc42aa5c454d Best regards, -- Niklas Schnelle