Sorry for a huge delay. I've addressed all, following your remarks. Some feedbacks inline. On Fri, Jul 04, 2025 at 12:43:42PM -0300, Jason Gunthorpe wrote: > On Sat, Jun 28, 2025 at 12:42:41AM -0700, Nicolin Chen wrote: > > > - This only works for IOMMU drivers that implemented ops->blocked_domain > > correctly with pci_disable_ats(). > > As was in the thread, it works for everyone. Even if we install an > empty paging domain for blocking that still will stop the ATS > invalidations from being issued. ATS remains on but this is not a > problem. OK. And I am dropping this validation in the PCI patch: /* Something wrong with the iommu driver that failed to disable ATS */ if (dev->ats_enabled) pci_err(dev, "failed to stop ATS. ATS invalidation may time out\n"); > > @@ -2155,8 +2172,17 @@ int iommu_deferred_attach(struct device *dev, struct iommu_domain *domain) > > int ret = 0; > > > > mutex_lock(&group->mutex); > > + > > + /* > > + * There is a racy attach while the device is resetting. Defer it until > > + * the iommu_dev_reset_done() that attaches the device to group->domain. > > + */ > > + if (device_to_group_device(dev)->pending_reset) > > + goto unlock; > > + > > if (dev->iommu && dev->iommu->attach_deferred) > > ret = __iommu_attach_device(domain, dev); > > +unlock: > > mutex_unlock(&group->mutex); > > Actually looking at this some more maybe write it like: > > /* > * This is called on the dma mapping fast path so avoid locking. This > * is racy, but we have an expectation that the driver will setup its > * DMAs inside probe while still single threaded to avoid racing. > */ > if (dev->iommu && !READ_ONCE(dev->iommu->attach_deferred)) This triggers a build error as attach_deferred is a bit-field. So I am changing it from "u32 attach_deferred:1" to "bool" for this. And, to keep the original logic, I think it should be: if (!dev->iommu || !READ_ONCE(dev->iommu->attach_deferred)) > return 0; > > guard(mutex)(&group->mutex); I recall Baolu mentioned that Joerg might not like the guard style so I am keeping mutex_lock/unlock(). > if (device_to_group_device(dev)->pending_reset) > return 0; > > if (!dev->iommu->attach_deferred) > return 0; I think this is redundant since the fast path checked. > return __iommu_attach_device(domain, dev); > > And of course it is already quite crazy to be doing FLR during a > device probe so this is not a realistic scenario. Hmm, I am not sure about that, as I see iommu_deferred_attach() get mostly invoked by a dma_alloc() or even a dma_map(). So, this might not be confined to a device probe? > > + if (dev->iommu->require_direct) { > > + dev_warn( > > + dev, > > + "Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.\n"); > > + return -EINVAL; > > + } > > I don't think we can do this. eg on ARM all devices have RMRs inside > VMs so this will completely break FLR inside a vm??? > > Either ignore this condition with the rational that we are about to > reset it so it doesn't matter, or we need to establish a new paging > domain for isolation purposes that has the RMR setup. Ah, you are right. ARM MSI in a VM uses RMR and sets this. But does it also raise a question that a VM having RMR can't use the blocked_domain, as __iommu_device_set_domain() has the exact same check rejecting blocked_domain? Not sure if there would be some unintended consequnce though... > > + if (ret) > > + goto unlock; > > + > > + /* Dock PASID domains to blocked_domain while retaining pasid_array */ > > + xa_lock(&group->pasid_array); > > Not sure we need this lock? The group mutex already prevents mutation > of the xa list and I dont' think it is allowed to call > iommu_remove_dev_pasid() in an atomic context. I see only iommu_attach_handle_get() doesn't use group->mutex. And it's a reader. So I think it's safe to drop the xa_lock. I added this: /* ||| iommu_map_sg * Dock PASID domains to blocking_domain while retaining pasid_array. * * The pasid_array is mostly fenced by group->mutex, except one reader * in iommu_attach_handle_get(), so it's safe to read without xa_lock. */ Thanks! Nicolin