On Tue, Jun 10, 2025 at 05:31:09PM +0100, Robin Murphy wrote: > On 2025-06-10 4:36 pm, Jason Gunthorpe wrote: > > On Tue, Jun 10, 2025 at 03:40:40PM +0100, Robin Murphy wrote: > > > On 2025-06-10 2:04 pm, Jason Gunthorpe wrote: > > > > On Tue, Jun 10, 2025 at 12:07:00AM -0700, Nicolin Chen wrote: > > > > > On Tue, Jun 10, 2025 at 12:26:07PM +0800, Baolu Lu wrote: > > > > > > On 6/10/25 02:45, Nicolin Chen wrote: > > > > > > > + ops = dev_iommu_ops(dev); > > > > > > > > > > > > Should this be protected by group->mutext? > > > > > > > > > > Not seemingly, but should require the iommu_probe_device_lock I > > > > > think. > > > > > > > > group and ops are not permitted to change while a driver is attached.. > > > > > > > > IIRC the FLR code in PCI doesn't always ensure that (due to the sysfs > > > > paths), so yeah, this looks troubled. iommu_probe_device_lock perhaps > > > > would fix it. > > > > > > No, iommu_probe_device_lock is for protecting access to dev->iommu in the > > > probe path until the device is definitively assigned to a group (or not). > > > Fundamentally it defends against multiple sources triggering a probe of the > > > same device in parallel - once the device *is* probed it is no longer > > > relevant, and the group mutex is the right thing to protect all subsequent > > > operations. > > > > Yes, adding iommu_probe_device_lock to iommu_deinit_device() would be > > gross. > > > > but something is required to protect the load of > > dev->iommu_group.. dev->iommu_group->mutex can't protect itself > > against a race UAF on deinit. > > Then you do iommu_group_get/put() around it as well. Same issue - can't use dev->iommu_group.kobj.kref to protect against UAF. By the time you do a try_get you've already UAF'd the memory holding the kref. It always needs some other enclosing protection. > From a quick skim I suspect it's probably OK - at least device_del() gets to > bus_remove_device()->device_remove_groups() well enough before the > BUS_NOTIFY_REMOVED_DEVICE call that triggers iommu_release_device(). Make sense, Nicolin, a well placed comment explaining this would be good > And on an unrelated thought that's just come to mind, if we ever did FLR > with PASIDs enabled, presumably that's going to wipe out the PASID > configuration, I've always been expecting the PCI FLR code to preserve the config space that belongs the iommu subsystem. PASID, ATS, PRI, etc. Never looked into it... Nicolin?? Otherwise we need a post-FLR callback to have the iommu driver reload the right config values for its current config.. That's an existing nasty bug :) > so will the caller who requested the reset actually expect > the attachments at the IOMMU end to be preserved, or would they assume to > start over from scratch? Seems like there's not necessarily one right answer > there :/ IMHO we have to preserve everything, and I think we should things back to working normally overall, if that isn't happening already. For something like VFIO preserving is desired. For DMA-API that is the right thing too. Something like mlx5, that has a robust RAS system, will unregister itself from the rdma subsystem and that triggers a natural destruction of the SVA/etc domains that might be there. We want the attachments to be left undisturbed so there is no issue cleaning them up later. Jason