On Thu, Jul 17, 2025 at 10:12:03PM +0200, Lukas Wunner wrote: > On Thu, Jul 17, 2025 at 11:11:44AM -0400, Michael S. Tsirkin wrote: > > On Mon, Jul 14, 2025 at 08:11:04AM +0200, Lukas Wunner wrote: > > > On Wed, Jul 09, 2025 at 04:55:26PM -0400, Michael S. Tsirkin wrote: > > > > At the moment, in case of a surprise removal, the regular remove > > > > callback is invoked, exclusively. This works well, because mostly, the > > > > cleanup would be the same. > > > > > > > > However, there's a race: imagine device removal was initiated by a user > > > > action, such as driver unbind, and it in turn initiated some cleanup and > > > > is now waiting for an interrupt from the device. If the device is now > > > > surprise-removed, that never arrives and the remove callback hangs > > > > forever. > > > > > > For PCI devices in a hotplug slot, user space can initiate "safe removal" > > > by writing "0" to the hotplug slot's "power" file in sysfs. > > > > > > If the PCI device is yanked from the slot while safe removal is ongoing, > > > there is likewise no way for the driver to know that the device is > > > suddenly gone. That's because pciehp_unconfigure_device() only calls > > > pci_dev_set_disconnected() in the surprise removal case, not for > > > safe removal. > > > > > > The solution proposed here is thus not a complete one: It may work > > > if user space initiated *driver* removal, but not if it initiated *safe* > > > removal of the entire device. For virtio, that may be sufficient. > > > > So just as an idea, something like this can work I guess? I'm yet to > > test this - wrote this on the go - > > Don't bother, it won't work: > > pciehp_handle_presence_or_link_change() is called from pciehp_ist(), > the IRQ thread. During safe removal the IRQ thread is busy in > pciehp_unconfigure_device() and waiting for the driver to unbind > from devices being safe-removed. Confused. I thought safe removal happens in the userspace thread that wrote into sysfs? > An IRQ thread is always single-threaded. There's no second instance > of the IRQ thread being run when another interrupt is signaled. > Rather, the IRQ thread is re-run when it has finished. > > In *theory* what would be possible is to plumb this into pciehp_isr(). > That's the hardirq handler. This one will indeed be run when an > interrupt comes in while the IRQ thread is running. Normally the > hardirq handler would just collect the events for later consumption > by the IRQ thread. The hardirq handler could *theoretically* mark > devices gone while they're being safe-removed. > > I'm saying "theoretically" because in reality I don't think this is > a viable approach either: pciehp_ist() contains code to *ignore* > link or presence changes if they were caused by a Secondary Bus Reset > or Downstream Port Containment. In that case we do *not* want to mark > devices disconnected because they're only *temporarily* inaccessible. > This requires waiting for the SBR or DPC to conclude, which can take > several seconds. We can't wait in the hardirq handler. > > So this cannot be solved with the current architecture of pciehp, > at least not easily or in an elegant way. Sorry! > > Thanks, > > Lukas