On Mon, Jun 30, 2025 at 04:07:55AM +0000, Parav Pandit wrote: > > > From: Keith Busch <kbusch@xxxxxxxxxx> > > Sent: 30 June 2025 05:10 AM > > > > On Sun, Jun 29, 2025 at 01:28:08PM -0400, Michael S. Tsirkin wrote: > > > On Sun, Jun 29, 2025 at 03:36:27PM +0200, Lukas Wunner wrote: > > > > On Sat, Jun 28, 2025 at 02:58:49PM -0400, Michael S. Tsirkin wrote: > > > > > > > > 1/ The device_lock() will reintroduce the issues solved by 74ff8864cc84. > > > > > > I see. What other way is there to prevent dev->driver from going away, > > > though? I guess I can add a new spinlock and take it both here and > > > when > > > dev->driver changes? Acceptable? > > > > You're already holding the pci_bus_sem here, so the final device 'put' > > can't have been called yet, so the device is valid and thread safe in this > > context. I think maintaining the desired lifetime of the instantiated driver is > > just a matter of reference counting within your driver. > > > > Just a thought on your patch, instead of introducing a new callback, you could > > call the existing '->error_detected()' callback with the previously set > > 'pci_channel_io_perm_failure' status. That would totally work for nvme to > > kick its cleanup much quicker than the blk_mq timeout handling we currently > > rely on for this scenario. > > error_detected() callback is also called while holding the device_lock() by report_error_detected(). > So when remove() callback is ongoing for graceful removal and driver is waiting for the request completions, > > If the error_detected() will be stuck on device lock. But I didn't suggest calling error_detected from report_error_detected. Just call it directly without device_lock. It's not very feasible to enforce a non-blocking callback, though, if speed is really a concern here.