On Tue, 2025-08-26 at 15:36 -0400, Benjamin Marzinski wrote: > On Tue, Aug 26, 2025 at 10:44:22AM +0200, Martin Wilck wrote: > > On Mon, 2025-08-25 at 20:51 -0400, Benjamin Marzinski wrote: > > > On Sun, Aug 24, 2025 at 05:26:50PM +0200, Martin Wilck wrote: > > > > > > > > > > > > + /* > > > > > + * Cannot free the reservation because the path that > > > > > is > > > > > holding it > > > > > + * is not usable. Workaround this by: > > > > > + * 1. Suspending the device > > > > > + * 2. Preempting the reservation to move it to a > > > > > usable > > > > > path > > > > > + * (this removes the registered keys on all paths > > > > > except > > > > > the > > > > > + * preempting one. Since the device is suspended, > > > > > no > > > > > IO > > > > > can > > > > > + * go to these unregistered paths and fail). > > > > > + * 3. Releasing the reservation on the path that now > > > > > holds > > > > > it. > > > > > + * 4. Resuming the device (since it no longer > > > > > matters > > > > > that > > > > > most of > > > > > + * that paths no longer have a registered key) > > > > > + * 5. Reregistering keys on all the paths > > > > > + */ > > > > > + > > > > > + if (!dm_simplecmd_noflush(DM_DEVICE_SUSPEND, mpp- > > > > > >alias, > > > > > 0)) > > > > > { > > > > > + condlog(0, "%s: release: failed to suspend > > > > > dm > > > > > device.", > > > > > > > > Why do you use dm_simplecmd_noflush() here? Shouldn't queued IO > > > > be > > > > flushed from the dm device to avoid it being sent to paths that > > > > are > > > > going to be unregistered? > > > > > > > > > > I'm pretty certain that DM will still flush all the IO from the > > > target > > > to DM core before suspending, even with dm_simplecmd_noflush() > > > set. > > > In > > > request based multipath, queued IOs are never stored in the > > > target. > > > In > > > bio based multipath, they are, but they will get flushed back up > > > to > > > DM > > > core when suspending and queued there. No IO should happen > > > through > > > the > > > target after the suspend, until the resume. > > > dm_simplecmd_noflush() > > > just > > > keeps multipath from failing any IO that it had queueing, and > > > it's > > > only > > > really necessary when we resize the device, because if we shrink > > > the > > > device, outstanding IO might be outside the new bounds. > > > > OK, thanks for the clarification. I guess I've never fully > > understood > > the way queueing works in dm. > > > > What about queueing in the path devices? We'll be removing > > registration > > keys, so IO sent by the SCSI layer may end up with RESERVATION > > CONFLICT > > errors. To my understanding, without the DM_NOFLUSH_FLAG the kernel > > will freeze the queue and flush everything, as if the device was > > closed > > during shutdown. If DM_NOFLUSH_FLAG is set, this won't happen. > > What's > > preventing the SCSI layer from sending IO while we're modifying the > > registrations? > > In __dm_suspend() we block all new IOs to the dm device here: > https://github.com/torvalds/linux/blob/fab1beda7597fac1cecc01707d55eadb6bbe773c/drivers/md/dm.c#L2955-L2966 > > Once we know that no new IOs are getting sent to the target, we wait > for > all the IOs that were send to the target to get completed by calling > dm_wait_for_completion() here: > > https://github.com/torvalds/linux/blob/fab1beda7597fac1cecc01707d55eadb6bbe773c/drivers/md/dm.c#L2973 I saw that code. I realize now that dm_wait_for_completion() will finish all IO on underlying path devices. I wasn't 100% certain about that in the first place. I also saw that lock_fs() is not called if we set the noflush flag. And I am wondering if that's what we want. If there's dirty data on the file system and we drop our registration key, the file system will most probably error out. Martin