On Tue, 2025-08-26 at 10:44 +0200, Martin Wilck wrote: > On Mon, 2025-08-25 at 20:51 -0400, Benjamin Marzinski wrote: > > On Sun, Aug 24, 2025 at 05:26:50PM +0200, Martin Wilck wrote: > > > > > > > > > + /* > > > > + * Cannot free the reservation because the path that > > > > is > > > > holding it > > > > + * is not usable. Workaround this by: > > > > + * 1. Suspending the device > > > > + * 2. Preempting the reservation to move it to a > > > > usable > > > > path > > > > + * (this removes the registered keys on all paths > > > > except > > > > the > > > > + * preempting one. Since the device is suspended, > > > > no > > > > IO > > > > can > > > > + * go to these unregistered paths and fail). > > > > + * 3. Releasing the reservation on the path that now > > > > holds > > > > it. > > > > + * 4. Resuming the device (since it no longer matters > > > > that > > > > most of > > > > + * that paths no longer have a registered key) > > > > + * 5. Reregistering keys on all the paths > > > > + */ > > > > + > > > > + if (!dm_simplecmd_noflush(DM_DEVICE_SUSPEND, mpp- > > > > >alias, > > > > 0)) > > > > { > > > > + condlog(0, "%s: release: failed to suspend dm > > > > device.", > > > > > > Why do you use dm_simplecmd_noflush() here? Shouldn't queued IO > > > be > > > flushed from the dm device to avoid it being sent to paths that > > > are > > > going to be unregistered? > > > > > > > I'm pretty certain that DM will still flush all the IO from the > > target > > to DM core before suspending, even with dm_simplecmd_noflush() set. > > In > > request based multipath, queued IOs are never stored in the target. > > In > > bio based multipath, they are, but they will get flushed back up to > > DM > > core when suspending and queued there. No IO should happen through > > the > > target after the suspend, until the resume. dm_simplecmd_noflush() > > just > > keeps multipath from failing any IO that it had queueing, and it's > > only > > really necessary when we resize the device, because if we shrink > > the > > device, outstanding IO might be outside the new bounds. > > OK, thanks for the clarification. I guess I've never fully understood > the way queueing works in dm. > > What about queueing in the path devices? We'll be removing > registration > keys, so IO sent by the SCSI layer may end up with RESERVATION > CONFLICT > errors. To my understanding, without the DM_NOFLUSH_FLAG the kernel > will freeze the queue and flush everything, as if the device was > closed > during shutdown. If DM_NOFLUSH_FLAG is set, this won't happen. What's > preventing the SCSI layer from sending IO while we're modifying the > registrations? I just re-read commit 2e93ccc1933d ("[PATCH] dm: suspend: add noflush pushback"), which explains that the noflush flag is mainly meant for handling queue_if_no_path situations, in particular being able to reload a queueing multipath map with new paths when all paths have failed. That's exactly what you wrote, queueing IO in generic dm rather than in the target in order to facilitate map reloads. Here we're looking at a very different situation. RESERVATION CONFLICT errors won't cause queueing, because the kernel doesn't classify them as path errors. If we need to preempt a reservation held by another host, we'd rather not send IO to the device yet. If we have a reservation we want to give up, we should make sure to have all outstanding IO sent to the storage beforehand. In other situations like just registering keys without reservations being present, flushing won't be necessary, but it won't hurt either, AFAICS. I am not sure whether it makes sense to try sending PRIN or PROUT commands to maps in queueing state. Depending on the nature of the errors that caused the paths to fail, we may or may not be able to send this type of commands. PRIN/PROUT must still work for devices in STANDBY state, but there are other situations where they wouldn't work, or would even hang. I tend to think that sending PRIN or PROUT commands to queueing devices is an extreme corner case. Perhaps we should just refuse to do this (realizing that we can't avoid a device entering queueing mode while we're processing PR commands, but that's an even more extreme corner case). If we can assume that the device is not queueing, it might actually be better leave the NOFLUSH flag clear, making sure that the queue is frozen and all IO is actually flushed before changing PR keys or reservations. Am I getting something wrong here? Martin