Re: [PATCH 2/3] multipathd: re-add paths skipped because they were offline

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2025-03-28 at 12:36 -0400, Benjamin Marzinski wrote:
> On Tue, Mar 25, 2025 at 11:33:18PM +0100, Martin Wilck wrote:
> > On Mon, 2025-03-24 at 16:55 -0400, Benjamin Marzinski wrote:
> > > When a new device is added by the multipath command, multipathd
> > > may
> > > know
> > > of other paths that cannot be added to the device because they
> > > are
> > > currently offline. Instead of ignoring these paths, multipathd
> > > will
> > > now
> > > re-add them when they come back online. To do this, it multipathd
> > > needs
> > > a new device initialized state, INIT_OFFLINE, to track devices
> > > that
> > > were
> > > in INIT_OK, but could not be added to an existing multipath
> > > device
> > > because they were offline. These paths are handled along with the
> > > other
> > > uninitialized paths.
> > > 
> > > Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx>
> > 
> > This looks good in general, but I'm not certain about INIT_OFFLINE.
> > The
> > property "path was temporarily offline while we tried to add it to
> > a
> > map" is not logically related to path initialization, IMO. Also, we
> > have plenty of (pp->initialized != INIT_xyz) conditionals in the
> > code,
> > and your patch touches only 2 of them. I find it difficult to rule
> > out
> > that some of them might be broken by the additional init state.
> > Have
> > you double-checked them all?
> 
> Yeah. I may have overlooked something, but I did look through them. I
> just double-checked pathinfo(), since if that's called with DI_ALL
> and
> succeeds, it will unconditionally set the path to INIT_OK.  There
> isn't
> a case where that could happen to a path in INIT_OFFLINE, but it
> would
> make sense to add a check there, just to make it obvious that this
> shouldn't move the path from INIT_OFFLINE to INIT_OK. Even if
> DI_CHECKER
> is set and pathinfo() discovers that the path has come back up, It's
> easier to just wait for checkerloop to check the path, and handle
> adding
> it the the multipath device there.
> 
> I should point out what this patch doesn't do, since that might clear
> up
> some cases where you would think I need to handle offline paths. This
> patch is only taking care of the case where a multipath device
> exists,
> but multipathd won't start monitoring it because it can't reload the
> table with the offline paths. It does not handle the case where
> offline
> paths exist, and that keeps multipathd from creating a device in the
> first place.
> 
> The easiest way for that to happen is if the device WWID is not in
> the
> wwids file and find_multipaths is set to "on" or "smart", where a
> device
> will get multipathed when a second path appears. In this case, you
> could
> add a single path and fully initialize it, then have that path go
> offline, and later add a second path. Multipathd would see that there
> are two paths, and try to create a multipath device. However the
> offline path would keep the device from being created. Since we
> call pathinfo(DI_CHECKER) in adopt_paths(), multipathd should know
> that the path is offline before it tries to create the device, so
> we could create the device without any offline paths, and then set
> those
> paths to INIT_OFFLINE, for later addition.
> 
> This case seemed less important to me than the one where we end up
> with
> an untracked mlutipath device. This problem has always existed and
> it's
> immediately obvious when it occurs (there's no multipath device), and
> yet I've never gotten a bug report for it. This makes sense because
> it's
> pretty hard to hit. Usually, the path will either not get initialized
> in
> the first place because it's offline (in which case we won't attempt
> to
> use it as part of a multipath device) or multipathd will create a
> multipath device using it as soon as the path gets added, leaving a
> very
> small window between when multipathd initializes the path and when
> it creates the multipath device. The larger window for the path to
> go offline only happens in the case where multipathd doesn't know
> to create a multipath device right away.
> 
> The case with the untracked device is pretty much as hard to hit, and
> has also always existed, but it's not obvious to the user that their
> multipath device isn't being tracked by multipathd. That seems much
> worse to me. I'm considering fixing the case where the device doesn't
> get created at all, but that can happen in multiple places, so I
> think
> it'll be a little messier. Also, I can't decide if multipathd should
> try
> to create the device with all the paths first, and only ignore the
> offline paths on retries, if the first create attempt fails, or if it
> should always skip the offline paths, at least if if just checked
> their
> state in adopt_paths(). Thoughts?

I agree that this is a corner case, and I have also never seen a bug
report about it. I wouldn't want to spend a lot of work on it.

Arguably the inconsistency is in the kernel, as the kernel doesn't mind
carrying around an offline device in a map after it has been added, but
refuses to add it in the first place while offline.

But I'm not keen on changing anything in that area, either.

Martin






[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux