Re: [PATCH 15/15] libmpathpersist: Add safety check for preempting on key change

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 2025-08-24 at 23:00 +0200, Martin Wilck wrote:
> On Thu, 2025-07-10 at 14:11 -0400, Benjamin Marzinski wrote:
> > When a reservation key is changed from one non-zero value to
> > another,
> > libmpathpersist checks if the old key is still holding the
> > reservation,
> > and preempts it if it is. This was only safe if two nodes never
> > share
> > the same key. If a node uses the same key as another node that is
> > holding the reservation, and switches keys so that it no longer
> > matches,
> > it will end up preempting the reservation. This is clearly
> > unexpected
> > behavior, and it can come about by a simple accident of registering
> > a
> > device with the wrong key, and then immediately fixing it.
> > 
> > To handle this, add code to track if a device is the reservation
> > holder
> > to multipathd. multipathd now has three new commands "getprhold",
> > "setprhold", and "unsetprhold". These commands work like the
> > equivalent
> > *prstatus commands. libmpathpersist calls setprhold on RESERVE
> > commands
> > and PREEMPT commands when the preempted key is holding the
> > reservation
> > and unsetprhold on RELEASE commands. Also, calling unsetprflag
> > causes
> > prhold to be unset as well, so CLEAR commands and REGISTER commands
> > with
> > a 0x0 service action key will also unset prhold. libmpathpersist()
> > will
> > also unset prhold if it notices that the device cannot be holding a
> > reservation in preempt_missing_path().
> > 
> > When a new multipath device is created, its initial prhold state is
> > PR_UNKNOWN until it checks the current reservation, just like with
> > prflag. If multipathd ever finds that a device's registration has
> > been
> > preempted or cleared in update_map_pr(), it unsets prhold, just
> > like
> > with prflag.
> > 
> > Now, before libmpathpersist preempts a reservation when changing
> > keys
> > it also checks if multipathd thinks that the device is holding
> > the reservation. If it does not, then libmpathpersist won't preempt
> > the key. It will assume that another node is holding the
> > reservation
> > with the same key.
> > 
> > I should note that this safety check only stops a node not holding
> > the
> > reservation from preempting the node holding the reservation. If
> > the
> > node holding the reservation changes its key, but it fails to
> > change
> > the
> > resevation key, because that path is down or gone, it will still
> > issue
> > the preempt to move the reservation to a usable path, even if
> > another
> > node is using the same key. This will remove the registrations for
> > that
> > other node. It also will not work correctly if multipathd stops
> > tracking
> > a device for some reason. It's only a best-effort safety check.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@xxxxxxxxxx>
> > ---
> >  libmpathpersist/mpath_persist_int.c | 87 ++++++++++++++++++++++++-
> > --
> > --
> >  libmpathpersist/mpath_updatepr.c    | 19 ++++++-
> >  libmpathpersist/mpathpr.h           |  2 +
> >  libmultipath/structs.h              |  1 +
> >  multipathd/callbacks.c              |  3 +
> >  multipathd/cli.c                    |  4 +-
> >  multipathd/cli.h                    |  3 +
> >  multipathd/cli_handlers.c           | 48 ++++++++++++++++
> >  multipathd/main.c                   | 33 ++++++++++-
> >  9 files changed, 182 insertions(+), 18 deletions(-)
> > 
> 
> Two questions all the way down.
> 
> >  
> > +static void check_prhold(struct multipath *mpp, struct path *pp)
> > +{
> > +	struct prin_resp resp = {0};
> > +	int status;
> > +
> > +	if (mpp->prflag == PR_UNSET) {
> > +		mpp->prhold = PR_UNSET;
> > +		return;
> > +	}
> > +	if (mpp->prflag != PR_SET || mpp->prhold != PR_UNKNOWN)
> > +		return;
> 
> How can this situation come to pass? prflag must be PR_UNKNOWN. 
> Shouldn't we reset prhold to PR_UNKNOWN as well?

Forget this remark. I was confused.

> 
> > +
> > +	status = prin_do_scsi_ioctl(pp->dev, MPATH_PRIN_RRES_SA,
> > &resp, 0);
> > +	if (status != MPATH_PR_SUCCESS) {
> > +		condlog (0, "%s: pr in read reservation command
> > failed: %d",
> > +			 mpp->wwid, status);
> > +		return;
> > +	}
> > +	mpp->prhold = PR_UNSET;
> > +	if (!resp.prin_descriptor.prin_readresv.additional_length)
> > +		return;
> > +
> > +	if (memcmp(&mpp->reservation_key,
> > resp.prin_descriptor.prin_readresv.key, 8) == 0)
> > +		mpp->prhold = PR_SET;
> > +}
> > +
> >  static void mpath_pr_event_handle(struct path *pp)
> >  {
> >  	struct multipath *mpp = pp->mpp;
> > @@ -4233,8 +4259,13 @@ static void mpath_pr_event_handle(struct
> > path
> > *pp)
> >  		return;
> >  	}
> >  
> > -	if (update_map_pr(mpp, pp) != MPATH_PR_SUCCESS)
> > +	if (update_map_pr(mpp, pp) != MPATH_PR_SUCCESS) {
> > +		if (mpp->prflag == PR_UNSET)
> > +			mpp->prhold = PR_UNSET;
> 
> Perhaps we should move setting prhold to update_map_pr()?

You fixed this in your 2nd set.

Martin





[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux