Re: [PATCH v2 2/2] lockd: while grace prefer to fail with nlm_lck_denied_grace_period

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 21, 2025 at 2:39 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>
> On 8/21/25 2:33 PM, Olga Kornievskaia wrote:
> > On Thu, Aug 21, 2025 at 2:24 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
> >>
> >> On 8/21/25 2:20 PM, Olga Kornievskaia wrote:
> >>> On Thu, Aug 21, 2025 at 11:09 AM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
> >>>>
> >>>> On 8/21/25 9:56 AM, Olga Kornievskaia wrote:
> >>>>> On Wed, Aug 20, 2025 at 7:15 PM NeilBrown <neil@xxxxxxxxxx> wrote:
> >>>>>>
> >>>>>> On Thu, 14 Aug 2025, Olga Kornievskaia wrote:
> >>>>>>> On Tue, Aug 12, 2025 at 8:05 PM NeilBrown <neil@xxxxxxxxxx> wrote:
> >>>>>>>>
> >>>>>>>> On Wed, 13 Aug 2025, Olga Kornievskaia wrote:
> >>>>>>>>> When nfsd is in grace and receives an NLM LOCK request which turns
> >>>>>>>>> out to have a conflicting delegation, return that the server is in
> >>>>>>>>> grace.
> >>>>>>>>>
> >>>>>>>>> Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx>
> >>>>>>>>> Signed-off-by: Olga Kornievskaia <okorniev@xxxxxxxxxx>
> >>>>>>>>> ---
> >>>>>>>>>  fs/lockd/svc4proc.c | 15 +++++++++++++--
> >>>>>>>>>  1 file changed, 13 insertions(+), 2 deletions(-)
> >>>>>>>>>
> >>>>>>>>> diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c
> >>>>>>>>> index 109e5caae8c7..7ac4af5c9875 100644
> >>>>>>>>> --- a/fs/lockd/svc4proc.c
> >>>>>>>>> +++ b/fs/lockd/svc4proc.c
> >>>>>>>>> @@ -141,8 +141,19 @@ __nlm4svc_proc_lock(struct svc_rqst *rqstp, struct nlm_res *resp)
> >>>>>>>>>       resp->cookie = argp->cookie;
> >>>>>>>>>
> >>>>>>>>>       /* Obtain client and file */
> >>>>>>>>> -     if ((resp->status = nlm4svc_retrieve_args(rqstp, argp, &host, &file)))
> >>>>>>>>> -             return resp->status == nlm_drop_reply ? rpc_drop_reply :rpc_success;
> >>>>>>>>> +     resp->status = nlm4svc_retrieve_args(rqstp, argp, &host, &file);
> >>>>>>>>> +     switch (resp->status) {
> >>>>>>>>> +     case 0:
> >>>>>>>>> +             break;
> >>>>>>>>> +     case nlm_drop_reply:
> >>>>>>>>> +             if (locks_in_grace(SVC_NET(rqstp))) {
> >>>>>>>>> +                     resp->status = nlm_lck_denied_grace_period;
> >>>>>>>>
> >>>>>>>> I think this is wrong.  If the lock request has the "reclaim" flag set,
> >>>>>>>> then nlm_lck_denied_grace_period is not a meaningful error.
> >>>>>>>> nlm4svc_retrieve_args() returns nlm_drop_reply when there is a delay
> >>>>>>>> getting a response to an upcall to mountd.  For NLM the request really
> >>>>>>>> must be dropped.
> >>>>>>>
> >>>>>>> Thank you for pointing out this case so we are suggesting to.
> >>>>>>>
> >>>>>>> if (locks_in_grace(SVC_NET(rqstp)) && !argp->reclaim)
> >>>>>>>
> >>>>>>> However, I've been looking and looking but I cannot figure out how
> >>>>>>> nlm4svc_retrieve_args() would ever get nlm_drop_reply. You say it can
> >>>>>>> happen during the upcall to mountd. So that happens within nfsd_open()
> >>>>>>> call and a part of fh_verify().
> >>>>>>> To return nlm_drop_reply, nlm_fopen() must have gotten nfserr_dropit
> >>>>>>> from the nfsd_open().  I have searched and searched but I don't see
> >>>>>>> anything that ever sets nfserr_dropit (NFSERR_DROPIT).
> >>>>>>>
> >>>>>>> I searched the logs and nfserr_dropit was an error that was EAGAIN
> >>>>>>> translated into but then removed by the following patch.
> >>>>>>
> >>>>>> Oh.  I didn't know that.
> >>>>>> We now use RQ_DROPME instead.
> >>>>>> I guess we should remove NFSERR_DROPIT completely as it not used at all
> >>>>>> any more.
> >>>>>>
> >>>>>> Though returning nfserr_jukebox to an v2 client isn't a good idea....
> >>>>>
> >>>>> I'll take your word for you.
> >>>>>
> >>>>>> So I guess my main complaint isn't valid, but I still don't like this
> >>>>>> patch.  It seems an untidy place to put the locks_in_grace test.
> >>>>>> Other callers of nlm4svc_retrieve_args() check locks_in_grace() before
> >>>>>> making that call.  __nlm4svc_proc_lock probably should too.  Or is there
> >>>>>> a reason that it is delayed until the middle of nlmsvc_lock()..
> >>>>>
> >>>>> I thought the same about why not adding the in_grace check and decided
> >>>>> that it was probably because you dont want to deny a lock if there are
> >>>>> no conflicts. If we add it, somebody might notice and will complain
> >>>>> that they can't get their lock when there are no conflicts.
> >>>>>
> >>>>>> The patch is not needed and isn't clearly an improvement, so I would
> >>>>>> rather it were dropped.
> >>>>>
> >>>>> I'm not against dropping this patch if the conclusion is that dropping
> >>>>> the packet is no worse than returning in_grace error.
> >>>>
> >>>> I dropped both of these from nfsd-testing. If a fix is still needed,
> >>>> let's start again with fresh patches.
> >>>
> >>> Can you clarify when you said "both"?
> >>>
> >>> What objections are there for the 1st patch in the series. It solves a
> >>> problem and a fix is needed.
> >>
> >> There are two reasons to drop the first patch.
> >>
> >> 1. Neil's "remove nfserr_dropit" patch doesn't apply unless both patches
> >> are reverted.
> >>
> >> 2. As I said above, NFSv2 does not have a mechanism like NFS3ERR_JUKEBOX
> >> to request that the client wait a bit and resend.
> >
> > ERR_JUKEBOX is not returned to another v2 or v3.
> >
> > Patch1 (nfsd: nfserr_jukebox in nlm_fopen should lead to a retry)
> > translates err_jukebox into the nlm_drop_reply and returns to lockd.
> > As the result, no error is returned to the client but the reply is
> > dropped all together.
>
> If you want NLM to drop the response, then set RQ_DROPME. Using
> nfserr_jukebox here is confusing -- it means "return a response to the
> client that requests a resend". You want NLM to /not send a response/,
> and we have a specific mechanism for that.

I don't understand why the suggestion holds value.

nlm_open() is going to call nfsd_open() which will return
nfserr_jukebox (for among other conditions) when there is a
conflicting delegation. Currently, nlm_fopen() would return nlm_failed
(it does not recognize nfserr_jukebox error code). nlm_fopen() has to
identify that nfserr_jukebox error as special to mean that the request
needs to be dropped. There is already a mechanism for it and that is
to return nlm_drop_reply.

What purpose would be to set RQ_DROPME for the nfserr_jukebox in
nlm_fopen() (note that nfserr_jukebox needs to be identified) . What
error should one be returning instead of "nlm_drop_reply"?  Why is
returning some other error code + setting RQ_DROPME be more "clear"?

>
>
> >> So, if 1/2 has been tested with NFSv2 and does not cause NFSD to leak
> >> nfserr_jukebox to NFSv2 clients, then please rebase that one on the
> >> current nfsd-testing branch and post it again.
> >>
> >>
> >>> This one I agree is not needed but I
> >>> thought was an improvement.
> >>>
> >>>>
> >>>>
> >>>>>> Thanks,
> >>>>>> NeilBrown
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> commit 062304a815fe10068c478a4a3f28cf091c55cb82
> >>>>>>> Author: J. Bruce Fields <bfields@xxxxxxxxxxxx>
> >>>>>>> Date:   Sun Jan 2 22:05:33 2011 -0500
> >>>>>>>
> >>>>>>>     nfsd: stop translating EAGAIN to nfserr_dropit
> >>>>>>>
> >>>>>>> diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
> >>>>>>> index dc9c2e3fd1b8..fd608a27a8d5 100644
> >>>>>>> --- a/fs/nfsd/nfsproc.c
> >>>>>>> +++ b/fs/nfsd/nfsproc.c
> >>>>>>> @@ -735,7 +735,8 @@ nfserrno (int errno)
> >>>>>>>                 { nfserr_stale, -ESTALE },
> >>>>>>>                 { nfserr_jukebox, -ETIMEDOUT },
> >>>>>>>                 { nfserr_jukebox, -ERESTARTSYS },
> >>>>>>> -               { nfserr_dropit, -EAGAIN },
> >>>>>>> +               { nfserr_jukebox, -EAGAIN },
> >>>>>>> +               { nfserr_jukebox, -EWOULDBLOCK },
> >>>>>>>                 { nfserr_jukebox, -ENOMEM },
> >>>>>>>                 { nfserr_badname, -ESRCH },
> >>>>>>>                 { nfserr_io, -ETXTBSY },
> >>>>>>>
> >>>>>>> so if fh_verify is failing whatever it is returning, it is not
> >>>>>>> nfserr_dropit nor is it nfserr_jukebox which means nlm_fopen() would
> >>>>>>> translate it to nlm_failed which with my patch would not trigger
> >>>>>>> nlm_lck_denied_grace_period error but resp->status would be set to
> >>>>>>> nlm_failed.
> >>>>>>>
> >>>>>>> So I circle back to I hope that convinces you that we don't need a
> >>>>>>> check for the reclaim lock.
> >>>>>>>
> >>>>>>> I believe nlm_drop_reply is nfsd_open's jukebox error, one of which is
> >>>>>>> delegation recall. it can be a memory failure. But I'm sure when
> >>>>>>> EWOULDBLOCK occurs.
> >>>>>>>
> >>>>>>>> At the very least we need to guard against the reclaim flag being set in
> >>>>>>>> the above test.  But I would much rather a more clear distinction were
> >>>>>>>> made between "drop because of a conflicting delegation" and "drop
> >>>>>>>> because of a delay getting upcall response".
> >>>>>>>> Maybe a new "nlm_conflicting_delegtion" return from ->fopen which nlm4
> >>>>>>>> (and ideally nlm2) handles appropriately.
> >>>>>>>
> >>>>>>>
> >>>>>>>> NeilBrown
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> +                     return rpc_success;
> >>>>>>>>> +             }
> >>>>>>>>> +             return nlm_drop_reply;
> >>>>>>>>> +     default:
> >>>>>>>>> +             return rpc_success;
> >>>>>>>>> +     }
> >>>>>>>>>
> >>>>>>>>>       /* Now try to lock the file */
> >>>>>>>>>       resp->status = nlmsvc_lock(rqstp, file, host, &argp->lock,
> >>>>>>>>> --
> >>>>>>>>> 2.47.1
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Chuck Lever
> >>>
> >>
> >>
> >> --
> >> Chuck Lever
>
>
> --
> Chuck Lever





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux