On Thu, Aug 21, 2025 at 2:39 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > > On 8/21/25 2:33 PM, Olga Kornievskaia wrote: > > On Thu, Aug 21, 2025 at 2:24 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > >> > >> On 8/21/25 2:20 PM, Olga Kornievskaia wrote: > >>> On Thu, Aug 21, 2025 at 11:09 AM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > >>>> > >>>> On 8/21/25 9:56 AM, Olga Kornievskaia wrote: > >>>>> On Wed, Aug 20, 2025 at 7:15 PM NeilBrown <neil@xxxxxxxxxx> wrote: > >>>>>> > >>>>>> On Thu, 14 Aug 2025, Olga Kornievskaia wrote: > >>>>>>> On Tue, Aug 12, 2025 at 8:05 PM NeilBrown <neil@xxxxxxxxxx> wrote: > >>>>>>>> > >>>>>>>> On Wed, 13 Aug 2025, Olga Kornievskaia wrote: > >>>>>>>>> When nfsd is in grace and receives an NLM LOCK request which turns > >>>>>>>>> out to have a conflicting delegation, return that the server is in > >>>>>>>>> grace. > >>>>>>>>> > >>>>>>>>> Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx> > >>>>>>>>> Signed-off-by: Olga Kornievskaia <okorniev@xxxxxxxxxx> > >>>>>>>>> --- > >>>>>>>>> fs/lockd/svc4proc.c | 15 +++++++++++++-- > >>>>>>>>> 1 file changed, 13 insertions(+), 2 deletions(-) > >>>>>>>>> > >>>>>>>>> diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c > >>>>>>>>> index 109e5caae8c7..7ac4af5c9875 100644 > >>>>>>>>> --- a/fs/lockd/svc4proc.c > >>>>>>>>> +++ b/fs/lockd/svc4proc.c > >>>>>>>>> @@ -141,8 +141,19 @@ __nlm4svc_proc_lock(struct svc_rqst *rqstp, struct nlm_res *resp) > >>>>>>>>> resp->cookie = argp->cookie; > >>>>>>>>> > >>>>>>>>> /* Obtain client and file */ > >>>>>>>>> - if ((resp->status = nlm4svc_retrieve_args(rqstp, argp, &host, &file))) > >>>>>>>>> - return resp->status == nlm_drop_reply ? rpc_drop_reply :rpc_success; > >>>>>>>>> + resp->status = nlm4svc_retrieve_args(rqstp, argp, &host, &file); > >>>>>>>>> + switch (resp->status) { > >>>>>>>>> + case 0: > >>>>>>>>> + break; > >>>>>>>>> + case nlm_drop_reply: > >>>>>>>>> + if (locks_in_grace(SVC_NET(rqstp))) { > >>>>>>>>> + resp->status = nlm_lck_denied_grace_period; > >>>>>>>> > >>>>>>>> I think this is wrong. If the lock request has the "reclaim" flag set, > >>>>>>>> then nlm_lck_denied_grace_period is not a meaningful error. > >>>>>>>> nlm4svc_retrieve_args() returns nlm_drop_reply when there is a delay > >>>>>>>> getting a response to an upcall to mountd. For NLM the request really > >>>>>>>> must be dropped. > >>>>>>> > >>>>>>> Thank you for pointing out this case so we are suggesting to. > >>>>>>> > >>>>>>> if (locks_in_grace(SVC_NET(rqstp)) && !argp->reclaim) > >>>>>>> > >>>>>>> However, I've been looking and looking but I cannot figure out how > >>>>>>> nlm4svc_retrieve_args() would ever get nlm_drop_reply. You say it can > >>>>>>> happen during the upcall to mountd. So that happens within nfsd_open() > >>>>>>> call and a part of fh_verify(). > >>>>>>> To return nlm_drop_reply, nlm_fopen() must have gotten nfserr_dropit > >>>>>>> from the nfsd_open(). I have searched and searched but I don't see > >>>>>>> anything that ever sets nfserr_dropit (NFSERR_DROPIT). > >>>>>>> > >>>>>>> I searched the logs and nfserr_dropit was an error that was EAGAIN > >>>>>>> translated into but then removed by the following patch. > >>>>>> > >>>>>> Oh. I didn't know that. > >>>>>> We now use RQ_DROPME instead. > >>>>>> I guess we should remove NFSERR_DROPIT completely as it not used at all > >>>>>> any more. > >>>>>> > >>>>>> Though returning nfserr_jukebox to an v2 client isn't a good idea.... > >>>>> > >>>>> I'll take your word for you. > >>>>> > >>>>>> So I guess my main complaint isn't valid, but I still don't like this > >>>>>> patch. It seems an untidy place to put the locks_in_grace test. > >>>>>> Other callers of nlm4svc_retrieve_args() check locks_in_grace() before > >>>>>> making that call. __nlm4svc_proc_lock probably should too. Or is there > >>>>>> a reason that it is delayed until the middle of nlmsvc_lock().. > >>>>> > >>>>> I thought the same about why not adding the in_grace check and decided > >>>>> that it was probably because you dont want to deny a lock if there are > >>>>> no conflicts. If we add it, somebody might notice and will complain > >>>>> that they can't get their lock when there are no conflicts. > >>>>> > >>>>>> The patch is not needed and isn't clearly an improvement, so I would > >>>>>> rather it were dropped. > >>>>> > >>>>> I'm not against dropping this patch if the conclusion is that dropping > >>>>> the packet is no worse than returning in_grace error. > >>>> > >>>> I dropped both of these from nfsd-testing. If a fix is still needed, > >>>> let's start again with fresh patches. > >>> > >>> Can you clarify when you said "both"? > >>> > >>> What objections are there for the 1st patch in the series. It solves a > >>> problem and a fix is needed. > >> > >> There are two reasons to drop the first patch. > >> > >> 1. Neil's "remove nfserr_dropit" patch doesn't apply unless both patches > >> are reverted. > >> > >> 2. As I said above, NFSv2 does not have a mechanism like NFS3ERR_JUKEBOX > >> to request that the client wait a bit and resend. > > > > ERR_JUKEBOX is not returned to another v2 or v3. > > > > Patch1 (nfsd: nfserr_jukebox in nlm_fopen should lead to a retry) > > translates err_jukebox into the nlm_drop_reply and returns to lockd. > > As the result, no error is returned to the client but the reply is > > dropped all together. > > If you want NLM to drop the response, then set RQ_DROPME. Using > nfserr_jukebox here is confusing -- it means "return a response to the > client that requests a resend". You want NLM to /not send a response/, > and we have a specific mechanism for that. I don't understand why the suggestion holds value. nlm_open() is going to call nfsd_open() which will return nfserr_jukebox (for among other conditions) when there is a conflicting delegation. Currently, nlm_fopen() would return nlm_failed (it does not recognize nfserr_jukebox error code). nlm_fopen() has to identify that nfserr_jukebox error as special to mean that the request needs to be dropped. There is already a mechanism for it and that is to return nlm_drop_reply. What purpose would be to set RQ_DROPME for the nfserr_jukebox in nlm_fopen() (note that nfserr_jukebox needs to be identified) . What error should one be returning instead of "nlm_drop_reply"? Why is returning some other error code + setting RQ_DROPME be more "clear"? > > > >> So, if 1/2 has been tested with NFSv2 and does not cause NFSD to leak > >> nfserr_jukebox to NFSv2 clients, then please rebase that one on the > >> current nfsd-testing branch and post it again. > >> > >> > >>> This one I agree is not needed but I > >>> thought was an improvement. > >>> > >>>> > >>>> > >>>>>> Thanks, > >>>>>> NeilBrown > >>>>>> > >>>>>> > >>>>>>> > >>>>>>> commit 062304a815fe10068c478a4a3f28cf091c55cb82 > >>>>>>> Author: J. Bruce Fields <bfields@xxxxxxxxxxxx> > >>>>>>> Date: Sun Jan 2 22:05:33 2011 -0500 > >>>>>>> > >>>>>>> nfsd: stop translating EAGAIN to nfserr_dropit > >>>>>>> > >>>>>>> diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c > >>>>>>> index dc9c2e3fd1b8..fd608a27a8d5 100644 > >>>>>>> --- a/fs/nfsd/nfsproc.c > >>>>>>> +++ b/fs/nfsd/nfsproc.c > >>>>>>> @@ -735,7 +735,8 @@ nfserrno (int errno) > >>>>>>> { nfserr_stale, -ESTALE }, > >>>>>>> { nfserr_jukebox, -ETIMEDOUT }, > >>>>>>> { nfserr_jukebox, -ERESTARTSYS }, > >>>>>>> - { nfserr_dropit, -EAGAIN }, > >>>>>>> + { nfserr_jukebox, -EAGAIN }, > >>>>>>> + { nfserr_jukebox, -EWOULDBLOCK }, > >>>>>>> { nfserr_jukebox, -ENOMEM }, > >>>>>>> { nfserr_badname, -ESRCH }, > >>>>>>> { nfserr_io, -ETXTBSY }, > >>>>>>> > >>>>>>> so if fh_verify is failing whatever it is returning, it is not > >>>>>>> nfserr_dropit nor is it nfserr_jukebox which means nlm_fopen() would > >>>>>>> translate it to nlm_failed which with my patch would not trigger > >>>>>>> nlm_lck_denied_grace_period error but resp->status would be set to > >>>>>>> nlm_failed. > >>>>>>> > >>>>>>> So I circle back to I hope that convinces you that we don't need a > >>>>>>> check for the reclaim lock. > >>>>>>> > >>>>>>> I believe nlm_drop_reply is nfsd_open's jukebox error, one of which is > >>>>>>> delegation recall. it can be a memory failure. But I'm sure when > >>>>>>> EWOULDBLOCK occurs. > >>>>>>> > >>>>>>>> At the very least we need to guard against the reclaim flag being set in > >>>>>>>> the above test. But I would much rather a more clear distinction were > >>>>>>>> made between "drop because of a conflicting delegation" and "drop > >>>>>>>> because of a delay getting upcall response". > >>>>>>>> Maybe a new "nlm_conflicting_delegtion" return from ->fopen which nlm4 > >>>>>>>> (and ideally nlm2) handles appropriately. > >>>>>>> > >>>>>>> > >>>>>>>> NeilBrown > >>>>>>>> > >>>>>>>> > >>>>>>>>> + return rpc_success; > >>>>>>>>> + } > >>>>>>>>> + return nlm_drop_reply; > >>>>>>>>> + default: > >>>>>>>>> + return rpc_success; > >>>>>>>>> + } > >>>>>>>>> > >>>>>>>>> /* Now try to lock the file */ > >>>>>>>>> resp->status = nlmsvc_lock(rqstp, file, host, &argp->lock, > >>>>>>>>> -- > >>>>>>>>> 2.47.1 > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>> > >>>> > >>>> -- > >>>> Chuck Lever > >>> > >> > >> > >> -- > >> Chuck Lever > > > -- > Chuck Lever