Thanks > -----Original Message----- > From: Calum Mackay [mailto:calum.mackay@xxxxxxxxxx] > Sent: Thursday, August 21, 2025 10:41 AM > To: Frank Filz <ffilzlnx@xxxxxxxxxxxxxx>; linux-nfs@xxxxxxxxxxxxxxx > Cc: Calum Mackay <calum.mackay@xxxxxxxxxx>; 'Ofir Vainshtein' > <ofirvins@xxxxxxxxxx>; 'Chuck Lever' <chuck.lever@xxxxxxxxxx> > Subject: Re: PYNFS LOCK20 Blocking Lock Test Case > > On 21/08/2025 6:30 pm, Frank Filz wrote: > > Ah, I see the logic the test case is expecting. > > > > For Ganesha, we maintain the blocking lock so long as the clientid is being > renewed, so we don't start the timer for claiming the lock until the lock becomes > available which seems to be allowed per the RFC. Maybe we just need to not run > that test case. > > > > But it would be nice to have a similar test case that just takes too long after > the lock is available to retry. > > Thanks Frank; Chuck also made a similar suggestion to me privately. I'll have a > look at either adjusting this test to report information that the lock was > obtained "early", and/or a separate/optional test that more closely matches the > RFC wording, regardless of how the server might behave. > > Of course, the RFC wording, and lack of nominative language, suggests this is an > implementation choice, and thus difficult for pynfs tests to adjudicate on. Yes, that can be tricky. I added a ganesha tag so there could be a few implementation specific tests. > thanks, > calum. > > > > > Part of the challenge is we share a lot of logic between 4.0 and 4.1 and with > the actual callback in 4.1, there is no expectation of the client polling for the > lock. > > > > Let me mull this one over more. > > > > Thanks > > > > Frank > > > >> -----Original Message----- > >> From: Calum Mackay [mailto:calum.mackay@xxxxxxxxxx] > >> Sent: Wednesday, August 13, 2025 6:30 PM > >> To: Frank Filz <ffilzlnx@xxxxxxxxxxxxxx>; linux-nfs@xxxxxxxxxxxxxxx > >> Cc: Calum Mackay <calum.mackay@xxxxxxxxxx>; 'Ofir Vainshtein' > >> <ofirvins@xxxxxxxxxx>; Chuck Lever <chuck.lever@xxxxxxxxxx> > >> Subject: Re: PYNFS LOCK20 Blocking Lock Test Case > >> > >> On 12/08/2025 5:55 pm, Frank Filz wrote: > >>> I believe this test case is wrong, relevant text from RFC: > >>> > >>> Some clients require the support of blocking locks. The NFSv4 > >>> protocol must not rely on a callback mechanism and therefore is > >>> unable to notify a client when a previously denied lock has been granted. > >>> Clients have no choice but to continually poll for the lock. This > >>> presents a fairness problem. Two new lock types are added, READW and > >>> WRITEW, and are used to indicate to the server that the client is > >>> requesting a blocking lock. The server should maintain an ordered > >>> list of pending blocking locks. When the conflicting lock is > >>> released, the server may wait the lease period for the first waiting > >>> client to re-request the lock. After the lease period expires, the > >>> next waiting client request is allowed the lock. > >>> > >>> Test case: > >>> > >>> # Standard owner opens and locks a file > >>> fh1, stateid1 = c.create_confirm(t.word(), > >> deny=OPEN4_SHARE_DENY_NONE) > >>> res1 = c.lock_file(t.word(), fh1, stateid1, type=WRITE_LT) > >>> check(res1, msg="Locking file %s" % t.word()) > >>> # Second owner is denied a blocking lock > >>> file = c.homedir + [t.word()] > >>> fh2, stateid2 = c.open_confirm(b"owner2", file, > >>> access=OPEN4_SHARE_ACCESS_BOTH, > >>> deny=OPEN4_SHARE_DENY_NONE) > >>> res2 = c.lock_file(b"owner2", fh2, stateid2, > >>> type=WRITEW_LT, lockowner=b"lockowner2_LOCK20") > >>> check(res2, NFS4ERR_DENIED, msg="Conflicting lock on %s" % t.word()) > >>> sleeptime = c.getLeaseTime() // 2 > >>> # Wait for queued lock to timeout > >>> for i in range(3): > >>> env.sleep(sleeptime, "Waiting for queued blocking lock to timeout") > >>> res = c.compound([op.renew(c.clientid)]) > >>> check(res, [NFS4_OK, NFS4ERR_CB_PATH_DOWN]) > >>> # Standard owner releases lock > >>> res1 = c.unlock_file(1, fh1, res1.lockid) > >>> check(res1) > >>> # Third owner tries to butt in and steal lock second owner is > >>> waiting for > >>> # Should work, since second owner let his turn expire > >>> file = c.homedir + [t.word()] > >>> fh3, stateid3 = c.open_confirm(b"owner3", file, > >>> access=OPEN4_SHARE_ACCESS_BOTH, > >>> deny=OPEN4_SHARE_DENY_NONE) > >>> res3 = c.lock_file(b"owner3", fh3, stateid3, > >>> type=WRITEW_LT, lockowner=b"lockowner3_LOCK20") > >>> check(res3, msg="Grabbing lock after another owner let his 'turn' > >>> expire") > >>> > >>> Note that the RFC indicated the client has one lease period AFTER > >>> the conflicting lock is released to retry while the test case waits > >>> 1.5 lease period after requesting the blocking lock before it > >>> releases the conflicting lock... > >>> > >>> Am I reading things right? > >> > >> I see what you mean. > >> > >> But since a waiting blocking lock client obviously doesn't know when > >> the lock- holding client is going to release its lock, the waiting > >> client has to start polling regularly as soon as its initial blocking > >> lock request is denied. It has to poll at least once per lease period. > >> > >> If the server notices that a waiting client hasn't polled once in a > >> lease period, after its initial blocking lock request was denied, > >> then it seems reasonable for the server to forget that waiting > >> client's interest in the pending lock there and then. There's no need > >> for the server to wait a further lease period after the lock is released. > >> > >> > >> Looking at the current Linux nfsd code, that does seem to be what it > >> does. I see that when the server adds the blocking lock request to > >> its pending list, it adds the current timestamp to it, i.e. the time that the > blocking lock was requested. > >> > >> The nfsd background clean-up thread (which runs at least once per > >> lease > >> period) removes any pending blocking lock requests if a lease period > >> has passed since they were placed on the list (i.e. when the blocking lock was > requested). > >> There's a corresponding comment: > >> > >> /* > >> * It's possible for a client to try and acquire an already held lock > >> * that is being held for a long time, and then lose interest in it. > >> * So, we clean out any un-revisited request after a lease period > >> * under the assumption that the client is no longer interested. > >> > >> https://elixir.bootlin.com/linux/v6.16/source/fs/nfsd/nfs4state.c#L68 > >> 24 > >> > >> > >> There's no pending locks action taken on lock release. The timing is > >> based solely on when the blocking READW/WRITEW request occurred, i.e. > >> the res2 WRITEW in the pynfs test, which is before the sleep. > >> > >> So, whilst the RFC may seem to suggest the timer should start at lock > >> release, it doesn't seem unreasonable for the NFS server to start the > >> timer earlier, at the blocking lock request, to avoid an unnecessary > >> delay upon lock release if the client has lost interest in the lock, i.e. it isn't > polling. > >> > >> > >> Presumably, the pynfs test was originally written to match NFS server > >> behaviour, rather than RFC wording. I'm not sure what other NFS servers do > in this case. > >> Waiting longer wouldn't change the test result in this case, I think. > >> > >> > >> Does that seem reasonable to you? > >> > >> thanks, > >> calum. > > > > -- > Calum Mackay > Linux Kernel Engineering > Oracle Linux and Virtualisation