Re: PYNFS LOCK20 Blocking Lock Test Case

Calum Mackay <calum.mackay@xxxxxxxxxx> · Thu, 14 Aug 2025 02:29:42 +0100

On 12/08/2025 5:55 pm, Frank Filz wrote:
I believe this test case is wrong, relevant text from RFC:

Some clients require the support of blocking locks. The NFSv4
protocol must not rely on a callback mechanism and therefore is
unable to notify a client when a previously denied lock has been
granted. Clients have no choice but to continually poll for the
lock. This presents a fairness problem. Two new lock types are
added, READW and WRITEW, and are used to indicate to the server that
the client is requesting a blocking lock. The server should maintain
an ordered list of pending blocking locks. When the conflicting lock
is released, the server may wait the lease period for the first
waiting client to re-request the lock. After the lease period
expires, the next waiting client request is allowed the lock.

Test case:

     # Standard owner opens and locks a file
     fh1, stateid1 = c.create_confirm(t.word(), deny=OPEN4_SHARE_DENY_NONE)
     res1 = c.lock_file(t.word(), fh1, stateid1, type=WRITE_LT)
     check(res1, msg="Locking file %s" % t.word())
     # Second owner is denied a blocking lock
     file = c.homedir + [t.word()]
     fh2, stateid2 = c.open_confirm(b"owner2", file,
                                    access=OPEN4_SHARE_ACCESS_BOTH,
                                    deny=OPEN4_SHARE_DENY_NONE)
     res2 = c.lock_file(b"owner2", fh2, stateid2,
                        type=WRITEW_LT, lockowner=b"lockowner2_LOCK20")
     check(res2, NFS4ERR_DENIED, msg="Conflicting lock on %s" % t.word())
     sleeptime = c.getLeaseTime() // 2
     # Wait for queued lock to timeout
     for i in range(3):
         env.sleep(sleeptime, "Waiting for queued blocking lock to timeout")
         res = c.compound([op.renew(c.clientid)])
         check(res, [NFS4_OK, NFS4ERR_CB_PATH_DOWN])
     # Standard owner releases lock
     res1 = c.unlock_file(1, fh1, res1.lockid)
     check(res1)
     # Third owner tries to butt in and steal lock second owner is waiting
for
     # Should work, since second owner let his turn expire
     file = c.homedir + [t.word()]
     fh3, stateid3 = c.open_confirm(b"owner3", file,
                                    access=OPEN4_SHARE_ACCESS_BOTH,
                                    deny=OPEN4_SHARE_DENY_NONE)
     res3 = c.lock_file(b"owner3", fh3, stateid3,
                        type=WRITEW_LT, lockowner=b"lockowner3_LOCK20")
     check(res3, msg="Grabbing lock after another owner let his 'turn'
expire")

Note that the RFC indicated the client has one lease period AFTER the
conflicting lock is released to retry while the test case waits 1.5 lease
period after requesting the blocking lock before it releases the conflicting
lock...

Am I reading things right?

I see what you mean.

But since a waiting blocking lock client obviously doesn't know when the 
lock-holding client is going to release its lock, the waiting client has 
to start polling regularly as soon as its initial blocking lock request 
is denied. It has to poll at least once per lease period.

If the server notices that a waiting client hasn't polled once in a 
lease period, after its initial blocking lock request was denied, then 
it seems reasonable for the server to forget that waiting client's 
interest in the pending lock there and then. There's no need for the 
server to wait a further lease period after the lock is released.

Looking at the current Linux nfsd code, that does seem to be what it 
does. I see that when the server adds the blocking lock request to its 
pending list, it adds the current timestamp to it, i.e. the time that 
the blocking lock was requested.

The nfsd background clean-up thread (which runs at least once per lease 
period) removes any pending blocking lock requests if a lease period has 
passed since they were placed on the list (i.e. when the blocking lock 
was requested). There's a corresponding comment:

	/*
	 * It's possible for a client to try and acquire an already held lock
	 * that is being held for a long time, and then lose interest in it.
	 * So, we clean out any un-revisited request after a lease period
	 * under the assumption that the client is no longer interested.

https://elixir.bootlin.com/linux/v6.16/source/fs/nfsd/nfs4state.c#L6824

There's no pending locks action taken on lock release. The timing is 
based solely on when the blocking READW/WRITEW request occurred, i.e. 
the res2 WRITEW in the pynfs test, which is before the sleep.

So, whilst the RFC may seem to suggest the timer should start at lock 
release, it doesn't seem unreasonable for the NFS server to start the 
timer earlier, at the blocking lock request, to avoid an unnecessary 
delay upon lock release if the client has lost interest in the lock, 
i.e. it isn't polling.

Presumably, the pynfs test was originally written to match NFS server 
behaviour, rather than RFC wording. I'm not sure what other NFS servers 
do in this case. Waiting longer wouldn't change the test result in this 
case, I think.

Does that seem reasonable to you?

thanks,
calum.