On Tue, 2025-06-10 at 05:42 -0700, Rick Macklem wrote: > On Tue, Jun 10, 2025 at 4:51 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > On Mon, 2025-06-09 at 18:06 -0700, Rick Macklem wrote: > > > On Mon, Jun 9, 2025 at 5:17 PM Dai Ngo <dai.ngo@xxxxxxxxxx> wrote: > > > > > > > > On 6/9/25 4:35 PM, Rick Macklem wrote: > > > > > Hi, > > > > > > > > > > I hope you don't mind a cross-post, but I thought both groups > > > > > might find this interesting... > > > > > > > > > > I have been creating a compound RPC that does REMOVE and > > > > > then tries to determine if the file object has been removed and > > > > > I was surprised to see quite different results from the Linux knfsd > > > > > and Solaris 11.4 NFSv4.1/4.2 servers. I think both these servers > > > > > provide FH4_PERSISTENT file handles, although I suppose I > > > > > should check that? > > > > > > > > > > First, the test OPEN/CREATEs a regular file called "foo" (only one > > > > > hard link) and acquires a write delegation for it. > > > > > Then a compound does the following: > > > > > ... > > > > > REMOVE foo > > > > > PUTFH fh for foo > > > > > GETATTR > > > > > > > > > > For the Solaris 11.4 server, the server CB_RECALLs the > > > > > delegation and then replies NFS4ERR_STALE for the PUTFH above. > > > > > (The FreeBSD server currently does the same.) > > > > > > > > > > For a fairly recent Linux (6.12) knfsd, the above replies NFS_OK > > > > > with nlinks == 0 in the GETATTR reply. > > > > > > > > > > Hmm. So I've looked in RFC8881 (I'm terrible at reading it so I > > > > > probably missed something) and I cannot find anything that states > > > > > either of the above behaviours is incorrect. > > > > This seems outside the scope of the spec. What you're probably seeing > > is just differences in the implementation details of the two servers. > > > > > > > (NFS4ERR_STALE is listed as an error code for PUTFH, but the > > > > > description of PUTFH only says that it sets the CFH to the fh arg. > > > > > It does not say anything w.r.t. the fh arg. needing to be for a file > > > > > that still exists.) Neither of these servers sets > > > > > OPEN4_RESULT_PRESERVE_UNLINKED in the OPEN reply. > > > > > > > > > > So, it looks like "file object no longer exists" is indicated either > > > > > by a NFS4ERR_STALE reply to either PUTFH or GETATTR > > > > > OR > > > > > by a successful reply, but with nlinks == 0 for the GETATTR reply. > > > > > > > > > > To be honest, I kinda like the Linux knfsd version, but I am wondering > > > > > if others think that both of these replies is correct? > > > > > > > > > > Also, is the CB_RECALL needed when the delegation is held by > > > > > the same client as the one doing the REMOVE? > > > > > > > > The Linux NFSD detects the delegation belongs to the same client that > > > > causes the conflict (due to REMOVE) and skips the CB_RECALL. This is > > > > an optimization based on the assumption that the client would handle > > > > the conflict locally. > > > And then what does the server do with the delegation? > > > - Does it just discard it, since the file object has been deleted? > > > OR > > > - Does it guarantee that a DELEGRETURN done after the REMOVE will > > > still work (which seems to be the case for the 6.12 server I am using for > > > testing). > > > > > > > The latter. The file on the server is still being held open by virtue > > of the fact that the client holds a delegation stateid on it. > > > > The inode will still exist in core (with nlinks == 0) until its last > > reference is released (here, when the client does the final > > DELEGRETURN). Aside from the fact that it's now disconnected from the > > filesystem namespace, it's still "alive", and reachable via filehandle. > Thanks for the info. (I had a hunch it was held by the delegation.) > I'll guess that implies that LINK could still be done, bumping nlink to 1 > before the DELEGRETURN? That means that nlink == 0 only guarantees > that the file object will be deleted if the client holds a write delegation and > ensures that LINK is not allowed before DELEGRETURN. > I believe that LINK is actually prevented at that point. The VFS only allows flink() to work when nlink == 0 on O_TMPFILE files, IIRC. IMO, that's a Linux implementation detail rather than something the NFS protocol or POSIX requires. > Although trying to avoid the WRITE, WRITE,...COMMIT to the server > just before a file is deleted seems worth the effort, it never seems to > be as easy as you'd think. > Definitely. The problem of course is that you can't really know whether a REMOVE will actually delete the file. It'll remove the name, but link() could have raced in, and at that point you sort of have to do the writes. > > > > > > > > > > If the REMOVE was done by another client, the REMOVE will not complete > > > > until the delegation is returned. If the PUTFH comes after the REMOVE > > > > was completed, it'll fail with NFS4ERR_STALE since the file, specified > > > > by the file handle, no longer exists. > > > Assuming the statement w.r.t. "fail with NFS4ERR_STALE" only applies to > > > "REMOVE done by another client" then that sounds fine. > > > However if the "fail with NFS4ERR_STALE is supposed for happen after > > > REMOVE for same client" then that is not what I am seeing. > > > If you are curious, the packet trace is here. (Look at packet#58). > > > https://people.freebsd.org/~rmacklem/linux-remove.pcap > > > > > > Btw, in case you are curious why I am doing this testing, I am trying > > > to figure out a good way for the FreeBSD client to handle temporary > > > files. Typically on POSIX they are done via the syscalls: > > > > > > fd = open("foo", O_CREATE ...); > > > unlink("foo"); > > > write(fd,..), write(fd,..)... > > > read(fd,...), read(fd,...)... > > > close(fd); > > > > > > If this happens quickly and is not too much writing, the writes > > > copy data into buffers/pages, the reads read the data out of > > > the pages and then it all gets deleted. > > > > > > > Yep, common pattern. > > > > > Unfortunately, the CB_RECALL forces the NFSv4.n client > > > to do WRITE, WRITE,..COMMIT and then DELEGRETURN. > > > Then the REMOVE throws all the data away on the NFSv4.n > > > server. > > > --> As such, I really like not doing the CB_RECALL for "same client". > > > My concern is "what happens to the delegation after the file object ("foo") > > > gets deleted? > > > It either needs to be thrown away by the NFSv4.n server or the > > > PUTFH, DELEGRETURN needs to work after the REMOVE. > > > > I think the latter. A REMOVE just removes the filename from the > > namespace. What happens to the underlying inode/vnode/whathaveyou is > > undefined by the protocol. The delegation is effectively holding the > > file open, so it needs to continue to exist on the server, just as the > > file "foo" in your example above must exist after the unlink(). > > > > > Otherwise, the NFSv4.n server may get constipated by the delegations, > > > which might be called stale, since the file object has been deleted. > > > > > > --> I can do PUTFH, GETATTR after REMOVE in the same compound, > > > to find out if the file object has been deleted. But then, if a > > > PUTFH, DELEGRETURN fails with NFS4ERR_STALE, can I get > > > away with saying "the server should just discard the delegation as > > > the client already has done so??. > > > > > > Thanks for your comments, rick > > > > > > > If you still have an outstanding delegation after a REMOVE, then > > returning ESTALE on the filehandle at that point seems wrong. The > > delegation still exists, so the underlying filehandle should still > > exist. > > > > Linux doesn't generally throw back an NFS4ERR_STALE until it just can't > > find the inode at all anymore. A dentry holds a reference to the inode, > > and open files hold a reference to the dentry. The remove just > > disconnects the dentry from the namespace and drops its refcount. When > > the DELEGRETURN issues the last close, the inode gets cleaned up and at > > that point you can't find it by filehandle anymore. > > > > You probably want to aim for similar behavior in FreeBSD? > I'm not sure. So long as the server guarantees that the file object has been > deleted by the REMOVE, throwing NFS4ERR_STALE seems a reasonable alternative? > At that point won't you have to start returning writeback errors back to userland? What if you do this? fd = open("foo", O_CREATE ...); unlink("foo"); write(fd,..), write(fd,..)... fsync(fd); In the absence of a delegation, won't the fsync get back an error here because the file is now stale? > Note that the FreeBSD server does not handle NFSv4 OPENs and > DELEGATIONs like a POSIX open(2), so the file handle is no longer > valid once nlink == 0 on the underlying vnode/inode. > (Again, I don't think there is anything in RFC8881 that specifies > what is correct behaviour for this?) > > It's a case where I'd like to be able to test against all extant servers, > but none of the others show up at Bakeathons these days. Sigh. > > Thanks for your comments, rick > > > > > > > > > > > -Dai > > > > > > > > > (I don't think it is, but there is a discussion in 18.25.4 which says > > > > > "When the determination above cannot be made definitively because > > > > > delegations are being held, they MUST be recalled.." but everything > > > > > above that is a may/MAY, so it is not obvious to me if a server really > > > > > needs to case?) > > > > > > > > > > Any comments? Thanks, rick > > > > > ps: I am amazed when I learn these things about NFSv4.n after all > > > > > these years. > > > > > > > > > > > -- > > Jeff Layton <jlayton@xxxxxxxxxx> -- Jeff Layton <jlayton@xxxxxxxxxx>