On Tue, Jun 10, 2025 at 6:28 AM Dai Ngo <dai.ngo@xxxxxxxxxx> wrote: > > > On 6/10/25 6:16 AM, Rick Macklem wrote: > > On Tue, Jun 10, 2025 at 4:58 AM Dai Ngo <dai.ngo@xxxxxxxxxx> wrote: > > On 6/9/25 6:06 PM, Rick Macklem wrote: > > On Mon, Jun 9, 2025 at 5:17 PM Dai Ngo <dai.ngo@xxxxxxxxxx> wrote: > > On 6/9/25 4:35 PM, Rick Macklem wrote: > > Hi, > > I hope you don't mind a cross-post, but I thought both groups > might find this interesting... > > I have been creating a compound RPC that does REMOVE and > then tries to determine if the file object has been removed and > I was surprised to see quite different results from the Linux knfsd > and Solaris 11.4 NFSv4.1/4.2 servers. I think both these servers > provide FH4_PERSISTENT file handles, although I suppose I > should check that? > > First, the test OPEN/CREATEs a regular file called "foo" (only one > hard link) and acquires a write delegation for it. > Then a compound does the following: > ... > REMOVE foo > PUTFH fh for foo > GETATTR > > For the Solaris 11.4 server, the server CB_RECALLs the > delegation and then replies NFS4ERR_STALE for the PUTFH above. > (The FreeBSD server currently does the same.) > > For a fairly recent Linux (6.12) knfsd, the above replies NFS_OK > with nlinks == 0 in the GETATTR reply. > > Hmm. So I've looked in RFC8881 (I'm terrible at reading it so I > probably missed something) and I cannot find anything that states > either of the above behaviours is incorrect. > (NFS4ERR_STALE is listed as an error code for PUTFH, but the > description of PUTFH only says that it sets the CFH to the fh arg. > It does not say anything w.r.t. the fh arg. needing to be for a file > that still exists.) Neither of these servers sets > OPEN4_RESULT_PRESERVE_UNLINKED in the OPEN reply. > > So, it looks like "file object no longer exists" is indicated either > by a NFS4ERR_STALE reply to either PUTFH or GETATTR > OR > by a successful reply, but with nlinks == 0 for the GETATTR reply. The nlink == 0 case does not work, since another client can do a LINK. It does imply that the WRITE, WRITE,...COMMIT, DELEGRET can be done asynchronously from the unlink(2) or close(2) that is causing the REMOVE. > > To be honest, I kinda like the Linux knfsd version, but I am wondering > if others think that both of these replies is correct? > > Also, is the CB_RECALL needed when the delegation is held by > the same client as the one doing the REMOVE? > > The Linux NFSD detects the delegation belongs to the same client that > causes the conflict (due to REMOVE) and skips the CB_RECALL. This is > an optimization based on the assumption that the client would handle > the conflict locally. > > And then what does the server do with the delegation? > - Does it just discard it, since the file object has been deleted? > OR > - Does it guarantee that a DELEGRETURN done after the REMOVE will > still work (which seems to be the case for the 6.12 server I am using for > testing). > > The delegation remains valid but the file was removed from the namespace. > This is why the PUTFH and GETATTR in your test did not fail. However, any > lookup of the file will fail. > > If the REMOVE was done by another client, the REMOVE will not complete > until the delegation is returned. If the PUTFH comes after the REMOVE > was completed, it'll fail with NFS4ERR_STALE since the file, specified > by the file handle, no longer exists. > > Assuming the statement w.r.t. "fail with NFS4ERR_STALE" only applies to > "REMOVE done by another client" then that sounds fine. > > Correction: even if the REMOVE was done by the another client and the > delegation was recalled from the 1st client, the open stateid of the file > remains valid until the client sends the CLOSE. So the PUTFH won't fail > regardless which client sends the REMOVE. > > So, should your server be setting OPEN4_RESULT_PRESERVE_UNLINKED > in OPEN replies, given this semantic? > --> If the FH remains valid after REMOVE drops nlink to 0 semantic > were indicated by > the OPEN4_RESULT_PRESERVE_UNLINKED flag, a client could check for > this flag and handle in appropriately. > > I believe the Linux NFSD currently does not support OPEN4_RESULT_PRESERVE_UNLINKED. Yes, if the server will throw away the file object when it reboots, you cannot set this flag. Thanks for all your comments, rick > > -Dai > > rick > > However if the "fail with NFS4ERR_STALE is supposed for happen after > REMOVE for same client" then that is not what I am seeing. > If you are curious, the packet trace is here. (Look at packet#58). > https://urldefense.com/v3/__https://people.freebsd.org/*rmacklem/linux-remove.pcap__;fg!!ACWV5N9M2RV99hQ!IEcffaAAeLhuzaJUO5rQOv0jUUk4ltuMpfqT83lLFkRL9cqOZEvZ-8GGjvoqlVAQKi_FAAhsKEl5NjvS0OLJ$ > > Btw, in case you are curious why I am doing this testing, I am trying > to figure out a good way for the FreeBSD client to handle temporary > files. Typically on POSIX they are done via the syscalls: > > fd = open("foo", O_CREATE ...); > unlink("foo"); > write(fd,..), write(fd,..)... > read(fd,...), read(fd,...)... > close(fd); > > If this happens quickly and is not too much writing, the writes > copy data into buffers/pages, the reads read the data out of > the pages and then it all gets deleted. > > Unfortunately, the CB_RECALL forces the NFSv4.n client > to do WRITE, WRITE,..COMMIT and then DELEGRETURN. > Then the REMOVE throws all the data away on the NFSv4.n > server. > --> As such, I really like not doing the CB_RECALL for "same client". > My concern is "what happens to the delegation after the file object ("foo") > gets deleted? > It either needs to be thrown away by the NFSv4.n server or the > PUTFH, DELEGRETURN needs to work after the REMOVE. > > The PUTFH and DELEGRETURN continue to work after the REMOVE. The open > stateid and delegation stateid on the server are destroyed only after > the client sends the DELEGRETURN and CLOSE. > > Otherwise, the NFSv4.n server may get constipated by the delegations, > which might be called stale, since the file object has been deleted. > > --> I can do PUTFH, GETATTR after REMOVE in the same compound, > to find out if the file object has been deleted. But then, if a > PUTFH, DELEGRETURN fails with NFS4ERR_STALE, can I get > away with saying "the server should just discard the delegation as > the client already has done so??. > > You can try your test but I believe the PUTFH and GETATTR won't fail > after the REMOVE. > > -Dai > > Thanks for your comments, rick > > -Dai > > (I don't think it is, but there is a discussion in 18.25.4 which says > "When the determination above cannot be made definitively because > delegations are being held, they MUST be recalled.." but everything > above that is a may/MAY, so it is not obvious to me if a server really > needs to case?) > > Any comments? Thanks, rick > ps: I am amazed when I learn these things about NFSv4.n after all > these years. >