On Tue, Jun 10, 2025 at 4:58 AM Dai Ngo <dai.ngo@xxxxxxxxxx> wrote: > > > On 6/9/25 6:06 PM, Rick Macklem wrote: > > On Mon, Jun 9, 2025 at 5:17 PM Dai Ngo <dai.ngo@xxxxxxxxxx> wrote: > >> On 6/9/25 4:35 PM, Rick Macklem wrote: > >>> Hi, > >>> > >>> I hope you don't mind a cross-post, but I thought both groups > >>> might find this interesting... > >>> > >>> I have been creating a compound RPC that does REMOVE and > >>> then tries to determine if the file object has been removed and > >>> I was surprised to see quite different results from the Linux knfsd > >>> and Solaris 11.4 NFSv4.1/4.2 servers. I think both these servers > >>> provide FH4_PERSISTENT file handles, although I suppose I > >>> should check that? > >>> > >>> First, the test OPEN/CREATEs a regular file called "foo" (only one > >>> hard link) and acquires a write delegation for it. > >>> Then a compound does the following: > >>> ... > >>> REMOVE foo > >>> PUTFH fh for foo > >>> GETATTR > >>> > >>> For the Solaris 11.4 server, the server CB_RECALLs the > >>> delegation and then replies NFS4ERR_STALE for the PUTFH above. > >>> (The FreeBSD server currently does the same.) > >>> > >>> For a fairly recent Linux (6.12) knfsd, the above replies NFS_OK > >>> with nlinks == 0 in the GETATTR reply. > >>> > >>> Hmm. So I've looked in RFC8881 (I'm terrible at reading it so I > >>> probably missed something) and I cannot find anything that states > >>> either of the above behaviours is incorrect. > >>> (NFS4ERR_STALE is listed as an error code for PUTFH, but the > >>> description of PUTFH only says that it sets the CFH to the fh arg. > >>> It does not say anything w.r.t. the fh arg. needing to be for a file > >>> that still exists.) Neither of these servers sets > >>> OPEN4_RESULT_PRESERVE_UNLINKED in the OPEN reply. > >>> > >>> So, it looks like "file object no longer exists" is indicated either > >>> by a NFS4ERR_STALE reply to either PUTFH or GETATTR > >>> OR > >>> by a successful reply, but with nlinks == 0 for the GETATTR reply. > >>> > >>> To be honest, I kinda like the Linux knfsd version, but I am wondering > >>> if others think that both of these replies is correct? > >>> > >>> Also, is the CB_RECALL needed when the delegation is held by > >>> the same client as the one doing the REMOVE? > >> The Linux NFSD detects the delegation belongs to the same client that > >> causes the conflict (due to REMOVE) and skips the CB_RECALL. This is > >> an optimization based on the assumption that the client would handle > >> the conflict locally. > > And then what does the server do with the delegation? > > - Does it just discard it, since the file object has been deleted? > > OR > > - Does it guarantee that a DELEGRETURN done after the REMOVE will > > still work (which seems to be the case for the 6.12 server I am using for > > testing). > > The delegation remains valid but the file was removed from the namespace. > This is why the PUTFH and GETATTR in your test did not fail. However, any > lookup of the file will fail. > > > > >> If the REMOVE was done by another client, the REMOVE will not complete > >> until the delegation is returned. If the PUTFH comes after the REMOVE > >> was completed, it'll fail with NFS4ERR_STALE since the file, specified > >> by the file handle, no longer exists. > > Assuming the statement w.r.t. "fail with NFS4ERR_STALE" only applies to > > "REMOVE done by another client" then that sounds fine. > > Correction: even if the REMOVE was done by the another client and the > delegation was recalled from the 1st client, the open stateid of the file > remains valid until the client sends the CLOSE. So the PUTFH won't fail > regardless which client sends the REMOVE. So, should your server be setting OPEN4_RESULT_PRESERVE_UNLINKED in OPEN replies, given this semantic? --> If the FH remains valid after REMOVE drops nlink to 0 semantic were indicated by the OPEN4_RESULT_PRESERVE_UNLINKED flag, a client could check for this flag and handle in appropriately. rick > > > However if the "fail with NFS4ERR_STALE is supposed for happen after > > REMOVE for same client" then that is not what I am seeing. > > If you are curious, the packet trace is here. (Look at packet#58). > > https://urldefense.com/v3/__https://people.freebsd.org/*rmacklem/linux-remove.pcap__;fg!!ACWV5N9M2RV99hQ!IEcffaAAeLhuzaJUO5rQOv0jUUk4ltuMpfqT83lLFkRL9cqOZEvZ-8GGjvoqlVAQKi_FAAhsKEl5NjvS0OLJ$ > > > > Btw, in case you are curious why I am doing this testing, I am trying > > to figure out a good way for the FreeBSD client to handle temporary > > files. Typically on POSIX they are done via the syscalls: > > > > fd = open("foo", O_CREATE ...); > > unlink("foo"); > > write(fd,..), write(fd,..)... > > read(fd,...), read(fd,...)... > > close(fd); > > > > If this happens quickly and is not too much writing, the writes > > copy data into buffers/pages, the reads read the data out of > > the pages and then it all gets deleted. > > > > Unfortunately, the CB_RECALL forces the NFSv4.n client > > to do WRITE, WRITE,..COMMIT and then DELEGRETURN. > > Then the REMOVE throws all the data away on the NFSv4.n > > server. > > --> As such, I really like not doing the CB_RECALL for "same client". > > My concern is "what happens to the delegation after the file object ("foo") > > gets deleted? > > It either needs to be thrown away by the NFSv4.n server or the > > PUTFH, DELEGRETURN needs to work after the REMOVE. > > The PUTFH and DELEGRETURN continue to work after the REMOVE. The open > stateid and delegation stateid on the server are destroyed only after > the client sends the DELEGRETURN and CLOSE. > > > Otherwise, the NFSv4.n server may get constipated by the delegations, > > which might be called stale, since the file object has been deleted. > > > > --> I can do PUTFH, GETATTR after REMOVE in the same compound, > > to find out if the file object has been deleted. But then, if a > > PUTFH, DELEGRETURN fails with NFS4ERR_STALE, can I get > > away with saying "the server should just discard the delegation as > > the client already has done so??. > > You can try your test but I believe the PUTFH and GETATTR won't fail > after the REMOVE. > > -Dai > > > > > Thanks for your comments, rick > > > >> -Dai > >> > >>> (I don't think it is, but there is a discussion in 18.25.4 which says > >>> "When the determination above cannot be made definitively because > >>> delegations are being held, they MUST be recalled.." but everything > >>> above that is a may/MAY, so it is not obvious to me if a server really > >>> needs to case?) > >>> > >>> Any comments? Thanks, rick > >>> ps: I am amazed when I learn these things about NFSv4.n after all > >>> these years. > >>>