Re: simple NFSv4.1/4.2 test of remove while holding a delegation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 10, 2025 at 4:58 AM Dai Ngo <dai.ngo@xxxxxxxxxx> wrote:
>
>
> On 6/9/25 6:06 PM, Rick Macklem wrote:
> > On Mon, Jun 9, 2025 at 5:17 PM Dai Ngo <dai.ngo@xxxxxxxxxx> wrote:
> >> On 6/9/25 4:35 PM, Rick Macklem wrote:
> >>> Hi,
> >>>
> >>> I hope you don't mind a cross-post, but I thought both groups
> >>> might find this interesting...
> >>>
> >>> I have been creating a compound RPC that does REMOVE and
> >>> then tries to determine if the file object has been removed and
> >>> I was surprised to see quite different results from the Linux knfsd
> >>> and Solaris 11.4 NFSv4.1/4.2 servers. I think both these servers
> >>> provide FH4_PERSISTENT file handles, although I suppose I
> >>> should check that?
> >>>
> >>> First, the test OPEN/CREATEs a regular file called "foo" (only one
> >>> hard link) and acquires a write delegation for it.
> >>> Then a compound does the following:
> >>> ...
> >>> REMOVE foo
> >>> PUTFH fh for foo
> >>> GETATTR
> >>>
> >>> For the Solaris 11.4 server, the server CB_RECALLs the
> >>> delegation and then replies NFS4ERR_STALE for the PUTFH above.
> >>> (The FreeBSD server currently does the same.)
> >>>
> >>> For a fairly recent Linux (6.12) knfsd, the above replies NFS_OK
> >>> with nlinks == 0 in the GETATTR reply.
> >>>
> >>> Hmm. So I've looked in RFC8881 (I'm terrible at reading it so I
> >>> probably missed something) and I cannot find anything that states
> >>> either of the above behaviours is incorrect.
> >>> (NFS4ERR_STALE is listed as an error code for PUTFH, but the
> >>> description of PUTFH only says that it sets the CFH to the fh arg.
> >>> It does not say anything w.r.t. the fh arg. needing to be for a file
> >>> that still exists.) Neither of these servers sets
> >>> OPEN4_RESULT_PRESERVE_UNLINKED in the OPEN reply.
> >>>
> >>> So, it looks like "file object no longer exists" is indicated either
> >>> by a NFS4ERR_STALE reply to either PUTFH or GETATTR
> >>> OR
> >>> by a successful reply, but with nlinks == 0 for the GETATTR reply.
> >>>
> >>> To be honest, I kinda like the Linux knfsd version, but I am wondering
> >>> if others think that both of these replies is correct?
> >>>
> >>> Also, is the CB_RECALL needed when the delegation is held by
> >>> the same client as the one doing the REMOVE?
> >> The Linux NFSD detects the delegation belongs to the same client that
> >> causes the conflict (due to REMOVE) and skips the CB_RECALL. This is
> >> an optimization based on the assumption that the client would handle
> >> the conflict locally.
> > And then what does the server do with the delegation?
> > - Does it just discard it, since the file object has been deleted?
> > OR
> > - Does it guarantee that a DELEGRETURN done after the REMOVE will
> >    still work (which seems to be the case for the 6.12 server I am using for
> >    testing).
>
> The delegation remains valid but the file was removed from the namespace.
> This is why the PUTFH and GETATTR in your test did not fail. However, any
> lookup of the file will fail.
>
> >
> >> If the REMOVE was done by another client, the REMOVE will not complete
> >> until the delegation is returned. If the PUTFH comes after the REMOVE
> >> was completed, it'll  fail with NFS4ERR_STALE since the file, specified
> >> by the file handle, no longer exists.
> > Assuming the statement w.r.t. "fail with NFS4ERR_STALE" only applies to
> > "REMOVE done by another client" then that sounds fine.
>
> Correction: even if the REMOVE was done by the another client and the
> delegation was recalled from the 1st client, the open stateid of the file
> remains valid until the client sends the CLOSE. So the PUTFH won't fail
> regardless which client sends the REMOVE.
So, should your server be setting OPEN4_RESULT_PRESERVE_UNLINKED
in OPEN replies, given this semantic?
--> If the FH remains valid after REMOVE drops nlink to 0 semantic
were indicated by
     the OPEN4_RESULT_PRESERVE_UNLINKED flag, a client could check for
     this flag and handle in appropriately.

rick

>
> > However if the "fail with NFS4ERR_STALE is supposed for happen after
> > REMOVE for same client" then that is not what I am seeing.
> > If you are curious, the packet trace is here. (Look at packet#58).
> > https://urldefense.com/v3/__https://people.freebsd.org/*rmacklem/linux-remove.pcap__;fg!!ACWV5N9M2RV99hQ!IEcffaAAeLhuzaJUO5rQOv0jUUk4ltuMpfqT83lLFkRL9cqOZEvZ-8GGjvoqlVAQKi_FAAhsKEl5NjvS0OLJ$
> >
> > Btw, in case you are curious why I am doing this testing, I am trying
> > to figure out a good way for the FreeBSD client to handle temporary
> > files. Typically on POSIX they are done via the syscalls:
> >
> > fd = open("foo", O_CREATE ...);
> > unlink("foo");
> > write(fd,..), write(fd,..)...
> > read(fd,...), read(fd,...)...
> > close(fd);
> >
> > If this happens quickly and is not too much writing, the writes
> > copy data into buffers/pages, the reads read the data out of
> > the pages and then it all gets deleted.
> >
> > Unfortunately, the CB_RECALL forces the NFSv4.n client
> > to do WRITE, WRITE,..COMMIT and then DELEGRETURN.
> > Then the REMOVE throws all the data away on the NFSv4.n
> > server.
> > --> As such, I really like not doing the CB_RECALL for "same client".
> > My concern is "what happens to the delegation after the file object ("foo")
> > gets deleted?
> > It either needs to be thrown away by the NFSv4.n server or the
> > PUTFH, DELEGRETURN needs to work after the REMOVE.
>
> The PUTFH and DELEGRETURN continue to work after the REMOVE. The open
> stateid and delegation stateid on the server are destroyed only after
> the client sends the DELEGRETURN and CLOSE.
>
> > Otherwise, the NFSv4.n server may get constipated by the delegations,
> > which might be called stale, since the file object has been deleted.
> >
> > --> I can do PUTFH, GETATTR after REMOVE in the same compound,
> >       to find out if the file object has been deleted. But then, if a
> >       PUTFH, DELEGRETURN fails with NFS4ERR_STALE, can I get
> >       away with saying "the server should just discard the delegation as
> >       the client already has done so??.
>
> You can try your test but I believe the PUTFH and GETATTR won't fail
> after the REMOVE.
>
> -Dai
>
> >
> > Thanks for your comments, rick
> >
> >> -Dai
> >>
> >>> (I don't think it is, but there is a discussion in 18.25.4 which says
> >>> "When the determination above cannot be made definitively because
> >>> delegations are being held, they MUST be recalled.." but everything
> >>> above that is a may/MAY, so it is not obvious to me if a server really
> >>> needs to case?)
> >>>
> >>> Any comments? Thanks, rick
> >>> ps: I am amazed when I learn these things about NFSv4.n after all
> >>>         these years.
> >>>





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux