Re: simple NFSv4.1/4.2 test of remove while holding a delegation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 10, 2025 at 6:28 AM Dai Ngo <dai.ngo@xxxxxxxxxx> wrote:
>
>
> On 6/10/25 6:16 AM, Rick Macklem wrote:
>
> On Tue, Jun 10, 2025 at 4:58 AM Dai Ngo <dai.ngo@xxxxxxxxxx> wrote:
>
> On 6/9/25 6:06 PM, Rick Macklem wrote:
>
> On Mon, Jun 9, 2025 at 5:17 PM Dai Ngo <dai.ngo@xxxxxxxxxx> wrote:
>
> On 6/9/25 4:35 PM, Rick Macklem wrote:
>
> Hi,
>
> I hope you don't mind a cross-post, but I thought both groups
> might find this interesting...
>
> I have been creating a compound RPC that does REMOVE and
> then tries to determine if the file object has been removed and
> I was surprised to see quite different results from the Linux knfsd
> and Solaris 11.4 NFSv4.1/4.2 servers. I think both these servers
> provide FH4_PERSISTENT file handles, although I suppose I
> should check that?
>
> First, the test OPEN/CREATEs a regular file called "foo" (only one
> hard link) and acquires a write delegation for it.
> Then a compound does the following:
> ...
> REMOVE foo
> PUTFH fh for foo
> GETATTR
>
> For the Solaris 11.4 server, the server CB_RECALLs the
> delegation and then replies NFS4ERR_STALE for the PUTFH above.
> (The FreeBSD server currently does the same.)
>
> For a fairly recent Linux (6.12) knfsd, the above replies NFS_OK
> with nlinks == 0 in the GETATTR reply.
>
> Hmm. So I've looked in RFC8881 (I'm terrible at reading it so I
> probably missed something) and I cannot find anything that states
> either of the above behaviours is incorrect.
> (NFS4ERR_STALE is listed as an error code for PUTFH, but the
> description of PUTFH only says that it sets the CFH to the fh arg.
> It does not say anything w.r.t. the fh arg. needing to be for a file
> that still exists.) Neither of these servers sets
> OPEN4_RESULT_PRESERVE_UNLINKED in the OPEN reply.
>
> So, it looks like "file object no longer exists" is indicated either
> by a NFS4ERR_STALE reply to either PUTFH or GETATTR
> OR
> by a successful reply, but with nlinks == 0 for the GETATTR reply.
The nlink == 0 case does not work, since another client can do a LINK.
It does imply that the WRITE, WRITE,...COMMIT, DELEGRET can
be done asynchronously from the unlink(2) or close(2) that is causing
the REMOVE.

>
> To be honest, I kinda like the Linux knfsd version, but I am wondering
> if others think that both of these replies is correct?
>
> Also, is the CB_RECALL needed when the delegation is held by
> the same client as the one doing the REMOVE?
>
> The Linux NFSD detects the delegation belongs to the same client that
> causes the conflict (due to REMOVE) and skips the CB_RECALL. This is
> an optimization based on the assumption that the client would handle
> the conflict locally.
>
> And then what does the server do with the delegation?
> - Does it just discard it, since the file object has been deleted?
> OR
> - Does it guarantee that a DELEGRETURN done after the REMOVE will
>    still work (which seems to be the case for the 6.12 server I am using for
>    testing).
>
> The delegation remains valid but the file was removed from the namespace.
> This is why the PUTFH and GETATTR in your test did not fail. However, any
> lookup of the file will fail.
>
> If the REMOVE was done by another client, the REMOVE will not complete
> until the delegation is returned. If the PUTFH comes after the REMOVE
> was completed, it'll  fail with NFS4ERR_STALE since the file, specified
> by the file handle, no longer exists.
>
> Assuming the statement w.r.t. "fail with NFS4ERR_STALE" only applies to
> "REMOVE done by another client" then that sounds fine.
>
> Correction: even if the REMOVE was done by the another client and the
> delegation was recalled from the 1st client, the open stateid of the file
> remains valid until the client sends the CLOSE. So the PUTFH won't fail
> regardless which client sends the REMOVE.
>
> So, should your server be setting OPEN4_RESULT_PRESERVE_UNLINKED
> in OPEN replies, given this semantic?
> --> If the FH remains valid after REMOVE drops nlink to 0 semantic
> were indicated by
>      the OPEN4_RESULT_PRESERVE_UNLINKED flag, a client could check for
>      this flag and handle in appropriately.
>
> I believe the Linux NFSD currently does not support OPEN4_RESULT_PRESERVE_UNLINKED.
Yes, if the server will throw away the file object when it reboots, you
cannot set this flag.

Thanks for all your comments, rick

>
> -Dai
>
> rick
>
> However if the "fail with NFS4ERR_STALE is supposed for happen after
> REMOVE for same client" then that is not what I am seeing.
> If you are curious, the packet trace is here. (Look at packet#58).
> https://urldefense.com/v3/__https://people.freebsd.org/*rmacklem/linux-remove.pcap__;fg!!ACWV5N9M2RV99hQ!IEcffaAAeLhuzaJUO5rQOv0jUUk4ltuMpfqT83lLFkRL9cqOZEvZ-8GGjvoqlVAQKi_FAAhsKEl5NjvS0OLJ$
>
> Btw, in case you are curious why I am doing this testing, I am trying
> to figure out a good way for the FreeBSD client to handle temporary
> files. Typically on POSIX they are done via the syscalls:
>
> fd = open("foo", O_CREATE ...);
> unlink("foo");
> write(fd,..), write(fd,..)...
> read(fd,...), read(fd,...)...
> close(fd);
>
> If this happens quickly and is not too much writing, the writes
> copy data into buffers/pages, the reads read the data out of
> the pages and then it all gets deleted.
>
> Unfortunately, the CB_RECALL forces the NFSv4.n client
> to do WRITE, WRITE,..COMMIT and then DELEGRETURN.
> Then the REMOVE throws all the data away on the NFSv4.n
> server.
> --> As such, I really like not doing the CB_RECALL for "same client".
> My concern is "what happens to the delegation after the file object ("foo")
> gets deleted?
> It either needs to be thrown away by the NFSv4.n server or the
> PUTFH, DELEGRETURN needs to work after the REMOVE.
>
> The PUTFH and DELEGRETURN continue to work after the REMOVE. The open
> stateid and delegation stateid on the server are destroyed only after
> the client sends the DELEGRETURN and CLOSE.
>
> Otherwise, the NFSv4.n server may get constipated by the delegations,
> which might be called stale, since the file object has been deleted.
>
> --> I can do PUTFH, GETATTR after REMOVE in the same compound,
>       to find out if the file object has been deleted. But then, if a
>       PUTFH, DELEGRETURN fails with NFS4ERR_STALE, can I get
>       away with saying "the server should just discard the delegation as
>       the client already has done so??.
>
> You can try your test but I believe the PUTFH and GETATTR won't fail
> after the REMOVE.
>
> -Dai
>
> Thanks for your comments, rick
>
> -Dai
>
> (I don't think it is, but there is a discussion in 18.25.4 which says
> "When the determination above cannot be made definitively because
> delegations are being held, they MUST be recalled.." but everything
> above that is a may/MAY, so it is not obvious to me if a server really
> needs to case?)
>
> Any comments? Thanks, rick
> ps: I am amazed when I learn these things about NFSv4.n after all
>         these years.
>





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux