Re: [Question]nfs: never returned delegation

Li Lingfeng <lilingfeng3@xxxxxxxxxx> · Mon, 1 Sep 2025 22:12:12 +0800

Hi,

在 2025/9/1 19:40, Jeff Layton 写道:
On Mon, 2025-09-01 at 17:07 +0800, Li Lingfeng wrote:
Hi,

在 2025/8/11 21:03, Trond Myklebust 写道:
On Mon, 2025-08-11 at 20:48 +0800, zhangjian (CG) wrote:
Recently, we meet a NFS problem in 5.10. There are so many
test_state_id request after a non-privilaged request in tcpdump
result. There are 40w+ delegations in client (I read the delegation
list from /proc/kcore).
Firstly, I think state manager cost a lot in
nfs_server_reap_expired_delegations. But I see they are all in
NFS_DELEGATION_REVOKED state except 6 in NFS_DELEGATION_REFERENCED (I
read this from /proc/kcore too).
I analyze NFS code and find if NFSPROC4_CLNT_DELEGRETURN procedure
meet ETIMEOUT, delegation will be marked as NFS4ERR_DELEG_REVOKED and
never return it again. NFS server will keep the revoked delegation in
clp->cl_revoked forever. This will result in following sequence
response with RECALLABLE_STATE_REVOKED flag. Client will send
test_state_id request for all non-revoked delegation.
This can only be solved by restarting NFS server.
I think ETIMEOUT in NFSPROC4_CLNT_DELEGRETURN procedure may be not
the only case that cause lots of non-terminable test_state_id
requests after any non-privilaged request.
Wish NFS experts give some advices on this problem.

You have the following options:

     1. Don't ever use "soft" or "softerr" on the NFS client.
     2. Reboot your server every now and again.
     3. Change the server code to not bother caching revoked state. Doing
        so is rather pointless, since there is nothing a client can do
        differently when presented with NFS4ERR_DELEG_REVOKED vs.
        NFS4ERR_BAD_STATEID.
     4. Change the server code to garbage collect revoked stateids after
        a while.

I found that a server-side bug could also cause such behavior, and I've
reproduced the issue based on the master (commit b320789d6883).
nfs4_laundromat                       nfsd4_delegreturn
I think you may be right about the race. The details are a little off
though. The important bit here is that the laundromat also calls this
unhash_delegation_locked before doing the list_add/del.
   list_add // add dp to reaplist
            // by dl_recall_lru
   list_del_init // delete dp from
                 // reaplist
                                         destroy_delegation
                                          unhash_delegation_locked
...which _should_ make the above unhash_delegation_locked return false,
so that list_del_init never happens.
Thank you for your correction. The delegreturn indeed does not perform
list_del_init in such concurrent scenarios.

                                           list_del_init
                                           // dp was not added to any list
                                           // via dl_recall_lru
   revoke_delegation
   list_add // add dp to cl_revoked
            // by dl_recall_lru

The delegation will be left in cl_revoked.

I agree with Trond's suggestion to change the server code to fix it.


...but there is at least one variation on what you wrote above where it
could get stuck back on the cl_revoked list after the delegreturn. The
delegreturn does set the SC_STATUS_CLOSED bit on the stateid, so
something like this (untested) patch, perhaps?
However, as you noted, since laundromat calls unhash_delegation_locked
first, I think the delegreturn will skip setting SC_STATUS_CLOSED due to
delegation_hashed returning false in unhash_delegation_locked.

------------8<----------

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index d2d5e8e397a4..e594ded49e60 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1506,7 +1506,7 @@ static void revoke_delegation(struct nfs4_delegation *dp)
         trace_nfsd_stid_revoke(&dp->dl_stid);
  
         spin_lock(&clp->cl_lock);
-       if (dp->dl_stid.sc_status & SC_STATUS_FREED) {
+       if (dp->dl_stid.sc_status & (SC_STATUS_FREED | SC_STATUS_CLOSED)) {
                 list_del_init(&dp->dl_recall_lru);
                 goto out;
         }

Thanks,
Lingfeng