Re: [Question]nfs: never returned delegation

Jeff Layton <jlayton@xxxxxxxxxx> · Mon, 01 Sep 2025 07:40:37 -0400

On Mon, 2025-09-01 at 17:07 +0800, Li Lingfeng wrote:
> Hi,
> 
> 在 2025/8/11 21:03, Trond Myklebust 写道:
> > On Mon, 2025-08-11 at 20:48 +0800, zhangjian (CG) wrote:
> > > Recently, we meet a NFS problem in 5.10. There are so many
> > > test_state_id request after a non-privilaged request in tcpdump
> > > result. There are 40w+ delegations in client (I read the delegation
> > > list from /proc/kcore).
> > > Firstly, I think state manager cost a lot in
> > > nfs_server_reap_expired_delegations. But I see they are all in
> > > NFS_DELEGATION_REVOKED state except 6 in NFS_DELEGATION_REFERENCED (I
> > > read this from /proc/kcore too).
> > > I analyze NFS code and find if NFSPROC4_CLNT_DELEGRETURN procedure
> > > meet ETIMEOUT, delegation will be marked as NFS4ERR_DELEG_REVOKED and
> > > never return it again. NFS server will keep the revoked delegation in
> > > clp->cl_revoked forever. This will result in following sequence
> > > response with RECALLABLE_STATE_REVOKED flag. Client will send
> > > test_state_id request for all non-revoked delegation.
> > > This can only be solved by restarting NFS server.
> > > I think ETIMEOUT in NFSPROC4_CLNT_DELEGRETURN procedure may be not
> > > the only case that cause lots of non-terminable test_state_id
> > > requests after any non-privilaged request.
> > > Wish NFS experts give some advices on this problem.
> > > 
> > You have the following options:
> > 
> >     1. Don't ever use "soft" or "softerr" on the NFS client.
> >     2. Reboot your server every now and again.
> >     3. Change the server code to not bother caching revoked state. Doing
> >        so is rather pointless, since there is nothing a client can do
> >        differently when presented with NFS4ERR_DELEG_REVOKED vs.
> >        NFS4ERR_BAD_STATEID.
> >     4. Change the server code to garbage collect revoked stateids after
> >        a while.
> > 
> I found that a server-side bug could also cause such behavior, and I've
> reproduced the issue based on the master (commit b320789d6883).
> nfs4_laundromat                       nfsd4_delegreturn

I think you may be right about the race. The details are a little off
though. The important bit here is that the laundromat also calls this
unhash_delegation_locked before doing the list_add/del.

>   list_add // add dp to reaplist
>            // by dl_recall_lru
>   list_del_init // delete dp from
>                 // reaplist
>                                         destroy_delegation
>                                          unhash_delegation_locked

...which _should_ make the above unhash_delegation_locked return false,
so that list_del_init never happens.

>                                           list_del_init
>                                           // dp was not added to any list
>                                           // via dl_recall_lru
>   revoke_delegation
>   list_add // add dp to cl_revoked
>            // by dl_recall_lru
> 
> The delegation will be left in cl_revoked.
> 
> I agree with Trond's suggestion to change the server code to fix it.
> 
> 

...but there is at least one variation on what you wrote above where it
could get stuck back on the cl_revoked list after the delegreturn. The
delegreturn does set the SC_STATUS_CLOSED bit on the stateid, so
something like this (untested) patch, perhaps?

------------8<----------

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index d2d5e8e397a4..e594ded49e60 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1506,7 +1506,7 @@ static void revoke_delegation(struct nfs4_delegation *dp)
        trace_nfsd_stid_revoke(&dp->dl_stid);
 
        spin_lock(&clp->cl_lock);
-       if (dp->dl_stid.sc_status & SC_STATUS_FREED) {
+       if (dp->dl_stid.sc_status & (SC_STATUS_FREED | SC_STATUS_CLOSED)) {
                list_del_init(&dp->dl_recall_lru);
                goto out;
        }


-- 
Jeff Layton <jlayton@xxxxxxxxxx>