On Mon, 2025-09-01 at 17:07 +0800, Li Lingfeng wrote: > Hi, > > 在 2025/8/11 21:03, Trond Myklebust 写道: > > On Mon, 2025-08-11 at 20:48 +0800, zhangjian (CG) wrote: > > > Recently, we meet a NFS problem in 5.10. There are so many > > > test_state_id request after a non-privilaged request in tcpdump > > > result. There are 40w+ delegations in client (I read the delegation > > > list from /proc/kcore). > > > Firstly, I think state manager cost a lot in > > > nfs_server_reap_expired_delegations. But I see they are all in > > > NFS_DELEGATION_REVOKED state except 6 in NFS_DELEGATION_REFERENCED (I > > > read this from /proc/kcore too). > > > I analyze NFS code and find if NFSPROC4_CLNT_DELEGRETURN procedure > > > meet ETIMEOUT, delegation will be marked as NFS4ERR_DELEG_REVOKED and > > > never return it again. NFS server will keep the revoked delegation in > > > clp->cl_revoked forever. This will result in following sequence > > > response with RECALLABLE_STATE_REVOKED flag. Client will send > > > test_state_id request for all non-revoked delegation. > > > This can only be solved by restarting NFS server. > > > I think ETIMEOUT in NFSPROC4_CLNT_DELEGRETURN procedure may be not > > > the only case that cause lots of non-terminable test_state_id > > > requests after any non-privilaged request. > > > Wish NFS experts give some advices on this problem. > > > > > You have the following options: > > > > 1. Don't ever use "soft" or "softerr" on the NFS client. > > 2. Reboot your server every now and again. > > 3. Change the server code to not bother caching revoked state. Doing > > so is rather pointless, since there is nothing a client can do > > differently when presented with NFS4ERR_DELEG_REVOKED vs. > > NFS4ERR_BAD_STATEID. > > 4. Change the server code to garbage collect revoked stateids after > > a while. > > > I found that a server-side bug could also cause such behavior, and I've > reproduced the issue based on the master (commit b320789d6883). > nfs4_laundromat nfsd4_delegreturn I think you may be right about the race. The details are a little off though. The important bit here is that the laundromat also calls this unhash_delegation_locked before doing the list_add/del. > list_add // add dp to reaplist > // by dl_recall_lru > list_del_init // delete dp from > // reaplist > destroy_delegation > unhash_delegation_locked ...which _should_ make the above unhash_delegation_locked return false, so that list_del_init never happens. > list_del_init > // dp was not added to any list > // via dl_recall_lru > revoke_delegation > list_add // add dp to cl_revoked > // by dl_recall_lru > > The delegation will be left in cl_revoked. > > I agree with Trond's suggestion to change the server code to fix it. > > ...but there is at least one variation on what you wrote above where it could get stuck back on the cl_revoked list after the delegreturn. The delegreturn does set the SC_STATUS_CLOSED bit on the stateid, so something like this (untested) patch, perhaps? ------------8<---------- diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index d2d5e8e397a4..e594ded49e60 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -1506,7 +1506,7 @@ static void revoke_delegation(struct nfs4_delegation *dp) trace_nfsd_stid_revoke(&dp->dl_stid); spin_lock(&clp->cl_lock); - if (dp->dl_stid.sc_status & SC_STATUS_FREED) { + if (dp->dl_stid.sc_status & (SC_STATUS_FREED | SC_STATUS_CLOSED)) { list_del_init(&dp->dl_recall_lru); goto out; } -- Jeff Layton <jlayton@xxxxxxxxxx>