On 2025/8/11 21:03, Trond Myklebust wrote: > On Mon, 2025-08-11 at 20:48 +0800, zhangjian (CG) wrote: >> Recently, we meet a NFS problem in 5.10. There are so many >> test_state_id request after a non-privilaged request in tcpdump >> result. There are 40w+ delegations in client (I read the delegation >> list from /proc/kcore). >> Firstly, I think state manager cost a lot in >> nfs_server_reap_expired_delegations. But I see they are all in >> NFS_DELEGATION_REVOKED state except 6 in NFS_DELEGATION_REFERENCED (I >> read this from /proc/kcore too). >> I analyze NFS code and find if NFSPROC4_CLNT_DELEGRETURN procedure >> meet ETIMEOUT, delegation will be marked as NFS4ERR_DELEG_REVOKED and >> never return it again. NFS server will keep the revoked delegation in >> clp->cl_revoked forever. This will result in following sequence >> response with RECALLABLE_STATE_REVOKED flag. Client will send >> test_state_id request for all non-revoked delegation. >> This can only be solved by restarting NFS server. >> I think ETIMEOUT in NFSPROC4_CLNT_DELEGRETURN procedure may be not >> the only case that cause lots of non-terminable test_state_id >> requests after any non-privilaged request. >> Wish NFS experts give some advices on this problem. >> > > You have the following options: > > 1. Don't ever use "soft" or "softerr" on the NFS client. > 2. Reboot your server every now and again. > 3. Change the server code to not bother caching revoked state. Doing > so is rather pointless, since there is nothing a client can do > differently when presented with NFS4ERR_DELEG_REVOKED vs. > NFS4ERR_BAD_STATEID. > 4. Change the server code to garbage collect revoked stateids after > a while. > > Thanks a lot for reply. NFS client meet TIMEOUT in return-delegation procedure may not be the only case that server keep delegation in clp->cl_revoked list forever. I think garbaging collecting revoked stateid after a while (4) is more reasonable way to avoid this problem。