On 17 Apr 2025, at 9:59, Trond Myklebust wrote: > On Thu, 2025-04-17 at 09:28 -0400, Benjamin Coddington wrote: >> Hey Trond, Anna, et al. >> >> I'm looking at working on nfs_server_reap_expired_delegations() because >> the work to walk that list is order n(n+1)/2 and the list can grow very >> large due to some servers doing SEQ4_STATUS_ADMIN_STATE_REVOKED these >> days. It can currently grow unlimited by 5k delegation watermark. > > Why would we be seeing an increase of SEQ4_STATUS_ADMIN_STATE_REVOKED > cases? That should normally just be seen on network outages that last > longer than a full lease period. No idea - but its happening. Maybe I've been too hasty to blame ADMIN_STATE_REVOKED, but we do see in the results many NFS_DELEGATION_REVOKED states. Perhaps the linux server's admin revocation interfaces are gaining popular use for some reason. >> First observation is that we don't remove revoked states from the list >> even after doing FREE_STATEID, so we're still doing walks across >> delegation state we'll never use again. I think we can fix this by >> plumbing in the error result from FREE_STATEID.. so that's a potential >> bit of work. >> >> I'm tempted to just do: >> >> @@ -1342,7 +1346,7 @@ nfs_delegation_test_free_expired(struct inode >> *inode, >> return; >> status = ops->test_and_free_expired(server, stateid, cred); >> if (status == -NFS4ERR_EXPIRED || status == - >> NFS4ERR_BAD_STATEID) >> - nfs_remove_bad_delegation(inode, stateid); >> + nfs_delegation_mark_returned(inode, stateid); >> } >> >> .. but I think that gets us up to not tracking state the server might >> still be tracking. > > Just marking the delegation as being returned isn't helpful. That > doesn't trigger any action to recover the state. That's the only path to removing the delegation from the nfs_server's delegations list, and seems to do all the right steps for this case.. but calling it here can't currently account for cases where FREE_STATEID wasn't successful. >> >> Other approaches might be to walk the list once moving the work to a >> temporary list and then operate on that linearly. >> >> Advice, thoughts, or direction welcomed.. I'll probably work on >> splitting out nfs41_free_stateid() from test_and_free_expired(), so the >> delegation code can know for sure that we're done with that state. > > Setting up a separate list for state that needs recovery of some sort > might be useful. However that doesn't solve the problem that you have > to scan the entire list of stateids every time the server sets > SEQ4_STATUS_ADMIN_STATE_REVOKED. The latter is a protocol requirement, > which is why if we're seeing a lot of it happening, then we need to > solve that problem first. Well, we did have some misbehaving servers that are getting fixed, but its still entirely possible to end up with client systems that keep old revoked state on that list (that it has already done FREE_STATEID for), and once we have to walk and test it things get hot. We're seeing this on clients that are /not/ on misbehaving servers, we've seen some clients with ~500k entries on that list. I'd really like a way to trim the list, thus the desire to remove state after a successful FREE_STATEID. Thanks very much for your look and discussion. Ben