On Tue, 2025-09-02 at 10:22 +0800, Li Lingfeng wrote: > When file access conflicts occur between clients, the server recalls > delegations. If the client holding delegation fails to return it after > a recall, nfs4_laundromat adds the delegation to cl_revoked list. > This causes subsequent SEQUENCE operations to set the > SEQ4_STATUS_RECALLABLE_STATE_REVOKED flag, forcing the client to > validate all delegations and return the revoked one. > > However, if the client fails to return the delegation due to a timeout > after receiving the recall or a server bug, the delegation remains in the > server's cl_revoked list. The client marks it revoked and won't find it > upon detecting SEQ4_STATUS_RECALLABLE_STATE_REVOKED. This leads to a loop: > the server persistently sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED, and the > client repeatedly tests all delegations, severely impacting performance > when numerous delegations exist. > It is a performance impact, but I don't get the "loop" here. Are you saying that this problem compounds itself? That testing all delegations causes others to be revoked? > Since abnormal delegations are removed from flc_lease via nfs4_laundromat > --> revoke_delegation --> destroy_unhashed_deleg --> > nfs4_unlock_deleg_lease --> kernel_setlease, and do not block new open > requests indefinitely, retaining such a delegation on the server is > unnecessary. > > Reported-by: Zhang Jian <zhangjian496@xxxxxxxxxx> > Fixes: 3bd64a5ba171 ("nfsd4: implement SEQ4_STATUS_RECALLABLE_STATE_REVOKED") > Closes: https://lore.kernel.org/all/ff8debe9-6877-4cf7-ba29-fc98eae0ffa0@xxxxxxxxxx/ > Signed-off-by: Li Lingfeng <lilingfeng3@xxxxxxxxxx> > --- > fs/nfsd/nfs4state.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > index 88c347957da5..aa65a685dbb9 100644 > --- a/fs/nfsd/nfs4state.c > +++ b/fs/nfsd/nfs4state.c > @@ -4326,6 +4326,8 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > int buflen; > struct net *net = SVC_NET(rqstp); > struct nfsd_net *nn = net_generic(net, nfsd_net_id); > + struct list_head *pos, *next; > + struct nfs4_delegation *dp; > > if (resp->opcnt != 1) > return nfserr_sequence_pos; > @@ -4470,6 +4472,15 @@ nfsd4_sequence(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, > default: > seq->status_flags = 0; > } > + if (!list_empty(&clp->cl_revoked)) { > + list_for_each_safe(pos, next, &clp->cl_revoked) { > + dp = list_entry(pos, struct nfs4_delegation, dl_recall_lru); > + if (dp->dl_time < (ktime_get_boottime_seconds() - 2 * nn->nfsd4_lease)) { > + list_del_init(&dp->dl_recall_lru); > + nfs4_put_stid(&dp->dl_stid); > + } > + } > + } > if (!list_empty(&clp->cl_revoked)) > seq->status_flags |= SEQ4_STATUS_RECALLABLE_STATE_REVOKED; > if (atomic_read(&clp->cl_admin_revoked)) This seems like a violation of the spec. AIUI, the server is required to hang onto a record of the delegation until the client does the TEST_STATEID/FREE_STATEID dance to remove it. Just discarding them like this seems wrong. Should we instead just administratively evict the client since it's clearly not behaving right in this case? -- Jeff Layton <jlayton@xxxxxxxxxx>