Re: [PATCH] nfsd: remove long-standing revoked delegations by force

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

在 2025/9/3 11:46, zhangjian (CG) 写道:
Hello every experts.

If we can see all delegations on hard-mounted nfs client, which are also
on server cl_revoked list, changed from
NFS_DELEGATION_RETURN_IF_CLOSED|NFS_DELEGATION_REVOKED|NFS_DELEGATION_TEST_EXPIRED
to NFS_DELEGATION_RETURN_IF_CLOSED|NFS_DELEGATION_REVOKED, can we give
some hypothesis on this problem ?

By the way, this problem can be cover over by decreasing file count on
server.

Thanks,
zhangjian
I think NFS_DELEGATION_TEST_EXPIRED is cleared as follows:
nfs4_state_manager
 nfs4_do_reclaim
  nfs4_reclaim_open_state
   __nfs4_reclaim_open_state // get nfs4_state from sp->so_states
    nfs41_open_expired // status = ops->recover_open
     nfs41_check_delegation_stateid
      test_and_clear_bit // NFS_DELEGATION_TEST_EXPIRED
After the bug in [1] is triggered, although the delegation is no longer on
server->delegations, it can still be obtained by traversing sp->so_states.
However, I cannot find the connection between the number of files on the
server and this issue.

Thanks,
Lingfeng


On 2025/9/2 20:43, Benjamin Coddington wrote:
On 2 Sep 2025, at 8:10, Li Lingfeng wrote:

Our expected outcome was that the client would release the abnormal
delegation via TEST_STATEID/FREE_STATEID upon detecting its invalidity.
However, this problematic delegation is no longer present in the
client's server->delegations list—whether due to client-side timeouts or
the server-side bug [1].
How does the client timeout TEST_STATEID - are you mounting with 'soft'?

We should find the server-side bug and fix it rather than write code to
paper over it.  I do think the synchronization of state here is a bit
fragile and wish the protocol had a generation, sequence, or marker for
setting SEQ4_STATUS_ bits..

Should we instead just administratively evict the client since it's
clearly not behaving right in this case?
Thanks for the suggestion. While administratively evicting the client would
certainly resolve the immediate delegation issue, I'm concerned that approach
might be a bit heavy-handed.
The problematic behavior seems isolated to a single delegation. Meanwhile,
the client itself likely has numerous other open files and active state on
the server. Forcing a complete client reconnect would tear down all that
state, which could cause significant application disruption and be perceived
as a service outage from the client's perspective.

[1] https://lore.kernel.org/all/de669327-c93a-49e5-a53b-bda9e67d34a2@xxxxxxxxxx/
^^ in this thread you reference v5.10 - there was a knfsd fix for a
cl_revoked leak "3b816601e279", and there have been 3 or 4 fixes to fix
problems and optimize the client walk of delegations since then.  Jeff
pointed out that there have been fixes in these areas.  Are you finding this
problem still with all those fixes included?

Ben







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux