Re: [RFC PATCH 1/1] nfsd: delay re-registering of listeners after listener removal

Chuck Lever <chuck.lever@xxxxxxxxxx> · Fri, 22 Aug 2025 10:19:15 -0400

On 8/21/25 5:01 PM, Chuck Lever wrote:
> On 8/21/25 4:43 PM, Olga Kornievskaia wrote:
>> This patch tries to address the following failure:
>> nfsdctl threads 0
>> nfsdctl listener +rdma::20049
>> nfsdctl listener +tcp::2049
>> nfsdctl listener -tcp::2049
>> nfsdctl: Error: Cannot assign requested address
>>
>> The reason for the failure is due to the fact that socket cleanup only
>> happens in __svc_rdma_free() which is a deferred work triggers when an
>> rdma transport is destroyed. To remove a listener nfsdctl is forced to
>> first remove all transports via svc_xprt_destroy_all() and then re-add
>> the ones that are left. Due to the fact that there isn't a way to
>> delete a particular entry from a list where permanent sockets are
>> stored.
> 
> The issue is specifically with llist, which does not permit the
> deletion of any entry other than the first on the list.
> 
> 
>> Going back to the deferred work done in __svc_rdma_free(), the
>> work might not get to run before nfsd_nl_listener_set_doit() creates
>> the new transports. As a result, it finds that something is still
>> listening of the rdma port and rdma_bind_addr() fails.
>>
>> Proposed solution is to add a delay after svc_xprt_destroy_all() to
>> allow for the deferred work to run.
>>
>> --- Is the chosen value of 1s enough to ensure socket goes away?
>> I can't guarantee that.
> 
> Adding a sleep and hoping it works is ... not a proper fix. The
> msleep() in svc_xprt_destroy_all() is part of a polling loop,
> and it is only waiting for the xprt lists to become empty. You're
> not polling here (ie, checking for completion before sleeping).
> 
> 
>> --- Alternatives that i can think of:
>> (1) to go back to listener removal approach that added removal of
>> entry to the llist api. That would not require a removal of all
>> transport causing this problem to occur. Earlier it was preferred
>> not to change llist api.
>> (2) some method of checking that all deferred work occuring in
>> svc_xprt_destroy_all() completed.
> 
> Jeff (and perhaps Lorenzo) need to go back to the original reasons why
> this was done and rework it. I think we were avoiding holding the
> nfsd mutex in here?
> 
> Complete shutdown of a transport always involve some deferred
> activity, and there's a reference count involved as well. I can't
> see how the current destroy/re-insert mechanism can be made reliable.

One thought is that instead of using svc_xprt_destroy_all(), can we
simply call svc_xprt_close() on the target transport?

The question about what to wait for to ensure the transport is gone
is a different problem: the memory backing the svc_xprt will be
freed, so we can't, say, "wait on bit" on one of the transport
flags.

-- 
Chuck Lever