Re: [RFC PATCH 1/1] nfsd: delay re-registering of listeners after listener removal

"NeilBrown" <neil@xxxxxxxxxx> · Sun, 24 Aug 2025 09:57:22 +1000

On Sat, 23 Aug 2025, Chuck Lever wrote:
> On 8/21/25 5:01 PM, Chuck Lever wrote:
> > On 8/21/25 4:43 PM, Olga Kornievskaia wrote:
> >> This patch tries to address the following failure:
> >> nfsdctl threads 0
> >> nfsdctl listener +rdma::20049
> >> nfsdctl listener +tcp::2049
> >> nfsdctl listener -tcp::2049
> >> nfsdctl: Error: Cannot assign requested address
> >>
> >> The reason for the failure is due to the fact that socket cleanup only
> >> happens in __svc_rdma_free() which is a deferred work triggers when an
> >> rdma transport is destroyed. To remove a listener nfsdctl is forced to
> >> first remove all transports via svc_xprt_destroy_all() and then re-add
> >> the ones that are left. Due to the fact that there isn't a way to
> >> delete a particular entry from a list where permanent sockets are
> >> stored.
> > 
> > The issue is specifically with llist, which does not permit the
> > deletion of any entry other than the first on the list.
> > 
> > 
> >> Going back to the deferred work done in __svc_rdma_free(), the
> >> work might not get to run before nfsd_nl_listener_set_doit() creates
> >> the new transports. As a result, it finds that something is still
> >> listening of the rdma port and rdma_bind_addr() fails.
> >>
> >> Proposed solution is to add a delay after svc_xprt_destroy_all() to
> >> allow for the deferred work to run.
> >>
> >> --- Is the chosen value of 1s enough to ensure socket goes away?
> >> I can't guarantee that.
> > 
> > Adding a sleep and hoping it works is ... not a proper fix. The
> > msleep() in svc_xprt_destroy_all() is part of a polling loop,
> > and it is only waiting for the xprt lists to become empty. You're
> > not polling here (ie, checking for completion before sleeping).
> > 
> > 
> >> --- Alternatives that i can think of:
> >> (1) to go back to listener removal approach that added removal of
> >> entry to the llist api. That would not require a removal of all
> >> transport causing this problem to occur. Earlier it was preferred
> >> not to change llist api.
> >> (2) some method of checking that all deferred work occuring in
> >> svc_xprt_destroy_all() completed.
> > 
> > Jeff (and perhaps Lorenzo) need to go back to the original reasons why
> > this was done and rework it. I think we were avoiding holding the
> > nfsd mutex in here?
> > 
> > Complete shutdown of a transport always involve some deferred
> > activity, and there's a reference count involved as well. I can't
> > see how the current destroy/re-insert mechanism can be made reliable.
> 
> One thought is that instead of using svc_xprt_destroy_all(), can we
> simply call svc_xprt_close() on the target transport?
> 
> The question about what to wait for to ensure the transport is gone
> is a different problem: the memory backing the svc_xprt will be
> freed, so we can't, say, "wait on bit" on one of the transport
> flags.

The standard solution would be to store a pointer to a completion in the
svc_xprt which, if not NULL, is completed before the memory is freed.
Then have the waiter set that to an on-stack completion before
triggering the shutdown.  Then wait_for_completion().

NeilBrown