On Sat, 23 Aug 2025, Chuck Lever wrote: > On 8/21/25 5:01 PM, Chuck Lever wrote: > > On 8/21/25 4:43 PM, Olga Kornievskaia wrote: > >> This patch tries to address the following failure: > >> nfsdctl threads 0 > >> nfsdctl listener +rdma::20049 > >> nfsdctl listener +tcp::2049 > >> nfsdctl listener -tcp::2049 > >> nfsdctl: Error: Cannot assign requested address > >> > >> The reason for the failure is due to the fact that socket cleanup only > >> happens in __svc_rdma_free() which is a deferred work triggers when an > >> rdma transport is destroyed. To remove a listener nfsdctl is forced to > >> first remove all transports via svc_xprt_destroy_all() and then re-add > >> the ones that are left. Due to the fact that there isn't a way to > >> delete a particular entry from a list where permanent sockets are > >> stored. > > > > The issue is specifically with llist, which does not permit the > > deletion of any entry other than the first on the list. > > > > > >> Going back to the deferred work done in __svc_rdma_free(), the > >> work might not get to run before nfsd_nl_listener_set_doit() creates > >> the new transports. As a result, it finds that something is still > >> listening of the rdma port and rdma_bind_addr() fails. > >> > >> Proposed solution is to add a delay after svc_xprt_destroy_all() to > >> allow for the deferred work to run. > >> > >> --- Is the chosen value of 1s enough to ensure socket goes away? > >> I can't guarantee that. > > > > Adding a sleep and hoping it works is ... not a proper fix. The > > msleep() in svc_xprt_destroy_all() is part of a polling loop, > > and it is only waiting for the xprt lists to become empty. You're > > not polling here (ie, checking for completion before sleeping). > > > > > >> --- Alternatives that i can think of: > >> (1) to go back to listener removal approach that added removal of > >> entry to the llist api. That would not require a removal of all > >> transport causing this problem to occur. Earlier it was preferred > >> not to change llist api. > >> (2) some method of checking that all deferred work occuring in > >> svc_xprt_destroy_all() completed. > > > > Jeff (and perhaps Lorenzo) need to go back to the original reasons why > > this was done and rework it. I think we were avoiding holding the > > nfsd mutex in here? > > > > Complete shutdown of a transport always involve some deferred > > activity, and there's a reference count involved as well. I can't > > see how the current destroy/re-insert mechanism can be made reliable. > > One thought is that instead of using svc_xprt_destroy_all(), can we > simply call svc_xprt_close() on the target transport? > > The question about what to wait for to ensure the transport is gone > is a different problem: the memory backing the svc_xprt will be > freed, so we can't, say, "wait on bit" on one of the transport > flags. The standard solution would be to store a pointer to a completion in the svc_xprt which, if not NULL, is completed before the memory is freed. Then have the waiter set that to an on-stack completion before triggering the shutdown. Then wait_for_completion(). NeilBrown