On 8/21/25 5:01 PM, Chuck Lever wrote: > On 8/21/25 4:43 PM, Olga Kornievskaia wrote: >> This patch tries to address the following failure: >> nfsdctl threads 0 >> nfsdctl listener +rdma::20049 >> nfsdctl listener +tcp::2049 >> nfsdctl listener -tcp::2049 >> nfsdctl: Error: Cannot assign requested address >> >> The reason for the failure is due to the fact that socket cleanup only >> happens in __svc_rdma_free() which is a deferred work triggers when an >> rdma transport is destroyed. To remove a listener nfsdctl is forced to >> first remove all transports via svc_xprt_destroy_all() and then re-add >> the ones that are left. Due to the fact that there isn't a way to >> delete a particular entry from a list where permanent sockets are >> stored. > > The issue is specifically with llist, which does not permit the > deletion of any entry other than the first on the list. > > >> Going back to the deferred work done in __svc_rdma_free(), the >> work might not get to run before nfsd_nl_listener_set_doit() creates >> the new transports. As a result, it finds that something is still >> listening of the rdma port and rdma_bind_addr() fails. >> >> Proposed solution is to add a delay after svc_xprt_destroy_all() to >> allow for the deferred work to run. >> >> --- Is the chosen value of 1s enough to ensure socket goes away? >> I can't guarantee that. > > Adding a sleep and hoping it works is ... not a proper fix. The > msleep() in svc_xprt_destroy_all() is part of a polling loop, > and it is only waiting for the xprt lists to become empty. You're > not polling here (ie, checking for completion before sleeping). > > >> --- Alternatives that i can think of: >> (1) to go back to listener removal approach that added removal of >> entry to the llist api. That would not require a removal of all >> transport causing this problem to occur. Earlier it was preferred >> not to change llist api. >> (2) some method of checking that all deferred work occuring in >> svc_xprt_destroy_all() completed. > > Jeff (and perhaps Lorenzo) need to go back to the original reasons why > this was done and rework it. I think we were avoiding holding the > nfsd mutex in here? > > Complete shutdown of a transport always involve some deferred > activity, and there's a reference count involved as well. I can't > see how the current destroy/re-insert mechanism can be made reliable. One thought is that instead of using svc_xprt_destroy_all(), can we simply call svc_xprt_close() on the target transport? The question about what to wait for to ensure the transport is gone is a different problem: the memory backing the svc_xprt will be freed, so we can't, say, "wait on bit" on one of the transport flags. -- Chuck Lever