On 8/28/25 1:29 PM, Jeff Layton wrote: > On Thu, 2025-08-28 at 11:21 -0400, Chuck Lever wrote: >> On 8/27/25 10:21 AM, Chuck Lever wrote: >>> On 8/26/25 6:00 PM, Olga Kornievskaia wrote: >>>> This patch tries to address the following failure: >>>> nfsdctl threads 0 >>>> nfsdctl listener +rdma::20049 >>>> nfsdctl listener +tcp::2049 >>>> nfsdctl listener -tcp::2049 >>>> nfsdctl: Error: Cannot assign requested address >>>> >>>> The reason for the failure is due to the fact that socket cleanup only >>>> happens in __svc_rdma_free() which is a deferred work triggers when an >>>> rdma transport is destroyed. To remove a listener nfsdctl is forced to >>>> first remove all transports via svc_xprt_destroy_all() and then re-add >>>> the ones that are left. Due to the fact that there isn't a way to >>>> delete a particular entry from server's lwq sp_xprts that stores >>>> transports. Going back to the deferred work done in __svc_rdma_free(), >>>> the work might not get to run before nfsd_nl_listener_set_doit() creates >>>> the new transports. As a result, it finds that something is still >>>> listening of the rdma port and rdma_bind_addr() fails. >>>> >>>> Instead of using svc_xprt_destroy_all() to manipulate the sp_xprt, >>>> instead introduce a function that just dequeues all transports. Then, >>>> we add non-removed transports back to the list. >>>> >>>> Still not allowing to remove a listener while the server is active. >>>> >>>> We need to make several passes over the list of existing/new list >>>> entries. On the first pass we determined if any of the entries need >>>> to be removed. If so, we then check if the server has no active >>>> threads. Then we dequeue all the transports and then go over the >>>> list and recreate both permsocks list and sp_xprts lists. Then, >>>> for the deleted transports, the transport is closed. >>> >>>> --- Comments: >>>> (1) There is still a restriction on removing an active listener as >>>> I dont know how to handle if the transport to be remove is currently >>>> serving a request (it won't be on the sp_xprt list I believe?). >>> >>> This is a good reason why just setting a bit in the xprt and waiting for >>> the close to complete is probably a better strategy than draining and >>> refilling the permsock list. >>> >>> The idea of setting XPT_CLOSE and enqueuing the transport ... you know, >>> like this: >>> >>> 151 /** >>> >>> 152 * svc_xprt_deferred_close - Close a transport >>> >>> 153 * @xprt: transport instance >>> >>> 154 * >>> >>> 155 * Used in contexts that need to defer the work of shutting down >>> >>> 156 * the transport to an nfsd thread. >>> >>> 157 */ >>> >>> 158 void svc_xprt_deferred_close(struct svc_xprt *xprt) >>> >>> 159 { >>> >>> 160 trace_svc_xprt_close(xprt); >>> >>> 161 if (!test_and_set_bit(XPT_CLOSE, &xprt->xpt_flags)) >>> >>> 162 svc_xprt_enqueue(xprt); >>> >>> 163 } >>> >>> 164 EXPORT_SYMBOL_GPL(svc_xprt_deferred_close); >>> >>> I expect that eventually the xprt will show up to svc_handle_xprt() and >>> get deleted there. But you might still need some serialization with >>> ->xpo_accept ? >> >> It occurred to me why the deferred close mechanism doesn't work: it >> relies on having an nfsd thread to pick up the deferred work. >> >> If listener removal requires all nfsd threads to be terminated, there >> is no thread to pick up the xprt and close it. >> > > Interesting. I guess that the old nfsdfs file just didn't allow you to > get into this situation, since you couldn't remove a listener at all. > > It really sounds like we just need a more selective version of > svc_clean_up_xprts(). Something that can dequeue everything, close the > ones that need to be closed (synchronously) and then requeues the rest. That is roughly what Olga has done here. Now that I understand the problem space a little better, maybe we can iterate on this. >>>> In general, I'm unsure if there are other things I'm not considering. >>>> (2) I'm questioning if in svc_xprt_dequeue_all() it is correct. I >>>> used svc_cleanup_up_xprts() as the example. >>>>> Fixes: d093c90892607 ("nfsd: fix management of listener transports") >>>> Signed-off-by: Olga Kornievskaia <okorniev@xxxxxxxxxx> >>>> --- >>>> fs/nfsd/nfsctl.c | 123 +++++++++++++++++++------------- >>>> include/linux/sunrpc/svc_xprt.h | 1 + >>>> include/linux/sunrpc/svcsock.h | 1 - >>>> net/sunrpc/svc_xprt.c | 12 ++++ >>>> 4 files changed, 88 insertions(+), 49 deletions(-) >>>> >>>> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c >>>> index dd3267b4c203..38aaaef4734e 100644 >>>> --- a/fs/nfsd/nfsctl.c >>>> +++ b/fs/nfsd/nfsctl.c >>>> @@ -1902,44 +1902,17 @@ int nfsd_nl_version_get_doit(struct sk_buff *skb, struct genl_info *info) >>>> return err; >>>> } >>>> >>>> -/** >>>> - * nfsd_nl_listener_set_doit - set the nfs running sockets >>>> - * @skb: reply buffer >>>> - * @info: netlink metadata and command arguments >>>> - * >>>> - * Return 0 on success or a negative errno. >>>> - */ >>>> -int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info) >>>> +static void _nfsd_walk_listeners(struct genl_info *info, struct svc_serv *serv, >>>> + struct list_head *permsocks, int modify_xprt) >>> >>> So this function looks for the one listener we need to remove. >>> >>> Should removing a listener also close down all active temporary sockets >>> for the service, or should it kill only the ones that were established >>> via the listener being removed, or should it leave all active temporary >>> sockets in place? >>> >>> Perhaps this is why /all/ permanent and temporary sockets are currently >>> being removed. Once the target listener is gone, clients can't >>> re-establish new connections, and the service is effectively ready to >>> be shut down cleanly. >>> >>> >>>> { >>>> struct net *net = genl_info_net(info); >>>> - struct svc_xprt *xprt, *tmp; >>>> const struct nlattr *attr; >>>> - struct svc_serv *serv; >>>> - LIST_HEAD(permsocks); >>>> - struct nfsd_net *nn; >>>> - bool delete = false; >>>> - int err, rem; >>>> - >>>> - mutex_lock(&nfsd_mutex); >>>> - >>>> - err = nfsd_create_serv(net); >>>> - if (err) { >>>> - mutex_unlock(&nfsd_mutex); >>>> - return err; >>>> - } >>>> - >>>> - nn = net_generic(net, nfsd_net_id); >>>> - serv = nn->nfsd_serv; >>>> - >>>> - spin_lock_bh(&serv->sv_lock); >>>> + struct svc_xprt *xprt, *tmp; >>>> + int rem; >>>> >>>> - /* Move all of the old listener sockets to a temp list */ >>>> - list_splice_init(&serv->sv_permsocks, &permsocks); >>>> + if (modify_xprt) >>>> + svc_xprt_dequeue_all(serv); >>>> >>>> - /* >>>> - * Walk the list of server_socks from userland and move any that match >>>> - * back to sv_permsocks >>>> - */ >>>> nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) { >>>> struct nlattr *tb[NFSD_A_SOCK_MAX + 1]; >>>> const char *xcl_name; >>>> @@ -1962,7 +1935,7 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info) >>>> sa = nla_data(tb[NFSD_A_SOCK_ADDR]); >>>> >>>> /* Put back any matching sockets */ >>>> - list_for_each_entry_safe(xprt, tmp, &permsocks, xpt_list) { >>>> + list_for_each_entry_safe(xprt, tmp, permsocks, xpt_list) { >>>> /* This shouldn't be possible */ >>>> if (WARN_ON_ONCE(xprt->xpt_net != net)) { >>>> list_move(&xprt->xpt_list, &serv->sv_permsocks); >>>> @@ -1971,35 +1944,89 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info) >>>> >>>> /* If everything matches, put it back */ >>>> if (!strcmp(xprt->xpt_class->xcl_name, xcl_name) && >>>> - rpc_cmp_addr_port(sa, (struct sockaddr *)&xprt->xpt_local)) { >>>> + rpc_cmp_addr_port(sa, >>>> + (struct sockaddr *)&xprt->xpt_local)) { >>>> list_move(&xprt->xpt_list, &serv->sv_permsocks); >>>> + if (modify_xprt) >>>> + svc_xprt_enqueue(xprt); >>>> break; >>>> } >>>> } >>>> } >>>> +} >>>> + >>>> +/** >>>> + * nfsd_nl_listener_set_doit - set the nfs running sockets >>>> + * @skb: reply buffer >>>> + * @info: netlink metadata and command arguments >>>> + * >>>> + * Return 0 on success or a negative errno. >>>> + */ >>>> +int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info) >>>> +{ >>>> + struct net *net = genl_info_net(info); >>>> + struct svc_xprt *xprt; >>>> + const struct nlattr *attr; >>>> + struct svc_serv *serv; >>>> + LIST_HEAD(permsocks); >>>> + struct nfsd_net *nn; >>>> + bool delete = false; >>>> + int err, rem; >>>> + >>>> + mutex_lock(&nfsd_mutex); >>>> + >>>> + err = nfsd_create_serv(net); >>>> + if (err) { >>>> + mutex_unlock(&nfsd_mutex); >>>> + return err; >>>> + } >>>> + >>>> + nn = net_generic(net, nfsd_net_id); >>>> + serv = nn->nfsd_serv; >>>> + >>>> + spin_lock_bh(&serv->sv_lock); >>>> + >>>> + /* Move all of the old listener sockets to a temp list */ >>>> + list_splice_init(&serv->sv_permsocks, &permsocks); >>>> >>>> /* >>>> - * If there are listener transports remaining on the permsocks list, >>>> - * it means we were asked to remove a listener. >>>> + * Walk the list of server_socks from userland and move any that match >>>> + * back to sv_permsocks. Determine if anything needs to be removed so >>>> + * don't manipulate sp_xprts list. >>>> */ >>>> - if (!list_empty(&permsocks)) { >>>> - list_splice_init(&permsocks, &serv->sv_permsocks); >>>> - delete = true; >>>> - } >>>> - spin_unlock_bh(&serv->sv_lock); >>>> + _nfsd_walk_listeners(info, serv, &permsocks, false); >>>> >>>> - /* Do not remove listeners while there are active threads. */ >>>> - if (serv->sv_nrthreads) { >>>> + /* For now, no removing old sockets while server is running */ >>>> + if (serv->sv_nrthreads && !list_empty(&permsocks)) { >>>> + list_splice_init(&permsocks, &serv->sv_permsocks); >>>> + spin_unlock_bh(&serv->sv_lock); >>>> err = -EBUSY; >>>> goto out_unlock_mtx; >>>> } >>>> >>>> /* >>>> - * Since we can't delete an arbitrary llist entry, destroy the >>>> - * remaining listeners and recreate the list. >>>> + * If there are listener transports remaining on the permsocks list, >>>> + * it means we were asked to remove a listener. Walk the list again, >>>> + * but this time also manage the sp_xprts but first removing all of >>>> + * them and only adding back the ones not being deleted. Then close >>>> + * the ones left on the list. >>>> */ >>>> - if (delete) >>>> - svc_xprt_destroy_all(serv, net, false); >>>> + if (!list_empty(&permsocks)) { >>>> + list_splice_init(&permsocks, &serv->sv_permsocks); >>>> + list_splice_init(&serv->sv_permsocks, &permsocks); >>>> + _nfsd_walk_listeners(info, serv, &permsocks, true); >>>> + while (!list_empty(&permsocks)) { >>>> + xprt = list_first_entry(&permsocks, struct svc_xprt, xpt_list); >>>> + clear_bit(XPT_BUSY, &xprt->xpt_flags); >>>> + set_bit(XPT_CLOSE, &xprt->xpt_flags); >>>> + spin_unlock_bh(&serv->sv_lock); >>>> + svc_xprt_close(xprt); >>>> + spin_lock_bh(&serv->sv_lock); >>>> + } >>>> + spin_unlock_bh(&serv->sv_lock); >>>> + goto out_unlock_mtx; >>>> + } >>>> + spin_unlock_bh(&serv->sv_lock); >>>> >>>> /* walk list of addrs again, open any that still don't exist */ >>>> nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) { >>>> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h >>>> index da2a2531e110..7038fd8ef20a 100644 >>>> --- a/include/linux/sunrpc/svc_xprt.h >>>> +++ b/include/linux/sunrpc/svc_xprt.h >>>> @@ -186,6 +186,7 @@ int svc_xprt_names(struct svc_serv *serv, char *buf, const int buflen); >>>> void svc_add_new_perm_xprt(struct svc_serv *serv, struct svc_xprt *xprt); >>>> void svc_age_temp_xprts_now(struct svc_serv *, struct sockaddr *); >>>> void svc_xprt_deferred_close(struct svc_xprt *xprt); >>>> +void svc_xprt_dequeue_all(struct svc_serv *serv); >>>> >>>> static inline void svc_xprt_get(struct svc_xprt *xprt) >>>> { >>>> diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h >>>> index 963bbe251e52..4c1be01afdb7 100644 >>>> --- a/include/linux/sunrpc/svcsock.h >>>> +++ b/include/linux/sunrpc/svcsock.h >>>> @@ -65,7 +65,6 @@ int svc_addsock(struct svc_serv *serv, struct net *net, >>>> const struct cred *cred); >>>> void svc_init_xprt_sock(void); >>>> void svc_cleanup_xprt_sock(void); >>>> - >>>> /* >>>> * svc_makesock socket characteristics >>>> */ >>>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c >>>> index 6973184ff667..2aa46b9468d4 100644 >>>> --- a/net/sunrpc/svc_xprt.c >>>> +++ b/net/sunrpc/svc_xprt.c >>>> @@ -890,6 +890,18 @@ void svc_recv(struct svc_rqst *rqstp) >>>> } >>>> EXPORT_SYMBOL_GPL(svc_recv); >>>> >>>> +void svc_xprt_dequeue_all(struct svc_serv *serv) >>>> +{ >>>> + int i; >>>> + >>>> + for (i = 0; i < serv->sv_nrpools; i++) { >>>> + struct svc_pool *pool = &serv->sv_pools[i]; >>>> + >>>> + lwq_dequeue_all(&pool->sp_xprts); >>>> + } >>>> +} >>>> +EXPORT_SYMBOL_GPL(svc_xprt_dequeue_all); >>>> + >>>> /** >>>> * svc_send - Return reply to client >>>> * @rqstp: RPC transaction context >>> >>> >> > -- Chuck Lever