On Thu, 2025-08-28 at 11:21 -0400, Chuck Lever wrote: > On 8/27/25 10:21 AM, Chuck Lever wrote: > > On 8/26/25 6:00 PM, Olga Kornievskaia wrote: > > > This patch tries to address the following failure: > > > nfsdctl threads 0 > > > nfsdctl listener +rdma::20049 > > > nfsdctl listener +tcp::2049 > > > nfsdctl listener -tcp::2049 > > > nfsdctl: Error: Cannot assign requested address > > > > > > The reason for the failure is due to the fact that socket cleanup only > > > happens in __svc_rdma_free() which is a deferred work triggers when an > > > rdma transport is destroyed. To remove a listener nfsdctl is forced to > > > first remove all transports via svc_xprt_destroy_all() and then re-add > > > the ones that are left. Due to the fact that there isn't a way to > > > delete a particular entry from server's lwq sp_xprts that stores > > > transports. Going back to the deferred work done in __svc_rdma_free(), > > > the work might not get to run before nfsd_nl_listener_set_doit() creates > > > the new transports. As a result, it finds that something is still > > > listening of the rdma port and rdma_bind_addr() fails. > > > > > > Instead of using svc_xprt_destroy_all() to manipulate the sp_xprt, > > > instead introduce a function that just dequeues all transports. Then, > > > we add non-removed transports back to the list. > > > > > > Still not allowing to remove a listener while the server is active. > > > > > > We need to make several passes over the list of existing/new list > > > entries. On the first pass we determined if any of the entries need > > > to be removed. If so, we then check if the server has no active > > > threads. Then we dequeue all the transports and then go over the > > > list and recreate both permsocks list and sp_xprts lists. Then, > > > for the deleted transports, the transport is closed. > > > > > --- Comments: > > > (1) There is still a restriction on removing an active listener as > > > I dont know how to handle if the transport to be remove is currently > > > serving a request (it won't be on the sp_xprt list I believe?). > > > > This is a good reason why just setting a bit in the xprt and waiting for > > the close to complete is probably a better strategy than draining and > > refilling the permsock list. > > > > The idea of setting XPT_CLOSE and enqueuing the transport ... you know, > > like this: > > > > 151 /** > > > > 152 * svc_xprt_deferred_close - Close a transport > > > > 153 * @xprt: transport instance > > > > 154 * > > > > 155 * Used in contexts that need to defer the work of shutting down > > > > 156 * the transport to an nfsd thread. > > > > 157 */ > > > > 158 void svc_xprt_deferred_close(struct svc_xprt *xprt) > > > > 159 { > > > > 160 trace_svc_xprt_close(xprt); > > > > 161 if (!test_and_set_bit(XPT_CLOSE, &xprt->xpt_flags)) > > > > 162 svc_xprt_enqueue(xprt); > > > > 163 } > > > > 164 EXPORT_SYMBOL_GPL(svc_xprt_deferred_close); > > > > I expect that eventually the xprt will show up to svc_handle_xprt() and > > get deleted there. But you might still need some serialization with > > ->xpo_accept ? > > It occurred to me why the deferred close mechanism doesn't work: it > relies on having an nfsd thread to pick up the deferred work. > > If listener removal requires all nfsd threads to be terminated, there > is no thread to pick up the xprt and close it. > Interesting. I guess that the old nfsdfs file just didn't allow you to get into this situation, since you couldn't remove a listener at all. It really sounds like we just need a more selective version of svc_clean_up_xprts(). Something that can dequeue everything, close the ones that need to be closed (synchronously) and then requeues the rest. > > > > In general, I'm unsure if there are other things I'm not considering. > > > (2) I'm questioning if in svc_xprt_dequeue_all() it is correct. I > > > used svc_cleanup_up_xprts() as the example. > > > > Fixes: d093c90892607 ("nfsd: fix management of listener transports") > > > Signed-off-by: Olga Kornievskaia <okorniev@xxxxxxxxxx> > > > --- > > > fs/nfsd/nfsctl.c | 123 +++++++++++++++++++------------- > > > include/linux/sunrpc/svc_xprt.h | 1 + > > > include/linux/sunrpc/svcsock.h | 1 - > > > net/sunrpc/svc_xprt.c | 12 ++++ > > > 4 files changed, 88 insertions(+), 49 deletions(-) > > > > > > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c > > > index dd3267b4c203..38aaaef4734e 100644 > > > --- a/fs/nfsd/nfsctl.c > > > +++ b/fs/nfsd/nfsctl.c > > > @@ -1902,44 +1902,17 @@ int nfsd_nl_version_get_doit(struct sk_buff *skb, struct genl_info *info) > > > return err; > > > } > > > > > > -/** > > > - * nfsd_nl_listener_set_doit - set the nfs running sockets > > > - * @skb: reply buffer > > > - * @info: netlink metadata and command arguments > > > - * > > > - * Return 0 on success or a negative errno. > > > - */ > > > -int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info) > > > +static void _nfsd_walk_listeners(struct genl_info *info, struct svc_serv *serv, > > > + struct list_head *permsocks, int modify_xprt) > > > > So this function looks for the one listener we need to remove. > > > > Should removing a listener also close down all active temporary sockets > > for the service, or should it kill only the ones that were established > > via the listener being removed, or should it leave all active temporary > > sockets in place? > > > > Perhaps this is why /all/ permanent and temporary sockets are currently > > being removed. Once the target listener is gone, clients can't > > re-establish new connections, and the service is effectively ready to > > be shut down cleanly. > > > > > > > { > > > struct net *net = genl_info_net(info); > > > - struct svc_xprt *xprt, *tmp; > > > const struct nlattr *attr; > > > - struct svc_serv *serv; > > > - LIST_HEAD(permsocks); > > > - struct nfsd_net *nn; > > > - bool delete = false; > > > - int err, rem; > > > - > > > - mutex_lock(&nfsd_mutex); > > > - > > > - err = nfsd_create_serv(net); > > > - if (err) { > > > - mutex_unlock(&nfsd_mutex); > > > - return err; > > > - } > > > - > > > - nn = net_generic(net, nfsd_net_id); > > > - serv = nn->nfsd_serv; > > > - > > > - spin_lock_bh(&serv->sv_lock); > > > + struct svc_xprt *xprt, *tmp; > > > + int rem; > > > > > > - /* Move all of the old listener sockets to a temp list */ > > > - list_splice_init(&serv->sv_permsocks, &permsocks); > > > + if (modify_xprt) > > > + svc_xprt_dequeue_all(serv); > > > > > > - /* > > > - * Walk the list of server_socks from userland and move any that match > > > - * back to sv_permsocks > > > - */ > > > nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) { > > > struct nlattr *tb[NFSD_A_SOCK_MAX + 1]; > > > const char *xcl_name; > > > @@ -1962,7 +1935,7 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info) > > > sa = nla_data(tb[NFSD_A_SOCK_ADDR]); > > > > > > /* Put back any matching sockets */ > > > - list_for_each_entry_safe(xprt, tmp, &permsocks, xpt_list) { > > > + list_for_each_entry_safe(xprt, tmp, permsocks, xpt_list) { > > > /* This shouldn't be possible */ > > > if (WARN_ON_ONCE(xprt->xpt_net != net)) { > > > list_move(&xprt->xpt_list, &serv->sv_permsocks); > > > @@ -1971,35 +1944,89 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info) > > > > > > /* If everything matches, put it back */ > > > if (!strcmp(xprt->xpt_class->xcl_name, xcl_name) && > > > - rpc_cmp_addr_port(sa, (struct sockaddr *)&xprt->xpt_local)) { > > > + rpc_cmp_addr_port(sa, > > > + (struct sockaddr *)&xprt->xpt_local)) { > > > list_move(&xprt->xpt_list, &serv->sv_permsocks); > > > + if (modify_xprt) > > > + svc_xprt_enqueue(xprt); > > > break; > > > } > > > } > > > } > > > +} > > > + > > > +/** > > > + * nfsd_nl_listener_set_doit - set the nfs running sockets > > > + * @skb: reply buffer > > > + * @info: netlink metadata and command arguments > > > + * > > > + * Return 0 on success or a negative errno. > > > + */ > > > +int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info) > > > +{ > > > + struct net *net = genl_info_net(info); > > > + struct svc_xprt *xprt; > > > + const struct nlattr *attr; > > > + struct svc_serv *serv; > > > + LIST_HEAD(permsocks); > > > + struct nfsd_net *nn; > > > + bool delete = false; > > > + int err, rem; > > > + > > > + mutex_lock(&nfsd_mutex); > > > + > > > + err = nfsd_create_serv(net); > > > + if (err) { > > > + mutex_unlock(&nfsd_mutex); > > > + return err; > > > + } > > > + > > > + nn = net_generic(net, nfsd_net_id); > > > + serv = nn->nfsd_serv; > > > + > > > + spin_lock_bh(&serv->sv_lock); > > > + > > > + /* Move all of the old listener sockets to a temp list */ > > > + list_splice_init(&serv->sv_permsocks, &permsocks); > > > > > > /* > > > - * If there are listener transports remaining on the permsocks list, > > > - * it means we were asked to remove a listener. > > > + * Walk the list of server_socks from userland and move any that match > > > + * back to sv_permsocks. Determine if anything needs to be removed so > > > + * don't manipulate sp_xprts list. > > > */ > > > - if (!list_empty(&permsocks)) { > > > - list_splice_init(&permsocks, &serv->sv_permsocks); > > > - delete = true; > > > - } > > > - spin_unlock_bh(&serv->sv_lock); > > > + _nfsd_walk_listeners(info, serv, &permsocks, false); > > > > > > - /* Do not remove listeners while there are active threads. */ > > > - if (serv->sv_nrthreads) { > > > + /* For now, no removing old sockets while server is running */ > > > + if (serv->sv_nrthreads && !list_empty(&permsocks)) { > > > + list_splice_init(&permsocks, &serv->sv_permsocks); > > > + spin_unlock_bh(&serv->sv_lock); > > > err = -EBUSY; > > > goto out_unlock_mtx; > > > } > > > > > > /* > > > - * Since we can't delete an arbitrary llist entry, destroy the > > > - * remaining listeners and recreate the list. > > > + * If there are listener transports remaining on the permsocks list, > > > + * it means we were asked to remove a listener. Walk the list again, > > > + * but this time also manage the sp_xprts but first removing all of > > > + * them and only adding back the ones not being deleted. Then close > > > + * the ones left on the list. > > > */ > > > - if (delete) > > > - svc_xprt_destroy_all(serv, net, false); > > > + if (!list_empty(&permsocks)) { > > > + list_splice_init(&permsocks, &serv->sv_permsocks); > > > + list_splice_init(&serv->sv_permsocks, &permsocks); > > > + _nfsd_walk_listeners(info, serv, &permsocks, true); > > > + while (!list_empty(&permsocks)) { > > > + xprt = list_first_entry(&permsocks, struct svc_xprt, xpt_list); > > > + clear_bit(XPT_BUSY, &xprt->xpt_flags); > > > + set_bit(XPT_CLOSE, &xprt->xpt_flags); > > > + spin_unlock_bh(&serv->sv_lock); > > > + svc_xprt_close(xprt); > > > + spin_lock_bh(&serv->sv_lock); > > > + } > > > + spin_unlock_bh(&serv->sv_lock); > > > + goto out_unlock_mtx; > > > + } > > > + spin_unlock_bh(&serv->sv_lock); > > > > > > /* walk list of addrs again, open any that still don't exist */ > > > nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) { > > > diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h > > > index da2a2531e110..7038fd8ef20a 100644 > > > --- a/include/linux/sunrpc/svc_xprt.h > > > +++ b/include/linux/sunrpc/svc_xprt.h > > > @@ -186,6 +186,7 @@ int svc_xprt_names(struct svc_serv *serv, char *buf, const int buflen); > > > void svc_add_new_perm_xprt(struct svc_serv *serv, struct svc_xprt *xprt); > > > void svc_age_temp_xprts_now(struct svc_serv *, struct sockaddr *); > > > void svc_xprt_deferred_close(struct svc_xprt *xprt); > > > +void svc_xprt_dequeue_all(struct svc_serv *serv); > > > > > > static inline void svc_xprt_get(struct svc_xprt *xprt) > > > { > > > diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h > > > index 963bbe251e52..4c1be01afdb7 100644 > > > --- a/include/linux/sunrpc/svcsock.h > > > +++ b/include/linux/sunrpc/svcsock.h > > > @@ -65,7 +65,6 @@ int svc_addsock(struct svc_serv *serv, struct net *net, > > > const struct cred *cred); > > > void svc_init_xprt_sock(void); > > > void svc_cleanup_xprt_sock(void); > > > - > > > /* > > > * svc_makesock socket characteristics > > > */ > > > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c > > > index 6973184ff667..2aa46b9468d4 100644 > > > --- a/net/sunrpc/svc_xprt.c > > > +++ b/net/sunrpc/svc_xprt.c > > > @@ -890,6 +890,18 @@ void svc_recv(struct svc_rqst *rqstp) > > > } > > > EXPORT_SYMBOL_GPL(svc_recv); > > > > > > +void svc_xprt_dequeue_all(struct svc_serv *serv) > > > +{ > > > + int i; > > > + > > > + for (i = 0; i < serv->sv_nrpools; i++) { > > > + struct svc_pool *pool = &serv->sv_pools[i]; > > > + > > > + lwq_dequeue_all(&pool->sp_xprts); > > > + } > > > +} > > > +EXPORT_SYMBOL_GPL(svc_xprt_dequeue_all); > > > + > > > /** > > > * svc_send - Return reply to client > > > * @rqstp: RPC transaction context > > > > > -- Jeff Layton <jlayton@xxxxxxxxxx>