Re: [RFC PATCH 1/1] nfsd: rework how a listener is removed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2025-08-28 at 11:21 -0400, Chuck Lever wrote:
> On 8/27/25 10:21 AM, Chuck Lever wrote:
> > On 8/26/25 6:00 PM, Olga Kornievskaia wrote:
> > > This patch tries to address the following failure:
> > > nfsdctl threads 0
> > > nfsdctl listener +rdma::20049
> > > nfsdctl listener +tcp::2049
> > > nfsdctl listener -tcp::2049
> > > nfsdctl: Error: Cannot assign requested address
> > > 
> > > The reason for the failure is due to the fact that socket cleanup only
> > > happens in __svc_rdma_free() which is a deferred work triggers when an
> > > rdma transport is destroyed. To remove a listener nfsdctl is forced to
> > > first remove all transports via svc_xprt_destroy_all() and then re-add
> > > the ones that are left. Due to the fact that there isn't a way to
> > > delete a particular entry from server's lwq sp_xprts that stores
> > > transports. Going back to the deferred work done in __svc_rdma_free(),
> > > the work might not get to run before nfsd_nl_listener_set_doit() creates
> > > the new transports. As a result, it finds that something is still
> > > listening of the rdma port and rdma_bind_addr() fails.
> > > 
> > > Instead of using svc_xprt_destroy_all() to manipulate the sp_xprt,
> > > instead introduce a function that just dequeues all transports. Then,
> > > we add non-removed transports back to the list.
> > > 
> > > Still not allowing to remove a listener while the server is active.
> > > 
> > > We need to make several passes over the list of existing/new list
> > > entries. On the first pass we determined if any of the entries need
> > > to be removed. If so, we then check if the server has no active
> > > threads. Then we dequeue all the transports and then go over the
> > > list and recreate both permsocks list and sp_xprts lists. Then,
> > > for the deleted transports, the transport is closed.
> > 
> > > --- Comments:
> > > (1) There is still a restriction on removing an active listener as
> > > I dont know how to handle if the transport to be remove is currently
> > > serving a request (it won't be on the sp_xprt list I believe?).
> > 
> > This is a good reason why just setting a bit in the xprt and waiting for
> > the close to complete is probably a better strategy than draining and
> > refilling the permsock list.
> > 
> > The idea of setting XPT_CLOSE and enqueuing the transport ... you know,
> > like this:
> > 
> >  151 /**
> > 
> >  152  * svc_xprt_deferred_close - Close a transport
> > 
> >  153  * @xprt: transport instance
> > 
> >  154  *
> > 
> >  155  * Used in contexts that need to defer the work of shutting down
> > 
> >  156  * the transport to an nfsd thread.
> > 
> >  157  */
> > 
> >  158 void svc_xprt_deferred_close(struct svc_xprt *xprt)
> > 
> >  159 {
> > 
> >  160         trace_svc_xprt_close(xprt);
> > 
> >  161         if (!test_and_set_bit(XPT_CLOSE, &xprt->xpt_flags))
> > 
> >  162                 svc_xprt_enqueue(xprt);
> > 
> >  163 }
> > 
> >  164 EXPORT_SYMBOL_GPL(svc_xprt_deferred_close);
> > 
> > I expect that eventually the xprt will show up to svc_handle_xprt() and
> > get deleted there. But you might still need some serialization with
> >   ->xpo_accept ?
> 
> It occurred to me why the deferred close mechanism doesn't work: it
> relies on having an nfsd thread to pick up the deferred work.
> 
> If listener removal requires all nfsd threads to be terminated, there
> is no thread to pick up the xprt and close it.
> 

Interesting. I guess that the old nfsdfs file just didn't allow you to
get into this situation, since you couldn't remove a listener at all.

It really sounds like we just need a more selective version of
svc_clean_up_xprts(). Something that can dequeue everything, close the
ones that need to be closed (synchronously) and then requeues the rest.


> 
> > > In general, I'm unsure if there are other things I'm not considering.
> > > (2) I'm questioning if in svc_xprt_dequeue_all() it is correct. I
> > > used svc_cleanup_up_xprts() as the example.
> > > > Fixes: d093c90892607 ("nfsd: fix management of listener transports")
> > > Signed-off-by: Olga Kornievskaia <okorniev@xxxxxxxxxx>
> > > ---
> > >  fs/nfsd/nfsctl.c                | 123 +++++++++++++++++++-------------
> > >  include/linux/sunrpc/svc_xprt.h |   1 +
> > >  include/linux/sunrpc/svcsock.h  |   1 -
> > >  net/sunrpc/svc_xprt.c           |  12 ++++
> > >  4 files changed, 88 insertions(+), 49 deletions(-)
> > > 
> > > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > > index dd3267b4c203..38aaaef4734e 100644
> > > --- a/fs/nfsd/nfsctl.c
> > > +++ b/fs/nfsd/nfsctl.c
> > > @@ -1902,44 +1902,17 @@ int nfsd_nl_version_get_doit(struct sk_buff *skb, struct genl_info *info)
> > >  	return err;
> > >  }
> > >  
> > > -/**
> > > - * nfsd_nl_listener_set_doit - set the nfs running sockets
> > > - * @skb: reply buffer
> > > - * @info: netlink metadata and command arguments
> > > - *
> > > - * Return 0 on success or a negative errno.
> > > - */
> > > -int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
> > > +static void _nfsd_walk_listeners(struct genl_info *info, struct svc_serv *serv,
> > > +				 struct list_head *permsocks, int modify_xprt)
> > 
> > So this function looks for the one listener we need to remove.
> > 
> > Should removing a listener also close down all active temporary sockets
> > for the service, or should it kill only the ones that were established
> > via the listener being removed, or should it leave all active temporary
> > sockets in place?
> > 
> > Perhaps this is why /all/ permanent and temporary sockets are currently
> > being removed. Once the target listener is gone, clients can't
> > re-establish new connections, and the service is effectively ready to
> > be shut down cleanly.
> > 
> > 
> > >  {
> > >  	struct net *net = genl_info_net(info);
> > > -	struct svc_xprt *xprt, *tmp;
> > >  	const struct nlattr *attr;
> > > -	struct svc_serv *serv;
> > > -	LIST_HEAD(permsocks);
> > > -	struct nfsd_net *nn;
> > > -	bool delete = false;
> > > -	int err, rem;
> > > -
> > > -	mutex_lock(&nfsd_mutex);
> > > -
> > > -	err = nfsd_create_serv(net);
> > > -	if (err) {
> > > -		mutex_unlock(&nfsd_mutex);
> > > -		return err;
> > > -	}
> > > -
> > > -	nn = net_generic(net, nfsd_net_id);
> > > -	serv = nn->nfsd_serv;
> > > -
> > > -	spin_lock_bh(&serv->sv_lock);
> > > +	struct svc_xprt *xprt, *tmp;
> > > +	int rem;
> > >  
> > > -	/* Move all of the old listener sockets to a temp list */
> > > -	list_splice_init(&serv->sv_permsocks, &permsocks);
> > > +	if (modify_xprt)
> > > +		svc_xprt_dequeue_all(serv);
> > >  
> > > -	/*
> > > -	 * Walk the list of server_socks from userland and move any that match
> > > -	 * back to sv_permsocks
> > > -	 */
> > >  	nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) {
> > >  		struct nlattr *tb[NFSD_A_SOCK_MAX + 1];
> > >  		const char *xcl_name;
> > > @@ -1962,7 +1935,7 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
> > >  		sa = nla_data(tb[NFSD_A_SOCK_ADDR]);
> > >  
> > >  		/* Put back any matching sockets */
> > > -		list_for_each_entry_safe(xprt, tmp, &permsocks, xpt_list) {
> > > +		list_for_each_entry_safe(xprt, tmp, permsocks, xpt_list) {
> > >  			/* This shouldn't be possible */
> > >  			if (WARN_ON_ONCE(xprt->xpt_net != net)) {
> > >  				list_move(&xprt->xpt_list, &serv->sv_permsocks);
> > > @@ -1971,35 +1944,89 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
> > >  
> > >  			/* If everything matches, put it back */
> > >  			if (!strcmp(xprt->xpt_class->xcl_name, xcl_name) &&
> > > -			    rpc_cmp_addr_port(sa, (struct sockaddr *)&xprt->xpt_local)) {
> > > +			    rpc_cmp_addr_port(sa,
> > > +				    (struct sockaddr *)&xprt->xpt_local)) {
> > >  				list_move(&xprt->xpt_list, &serv->sv_permsocks);
> > > +				if (modify_xprt)
> > > +					svc_xprt_enqueue(xprt);
> > >  				break;
> > >  			}
> > >  		}
> > >  	}
> > > +}
> > > +
> > > +/**
> > > + * nfsd_nl_listener_set_doit - set the nfs running sockets
> > > + * @skb: reply buffer
> > > + * @info: netlink metadata and command arguments
> > > + *
> > > + * Return 0 on success or a negative errno.
> > > + */
> > > +int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
> > > +{
> > > +	struct net *net = genl_info_net(info);
> > > +	struct svc_xprt *xprt;
> > > +	const struct nlattr *attr;
> > > +	struct svc_serv *serv;
> > > +	LIST_HEAD(permsocks);
> > > +	struct nfsd_net *nn;
> > > +	bool delete = false;
> > > +	int err, rem;
> > > +
> > > +	mutex_lock(&nfsd_mutex);
> > > +
> > > +	err = nfsd_create_serv(net);
> > > +	if (err) {
> > > +		mutex_unlock(&nfsd_mutex);
> > > +		return err;
> > > +	}
> > > +
> > > +	nn = net_generic(net, nfsd_net_id);
> > > +	serv = nn->nfsd_serv;
> > > +
> > > +	spin_lock_bh(&serv->sv_lock);
> > > +
> > > +	/* Move all of the old listener sockets to a temp list */
> > > +	list_splice_init(&serv->sv_permsocks, &permsocks);
> > >  
> > >  	/*
> > > -	 * If there are listener transports remaining on the permsocks list,
> > > -	 * it means we were asked to remove a listener.
> > > +	 * Walk the list of server_socks from userland and move any that match
> > > +	 * back to sv_permsocks. Determine if anything needs to be removed so
> > > +	 * don't manipulate sp_xprts list.
> > >  	 */
> > > -	if (!list_empty(&permsocks)) {
> > > -		list_splice_init(&permsocks, &serv->sv_permsocks);
> > > -		delete = true;
> > > -	}
> > > -	spin_unlock_bh(&serv->sv_lock);
> > > +	_nfsd_walk_listeners(info, serv, &permsocks, false);
> > >  
> > > -	/* Do not remove listeners while there are active threads. */
> > > -	if (serv->sv_nrthreads) {
> > > +	/* For now, no removing old sockets while server is running */
> > > +	if (serv->sv_nrthreads && !list_empty(&permsocks)) {
> > > +		list_splice_init(&permsocks, &serv->sv_permsocks);
> > > +		spin_unlock_bh(&serv->sv_lock);
> > >  		err = -EBUSY;
> > >  		goto out_unlock_mtx;
> > >  	}
> > >  
> > >  	/*
> > > -	 * Since we can't delete an arbitrary llist entry, destroy the
> > > -	 * remaining listeners and recreate the list.
> > > +	 * If there are listener transports remaining on the permsocks list,
> > > +	 * it means we were asked to remove a listener. Walk the list again,
> > > +	 * but this time also manage the sp_xprts but first removing all of
> > > +	 * them and only adding back the ones not being deleted. Then close
> > > +	 * the ones left on the list.
> > >  	 */
> > > -	if (delete)
> > > -		svc_xprt_destroy_all(serv, net, false);
> > > +	if (!list_empty(&permsocks)) {
> > > +		list_splice_init(&permsocks, &serv->sv_permsocks);
> > > +		list_splice_init(&serv->sv_permsocks, &permsocks);
> > > +		_nfsd_walk_listeners(info, serv, &permsocks, true);
> > > +		while (!list_empty(&permsocks)) {
> > > +			xprt = list_first_entry(&permsocks, struct svc_xprt, xpt_list);
> > > +			clear_bit(XPT_BUSY, &xprt->xpt_flags);
> > > +			set_bit(XPT_CLOSE, &xprt->xpt_flags);
> > > +			spin_unlock_bh(&serv->sv_lock);
> > > +			svc_xprt_close(xprt);
> > > +			spin_lock_bh(&serv->sv_lock);
> > > +		}
> > > +		spin_unlock_bh(&serv->sv_lock);
> > > +		goto out_unlock_mtx;
> > > +	}
> > > +	spin_unlock_bh(&serv->sv_lock);
> > >  
> > >  	/* walk list of addrs again, open any that still don't exist */
> > >  	nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) {
> > > diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
> > > index da2a2531e110..7038fd8ef20a 100644
> > > --- a/include/linux/sunrpc/svc_xprt.h
> > > +++ b/include/linux/sunrpc/svc_xprt.h
> > > @@ -186,6 +186,7 @@ int	svc_xprt_names(struct svc_serv *serv, char *buf, const int buflen);
> > >  void	svc_add_new_perm_xprt(struct svc_serv *serv, struct svc_xprt *xprt);
> > >  void	svc_age_temp_xprts_now(struct svc_serv *, struct sockaddr *);
> > >  void	svc_xprt_deferred_close(struct svc_xprt *xprt);
> > > +void	svc_xprt_dequeue_all(struct svc_serv *serv);
> > >  
> > >  static inline void svc_xprt_get(struct svc_xprt *xprt)
> > >  {
> > > diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
> > > index 963bbe251e52..4c1be01afdb7 100644
> > > --- a/include/linux/sunrpc/svcsock.h
> > > +++ b/include/linux/sunrpc/svcsock.h
> > > @@ -65,7 +65,6 @@ int		svc_addsock(struct svc_serv *serv, struct net *net,
> > >  			    const struct cred *cred);
> > >  void		svc_init_xprt_sock(void);
> > >  void		svc_cleanup_xprt_sock(void);
> > > -
> > >  /*
> > >   * svc_makesock socket characteristics
> > >   */
> > > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> > > index 6973184ff667..2aa46b9468d4 100644
> > > --- a/net/sunrpc/svc_xprt.c
> > > +++ b/net/sunrpc/svc_xprt.c
> > > @@ -890,6 +890,18 @@ void svc_recv(struct svc_rqst *rqstp)
> > >  }
> > >  EXPORT_SYMBOL_GPL(svc_recv);
> > >  
> > > +void svc_xprt_dequeue_all(struct svc_serv *serv)
> > > +{
> > > +	int i;
> > > +
> > > +	for (i = 0; i < serv->sv_nrpools; i++) {
> > > +		struct svc_pool *pool = &serv->sv_pools[i];
> > > +
> > > +		lwq_dequeue_all(&pool->sp_xprts);
> > > +	}
> > > +}
> > > +EXPORT_SYMBOL_GPL(svc_xprt_dequeue_all);
> > > +
> > >  /**
> > >   * svc_send - Return reply to client
> > >   * @rqstp: RPC transaction context
> > 
> > 
> 

-- 
Jeff Layton <jlayton@xxxxxxxxxx>





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux