Re: [RFC PATCH 1/1] nfsd: rework how a listener is removed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/28/25 1:29 PM, Jeff Layton wrote:
> On Thu, 2025-08-28 at 11:21 -0400, Chuck Lever wrote:
>> On 8/27/25 10:21 AM, Chuck Lever wrote:
>>> On 8/26/25 6:00 PM, Olga Kornievskaia wrote:
>>>> This patch tries to address the following failure:
>>>> nfsdctl threads 0
>>>> nfsdctl listener +rdma::20049
>>>> nfsdctl listener +tcp::2049
>>>> nfsdctl listener -tcp::2049
>>>> nfsdctl: Error: Cannot assign requested address
>>>>
>>>> The reason for the failure is due to the fact that socket cleanup only
>>>> happens in __svc_rdma_free() which is a deferred work triggers when an
>>>> rdma transport is destroyed. To remove a listener nfsdctl is forced to
>>>> first remove all transports via svc_xprt_destroy_all() and then re-add
>>>> the ones that are left. Due to the fact that there isn't a way to
>>>> delete a particular entry from server's lwq sp_xprts that stores
>>>> transports. Going back to the deferred work done in __svc_rdma_free(),
>>>> the work might not get to run before nfsd_nl_listener_set_doit() creates
>>>> the new transports. As a result, it finds that something is still
>>>> listening of the rdma port and rdma_bind_addr() fails.
>>>>
>>>> Instead of using svc_xprt_destroy_all() to manipulate the sp_xprt,
>>>> instead introduce a function that just dequeues all transports. Then,
>>>> we add non-removed transports back to the list.
>>>>
>>>> Still not allowing to remove a listener while the server is active.
>>>>
>>>> We need to make several passes over the list of existing/new list
>>>> entries. On the first pass we determined if any of the entries need
>>>> to be removed. If so, we then check if the server has no active
>>>> threads. Then we dequeue all the transports and then go over the
>>>> list and recreate both permsocks list and sp_xprts lists. Then,
>>>> for the deleted transports, the transport is closed.
>>>
>>>> --- Comments:
>>>> (1) There is still a restriction on removing an active listener as
>>>> I dont know how to handle if the transport to be remove is currently
>>>> serving a request (it won't be on the sp_xprt list I believe?).
>>>
>>> This is a good reason why just setting a bit in the xprt and waiting for
>>> the close to complete is probably a better strategy than draining and
>>> refilling the permsock list.
>>>
>>> The idea of setting XPT_CLOSE and enqueuing the transport ... you know,
>>> like this:
>>>
>>>  151 /**
>>>
>>>  152  * svc_xprt_deferred_close - Close a transport
>>>
>>>  153  * @xprt: transport instance
>>>
>>>  154  *
>>>
>>>  155  * Used in contexts that need to defer the work of shutting down
>>>
>>>  156  * the transport to an nfsd thread.
>>>
>>>  157  */
>>>
>>>  158 void svc_xprt_deferred_close(struct svc_xprt *xprt)
>>>
>>>  159 {
>>>
>>>  160         trace_svc_xprt_close(xprt);
>>>
>>>  161         if (!test_and_set_bit(XPT_CLOSE, &xprt->xpt_flags))
>>>
>>>  162                 svc_xprt_enqueue(xprt);
>>>
>>>  163 }
>>>
>>>  164 EXPORT_SYMBOL_GPL(svc_xprt_deferred_close);
>>>
>>> I expect that eventually the xprt will show up to svc_handle_xprt() and
>>> get deleted there. But you might still need some serialization with
>>>   ->xpo_accept ?
>>
>> It occurred to me why the deferred close mechanism doesn't work: it
>> relies on having an nfsd thread to pick up the deferred work.
>>
>> If listener removal requires all nfsd threads to be terminated, there
>> is no thread to pick up the xprt and close it.
>>
> 
> Interesting. I guess that the old nfsdfs file just didn't allow you to
> get into this situation, since you couldn't remove a listener at all.
> 
> It really sounds like we just need a more selective version of
> svc_clean_up_xprts(). Something that can dequeue everything, close the
> ones that need to be closed (synchronously) and then requeues the rest.

That is roughly what Olga has done here. Now that I understand the
problem space a little better, maybe we can iterate on this.


>>>> In general, I'm unsure if there are other things I'm not considering.
>>>> (2) I'm questioning if in svc_xprt_dequeue_all() it is correct. I
>>>> used svc_cleanup_up_xprts() as the example.
>>>>> Fixes: d093c90892607 ("nfsd: fix management of listener transports")
>>>> Signed-off-by: Olga Kornievskaia <okorniev@xxxxxxxxxx>
>>>> ---
>>>>  fs/nfsd/nfsctl.c                | 123 +++++++++++++++++++-------------
>>>>  include/linux/sunrpc/svc_xprt.h |   1 +
>>>>  include/linux/sunrpc/svcsock.h  |   1 -
>>>>  net/sunrpc/svc_xprt.c           |  12 ++++
>>>>  4 files changed, 88 insertions(+), 49 deletions(-)
>>>>
>>>> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
>>>> index dd3267b4c203..38aaaef4734e 100644
>>>> --- a/fs/nfsd/nfsctl.c
>>>> +++ b/fs/nfsd/nfsctl.c
>>>> @@ -1902,44 +1902,17 @@ int nfsd_nl_version_get_doit(struct sk_buff *skb, struct genl_info *info)
>>>>  	return err;
>>>>  }
>>>>  
>>>> -/**
>>>> - * nfsd_nl_listener_set_doit - set the nfs running sockets
>>>> - * @skb: reply buffer
>>>> - * @info: netlink metadata and command arguments
>>>> - *
>>>> - * Return 0 on success or a negative errno.
>>>> - */
>>>> -int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>> +static void _nfsd_walk_listeners(struct genl_info *info, struct svc_serv *serv,
>>>> +				 struct list_head *permsocks, int modify_xprt)
>>>
>>> So this function looks for the one listener we need to remove.
>>>
>>> Should removing a listener also close down all active temporary sockets
>>> for the service, or should it kill only the ones that were established
>>> via the listener being removed, or should it leave all active temporary
>>> sockets in place?
>>>
>>> Perhaps this is why /all/ permanent and temporary sockets are currently
>>> being removed. Once the target listener is gone, clients can't
>>> re-establish new connections, and the service is effectively ready to
>>> be shut down cleanly.
>>>
>>>
>>>>  {
>>>>  	struct net *net = genl_info_net(info);
>>>> -	struct svc_xprt *xprt, *tmp;
>>>>  	const struct nlattr *attr;
>>>> -	struct svc_serv *serv;
>>>> -	LIST_HEAD(permsocks);
>>>> -	struct nfsd_net *nn;
>>>> -	bool delete = false;
>>>> -	int err, rem;
>>>> -
>>>> -	mutex_lock(&nfsd_mutex);
>>>> -
>>>> -	err = nfsd_create_serv(net);
>>>> -	if (err) {
>>>> -		mutex_unlock(&nfsd_mutex);
>>>> -		return err;
>>>> -	}
>>>> -
>>>> -	nn = net_generic(net, nfsd_net_id);
>>>> -	serv = nn->nfsd_serv;
>>>> -
>>>> -	spin_lock_bh(&serv->sv_lock);
>>>> +	struct svc_xprt *xprt, *tmp;
>>>> +	int rem;
>>>>  
>>>> -	/* Move all of the old listener sockets to a temp list */
>>>> -	list_splice_init(&serv->sv_permsocks, &permsocks);
>>>> +	if (modify_xprt)
>>>> +		svc_xprt_dequeue_all(serv);
>>>>  
>>>> -	/*
>>>> -	 * Walk the list of server_socks from userland and move any that match
>>>> -	 * back to sv_permsocks
>>>> -	 */
>>>>  	nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) {
>>>>  		struct nlattr *tb[NFSD_A_SOCK_MAX + 1];
>>>>  		const char *xcl_name;
>>>> @@ -1962,7 +1935,7 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>>  		sa = nla_data(tb[NFSD_A_SOCK_ADDR]);
>>>>  
>>>>  		/* Put back any matching sockets */
>>>> -		list_for_each_entry_safe(xprt, tmp, &permsocks, xpt_list) {
>>>> +		list_for_each_entry_safe(xprt, tmp, permsocks, xpt_list) {
>>>>  			/* This shouldn't be possible */
>>>>  			if (WARN_ON_ONCE(xprt->xpt_net != net)) {
>>>>  				list_move(&xprt->xpt_list, &serv->sv_permsocks);
>>>> @@ -1971,35 +1944,89 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>>  
>>>>  			/* If everything matches, put it back */
>>>>  			if (!strcmp(xprt->xpt_class->xcl_name, xcl_name) &&
>>>> -			    rpc_cmp_addr_port(sa, (struct sockaddr *)&xprt->xpt_local)) {
>>>> +			    rpc_cmp_addr_port(sa,
>>>> +				    (struct sockaddr *)&xprt->xpt_local)) {
>>>>  				list_move(&xprt->xpt_list, &serv->sv_permsocks);
>>>> +				if (modify_xprt)
>>>> +					svc_xprt_enqueue(xprt);
>>>>  				break;
>>>>  			}
>>>>  		}
>>>>  	}
>>>> +}
>>>> +
>>>> +/**
>>>> + * nfsd_nl_listener_set_doit - set the nfs running sockets
>>>> + * @skb: reply buffer
>>>> + * @info: netlink metadata and command arguments
>>>> + *
>>>> + * Return 0 on success or a negative errno.
>>>> + */
>>>> +int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>> +{
>>>> +	struct net *net = genl_info_net(info);
>>>> +	struct svc_xprt *xprt;
>>>> +	const struct nlattr *attr;
>>>> +	struct svc_serv *serv;
>>>> +	LIST_HEAD(permsocks);
>>>> +	struct nfsd_net *nn;
>>>> +	bool delete = false;
>>>> +	int err, rem;
>>>> +
>>>> +	mutex_lock(&nfsd_mutex);
>>>> +
>>>> +	err = nfsd_create_serv(net);
>>>> +	if (err) {
>>>> +		mutex_unlock(&nfsd_mutex);
>>>> +		return err;
>>>> +	}
>>>> +
>>>> +	nn = net_generic(net, nfsd_net_id);
>>>> +	serv = nn->nfsd_serv;
>>>> +
>>>> +	spin_lock_bh(&serv->sv_lock);
>>>> +
>>>> +	/* Move all of the old listener sockets to a temp list */
>>>> +	list_splice_init(&serv->sv_permsocks, &permsocks);
>>>>  
>>>>  	/*
>>>> -	 * If there are listener transports remaining on the permsocks list,
>>>> -	 * it means we were asked to remove a listener.
>>>> +	 * Walk the list of server_socks from userland and move any that match
>>>> +	 * back to sv_permsocks. Determine if anything needs to be removed so
>>>> +	 * don't manipulate sp_xprts list.
>>>>  	 */
>>>> -	if (!list_empty(&permsocks)) {
>>>> -		list_splice_init(&permsocks, &serv->sv_permsocks);
>>>> -		delete = true;
>>>> -	}
>>>> -	spin_unlock_bh(&serv->sv_lock);
>>>> +	_nfsd_walk_listeners(info, serv, &permsocks, false);
>>>>  
>>>> -	/* Do not remove listeners while there are active threads. */
>>>> -	if (serv->sv_nrthreads) {
>>>> +	/* For now, no removing old sockets while server is running */
>>>> +	if (serv->sv_nrthreads && !list_empty(&permsocks)) {
>>>> +		list_splice_init(&permsocks, &serv->sv_permsocks);
>>>> +		spin_unlock_bh(&serv->sv_lock);
>>>>  		err = -EBUSY;
>>>>  		goto out_unlock_mtx;
>>>>  	}
>>>>  
>>>>  	/*
>>>> -	 * Since we can't delete an arbitrary llist entry, destroy the
>>>> -	 * remaining listeners and recreate the list.
>>>> +	 * If there are listener transports remaining on the permsocks list,
>>>> +	 * it means we were asked to remove a listener. Walk the list again,
>>>> +	 * but this time also manage the sp_xprts but first removing all of
>>>> +	 * them and only adding back the ones not being deleted. Then close
>>>> +	 * the ones left on the list.
>>>>  	 */
>>>> -	if (delete)
>>>> -		svc_xprt_destroy_all(serv, net, false);
>>>> +	if (!list_empty(&permsocks)) {
>>>> +		list_splice_init(&permsocks, &serv->sv_permsocks);
>>>> +		list_splice_init(&serv->sv_permsocks, &permsocks);
>>>> +		_nfsd_walk_listeners(info, serv, &permsocks, true);
>>>> +		while (!list_empty(&permsocks)) {
>>>> +			xprt = list_first_entry(&permsocks, struct svc_xprt, xpt_list);
>>>> +			clear_bit(XPT_BUSY, &xprt->xpt_flags);
>>>> +			set_bit(XPT_CLOSE, &xprt->xpt_flags);
>>>> +			spin_unlock_bh(&serv->sv_lock);
>>>> +			svc_xprt_close(xprt);
>>>> +			spin_lock_bh(&serv->sv_lock);
>>>> +		}
>>>> +		spin_unlock_bh(&serv->sv_lock);
>>>> +		goto out_unlock_mtx;
>>>> +	}
>>>> +	spin_unlock_bh(&serv->sv_lock);
>>>>  
>>>>  	/* walk list of addrs again, open any that still don't exist */
>>>>  	nlmsg_for_each_attr(attr, info->nlhdr, GENL_HDRLEN, rem) {
>>>> diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h
>>>> index da2a2531e110..7038fd8ef20a 100644
>>>> --- a/include/linux/sunrpc/svc_xprt.h
>>>> +++ b/include/linux/sunrpc/svc_xprt.h
>>>> @@ -186,6 +186,7 @@ int	svc_xprt_names(struct svc_serv *serv, char *buf, const int buflen);
>>>>  void	svc_add_new_perm_xprt(struct svc_serv *serv, struct svc_xprt *xprt);
>>>>  void	svc_age_temp_xprts_now(struct svc_serv *, struct sockaddr *);
>>>>  void	svc_xprt_deferred_close(struct svc_xprt *xprt);
>>>> +void	svc_xprt_dequeue_all(struct svc_serv *serv);
>>>>  
>>>>  static inline void svc_xprt_get(struct svc_xprt *xprt)
>>>>  {
>>>> diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
>>>> index 963bbe251e52..4c1be01afdb7 100644
>>>> --- a/include/linux/sunrpc/svcsock.h
>>>> +++ b/include/linux/sunrpc/svcsock.h
>>>> @@ -65,7 +65,6 @@ int		svc_addsock(struct svc_serv *serv, struct net *net,
>>>>  			    const struct cred *cred);
>>>>  void		svc_init_xprt_sock(void);
>>>>  void		svc_cleanup_xprt_sock(void);
>>>> -
>>>>  /*
>>>>   * svc_makesock socket characteristics
>>>>   */
>>>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
>>>> index 6973184ff667..2aa46b9468d4 100644
>>>> --- a/net/sunrpc/svc_xprt.c
>>>> +++ b/net/sunrpc/svc_xprt.c
>>>> @@ -890,6 +890,18 @@ void svc_recv(struct svc_rqst *rqstp)
>>>>  }
>>>>  EXPORT_SYMBOL_GPL(svc_recv);
>>>>  
>>>> +void svc_xprt_dequeue_all(struct svc_serv *serv)
>>>> +{
>>>> +	int i;
>>>> +
>>>> +	for (i = 0; i < serv->sv_nrpools; i++) {
>>>> +		struct svc_pool *pool = &serv->sv_pools[i];
>>>> +
>>>> +		lwq_dequeue_all(&pool->sp_xprts);
>>>> +	}
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(svc_xprt_dequeue_all);
>>>> +
>>>>  /**
>>>>   * svc_send - Return reply to client
>>>>   * @rqstp: RPC transaction context
>>>
>>>
>>
> 


-- 
Chuck Lever




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux