On 9/10/25 1:14 PM, Olga Kornievskaia wrote: > On Thu, Sep 4, 2025 at 11:48 AM Chuck Lever <cel@xxxxxxxxxx> wrote: >> >> On 9/4/25 11:43 AM, Olga Kornievskaia wrote: >>> On Wed, Sep 3, 2025 at 11:53 AM Chuck Lever <cel@xxxxxxxxxx> wrote: >>>> >>>> From: Chuck Lever <chuck.lever@xxxxxxxxxx> >>>> >>>> NFSD has always supported added network listeners. The new netlink >>>> protocol now enables the removal of listeners. >>>> >>>> Olga noticed that if an RDMA listener is removed and immediately >>>> re-added, the deferred __svc_rdma_free() function might not have >>>> run yet, so some or all of the old listener's RDMA resources >>>> linger, which prevents a new listener on the same address from >>>> being created. >>> >>> Does this mean that you prefer to go the way of rdma synchronous >>> release vs the patch I posted? >> >> We could do both. IMO we need to make "remove listener" work while >> there are still nfsd threads running, and this RFC patch does >> nothing about that. >> >> But as noted below, it looks like the svc_xprt_free() code path assumes >> that ->xpo_free releases all transport resources synchronously, and >> there can be consequences if that does not happen. That needs to be >> addressed somehow. >> >> >>> I'm not against the approach as I have previously noted it as an >>> alternative which I tested and it also solves the problem. But I still >>> dont grasp the consequence of making svc_rdma_free() synchronous, >>> especially for active transports (not listening sockets). >> >> I've tested the synchronous approach a little, there didn't seem to >> be a problem with it. But I agree, the certainty level is not as >> high as it ought to be. > > So what do you think about including this patch? I don't see it in > your nfsd-testing branch. As you mentioned earlier, it needs some aggressive testing to ensure it does not introduce a deadlock. > Either this patch or my patch fix an existing problem and I believe > would be beneficial to include now (and backport). A solution for > removal of listeners on an active server can be worked on top of that. Agreed, both changes are worth including. >>>> Also, svc_xprt_free() does a module_put() just after calling >>>> ->xpo_free(). That means if there is deferred work going on, the >>>> module could be unloaded before that work is even started, >>>> resulting in a UAF. >>>> >>>> Neil asks: >>>>> What particular part of __svc_rdma_free() needs to run in order for a >>>>> subsequent registration to succeed? >>>>> Can that bit be run directory from svc_rdma_free() rather than be >>>>> delayed? >>>>> (I know almost nothing about rdma so forgive me if the answers to these >>>>> questions seems obvious) >>>> >>>> The reasons I can recall are: >>>> >>>> - Some of the transport tear-down work can sleep >>>> - Releasing a cm_id is tricky and can deadlock >>>> >>>> We might be able to mitigate the second issue with judicious >>>> application of transport reference counting. >>>> >>>> Reported-by: Olga Kornievskaia <okorniev@xxxxxxxxxx> >>>> Suggested-by: NeilBrown <neil@xxxxxxxxxx> >>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> >>>> --- >>>> net/sunrpc/svc_xprt.c | 1 + >>>> net/sunrpc/xprtrdma/svc_rdma_transport.c | 19 ++++++++----------- >>>> 2 files changed, 9 insertions(+), 11 deletions(-) >>>> >>>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c >>>> index 8b1837228799..8526bfc3ab20 100644 >>>> --- a/net/sunrpc/svc_xprt.c >>>> +++ b/net/sunrpc/svc_xprt.c >>>> @@ -168,6 +168,7 @@ static void svc_xprt_free(struct kref *kref) >>>> struct svc_xprt *xprt = >>>> container_of(kref, struct svc_xprt, xpt_ref); >>>> struct module *owner = xprt->xpt_class->xcl_owner; >>>> + >>>> if (test_bit(XPT_CACHE_AUTH, &xprt->xpt_flags)) >>>> svcauth_unix_info_release(xprt); >>>> put_cred(xprt->xpt_cred); >>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c >>>> index 3d7f1413df02..b7b318ad25c4 100644 >>>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c >>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c >>>> @@ -591,12 +591,18 @@ static void svc_rdma_detach(struct svc_xprt *xprt) >>>> rdma_disconnect(rdma->sc_cm_id); >>>> } >>>> >>>> -static void __svc_rdma_free(struct work_struct *work) >>>> +/** >>>> + * svc_rdma_free - Release class-specific transport resources >>>> + * @xprt: Generic svc transport object >>>> + */ >>>> +static void svc_rdma_free(struct svc_xprt *xprt) >>>> { >>>> struct svcxprt_rdma *rdma = >>>> - container_of(work, struct svcxprt_rdma, sc_work); >>>> + container_of(xprt, struct svcxprt_rdma, sc_xprt); >>>> struct ib_device *device = rdma->sc_cm_id->device; >>>> >>>> + might_sleep(); >>>> + >>>> /* This blocks until the Completion Queues are empty */ >>>> if (rdma->sc_qp && !IS_ERR(rdma->sc_qp)) >>>> ib_drain_qp(rdma->sc_qp); >>>> @@ -629,15 +635,6 @@ static void __svc_rdma_free(struct work_struct *work) >>>> kfree(rdma); >>>> } >>>> >>>> -static void svc_rdma_free(struct svc_xprt *xprt) >>>> -{ >>>> - struct svcxprt_rdma *rdma = >>>> - container_of(xprt, struct svcxprt_rdma, sc_xprt); >>>> - >>>> - INIT_WORK(&rdma->sc_work, __svc_rdma_free); >>>> - schedule_work(&rdma->sc_work); >>>> -} >>>> - >>>> static int svc_rdma_has_wspace(struct svc_xprt *xprt) >>>> { >>>> struct svcxprt_rdma *rdma = >>>> -- >>>> 2.50.0 >>>> >>> >> >> >> -- >> Chuck Lever >> -- Chuck Lever