On 9/4/25 11:43 AM, Olga Kornievskaia wrote: > On Wed, Sep 3, 2025 at 11:53 AM Chuck Lever <cel@xxxxxxxxxx> wrote: >> >> From: Chuck Lever <chuck.lever@xxxxxxxxxx> >> >> NFSD has always supported added network listeners. The new netlink >> protocol now enables the removal of listeners. >> >> Olga noticed that if an RDMA listener is removed and immediately >> re-added, the deferred __svc_rdma_free() function might not have >> run yet, so some or all of the old listener's RDMA resources >> linger, which prevents a new listener on the same address from >> being created. > > Does this mean that you prefer to go the way of rdma synchronous > release vs the patch I posted? We could do both. IMO we need to make "remove listener" work while there are still nfsd threads running, and this RFC patch does nothing about that. But as noted below, it looks like the svc_xprt_free() code path assumes that ->xpo_free releases all transport resources synchronously, and there can be consequences if that does not happen. That needs to be addressed somehow. > I'm not against the approach as I have previously noted it as an > alternative which I tested and it also solves the problem. But I still > dont grasp the consequence of making svc_rdma_free() synchronous, > especially for active transports (not listening sockets). I've tested the synchronous approach a little, there didn't seem to be a problem with it. But I agree, the certainty level is not as high as it ought to be. >> Also, svc_xprt_free() does a module_put() just after calling >> ->xpo_free(). That means if there is deferred work going on, the >> module could be unloaded before that work is even started, >> resulting in a UAF. >> >> Neil asks: >>> What particular part of __svc_rdma_free() needs to run in order for a >>> subsequent registration to succeed? >>> Can that bit be run directory from svc_rdma_free() rather than be >>> delayed? >>> (I know almost nothing about rdma so forgive me if the answers to these >>> questions seems obvious) >> >> The reasons I can recall are: >> >> - Some of the transport tear-down work can sleep >> - Releasing a cm_id is tricky and can deadlock >> >> We might be able to mitigate the second issue with judicious >> application of transport reference counting. >> >> Reported-by: Olga Kornievskaia <okorniev@xxxxxxxxxx> >> Suggested-by: NeilBrown <neil@xxxxxxxxxx> >> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> >> --- >> net/sunrpc/svc_xprt.c | 1 + >> net/sunrpc/xprtrdma/svc_rdma_transport.c | 19 ++++++++----------- >> 2 files changed, 9 insertions(+), 11 deletions(-) >> >> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c >> index 8b1837228799..8526bfc3ab20 100644 >> --- a/net/sunrpc/svc_xprt.c >> +++ b/net/sunrpc/svc_xprt.c >> @@ -168,6 +168,7 @@ static void svc_xprt_free(struct kref *kref) >> struct svc_xprt *xprt = >> container_of(kref, struct svc_xprt, xpt_ref); >> struct module *owner = xprt->xpt_class->xcl_owner; >> + >> if (test_bit(XPT_CACHE_AUTH, &xprt->xpt_flags)) >> svcauth_unix_info_release(xprt); >> put_cred(xprt->xpt_cred); >> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c >> index 3d7f1413df02..b7b318ad25c4 100644 >> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c >> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c >> @@ -591,12 +591,18 @@ static void svc_rdma_detach(struct svc_xprt *xprt) >> rdma_disconnect(rdma->sc_cm_id); >> } >> >> -static void __svc_rdma_free(struct work_struct *work) >> +/** >> + * svc_rdma_free - Release class-specific transport resources >> + * @xprt: Generic svc transport object >> + */ >> +static void svc_rdma_free(struct svc_xprt *xprt) >> { >> struct svcxprt_rdma *rdma = >> - container_of(work, struct svcxprt_rdma, sc_work); >> + container_of(xprt, struct svcxprt_rdma, sc_xprt); >> struct ib_device *device = rdma->sc_cm_id->device; >> >> + might_sleep(); >> + >> /* This blocks until the Completion Queues are empty */ >> if (rdma->sc_qp && !IS_ERR(rdma->sc_qp)) >> ib_drain_qp(rdma->sc_qp); >> @@ -629,15 +635,6 @@ static void __svc_rdma_free(struct work_struct *work) >> kfree(rdma); >> } >> >> -static void svc_rdma_free(struct svc_xprt *xprt) >> -{ >> - struct svcxprt_rdma *rdma = >> - container_of(xprt, struct svcxprt_rdma, sc_xprt); >> - >> - INIT_WORK(&rdma->sc_work, __svc_rdma_free); >> - schedule_work(&rdma->sc_work); >> -} >> - >> static int svc_rdma_has_wspace(struct svc_xprt *xprt) >> { >> struct svcxprt_rdma *rdma = >> -- >> 2.50.0 >> > -- Chuck Lever