Re: [nfs-utils PATCH] rpc-statd.service: weaken the dependency on rpcbind.socket

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 06 Sep 2025, NeilBrown wrote:

> On Sat, 06 Sep 2025, Scott Mayhew wrote:
> > In 91da135f ("systemd unit files: fix up dependencies on rpcbind"),
> > Neil laid out the rationale for how the nfs services should define their
> > dependencies on rpcbind.  In a nutshell:
> > 
> > 1. Dependencies should only be defined using rpcbind.socket
> > 2. Ordering for dependencies should only be defined usint "After="
> > 3. nfs-server.service should use "Wants=rpcbind.socket", to allow
> >    rpcbind.socket to be masked in NFSv4-only setups.
> > 4. rpc-statd.service should use "Requires=rpcbind.socket", as rpc.statd
> >    is useless if it can't register with rpcbind.
> > 
> > Then in https://bugzilla.redhat.com/show_bug.cgi?id=2100395, Ben noted
> > that due to the way the dependencies are ordered, when 'systemctl stop
> > rpcbind.socket' is run, systemd first sends SIGTERM to rpcbind, then
> > SIGTERM to rpc.statd.  On SIGTERM, rpcbind tears down /var/run/rpcbind.sock.
> > However, rpc-statd on SIGTERM attempts to unregister from rpcbind.  This
> > results in a long delay:
> > 
> > [root@rawhide ~]# time systemctl restart rpcbind.socket
> > 
> > real	1m0.147s
> > user	0m0.004s
> > sys	0m0.003s
> > 
> > 8a835ceb ("rpc-statd.service: Stop rpcbind and rpc.stat in an exit race")
> > fixed this by changing the dependency in rpc-statd.service to use
> > "After=rpcbind.service", bending rule #1 from above.
> 
> Thanks for the thorough and detailed explanation.
> 
> I'd like to suggest a different fix.  Change rpc-statd.service to
> declare:
> 
> After=network-online.target nss-lookup.target rpcbind.socket rpcbind.service
> 
> i.e. it is declared to be After both the socket and the service.
> 
> "After" declarations only have effect if the units are in the same
> transaction.  If the Unit is not being started or stopped, the After
> declaration has no effect.
> 
> So on startup, this will ensure rpcbind.socket is started before
> rpc-statd.service.
> On shutdown in a transaction that stops both rpc-statd.service and
> rpcbind.service, rpcbind.service won't be stopped until after
> rpc-statd.service is stopped.

That works too.

> 
> I agree that it isn't necessary to restart rpc-statd when rpcbind is
> restarted.
> Maybe that is a justification to use Wants instead of Requires.
> Or maybe Upholds would be even better.

I think Upholds is confusing.... especially since there aren't any
existing unit files using it, at least on a stock Fedora Rawhide
system.  I don't see it being used on OpenSUSE Tumbleweed or Debian
Trixie either.  I think it's going to confuse users if they try to stop
rpcbind.socket and then find that it's still running.  Finally, when I tested
it, it prevented me from stopping rpc-statd.  Eventually the shutdown
timer hit and systemd sent rpc-statd a SIGABRT, which in turned
triggered the systemd-coredump handler.  That's a whole mess of syslog
entries that's going to more bug reports.  I'd rather stick with Wants.
 
> 
> I wonder if putting
> 
>  ConditionPathIsSymbolisLink !/etc/systemd/system/rpcbind.socket

I'm lost.  What what cause the rpcbind.socket symlink to be created
directly in /etc/systemd/system?  I've seen it get created in
/etc/systemd/system/sockets.target.wants or
/etc/systemd/system/multi-user.target.wants, but never directly in
/etc/systemd/system.

-Scott
> 
> in rpc-statd.service would be a suitable way to stop rpc-statd from
> starting if rpcbind.socket is masked.
> 
> In any case I think there are two separate issues here which deserve two
> separate patch.
> 1/ shutdown ordering isn't handled correctly.  Adding the extra After
>    directive should fix that
> 2/ rpc.statd is restarted unnecessarily.  Wants or Upholds might be part
>    of the solution.
> 
> Thanks,
> NeilBrown
> 
>    
> 
> > 
> > Yongcheng recently noted that when runnnig the following test:
> > 
> > [root@rawhide ~]# for i in `seq 10`; do systemctl reset-failed; \
> > 	systemctl stop rpcbind rpcbind.socket ; systemctl restart nfs-server ; \
> > 	systemctl status rpc-statd; done
> > 
> > rpc-statd.service would often fail to start:
> > 
> > × rpc-statd.service - NFS status monitor for NFSv2/3 locking.
> >      Loaded: loaded (/usr/lib/systemd/system/rpc-statd.service; enabled-runtime; preset: disabled)
> >     Drop-In: /usr/lib/systemd/system/service.d
> >              └─10-timeout-abort.conf
> >      Active: failed (Result: exit-code) since Fri 2025-09-05 18:01:15 EDT; 229ms ago
> >    Duration: 228ms
> >  Invocation: bafb2bb00761439ebc348000704e8fbb
> >        Docs: man:rpc.statd(8)
> >     Process: 29937 ExecStart=/usr/sbin/rpc.statd (code=exited, status=1/FAILURE)
> >    Mem peak: 1.5M
> >         CPU: 7ms
> > 
> > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Version 2.8.2 starting
> > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Flags: TI-RPC
> > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, udp): svc_reg() err: RPC: Remote system error - Connection refused
> > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, tcp): svc_reg() err: RPC: Success
> > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, udp6): svc_reg() err: RPC: Success
> > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, tcp6): svc_reg() err: RPC: Success
> > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: failed to create RPC listeners, exiting
> > Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: rpc-statd.service: Control process exited, code=exited, status=1/FAILURE
> > Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: rpc-statd.service: Failed with result 'exit-code'.
> > Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: Failed to start rpc-statd.service - NFS status monitor for NFSv2/3 locking..
> > 
> > I propose we revert the change from 8a835ceb and instead turn the
> > dependency into a weak dependency by using "Wants=rpcbind.socket"
> > instead of "Requires=rpcbind.socket".  This bends rule #4 above and will
> > make it so that systemd will try to start rpcbind.socket if it isn't
> > already running when rpc-statd.service starts, but it won't restart
> > rpc-statd.service whenever rpcbind is restarted.  Frankly, we shouldn't
> > need to restart services whenever rpcbind is restarted (thats why
> > rpcbind has the warmstart feature).  The only drawback is that now if an
> > admin wants to set up an NFSv4-only server by masking rpcbind.socket,
> > they'll need to mask rpc-statd.service as well.  I don't think that's
> > too much to ask, so the nfs.systemd man page has been updated
> > accordingly.
> > 
> > Signed-off-by: Scott Mayhew <smayhew@xxxxxxxxxx>
> > ---
> >  systemd/nfs.systemd.man   | 10 +++++++---
> >  systemd/rpc-statd.service |  5 +++--
> >  2 files changed, 10 insertions(+), 5 deletions(-)
> > 
> > diff --git a/systemd/nfs.systemd.man b/systemd/nfs.systemd.man
> > index a8476038..93fb87cd 100644
> > --- a/systemd/nfs.systemd.man
> > +++ b/systemd/nfs.systemd.man
> > @@ -137,7 +137,9 @@ NFSv2) and does not want
> >  .I rpcbind
> >  to be running, the correct approach is to run
> >  .RS
> > -.B systemctl mask rpcbind
> > +.B systemctl mask rpcbind.socket
> > +.br
> > +.B systemctl mask rpc-statd.service
> >  .RE
> >  This will disable
> >  .IR rpcbind ,
> > @@ -145,9 +147,11 @@ and the various NFS services which depend on it (and are only needed
> >  for NFSv3) will refuse to start, without interfering with the
> >  operation of NFSv4 services.  In particular,
> >  .I rpc.statd
> > -will not run when
> > +will fail to start when
> >  .I rpcbind
> > -is masked.
> > +is masked, so
> > +.I rpc-statd.service
> > +should be masked as well.
> >  .PP
> >  .I idmapd
> >  is only needed for NFSv4, and even then is not needed when the client
> > diff --git a/systemd/rpc-statd.service b/systemd/rpc-statd.service
> > index 660ed861..4e138f69 100644
> > --- a/systemd/rpc-statd.service
> > +++ b/systemd/rpc-statd.service
> > @@ -3,10 +3,11 @@ Description=NFS status monitor for NFSv2/3 locking.
> >  Documentation=man:rpc.statd(8)
> >  DefaultDependencies=no
> >  Conflicts=umount.target
> > -Requires=nss-lookup.target rpcbind.socket
> > +Requires=nss-lookup.target
> > +Wants=rpcbind.socket
> >  Wants=network-online.target
> >  Wants=rpc-statd-notify.service
> > -After=network-online.target nss-lookup.target rpcbind.service
> > +After=network-online.target nss-lookup.target rpcbind.socket
> >  
> >  PartOf=nfs-utils.service
> >  IgnoreOnIsolate=yes
> > -- 
> > 2.50.1
> > 
> > 
> 
> 






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux