Re: [nfs-utils PATCH] rpc-statd.service: weaken the dependency on rpcbind.socket

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 09 Sep 2025, Scott Mayhew wrote:
> On Sat, 06 Sep 2025, NeilBrown wrote:
> 
> > On Sat, 06 Sep 2025, Scott Mayhew wrote:
> > > In 91da135f ("systemd unit files: fix up dependencies on rpcbind"),
> > > Neil laid out the rationale for how the nfs services should define their
> > > dependencies on rpcbind.  In a nutshell:
> > > 
> > > 1. Dependencies should only be defined using rpcbind.socket
> > > 2. Ordering for dependencies should only be defined usint "After="
> > > 3. nfs-server.service should use "Wants=rpcbind.socket", to allow
> > >    rpcbind.socket to be masked in NFSv4-only setups.
> > > 4. rpc-statd.service should use "Requires=rpcbind.socket", as rpc.statd
> > >    is useless if it can't register with rpcbind.
> > > 
> > > Then in https://bugzilla.redhat.com/show_bug.cgi?id=2100395, Ben noted
> > > that due to the way the dependencies are ordered, when 'systemctl stop
> > > rpcbind.socket' is run, systemd first sends SIGTERM to rpcbind, then
> > > SIGTERM to rpc.statd.  On SIGTERM, rpcbind tears down /var/run/rpcbind.sock.
> > > However, rpc-statd on SIGTERM attempts to unregister from rpcbind.  This
> > > results in a long delay:
> > > 
> > > [root@rawhide ~]# time systemctl restart rpcbind.socket
> > > 
> > > real	1m0.147s
> > > user	0m0.004s
> > > sys	0m0.003s
> > > 
> > > 8a835ceb ("rpc-statd.service: Stop rpcbind and rpc.stat in an exit race")
> > > fixed this by changing the dependency in rpc-statd.service to use
> > > "After=rpcbind.service", bending rule #1 from above.
> > 
> > Thanks for the thorough and detailed explanation.
> > 
> > I'd like to suggest a different fix.  Change rpc-statd.service to
> > declare:
> > 
> > After=network-online.target nss-lookup.target rpcbind.socket rpcbind.service
> > 
> > i.e. it is declared to be After both the socket and the service.
> > 
> > "After" declarations only have effect if the units are in the same
> > transaction.  If the Unit is not being started or stopped, the After
> > declaration has no effect.
> > 
> > So on startup, this will ensure rpcbind.socket is started before
> > rpc-statd.service.
> > On shutdown in a transaction that stops both rpc-statd.service and
> > rpcbind.service, rpcbind.service won't be stopped until after
> > rpc-statd.service is stopped.
> 
> That works too.
> 
> > 
> > I agree that it isn't necessary to restart rpc-statd when rpcbind is
> > restarted.
> > Maybe that is a justification to use Wants instead of Requires.
> > Or maybe Upholds would be even better.
> 
> I think Upholds is confusing.... especially since there aren't any
> existing unit files using it, at least on a stock Fedora Rawhide
> system.  I don't see it being used on OpenSUSE Tumbleweed or Debian
> Trixie either.  I think it's going to confuse users if they try to stop
> rpcbind.socket and then find that it's still running.  Finally, when I tested
> it, it prevented me from stopping rpc-statd.  Eventually the shutdown
> timer hit and systemd sent rpc-statd a SIGABRT, which in turned
> triggered the systemd-coredump handler.  That's a whole mess of syslog
> entries that's going to more bug reports.  I'd rather stick with Wants.

Thanks for testing.  I've never used Upholds myself but I was reading
the man page any wondered if it might be a good fit.  Apparently it
isn't.


>  
> > 
> > I wonder if putting
> > 
> >  ConditionPathIsSymbolisLink !/etc/systemd/system/rpcbind.socket
> 
> I'm lost.  What what cause the rpcbind.socket symlink to be created
> directly in /etc/systemd/system?  I've seen it get created in
> /etc/systemd/system/sockets.target.wants or
> /etc/systemd/system/multi-user.target.wants, but never directly in
> /etc/systemd/system.

$ sudo systemctl mask rpcbind.socket
Created symlink '/etc/systemd/system/rpcbind.socket' → '/dev/null'.

I'm not sure it's a good idea.  I was mostly thinking aloud.

I think the After line should be changed to include both .service and
.socket and that cleanly fixed just the observed problem about shutdown.

And I don't think any other change should be made without a clear
demonstrated need.

Thanks,
NeilBrown


> 
> -Scott
> > 
> > in rpc-statd.service would be a suitable way to stop rpc-statd from
> > starting if rpcbind.socket is masked.
> > 
> > In any case I think there are two separate issues here which deserve two
> > separate patch.
> > 1/ shutdown ordering isn't handled correctly.  Adding the extra After
> >    directive should fix that
> > 2/ rpc.statd is restarted unnecessarily.  Wants or Upholds might be part
> >    of the solution.
> > 
> > Thanks,
> > NeilBrown
> > 
> >    
> > 
> > > 
> > > Yongcheng recently noted that when runnnig the following test:
> > > 
> > > [root@rawhide ~]# for i in `seq 10`; do systemctl reset-failed; \
> > > 	systemctl stop rpcbind rpcbind.socket ; systemctl restart nfs-server ; \
> > > 	systemctl status rpc-statd; done
> > > 
> > > rpc-statd.service would often fail to start:
> > > 
> > > × rpc-statd.service - NFS status monitor for NFSv2/3 locking.
> > >      Loaded: loaded (/usr/lib/systemd/system/rpc-statd.service; enabled-runtime; preset: disabled)
> > >     Drop-In: /usr/lib/systemd/system/service.d
> > >              └─10-timeout-abort.conf
> > >      Active: failed (Result: exit-code) since Fri 2025-09-05 18:01:15 EDT; 229ms ago
> > >    Duration: 228ms
> > >  Invocation: bafb2bb00761439ebc348000704e8fbb
> > >        Docs: man:rpc.statd(8)
> > >     Process: 29937 ExecStart=/usr/sbin/rpc.statd (code=exited, status=1/FAILURE)
> > >    Mem peak: 1.5M
> > >         CPU: 7ms
> > > 
> > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Version 2.8.2 starting
> > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Flags: TI-RPC
> > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, udp): svc_reg() err: RPC: Remote system error - Connection refused
> > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, tcp): svc_reg() err: RPC: Success
> > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, udp6): svc_reg() err: RPC: Success
> > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, tcp6): svc_reg() err: RPC: Success
> > > Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: failed to create RPC listeners, exiting
> > > Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: rpc-statd.service: Control process exited, code=exited, status=1/FAILURE
> > > Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: rpc-statd.service: Failed with result 'exit-code'.
> > > Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: Failed to start rpc-statd.service - NFS status monitor for NFSv2/3 locking..
> > > 
> > > I propose we revert the change from 8a835ceb and instead turn the
> > > dependency into a weak dependency by using "Wants=rpcbind.socket"
> > > instead of "Requires=rpcbind.socket".  This bends rule #4 above and will
> > > make it so that systemd will try to start rpcbind.socket if it isn't
> > > already running when rpc-statd.service starts, but it won't restart
> > > rpc-statd.service whenever rpcbind is restarted.  Frankly, we shouldn't
> > > need to restart services whenever rpcbind is restarted (thats why
> > > rpcbind has the warmstart feature).  The only drawback is that now if an
> > > admin wants to set up an NFSv4-only server by masking rpcbind.socket,
> > > they'll need to mask rpc-statd.service as well.  I don't think that's
> > > too much to ask, so the nfs.systemd man page has been updated
> > > accordingly.
> > > 
> > > Signed-off-by: Scott Mayhew <smayhew@xxxxxxxxxx>
> > > ---
> > >  systemd/nfs.systemd.man   | 10 +++++++---
> > >  systemd/rpc-statd.service |  5 +++--
> > >  2 files changed, 10 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/systemd/nfs.systemd.man b/systemd/nfs.systemd.man
> > > index a8476038..93fb87cd 100644
> > > --- a/systemd/nfs.systemd.man
> > > +++ b/systemd/nfs.systemd.man
> > > @@ -137,7 +137,9 @@ NFSv2) and does not want
> > >  .I rpcbind
> > >  to be running, the correct approach is to run
> > >  .RS
> > > -.B systemctl mask rpcbind
> > > +.B systemctl mask rpcbind.socket
> > > +.br
> > > +.B systemctl mask rpc-statd.service
> > >  .RE
> > >  This will disable
> > >  .IR rpcbind ,
> > > @@ -145,9 +147,11 @@ and the various NFS services which depend on it (and are only needed
> > >  for NFSv3) will refuse to start, without interfering with the
> > >  operation of NFSv4 services.  In particular,
> > >  .I rpc.statd
> > > -will not run when
> > > +will fail to start when
> > >  .I rpcbind
> > > -is masked.
> > > +is masked, so
> > > +.I rpc-statd.service
> > > +should be masked as well.
> > >  .PP
> > >  .I idmapd
> > >  is only needed for NFSv4, and even then is not needed when the client
> > > diff --git a/systemd/rpc-statd.service b/systemd/rpc-statd.service
> > > index 660ed861..4e138f69 100644
> > > --- a/systemd/rpc-statd.service
> > > +++ b/systemd/rpc-statd.service
> > > @@ -3,10 +3,11 @@ Description=NFS status monitor for NFSv2/3 locking.
> > >  Documentation=man:rpc.statd(8)
> > >  DefaultDependencies=no
> > >  Conflicts=umount.target
> > > -Requires=nss-lookup.target rpcbind.socket
> > > +Requires=nss-lookup.target
> > > +Wants=rpcbind.socket
> > >  Wants=network-online.target
> > >  Wants=rpc-statd-notify.service
> > > -After=network-online.target nss-lookup.target rpcbind.service
> > > +After=network-online.target nss-lookup.target rpcbind.socket
> > >  
> > >  PartOf=nfs-utils.service
> > >  IgnoreOnIsolate=yes
> > > -- 
> > > 2.50.1
> > > 
> > > 
> > 
> > 
> 
> 






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux