Re: [nfs-utils PATCH] rpc-statd.service: weaken the dependency on rpcbind.socket

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 06 Sep 2025, Scott Mayhew wrote:
> In 91da135f ("systemd unit files: fix up dependencies on rpcbind"),
> Neil laid out the rationale for how the nfs services should define their
> dependencies on rpcbind.  In a nutshell:
> 
> 1. Dependencies should only be defined using rpcbind.socket
> 2. Ordering for dependencies should only be defined usint "After="
> 3. nfs-server.service should use "Wants=rpcbind.socket", to allow
>    rpcbind.socket to be masked in NFSv4-only setups.
> 4. rpc-statd.service should use "Requires=rpcbind.socket", as rpc.statd
>    is useless if it can't register with rpcbind.
> 
> Then in https://bugzilla.redhat.com/show_bug.cgi?id=2100395, Ben noted
> that due to the way the dependencies are ordered, when 'systemctl stop
> rpcbind.socket' is run, systemd first sends SIGTERM to rpcbind, then
> SIGTERM to rpc.statd.  On SIGTERM, rpcbind tears down /var/run/rpcbind.sock.
> However, rpc-statd on SIGTERM attempts to unregister from rpcbind.  This
> results in a long delay:
> 
> [root@rawhide ~]# time systemctl restart rpcbind.socket
> 
> real	1m0.147s
> user	0m0.004s
> sys	0m0.003s
> 
> 8a835ceb ("rpc-statd.service: Stop rpcbind and rpc.stat in an exit race")
> fixed this by changing the dependency in rpc-statd.service to use
> "After=rpcbind.service", bending rule #1 from above.

Thanks for the thorough and detailed explanation.

I'd like to suggest a different fix.  Change rpc-statd.service to
declare:

After=network-online.target nss-lookup.target rpcbind.socket rpcbind.service

i.e. it is declared to be After both the socket and the service.

"After" declarations only have effect if the units are in the same
transaction.  If the Unit is not being started or stopped, the After
declaration has no effect.

So on startup, this will ensure rpcbind.socket is started before
rpc-statd.service.
On shutdown in a transaction that stops both rpc-statd.service and
rpcbind.service, rpcbind.service won't be stopped until after
rpc-statd.service is stopped.

I agree that it isn't necessary to restart rpc-statd when rpcbind is
restarted.
Maybe that is a justification to use Wants instead of Requires.
Or maybe Upholds would be even better.

I wonder if putting

 ConditionPathIsSymbolisLink !/etc/systemd/system/rpcbind.socket

in rpc-statd.service would be a suitable way to stop rpc-statd from
starting if rpcbind.socket is masked.

In any case I think there are two separate issues here which deserve two
separate patch.
1/ shutdown ordering isn't handled correctly.  Adding the extra After
   directive should fix that
2/ rpc.statd is restarted unnecessarily.  Wants or Upholds might be part
   of the solution.

Thanks,
NeilBrown

   

> 
> Yongcheng recently noted that when runnnig the following test:
> 
> [root@rawhide ~]# for i in `seq 10`; do systemctl reset-failed; \
> 	systemctl stop rpcbind rpcbind.socket ; systemctl restart nfs-server ; \
> 	systemctl status rpc-statd; done
> 
> rpc-statd.service would often fail to start:
> 
> × rpc-statd.service - NFS status monitor for NFSv2/3 locking.
>      Loaded: loaded (/usr/lib/systemd/system/rpc-statd.service; enabled-runtime; preset: disabled)
>     Drop-In: /usr/lib/systemd/system/service.d
>              └─10-timeout-abort.conf
>      Active: failed (Result: exit-code) since Fri 2025-09-05 18:01:15 EDT; 229ms ago
>    Duration: 228ms
>  Invocation: bafb2bb00761439ebc348000704e8fbb
>        Docs: man:rpc.statd(8)
>     Process: 29937 ExecStart=/usr/sbin/rpc.statd (code=exited, status=1/FAILURE)
>    Mem peak: 1.5M
>         CPU: 7ms
> 
> Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Version 2.8.2 starting
> Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Flags: TI-RPC
> Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, udp): svc_reg() err: RPC: Remote system error - Connection refused
> Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, tcp): svc_reg() err: RPC: Success
> Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, udp6): svc_reg() err: RPC: Success
> Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: Failed to register (statd, 1, tcp6): svc_reg() err: RPC: Success
> Sep 05 18:01:15 rawhide.smayhew.test rpc.statd[29938]: failed to create RPC listeners, exiting
> Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: rpc-statd.service: Control process exited, code=exited, status=1/FAILURE
> Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: rpc-statd.service: Failed with result 'exit-code'.
> Sep 05 18:01:15 rawhide.smayhew.test systemd[1]: Failed to start rpc-statd.service - NFS status monitor for NFSv2/3 locking..
> 
> I propose we revert the change from 8a835ceb and instead turn the
> dependency into a weak dependency by using "Wants=rpcbind.socket"
> instead of "Requires=rpcbind.socket".  This bends rule #4 above and will
> make it so that systemd will try to start rpcbind.socket if it isn't
> already running when rpc-statd.service starts, but it won't restart
> rpc-statd.service whenever rpcbind is restarted.  Frankly, we shouldn't
> need to restart services whenever rpcbind is restarted (thats why
> rpcbind has the warmstart feature).  The only drawback is that now if an
> admin wants to set up an NFSv4-only server by masking rpcbind.socket,
> they'll need to mask rpc-statd.service as well.  I don't think that's
> too much to ask, so the nfs.systemd man page has been updated
> accordingly.
> 
> Signed-off-by: Scott Mayhew <smayhew@xxxxxxxxxx>
> ---
>  systemd/nfs.systemd.man   | 10 +++++++---
>  systemd/rpc-statd.service |  5 +++--
>  2 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/systemd/nfs.systemd.man b/systemd/nfs.systemd.man
> index a8476038..93fb87cd 100644
> --- a/systemd/nfs.systemd.man
> +++ b/systemd/nfs.systemd.man
> @@ -137,7 +137,9 @@ NFSv2) and does not want
>  .I rpcbind
>  to be running, the correct approach is to run
>  .RS
> -.B systemctl mask rpcbind
> +.B systemctl mask rpcbind.socket
> +.br
> +.B systemctl mask rpc-statd.service
>  .RE
>  This will disable
>  .IR rpcbind ,
> @@ -145,9 +147,11 @@ and the various NFS services which depend on it (and are only needed
>  for NFSv3) will refuse to start, without interfering with the
>  operation of NFSv4 services.  In particular,
>  .I rpc.statd
> -will not run when
> +will fail to start when
>  .I rpcbind
> -is masked.
> +is masked, so
> +.I rpc-statd.service
> +should be masked as well.
>  .PP
>  .I idmapd
>  is only needed for NFSv4, and even then is not needed when the client
> diff --git a/systemd/rpc-statd.service b/systemd/rpc-statd.service
> index 660ed861..4e138f69 100644
> --- a/systemd/rpc-statd.service
> +++ b/systemd/rpc-statd.service
> @@ -3,10 +3,11 @@ Description=NFS status monitor for NFSv2/3 locking.
>  Documentation=man:rpc.statd(8)
>  DefaultDependencies=no
>  Conflicts=umount.target
> -Requires=nss-lookup.target rpcbind.socket
> +Requires=nss-lookup.target
> +Wants=rpcbind.socket
>  Wants=network-online.target
>  Wants=rpc-statd-notify.service
> -After=network-online.target nss-lookup.target rpcbind.service
> +After=network-online.target nss-lookup.target rpcbind.socket
>  
>  PartOf=nfs-utils.service
>  IgnoreOnIsolate=yes
> -- 
> 2.50.1
> 
> 






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux