Re: [PATCH] sunrpc: don't fail immediately in rpc_wait_bit_killable()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2025-08-28 at 18:12 +0530, Harshvardhan Jha wrote:
> Hi there,
> 
> On 20/08/25 3:08 AM, NeilBrown wrote:
> > rpc_wait_bit_killable() is called when it is appropriate for a
> > fatal
> > signal to abort the wait.
> > 
> > If it is called late during process exit after exit_signals() is
> > called
> > (and when PF_EXITING is set), it cannot receive a fatal signal so
> > waiting indefinitely is not safe.
> > 
> > However aborting immediately, as it currently does, is not ideal as
> > it
> > mean that the related NFS request cannot succeed, even if the
> > network
> > and server are working properly.
> > 
> > One of the causes of filesystem IO when PF_EXITING is set is
> > acct_process() which may access the process accounting file.  For a
> > NFS-root configuration, this can be accessed over NFS.
> > 
> > In this configuration LTP test "acct02" fails.
> > 
> > Though waiting indefinitely is not appropriate, aborting
> > immediately is
> > also not desirable.  This patch aims for a middle ground of waiting
> > at
> > most 5 seconds.  This should be enough when NFS service is working,
> > but
> > not so much as to delay process exit excessively when NFS service
> > is not
> > functioning.
> > 
> > Reported-by: Mark Brown <broonie@xxxxxxxxxx>
> > Reported-and-tested-by: Harshvardhan Jha
> > <harshvardhan.j.jha@xxxxxxxxxx>
> > Link:
> > https://urldefense.com/v3/__https://lore.kernel.org/linux-nfs/7d4d57b0-39a3-49f1-8ada-60364743e3b4@xxxxxxxxxxxxx/__;!!ACWV5N9M2RV99hQ!LaRJdjZulcG71nHFWdEAszB9mJEhezxPsDxHO8xeQJ7P8a9UfYNRIm1ziuuHU5wxgEXW14vAqC1dlpSQraWaxA$
> >  
> > Fixes: 14e41b16e8cb ("SUNRPC: Don't allow waiting for exiting
> > tasks")
> > Signed-off-by: NeilBrown <neil@xxxxxxxxxx>
> > ---
> >  net/sunrpc/sched.c | 14 +++++++++-----
> >  1 file changed, 9 insertions(+), 5 deletions(-)
> > 
> > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> > index 73bc39281ef5..92f39e828fbe 100644
> > --- a/net/sunrpc/sched.c
> > +++ b/net/sunrpc/sched.c
> > @@ -276,11 +276,15 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
> >  
> >  static int rpc_wait_bit_killable(struct wait_bit_key *key, int
> > mode)
> >  {
> > -	if (unlikely(current->flags & PF_EXITING))
> > -		return -EINTR;
> > -	schedule();
> > -	if (signal_pending_state(mode, current))
> > -		return -ERESTARTSYS;
> > +	if (unlikely(current->flags & PF_EXITING)) {
> > +		/* Cannot be killed by a signal, so don't wait
> > indefinitely */
> > +		if (schedule_timeout(5 * HZ) == 0)
> > +			return -EINTR;
> > +	} else {
> > +		schedule();
> > +		if (signal_pending_state(mode, current))
> > +			return -ERESTARTSYS;
> > +	}
> >  	return 0;
> >  }
> >  
> Is it possible to get this merged in 6.17? I have tested this and the
> LTP tests pass.

After much thought, I think I'd rather just revert the commit that
caused the issue. I'll work on an alternative for the 6.18 timeframe
instead.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@xxxxxxxxxx, trond.myklebust@xxxxxxxxxxxxxxx





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux