Re: [PATCH] sunrpc: don't fail immediately in rpc_wait_bit_killable()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2025-08-28 at 18:12 +0530, Harshvardhan Jha wrote:
> Hi there,
> 
> On 20/08/25 3:08 AM, NeilBrown wrote:
> > rpc_wait_bit_killable() is called when it is appropriate for a
> > fatal
> > signal to abort the wait.
> > 
> > If it is called late during process exit after exit_signals() is
> > called
> > (and when PF_EXITING is set), it cannot receive a fatal signal so
> > waiting indefinitely is not safe.
> > 
> > However aborting immediately, as it currently does, is not ideal as
> > it
> > mean that the related NFS request cannot succeed, even if the
> > network
> > and server are working properly.
> > 
> > One of the causes of filesystem IO when PF_EXITING is set is
> > acct_process() which may access the process accounting file.  For a
> > NFS-root configuration, this can be accessed over NFS.
> > 
> > In this configuration LTP test "acct02" fails.
> > 
> > Though waiting indefinitely is not appropriate, aborting
> > immediately is
> > also not desirable.  This patch aims for a middle ground of waiting
> > at
> > most 5 seconds.  This should be enough when NFS service is working,
> > but
> > not so much as to delay process exit excessively when NFS service
> > is not
> > functioning.
> > 
> > Reported-by: Mark Brown <broonie@xxxxxxxxxx>
> > Reported-and-tested-by: Harshvardhan Jha
> > <harshvardhan.j.jha@xxxxxxxxxx>
> > Link:
> > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefense.com%2Fv3%2F__https%3A%2F%2Flore.kernel.org%2Flinux-nfs%2F7d4d57b0-39a3-49f1-8ada-60364743e3b4%40sirena.org.uk%2F__%3B!!ACWV5N9M2RV99hQ!LaRJdjZulcG71nHFWdEAszB9mJEhezxPsDxHO8xeQJ7P8a9UfYNRIm1ziuuHU5wxgEXW14vAqC1dlpSQraWaxA%24&data=05%7C02%7Ctrondmy%40hammerspace.com%7C5463825a86c248e5766c08dde6306312%7C0d4fed5c3a7046fe9430ece41741f59e%7C0%7C0%7C638919817692991630%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=GBRM89CFMk2vKyeN4yKBvjiV9IrODC4tbwMfRSk4Cfc%3D&reserved=0
> >  
> > Fixes: 14e41b16e8cb ("SUNRPC: Don't allow waiting for exiting
> > tasks")
> > Signed-off-by: NeilBrown <neil@xxxxxxxxxx>
> > ---
> >  net/sunrpc/sched.c | 14 +++++++++-----
> >  1 file changed, 9 insertions(+), 5 deletions(-)
> > 
> > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> > index 73bc39281ef5..92f39e828fbe 100644
> > --- a/net/sunrpc/sched.c
> > +++ b/net/sunrpc/sched.c
> > @@ -276,11 +276,15 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
> >  
> >  static int rpc_wait_bit_killable(struct wait_bit_key *key, int
> > mode)
> >  {
> > -	if (unlikely(current->flags & PF_EXITING))
> > -		return -EINTR;
> > -	schedule();
> > -	if (signal_pending_state(mode, current))
> > -		return -ERESTARTSYS;
> > +	if (unlikely(current->flags & PF_EXITING)) {
> > +		/* Cannot be killed by a signal, so don't wait
> > indefinitely */
> > +		if (schedule_timeout(5 * HZ) == 0)
> > +			return -EINTR;
> > +	} else {
> > +		schedule();
> > +		if (signal_pending_state(mode, current))
> > +			return -ERESTARTSYS;
> > +	}
> >  	return 0;
> >  }
> >  
> Is it possible to get this merged in 6.17? I have tested this and the
> LTP tests pass

If we are in this situation, it is because some signal has already
killed the parent task. That throws any data integrity guarantees right
out of the window.

So what problem is this patch trying to fix?

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@xxxxxxxxxx, trond.myklebust@xxxxxxxxxxxxxxx





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux