On Thu, 2025-08-28 at 18:12 +0530, Harshvardhan Jha wrote: > Hi there, > > On 20/08/25 3:08 AM, NeilBrown wrote: > > rpc_wait_bit_killable() is called when it is appropriate for a > > fatal > > signal to abort the wait. > > > > If it is called late during process exit after exit_signals() is > > called > > (and when PF_EXITING is set), it cannot receive a fatal signal so > > waiting indefinitely is not safe. > > > > However aborting immediately, as it currently does, is not ideal as > > it > > mean that the related NFS request cannot succeed, even if the > > network > > and server are working properly. > > > > One of the causes of filesystem IO when PF_EXITING is set is > > acct_process() which may access the process accounting file. For a > > NFS-root configuration, this can be accessed over NFS. > > > > In this configuration LTP test "acct02" fails. > > > > Though waiting indefinitely is not appropriate, aborting > > immediately is > > also not desirable. This patch aims for a middle ground of waiting > > at > > most 5 seconds. This should be enough when NFS service is working, > > but > > not so much as to delay process exit excessively when NFS service > > is not > > functioning. > > > > Reported-by: Mark Brown <broonie@xxxxxxxxxx> > > Reported-and-tested-by: Harshvardhan Jha > > <harshvardhan.j.jha@xxxxxxxxxx> > > Link: > > https://urldefense.com/v3/__https://lore.kernel.org/linux-nfs/7d4d57b0-39a3-49f1-8ada-60364743e3b4@xxxxxxxxxxxxx/__;!!ACWV5N9M2RV99hQ!LaRJdjZulcG71nHFWdEAszB9mJEhezxPsDxHO8xeQJ7P8a9UfYNRIm1ziuuHU5wxgEXW14vAqC1dlpSQraWaxA$ > > > > Fixes: 14e41b16e8cb ("SUNRPC: Don't allow waiting for exiting > > tasks") > > Signed-off-by: NeilBrown <neil@xxxxxxxxxx> > > --- > > net/sunrpc/sched.c | 14 +++++++++----- > > 1 file changed, 9 insertions(+), 5 deletions(-) > > > > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c > > index 73bc39281ef5..92f39e828fbe 100644 > > --- a/net/sunrpc/sched.c > > +++ b/net/sunrpc/sched.c > > @@ -276,11 +276,15 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue); > > > > static int rpc_wait_bit_killable(struct wait_bit_key *key, int > > mode) > > { > > - if (unlikely(current->flags & PF_EXITING)) > > - return -EINTR; > > - schedule(); > > - if (signal_pending_state(mode, current)) > > - return -ERESTARTSYS; > > + if (unlikely(current->flags & PF_EXITING)) { > > + /* Cannot be killed by a signal, so don't wait > > indefinitely */ > > + if (schedule_timeout(5 * HZ) == 0) > > + return -EINTR; > > + } else { > > + schedule(); > > + if (signal_pending_state(mode, current)) > > + return -ERESTARTSYS; > > + } > > return 0; > > } > > > Is it possible to get this merged in 6.17? I have tested this and the > LTP tests pass. After much thought, I think I'd rather just revert the commit that caused the issue. I'll work on an alternative for the 6.18 timeframe instead. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trondmy@xxxxxxxxxx, trond.myklebust@xxxxxxxxxxxxxxx