On Sat, 2025-09-06 at 08:45 +1000, NeilBrown wrote: > On Sat, 06 Sep 2025, Trond Myklebust wrote: > > On Thu, 2025-08-28 at 18:12 +0530, Harshvardhan Jha wrote: > > > Is it possible to get this merged in 6.17? I have tested this and > > > the > > > LTP tests pass. > > > > After much thought, I think I'd rather just revert the commit that > > caused the issue. I'll work on an alternative for the 6.18 > > timeframe > > instead. > > That seems reasonable - thanks. I'd be curious to know what the > original issue was. I'm guess it was CLOSE blocking ?? > A customer of ours was seeing the NFS flush-on-close hanging after the process itself had been killed. So the patch was really intended to address the problem of signalled threads. The fact that it also affected ordinary processes that rely on exit() to close the file descriptors was due to a brain fart on my part. > If you do revert, would you consider the following? I wrote it a > while > ago but it became irrelevant with the patch that you might now > revert. > > I wonder if it would make sense for some part of bit_wait() (or > rpc_wait_bit_killable()) to warn if waiting in TASK_KILLABLE when > PF_EXITING is set. What I've been thinking is that for a process in the PF_EXITING state, we might automatically set RPC_TASK_TIMEOUT on any new RPC calls. The main problem with that is that it could cause loss of data, since ETIMEDOUT is a fatal error. So perhaps a new flag similar to RPC_TASK_TIMEDOUT, but that instead sets EINTR? The reason for doing that rather than changing the wait, is that when everything is working correctly, we might want to allow the exiting process to wait for a very large file flush to complete. It's only when the RPC calls themselves start to hang due to the server being unavailable, that we want to actually pull the plug. > > Thanks, > NeilBrown > -- Trond Myklebust Linux NFS client maintainer, Hammerspace trondmy@xxxxxxxxxx, trond.myklebust@xxxxxxxxxxxxxxx