Thanks for the Cc Just replying quickly without looking at anything Oleg Nesterov wrote on Tue, Mar 25, 2025 at 01:15:26PM +0100: > All I can say right now is that the "sigpending" logic in p9_client_rpc() > looks wrong. If nothing else: > > - clear_thread_flag(TIF_SIGPENDING) is not enough, it won't make > signal_pending() false if TIF_NOTIFY_SIGNAL is set. > > - otoh, if signal_pending() was true because of pending SIGKILL, > then after clear_thread_flag(TIF_SIGPENDING) wait_event_killable() > will act as uninterruptible wait_event(). Yeah, this is effectively an unkillable event loop once a flush has been sent; this is a known issue. I've tried to address this with async rpc (so we could send the flush and forget about it), but that caused other regressions and I never had time to dig into these... The patches date back 2018 and probably won't even apply cleanly anymore, but if anyone cares they are here: https://lore.kernel.org/all/1544532108-21689-3-git-send-email-asmadeus@xxxxxxxxxxxxx/T/#u (the hard work of refcounting was done just before that in order to kill this pattern, I just pretty much ran out of free time at that point, hobbies are hard...) So: sorry, it's probably possible to improve this, but it won't be easy nor immediate. > > c->trans_mod->request() calls p9_fd_request() in net/9p/trans_fd.c > > which basically does a p9_fd_poll(). > > > > Previously, the above would fail with err as -EIO which would > > cause the client to "Disconnect" and the retry logic would make > > progress. Now however, the err returned is -ERESTARTSYS which > > will not cause a disconnect and the retry logic will hang > > somewhere in p9_client_rpc() later. Now, if you got this far I think it'll be easier to make whatever changed error out with EIO again instead; I'll try to check the rest of the thread later this week as I didn't follow this thread at all. Thanks, -- Dominique