Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter

Oleg Nesterov <oleg@xxxxxxxxxx> · Wed, 26 Mar 2025 13:19:47 +0100

On 03/25, Dominique Martinet wrote:
>
> Thanks for the traces.
>
> w/ revert
> K Prateek Nayak wrote on Tue, Mar 25, 2025 at 08:19:26PM +0530:
> >    kworker/100:1-1803    [100] .....   286.618822: p9_fd_poll: p9_fd_poll rd poll
> >    kworker/100:1-1803    [100] .....   286.618822: p9_fd_poll: p9_fd_request wr poll
> >    kworker/100:1-1803    [100] .....   286.618823: p9_read_work: Data read wait 7
>
> new behavior
> >            repro-4076    [031] .....    95.011394: p9_fd_poll: p9_fd_poll rd poll
> >            repro-4076    [031] .....    95.011394: p9_fd_poll: p9_fd_request wr poll
> >            repro-4076    [031] .....    99.731970: p9_client_rpc: Wait event killable (-512)
>
> For me the problem isn't so much that this gets ERESTARTSYS but that it
> nevers gets to read the 7 bytes that are available?

Yes...

OK, lets first recall what the commit aaec5a95d59615523 ("pipe_read:
don't wake up the writer if the pipe is still full") does.
It simply removes the unnecessary/spurious wakeups when the writer
can't add more data to the pipe.

See the "stupid test-cas" in
https://lore.kernel.org/all/20250120144338.GC7432@xxxxxxxxxx/

In particular this note:

	As you can see, without this patch pipe_read() wakes the writer up
	4095 times for no reason, the writer burns a bit of CPU and blocks
	again after wakeup until the last read(fd[0], &c, 1).

in this test-case the writer sleeps in pipe_write(), but the same is true
for the task sleeping in poll( { .fd = pipe_fd, .events = POLLOUT}, ...).

Now, after some grepping I have found

	static void p9_conn_create(struct p9_client *client)
	{
		...

		init_poll_funcptr(&m->pt, p9_pollwait);

		n = p9_fd_poll(client, &m->pt, NULL);

		...
	}

So, iiuc, in this case p9_fd_poll(&m->pt /* != NULL */) -> p9_pollwait()
paths will add the "dummy" pwait->wait entries with ->func = p9_pollwake
to pipe_inode_info.rd_wait and pipe_inode_info.wr_wait.

Hmm... I don't understand why the 2nd vfs_poll(ts->wr) depends on the
ret from vfs_poll(ts->rd), but I assume this is correct.

This means that every time pipe_read() does wake_up(&pipe->wr_wait)
p9_pollwake() is called. This function kicks p9_poll_workfn() which
calls p9_poll_mux() which calls p9_fd_poll() again with pt == NULL.

In this case the conditional vfs_poll(ts->wr) looks more understandable...

So. Without the commit above, p9_poll_mux()->p9_fd_poll() can be called
much more often and, in particular, can report the "additional" EPOLLIN.

Can this somehow explain the problem?

Oleg.