Re: [PATCH] 9p/trans_fd: p9_fd_request: kick rx thread if EPOLLIN

K Prateek Nayak <kprateek.nayak@xxxxxxx> · Wed, 20 Aug 2025 11:59:20 +0530

Hello Oleg,

On 8/19/2025 9:40 PM, Oleg Nesterov wrote:
> p9_read_work() doesn't set Rworksched and doesn't do schedule_work(m->rq)
> if list_empty(&m->req_list).
> 
> However, if the pipe is full, we need to read more data and this used to
> work prior to commit aaec5a95d59615 ("pipe_read: don't wake up the writer
> if the pipe is still full").
> 
> p9_read_work() does p9_fd_read() -> ... -> anon_pipe_read() which (before
> the commit above) triggered the unnecessary wakeup. This wakeup calls
> p9_pollwake() which kicks p9_poll_workfn() -> p9_poll_mux(), p9_poll_mux()
> will notice EPOLLIN and schedule_work(&m->rq).
> 
> This no longer happens after the optimization above, change p9_fd_request()
> to use p9_poll_mux() instead of only checking for EPOLLOUT.
> 
> Reported-by: syzbot+d1b5dace43896bc386c3@xxxxxxxxxxxxxxxxxxxxxxxxx
> Tested-by: syzbot+d1b5dace43896bc386c3@xxxxxxxxxxxxxxxxxxxxxxxxx
> Closes: https://lore.kernel.org/all/68a2de8f.050a0220.e29e5.0097.GAE@xxxxxxxxxx/
> Link: https://lore.kernel.org/all/67dedd2f.050a0220.31a16b.003f.GAE@xxxxxxxxxx/
> Co-developed-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>
> Signed-off-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>

A "Debugged-by:" or equivalent would have been fine too since you did
most of the heavy lifting by finding p9_poll_mux() but I don't mind
standing behind this since it is doing the right thing :)

I tested this on top of v6.17-rc2 and the upstream runs into a hang
instantly with the syzbot's reproducer. The dmesg logs:

    INFO: task repro:4150 blocked for more than 120 seconds.
          Not tainted 6.17.0-rc2-upstream #34
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    task:repro           state:D stack:0     pid:4150  tgid:4150  ppid:1      task_flags:0x400140 flags:0x00004006
    Call Trace:
     <TASK>
     __schedule+0x474/0x1620
     ? __wb_update_bandwidth+0x37/0x1d0
     schedule+0x27/0xd0
     io_schedule+0x46/0x70
     folio_wait_bit_common+0x112/0x300
     ? filemap_get_folios_tag+0x232/0x2a0
     ? __pfx_wake_page_function+0x10/0x10
     folio_wait_writeback+0x2b/0x80
     __filemap_fdatawait_range+0x7c/0xe0
     file_write_and_wait_range+0x89/0xb0
     v9fs_file_fsync+0x2d/0x90 [9p]
     netfs_file_write_iter+0xec/0x120 [netfs]
     vfs_write+0x305/0x420
     ksys_write+0x65/0xe0
     do_syscall_64+0x85/0xb30
     ? do_syscall_64+0x223/0xb30
     ? count_memcg_events+0xd9/0x1c0
     ? handle_mm_fault+0x1af/0x290
     ? do_user_addr_fault+0x2d0/0x8c0
     entry_SYSCALL_64_after_hwframe+0x76/0x7e
    RIP: 0033:0x7f3b26d1e88d
    RSP: 002b:00007ffe581fa348 EFLAGS: 00000213 ORIG_RAX: 0000000000000001
    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3b26d1e88d
    RDX: 0000000000007fec RSI: 0000200000000300 RDI: 0000000000000007
    RBP: 00007ffe581fa360 R08: 00007ffe581fa360 R09: 00007ffe581fa360
    R10: 00007ffe581fa360 R11: 0000000000000213 R12: 00007ffe581fa4b8
    R13: 0000558168a6de12 R14: 0000558168a6fd10 R15: 00007f3b26f03040
     </TASK>

With this patch applied on top, I haven't seen a hang yet and I've been
running it for 30min now so feel free to also include:

Tested-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>

> Signed-off-by: Oleg Nesterov <oleg@xxxxxxxxxx>
> ---
>  net/9p/trans_fd.c | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> index 339ec4e54778..474fe67f72ac 100644
> --- a/net/9p/trans_fd.c
> +++ b/net/9p/trans_fd.c
> @@ -666,7 +666,6 @@ static void p9_poll_mux(struct p9_conn *m)
>  
>  static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
>  {
> -	__poll_t n;
>  	int err;
>  	struct p9_trans_fd *ts = client->trans;
>  	struct p9_conn *m = &ts->conn;
> @@ -686,13 +685,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
>  	list_add_tail(&req->req_list, &m->unsent_req_list);
>  	spin_unlock(&m->req_lock);
>  
> -	if (test_and_clear_bit(Wpending, &m->wsched))
> -		n = EPOLLOUT;
> -	else
> -		n = p9_fd_poll(m->client, NULL, NULL);
> -
> -	if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched))
> -		schedule_work(&m->wq);
> +	p9_poll_mux(m);
>  
>  	return 0;
>  }

-- 
Thanks and Regards,
Prateek