Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Oleg, Dominique,

On 3/25/2025 6:34 PM, Oleg Nesterov wrote:
On 03/25, K Prateek Nayak wrote:

I chased this down to p9_client_rpc() net/9p/client.c specifically:

         err = c->trans_mod->request(c, req);
         if (err < 0) {
                 /* write won't happen */
                 p9_req_put(c, req);
                 if (err != -ERESTARTSYS && err != -EFAULT)
                         c->status = Disconnected;
                 goto recalc_sigpending;
         }

c->trans_mod->request() calls p9_fd_request() in net/9p/trans_fd.c
which basically does a p9_fd_poll().

Again, I know nothing about 9p... but if p9_fd_request() returns
an err < 0, then it comes from p9_conn->err and p9_fd_request()
does nothing else.

Previously, the above would fail with err as -EIO which would
cause the client to "Disconnect" and the retry logic would make
progress. Now however, the err returned is -ERESTARTSYS

OK... So p9_conn->err = -ERESTARTSYS was set by p9_conn_cancel()
called by p9_write_work() because pipe_write() returns ERESTARTSYS?

But I don't understand -EIO with the reverted commit aaec5a95d59615

Okay it is a long explanation. These are the relevant bits of
traces with aaec5a95d59615 reverted on mainline and I'll break it
down to the best of my abilities:

   kworker/100:1-1803    [100] .....   286.618784: p9_write_work: Data write wait 32770
   kworker/100:1-1803    [100] .....   286.618792: p9_write_work: Data written 28672
   kworker/100:1-1803    [100] .....   286.618793: p9_fd_poll: p9_fd_poll rd poll
   kworker/100:1-1803    [100] .....   286.618793: p9_fd_poll: p9_fd_request wr poll
   kworker/100:1-1803    [100] .....   286.618793: p9_fd_poll: p9_fd_poll rd poll
   kworker/100:1-1803    [100] .....   286.618793: p9_fd_poll: p9_fd_request wr poll
   kworker/100:1-1803    [100] .....   286.618795: p9_read_work: Data read wait 7
   kworker/100:1-1803    [100] .....   286.618796: p9_read_work: Data read 7
   kworker/100:1-1803    [100] .....   286.618796: p9_fd_poll: p9_fd_poll rd poll
   kworker/100:1-1803    [100] .....   286.618796: p9_fd_poll: p9_fd_request wr poll
   kworker/100:1-1803    [100] .....   286.618797: p9_fd_poll: p9_fd_poll rd poll
   kworker/100:1-1803    [100] .....   286.618797: p9_fd_poll: p9_fd_request wr poll
   kworker/100:1-1803    [100] .....   286.618797: p9_read_work: Data read wait 55
   kworker/100:1-1803    [100] .....   286.618798: p9_read_work: Data read 55
           repro-4102    [100] .....   286.618802: p9_client_write: p9_client_rpc done
           repro-4102    [100] .....   286.618804: p9_client_write: p9_pdup
           repro-4102    [100] .....   286.618804: p9_client_write: iter revert

  ==== the above bits are common between the two ===

           repro-4102    [100] .....   286.618805: p9_client_write: p9_client_rpc
           repro-4102    [100] .....   286.618817: p9_fd_request: p9_fd_request
           repro-4102    [100] .....   286.618818: p9_fd_request: p9_fd_request EPOLL
           repro-4102    [100] .....   286.618818: p9_fd_poll: p9_fd_poll rd poll
           repro-4102    [100] .....   286.618818: p9_fd_poll: p9_fd_request wr poll
   kworker/100:1-1803    [100] .....   286.618822: p9_fd_poll: p9_fd_poll rd poll
   kworker/100:1-1803    [100] .....   286.618822: p9_fd_poll: p9_fd_request wr poll
   kworker/100:1-1803    [100] .....   286.618823: p9_read_work: Data read wait 7
   kworker/100:1-1803    [100] .....   286.618825: p9_read_work: Data read 7
   kworker/100:1-1803    [100] .....   286.618825: p9_read_work: p9_tag_lookup error

                                                                 ^^^^^^^^^^^^^^^^^^^
  ==== In this case the  read succeeds but p9_tag_lookup() later will error out ====
  ==== Connection is cancelled with m->err as -EIO                              ====

   kworker/100:1-1803    [100] .....   286.618825: p9_read_work: Connection cancel
   kworker/100:1-1803    [100] ...1.   286.618826: p9_conn_cancel: Connection cancelled (-5)
           repro-4102    [100] .....   286.618831: p9_client_write: p9_client_rpc done
           repro-4102    [100] .....   286.618832: p9_client_write: p9_client_rpc error (-5)

                                                                                         ^^^
  ==== This error from kworker/100 is received by the reproducer                         ====

           repro-4102    [100] .....   286.618833: v9fs_issue_write: Issue write done 2 err(-5)
           repro-4102    [100] .....   286.618833: netfs_write_subrequest_terminated: Collector woken up from netfs_write_subrequest_terminated
           repro-4102    [100] .....   286.618834: netfs_wake_write_collector: Wake collector
           repro-4102    [100] .....   286.618834: netfs_wake_write_collector: Queuing collector work
           repro-4102    [100] .....   286.618837: v9fs_issue_write: Issue write subrequest terminated 2
           repro-4102    [100] .....   286.618838: netfs_unbuffered_write: netfs_unbuffered_write
           repro-4102    [100] .....   286.618838: netfs_end_issue_write: netfs_end_issue_write
           repro-4102    [100] .....   286.618838: netfs_end_issue_write: Write collector need poke 0
           repro-4102    [100] .....   286.618839: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS!

  ==== A reissue mechanism triggers below ====

 kworker/u1037:2-3232    [096] .....   286.618845: netfs_retry_writes: netfs_reissue_write 1
 kworker/u1037:2-3232    [096] .....   286.618846: v9fs_issue_write: v9fs_issue_write
 kworker/u1037:2-3232    [096] .....   286.618847: p9_client_write: p9_client_rpc
 kworker/u1037:2-3232    [096] .....   286.618873: p9_fd_request: p9_fd_request
 kworker/u1037:2-3232    [096] ...1.   286.618874: p9_fd_request: p9_fd_request error
 kworker/u1037:2-3232    [096] .....   286.618874: p9_client_rpc: Client disconnected (no write)

  ==== Connection with client is disconnected ====

 kworker/u1037:2-3232    [096] .....   286.618877: p9_client_write: p9_client_rpc done
 kworker/u1037:2-3232    [096] .....   286.618880: p9_client_write: p9_client_rpc error (-5)
 kworker/u1037:2-3232    [096] .....   286.618881: v9fs_issue_write: Issue write done 0 err(-5)
 kworker/u1037:2-3232    [096] .....   286.618882: netfs_write_subrequest_terminated: Collector woken up from netfs_write_subrequest_terminated
 kworker/u1037:2-3232    [096] .....   286.618882: netfs_wake_write_collector: Wake collector
 kworker/u1037:2-3232    [096] .....   286.618882: netfs_wake_write_collector: Queuing collector work
 kworker/u1037:2-3232    [096] .....   286.618887: v9fs_issue_write: Issue write subrequest terminated 0
 kworker/u1037:2-3232    [096] .....   286.618890: netfs_write_collection_worker: Write collect clearing and waking up!

  ==== The retry worker will clear the NETFS_RREQ_IN_PROGRESS bit and issue a wakeup ====

           repro-4102    [100] .....   286.619051: p9_client_prepare_req: p9_client_prepare_req eio 1
           repro-4102    [100] .....   286.619052: p9_client_rpc: p9_client_rpc early err return
           repro-4108    [240] .....   286.621325: p9_fd_create: Client connected

  ==== The reproducer will then re-establish the connection and the pattern repeats ====

---

So the pattern is with the optimization reverted is:

o The RPC thread reads "7" bytes.
o p9_tag_lookup() fails giving -EIO.
o connection is cancelled from rpc thread.

o The reproducer thread receives the packet and finds -EIO.
o The reproducer starts a Disconnect.
o The reproducer wakes up collector and sleeps on bit
  NETFS_RREQ_IN_PROGRESS

o A retry worker wakes up.
o Finds connection has been disconnected
o Wakes up the reproducer thread.

o Reproducer wakes up and establishes connection.

I don't know 9p bits enough to understand the intermediates steps but
this is the rough flow.

---

Now for the traces on vanilla mainline:

    kworker/31:1-1723    [031] .....    95.011362: p9_write_work: Data write wait 32770
    kworker/31:1-1723    [031] .....    95.011372: p9_write_work: Data written 28672
    kworker/31:1-1723    [031] .....    95.011373: p9_fd_poll: p9_fd_poll rd poll
    kworker/31:1-1723    [031] .....    95.011373: p9_fd_poll: p9_fd_request wr poll
    kworker/31:1-1723    [031] .....    95.011374: p9_fd_poll: p9_fd_poll rd poll
    kworker/31:1-1723    [031] .....    95.011374: p9_fd_poll: p9_fd_request wr poll
    kworker/31:1-1723    [031] .....    95.011374: p9_read_work: Data read wait 7
    kworker/31:1-1723    [031] .....    95.011375: p9_read_work: Data read 7
    kworker/31:1-1723    [031] .....    95.011375: p9_fd_poll: p9_fd_poll rd poll
    kworker/31:1-1723    [031] .....    95.011375: p9_fd_poll: p9_fd_request wr poll
    kworker/31:1-1723    [031] .....    95.011376: p9_read_work: Data read wait 55
    kworker/31:1-1723    [031] .....    95.011376: p9_read_work: Data read 55
           repro-4076    [031] .....    95.011381: p9_client_rpc: Wait event killable (0)
           repro-4076    [031] .....    95.011382: p9_client_rpc: Check error (0)
           repro-4076    [031] .....    95.011382: p9_client_write: p9_client_rpc done
           repro-4076    [031] .....    95.011382: p9_client_write: p9_pdup
           repro-4076    [031] .....    95.011382: p9_client_write: iter revert

  ==== Above bits are same as the previous traces ====

           repro-4076    [031] .....    95.011383: p9_client_write: p9_client_rpc
           repro-4076    [031] .....    95.011393: p9_fd_request: p9_fd_request
           repro-4076    [031] .....    95.011394: p9_fd_request: p9_fd_request EPOLL
           repro-4076    [031] .....    95.011394: p9_fd_poll: p9_fd_poll rd poll
           repro-4076    [031] .....    95.011394: p9_fd_poll: p9_fd_request wr poll
           repro-4076    [031] .....    99.731970: p9_client_rpc: Wait event killable (-512)

  ==== The wait event here fails with -ERESTARTSYS ====

           repro-4076    [031] .....    99.731979: p9_client_rpc: Flushed (-512)
           repro-4076    [031] .....    99.731983: p9_client_rpc: Final error (-512)
           repro-4076    [031] .....    99.731983: p9_client_write: p9_client_rpc done
           repro-4076    [031] .....    99.731984: p9_client_write: p9_client_rpc error (-512)
           repro-4076    [031] .....    99.731985: v9fs_issue_write: Issue write done 2 err(-512)
           repro-4076    [031] .....    99.731987: netfs_write_subrequest_terminated: Collector woken up from netfs_write_subrequest_terminated
           repro-4076    [031] .....    99.731987: netfs_wake_write_collector: Wake collector
           repro-4076    [031] .....    99.731988: netfs_wake_write_collector: Queuing collector work
           repro-4076    [031] .....    99.731991: v9fs_issue_write: Issue write subrequest terminated 2
           repro-4076    [031] .....    99.731991: netfs_unbuffered_write: netfs_unbuffered_write
           repro-4076    [031] .....    99.731991: netfs_end_issue_write: netfs_end_issue_write
           repro-4076    [031] .....    99.731992: netfs_end_issue_write: Write collector need poke 0
           repro-4076    [031] .....    99.731993: netfs_unbuffered_write_iter_locked: Waiting on NETFS_RREQ_IN_PROGRESS!

  ==== The reproducer waits on bit NETFS_RREQ_IN_PROGRESS ====

 kworker/u1028:3-3860    [030] .....    99.732028: netfs_retry_writes: netfs_reissue_write 1
 kworker/u1028:3-3860    [030] .....    99.732030: v9fs_issue_write: v9fs_issue_write
 kworker/u1028:3-3860    [030] .....    99.732031: p9_client_write: p9_client_rpc
 kworker/u1028:3-3860    [030] .....    99.732051: p9_fd_request: p9_fd_request
 kworker/u1028:3-3860    [030] .....    99.732052: p9_fd_request: p9_fd_request EPOLL
 kworker/u1028:3-3860    [030] .....    99.732052: p9_fd_poll: p9_fd_poll rd poll
 kworker/u1028:3-3860    [030] .....    99.732053: p9_fd_poll: p9_fd_request wr poll

  ==== The retry worker seemingly gets stuck post p9_fd_poll() waiting for wakeup ====

---

That is my analysis so far.


Oleg.


--
Thanks and Regards,
Prateek
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index 32619d146cbc..3fbfc6882c35 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -56,10 +56,13 @@ static void v9fs_issue_write(struct netfs_io_subrequest *subreq)
 	struct p9_fid *fid = subreq->rreq->netfs_priv;
 	int err, len;
 
+	trace_printk("v9fs_issue_write\n");
 	len = p9_client_write(fid, subreq->start, &subreq->io_iter, &err);
+	trace_printk("Issue write done %d err(%d)\n", len, err);
 	if (len > 0)
 		__set_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags);
 	netfs_write_subrequest_terminated(subreq, len ?: err, false);
+	trace_printk("Issue write subrequest terminated %d\n", len);
 }
 
 /**
@@ -74,6 +77,7 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq)
 	int total, err;
 
 	total = p9_client_read(fid, pos, &subreq->io_iter, &err);
+	trace_printk("v9fs_issue_read %d err(%d)\n", total, err);
 
 	/* if we just extended the file size, any portion not in
 	 * cache won't be on server and is zeroes */
diff --git a/fs/netfs/direct_write.c b/fs/netfs/direct_write.c
index 42ce53cc216e..6333b8bb4008 100644
--- a/fs/netfs/direct_write.c
+++ b/fs/netfs/direct_write.c
@@ -105,6 +105,7 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter *
 
 	if (!async) {
 		trace_netfs_rreq(wreq, netfs_rreq_trace_wait_ip);
+		trace_printk("Waiting on NETFS_RREQ_IN_PROGRESS!\n");
 		wait_on_bit(&wreq->flags, NETFS_RREQ_IN_PROGRESS,
 			    TASK_UNINTERRUPTIBLE);
 		ret = wreq->error;
diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
index 23c75755ad4e..a609063db8a7 100644
--- a/fs/netfs/read_collect.c
+++ b/fs/netfs/read_collect.c
@@ -446,6 +446,7 @@ static void netfs_read_collection(struct netfs_io_request *rreq)
 	task_io_account_read(rreq->transferred);
 
 	trace_netfs_rreq(rreq, netfs_rreq_trace_wake_ip);
+	trace_printk("Read collect clearing and waking up!\n");
 	clear_and_wake_up_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags);
 
 	trace_netfs_rreq(rreq, netfs_rreq_trace_done);
diff --git a/fs/netfs/write_collect.c b/fs/netfs/write_collect.c
index 3fca59e6475d..a54aa2bcc786 100644
--- a/fs/netfs/write_collect.c
+++ b/fs/netfs/write_collect.c
@@ -429,6 +429,7 @@ void netfs_write_collection_worker(struct work_struct *work)
 
 	_debug("finished");
 	trace_netfs_rreq(wreq, netfs_rreq_trace_wake_ip);
+	trace_printk("Write collect clearing and waking up!\n");
 	clear_and_wake_up_bit(NETFS_RREQ_IN_PROGRESS, &wreq->flags);
 
 	if (wreq->iocb) {
@@ -449,10 +450,14 @@ void netfs_write_collection_worker(struct work_struct *work)
  */
 void netfs_wake_write_collector(struct netfs_io_request *wreq, bool was_async)
 {
+	trace_printk("Wake collector\n");
 	if (!work_pending(&wreq->work)) {
 		netfs_get_request(wreq, netfs_rreq_trace_get_work);
-		if (!queue_work(system_unbound_wq, &wreq->work))
+		trace_printk("Queuing collector work\n");
+		if (!queue_work(system_unbound_wq, &wreq->work)) {
+			trace_printk("Put request\n");
 			netfs_put_request(wreq, was_async, netfs_rreq_trace_put_work_nq);
+		}
 	}
 }
 
@@ -542,8 +547,10 @@ void netfs_write_subrequest_terminated(void *_op, ssize_t transferred_or_error,
 	/* If we are at the head of the queue, wake up the collector,
 	 * transferring a ref to it if we were the ones to do so.
 	 */
-	if (list_is_first(&subreq->rreq_link, &stream->subrequests))
+	if (list_is_first(&subreq->rreq_link, &stream->subrequests)) {
+		trace_printk("Collector woken up from netfs_write_subrequest_terminated\n");
 		netfs_wake_write_collector(wreq, was_async);
+	}
 
 	netfs_put_subrequest(subreq, was_async, netfs_sreq_trace_put_terminated);
 }
diff --git a/fs/netfs/write_issue.c b/fs/netfs/write_issue.c
index 77279fc5b5a7..22a4b793e789 100644
--- a/fs/netfs/write_issue.c
+++ b/fs/netfs/write_issue.c
@@ -232,8 +232,10 @@ static void netfs_do_issue_write(struct netfs_io_stream *stream,
 
 	_enter("R=%x[%x],%zx", wreq->debug_id, subreq->debug_index, subreq->len);
 
-	if (test_bit(NETFS_SREQ_FAILED, &subreq->flags))
+	if (test_bit(NETFS_SREQ_FAILED, &subreq->flags)) {
+		trace_printk("netfs_do_issue_write failed!\n");
 		return netfs_write_subrequest_terminated(subreq, subreq->error, false);
+	}
 
 	trace_netfs_sreq(subreq, netfs_sreq_trace_submit);
 	stream->issue_write(subreq);
@@ -264,6 +266,7 @@ void netfs_issue_write(struct netfs_io_request *wreq,
 
 	if (!subreq)
 		return;
+	trace_printk("netfs_issue_write!\n");
 	stream->construct = NULL;
 	subreq->io_iter.count = subreq->len;
 	netfs_do_issue_write(stream, subreq);
@@ -290,6 +293,7 @@ size_t netfs_advance_write(struct netfs_io_request *wreq,
 	_enter("R=%x[%x]", wreq->debug_id, subreq ? subreq->debug_index : 0);
 
 	if (subreq && start != subreq->start + subreq->len) {
+		trace_printk("netfs_issue_write from advance write!\n");
 		netfs_issue_write(wreq, stream);
 		subreq = NULL;
 	}
@@ -307,6 +311,7 @@ size_t netfs_advance_write(struct netfs_io_request *wreq,
 	if (subreq->len >= stream->sreq_max_len ||
 	    subreq->nr_segs >= stream->sreq_max_segs ||
 	    to_eof) {
+		trace_printk("netfs_issue_write from advance write! 2\n");
 		netfs_issue_write(wreq, stream);
 		subreq = NULL;
 	}
@@ -397,6 +402,7 @@ static int netfs_write_folio(struct netfs_io_request *wreq,
 	 *     when they were read.  Note that these appear as a special
 	 *     write-back group.
 	 */
+	trace_printk("netfs_issue_write in folio copy\n");
 	if (fgroup == NETFS_FOLIO_COPY_TO_CACHE) {
 		netfs_issue_write(wreq, upload);
 	} else if (fgroup != wreq->group) {
@@ -526,6 +532,7 @@ static int netfs_write_folio(struct netfs_io_request *wreq,
  */
 static void netfs_end_issue_write(struct netfs_io_request *wreq)
 {
+	trace_printk("netfs_end_issue_write\n");
 	bool needs_poke = true;
 
 	smp_wmb(); /* Write subreq lists before ALL_QUEUED. */
@@ -541,6 +548,7 @@ static void netfs_end_issue_write(struct netfs_io_request *wreq)
 		netfs_issue_write(wreq, stream);
 	}
 
+	trace_printk("Write collector need poke %d\n", needs_poke);
 	if (needs_poke)
 		netfs_wake_write_collector(wreq, false);
 }
@@ -730,6 +738,7 @@ int netfs_unbuffered_write(struct netfs_io_request *wreq, bool may_wait, size_t
 			break;
 	}
 
+	trace_printk("netfs_unbuffered_write\n");
 	netfs_end_issue_write(wreq);
 	_leave(" = %d", error);
 	return error;
@@ -908,6 +917,7 @@ int netfs_writeback_single(struct address_space *mapping,
 	}
 
 stop:
+	trace_printk("netfs_issue_write writeback single\n");
 	for (int s = 0; s < NR_IO_STREAMS; s++)
 		netfs_issue_write(wreq, &wreq->io_streams[s]);
 	smp_wmb(); /* Write lists before ALL_QUEUED. */
diff --git a/fs/netfs/write_retry.c b/fs/netfs/write_retry.c
index 545d33079a77..60a03079766b 100644
--- a/fs/netfs/write_retry.c
+++ b/fs/netfs/write_retry.c
@@ -43,6 +43,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
 
 				iov_iter_revert(&source, subreq->len - source.count);
 				netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
+				trace_printk("netfs_reissue_write 1\n");
 				netfs_reissue_write(stream, subreq, &source);
 			}
 		}
@@ -116,6 +117,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
 				boundary = true;
 
 			netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit);
+			trace_printk("netfs_reissue_write 2\n");
 			netfs_reissue_write(stream, subreq, &source);
 			if (subreq == to)
 				break;
@@ -183,6 +185,7 @@ static void netfs_retry_write_stream(struct netfs_io_request *wreq,
 				boundary = false;
 			}
 
+			trace_printk("netfs_reissue_write 3\n");
 			netfs_reissue_write(stream, subreq, &source);
 			if (!len)
 				break;
diff --git a/net/9p/client.c b/net/9p/client.c
index 09f8ced9f8bb..909c83b0e015 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -633,12 +633,16 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
 	p9_debug(P9_DEBUG_MUX, "client %p op %d\n", c, type);
 
 	/* we allow for any status other than disconnected */
-	if (c->status == Disconnected)
+	if (c->status == Disconnected) {
+		trace_printk("p9_client_prepare_req eio 1\n");
 		return ERR_PTR(-EIO);
+	}
 
 	/* if status is begin_disconnected we allow only clunk request */
-	if (c->status == BeginDisconnect && type != P9_TCLUNK)
+	if (c->status == BeginDisconnect && type != P9_TCLUNK) {
+		trace_printk("p9_client_prepare_req eio 2\n");
 		return ERR_PTR(-EIO);
+	}
 
 	va_copy(apc, ap);
 	req = p9_tag_alloc(c, type, t_size, r_size, fmt, apc);
@@ -690,8 +694,10 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 	va_start(ap, fmt);
 	req = p9_client_prepare_req(c, type, tsize, rsize, fmt, ap);
 	va_end(ap);
-	if (IS_ERR(req))
+	if (IS_ERR(req)) {
+		trace_printk("p9_client_rpc early err return\n");
 		return req;
+	}
 
 	req->tc.zc = false;
 	req->rc.zc = false;
@@ -706,9 +712,12 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 	err = c->trans_mod->request(c, req);
 	if (err < 0) {
 		/* write won't happen */
+		trace_printk("write won't happnen (%d)\n", err);
 		p9_req_put(c, req);
-		if (err != -ERESTARTSYS && err != -EFAULT)
+		if (err != -ERESTARTSYS && err != -EFAULT) {
+			trace_printk("Client disconnected (no write)\n");
 			c->status = Disconnected;
+		}
 		goto recalc_sigpending;
 	}
 again:
@@ -721,6 +730,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 	 */
 	smp_rmb();
 
+	trace_printk("Wait event killable (%d) (%d) (%d) (%d)\n", err, READ_ONCE(req->status), c->status, type);
 	if (err == -ERESTARTSYS && c->status == Connected &&
 	    type == P9_TFLUSH) {
 		sigpending = 1;
@@ -731,6 +741,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 	if (READ_ONCE(req->status) == REQ_STATUS_ERROR) {
 		p9_debug(P9_DEBUG_ERROR, "req_status error %d\n", req->t_err);
 		err = req->t_err;
+		trace_printk("REQ_STATUS_ERROR (%d)\n", err);
 	}
 	if (err == -ERESTARTSYS && c->status == Connected) {
 		p9_debug(P9_DEBUG_MUX, "flushing\n");
@@ -743,6 +754,8 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 		/* if we received the response anyway, don't signal error */
 		if (READ_ONCE(req->status) == REQ_STATUS_RCVD)
 			err = 0;
+
+		trace_printk("Flushed (%d)\n", err);
 	}
 recalc_sigpending:
 	if (sigpending) {
@@ -754,11 +767,13 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 		goto reterr;
 
 	err = p9_check_errors(c, req);
+	trace_printk("Check error (%d)\n", err);
 	trace_9p_client_res(c, type, req->rc.tag, err);
 	if (!err)
 		return req;
 reterr:
 	p9_req_put(c, req);
+	trace_printk("Final error (%d)\n", err);
 	return ERR_PTR(safe_errno(err));
 }
 
@@ -808,8 +823,10 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
 	err = c->trans_mod->zc_request(c, req, uidata, uodata,
 				       inlen, olen, in_hdrlen);
 	if (err < 0) {
-		if (err == -EIO)
+		if (err == -EIO) {
+			trace_printk("Client disconnected (zero_copy)\n");
 			c->status = Disconnected;
+		}
 		if (err != -ERESTARTSYS)
 			goto recalc_sigpending;
 	}
@@ -1093,6 +1110,7 @@ EXPORT_SYMBOL(p9_client_destroy);
 void p9_client_disconnect(struct p9_client *clnt)
 {
 	p9_debug(P9_DEBUG_9P, "clnt %p\n", clnt);
+	trace_printk("Client disconnected\n");
 	clnt->status = Disconnected;
 }
 EXPORT_SYMBOL(p9_client_disconnect);
@@ -1100,6 +1118,7 @@ EXPORT_SYMBOL(p9_client_disconnect);
 void p9_client_begin_disconnect(struct p9_client *clnt)
 {
 	p9_debug(P9_DEBUG_9P, "clnt %p\n", clnt);
+	trace_printk("Client begin disconnect\n");
 	clnt->status = BeginDisconnect;
 }
 EXPORT_SYMBOL(p9_client_begin_disconnect);
@@ -1572,11 +1591,13 @@ p9_client_read_once(struct p9_fid *fid, u64 offset, struct iov_iter *to,
 				       offset, rsize);
 	} else {
 		non_zc = 1;
+		trace_printk("RPC issue\n");
 		req = p9_client_rpc(clnt, P9_TREAD, "dqd", fid->fid, offset,
 				    rsize);
 	}
 	if (IS_ERR(req)) {
 		*err = PTR_ERR(req);
+		trace_printk("read RPC issue error(%d)\n", *err);
 		if (!non_zc)
 			iov_iter_revert(to, count - iov_iter_count(to));
 		return 0;
@@ -1587,12 +1608,14 @@ p9_client_read_once(struct p9_fid *fid, u64 offset, struct iov_iter *to,
 	if (*err) {
 		if (!non_zc)
 			iov_iter_revert(to, count - iov_iter_count(to));
+		trace_printk("p9pdu_readf error(%d)\n", *err);
 		trace_9p_protocol_dump(clnt, &req->rc);
 		p9_req_put(clnt, req);
 		return 0;
 	}
 	if (rsize < received) {
 		pr_err("bogus RREAD count (%d > %d)\n", received, rsize);
+		trace_printk("bogus read\n");
 		received = rsize;
 	}
 
@@ -1602,6 +1625,7 @@ p9_client_read_once(struct p9_fid *fid, u64 offset, struct iov_iter *to,
 		int n = copy_to_iter(dataptr, received, to);
 
 		if (n != received) {
+			trace_printk("copy_to_iter fault\n");
 			*err = -EFAULT;
 			p9_req_put(clnt, req);
 			return n;
@@ -1642,23 +1666,29 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
 					       rsize, P9_ZC_HDR_SZ, "dqd",
 					       fid->fid, offset, rsize);
 		} else {
+			trace_printk("p9_client_rpc\n");
 			req = p9_client_rpc(clnt, P9_TWRITE, "dqV", fid->fid,
 					    offset, rsize, from);
+			trace_printk("p9_client_rpc done\n");
 		}
 		if (IS_ERR(req)) {
 			iov_iter_revert(from, count - iov_iter_count(from));
 			*err = PTR_ERR(req);
+			trace_printk("p9_client_rpc error (%d)\n", *err);
 			break;
 		}
 
+		trace_printk("p9_pdup\n");
 		*err = p9pdu_readf(&req->rc, clnt->proto_version, "d", &written);
 		if (*err) {
+			trace_printk("p9_pdup err(%d)\n", *err);
 			iov_iter_revert(from, count - iov_iter_count(from));
 			trace_9p_protocol_dump(clnt, &req->rc);
 			p9_req_put(clnt, req);
 			break;
 		}
 		if (rsize < written) {
+			trace_printk("p9_pdup bogus %d %d\n", written, rsize);
 			pr_err("bogus RWRITE count (%d > %d)\n", written, rsize);
 			written = rsize;
 		}
@@ -1666,6 +1696,7 @@ p9_client_write(struct p9_fid *fid, u64 offset, struct iov_iter *from, int *err)
 		p9_debug(P9_DEBUG_9P, "<<< RWRITE count %d\n", written);
 
 		p9_req_put(clnt, req);
+		trace_printk("iter revert\n");
 		iov_iter_revert(from, count - written - iov_iter_count(from));
 		total += written;
 		offset += written;
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 196060dc6138..3c83b3855e82 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -196,6 +196,7 @@ static void p9_conn_cancel(struct p9_conn *m, int err)
 		return;
 	}
 
+	trace_printk("Connection cancelled (%d)\n", err);
 	m->err = err;
 
 	list_for_each_entry_safe(req, rtmp, &m->req_list, req_list) {
@@ -233,9 +234,12 @@ p9_fd_poll(struct p9_client *client, struct poll_table_struct *pt, int *err)
 		return EPOLLERR;
 	}
 
+	trace_printk("p9_fd_poll rd poll\n");
 	ret = vfs_poll(ts->rd, pt);
-	if (ts->rd != ts->wr)
+	if (ts->rd != ts->wr) {
+		trace_printk("p9_fd_request wr poll\n");
 		ret = (ret & ~EPOLLOUT) | (vfs_poll(ts->wr, pt) & ~EPOLLIN);
+	}
 	return ret;
 }
 
@@ -263,9 +267,13 @@ static int p9_fd_read(struct p9_client *client, void *v, int len)
 		p9_debug(P9_DEBUG_ERROR, "blocking read ...\n");
 
 	pos = ts->rd->f_pos;
+	trace_printk("Data read wait %d\n", len);
 	ret = kernel_read(ts->rd, v, len, &pos);
-	if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN)
+	trace_printk("Data read %d\n", ret);
+	if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN) {
+		trace_printk("Client disconnected (fd_read)\n");
 		client->status = Disconnected;
+	}
 	return ret;
 }
 
@@ -330,6 +338,7 @@ static void p9_read_work(struct work_struct *work)
 		if (!m->rreq || (m->rreq->status != REQ_STATUS_SENT)) {
 			p9_debug(P9_DEBUG_ERROR, "Unexpected packet tag %d\n",
 				 m->rc.tag);
+			trace_printk("p9_tag_lookup error\n");
 			err = -EIO;
 			goto error;
 		}
@@ -338,6 +347,7 @@ static void p9_read_work(struct work_struct *work)
 			p9_debug(P9_DEBUG_ERROR,
 				 "requested packet size too big: %d for tag %d with capacity %zd\n",
 				 m->rc.size, m->rc.tag, m->rreq->rc.capacity);
+			trace_printk("capacity error\n");
 			err = -EIO;
 			goto error;
 		}
@@ -348,6 +358,7 @@ static void p9_read_work(struct work_struct *work)
 				 m->rc.tag, m->rreq);
 			p9_req_put(m->client, m->rreq);
 			m->rreq = NULL;
+			trace_printk("sdata error\n");
 			err = -EIO;
 			goto error;
 		}
@@ -375,6 +386,7 @@ static void p9_read_work(struct work_struct *work)
 			p9_debug(P9_DEBUG_ERROR,
 				 "Request tag %d errored out while we were reading the reply\n",
 				 m->rc.tag);
+			trace_printk("request tag error %d status(%d)\n", m->rc.tag, m->rreq->status);
 			err = -EIO;
 			goto error;
 		}
@@ -403,6 +415,7 @@ static void p9_read_work(struct work_struct *work)
 
 	return;
 error:
+	trace_printk("Connection cancel\n");
 	p9_conn_cancel(m, err);
 	clear_bit(Rworksched, &m->wsched);
 }
@@ -429,9 +442,13 @@ static int p9_fd_write(struct p9_client *client, void *v, int len)
 	if (!(ts->wr->f_flags & O_NONBLOCK))
 		p9_debug(P9_DEBUG_ERROR, "blocking write ...\n");
 
+	trace_printk("Data write wait %d\n",len);
 	ret = kernel_write(ts->wr, v, len, &ts->wr->f_pos);
-	if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN)
+	trace_printk("Data written %ld\n", ret);
+	if (ret <= 0 && ret != -ERESTARTSYS && ret != -EAGAIN) {
+		trace_printk("Client disconnected (fd_write)\n");
 		client->status = Disconnected;
+	}
 	return ret;
 }
 
@@ -519,6 +536,7 @@ static void p9_write_work(struct work_struct *work)
 	return;
 
 error:
+	trace_printk("Connection cancel\n");
 	p9_conn_cancel(m, err);
 	clear_bit(Wworksched, &m->wsched);
 }
@@ -628,6 +646,7 @@ static void p9_poll_mux(struct p9_conn *m)
 	n = p9_fd_poll(m->client, NULL, &err);
 	if (n & (EPOLLERR | EPOLLHUP | EPOLLNVAL)) {
 		p9_debug(P9_DEBUG_TRANS, "error mux %p err %d\n", m, n);
+		trace_printk("Connection cancel\n");
 		p9_conn_cancel(m, err);
 	}
 
@@ -668,12 +687,14 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
 	struct p9_trans_fd *ts = client->trans;
 	struct p9_conn *m = &ts->conn;
 
+	trace_printk("p9_fd_request\n");
 	p9_debug(P9_DEBUG_TRANS, "mux %p task %p tcall %p id %d\n",
 		 m, current, &req->tc, req->tc.id);
 
 	spin_lock(&m->req_lock);
 
 	if (m->err < 0) {
+		trace_printk("p9_fd_request error\n");
 		spin_unlock(&m->req_lock);
 		return m->err;
 	}
@@ -682,13 +703,16 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
 	list_add_tail(&req->req_list, &m->unsent_req_list);
 	spin_unlock(&m->req_lock);
 
+	trace_printk("p9_fd_request EPOLL\n");
 	if (test_and_clear_bit(Wpending, &m->wsched))
 		n = EPOLLOUT;
 	else
 		n = p9_fd_poll(m->client, NULL, NULL);
 
-	if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched))
+	if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched)) {
+		trace_printk("p9_fd_request schedule work\n");
 		schedule_work(&m->wq);
+	}
 
 	return 0;
 }
@@ -852,6 +876,7 @@ static int p9_fd_open(struct p9_client *client, int rfd, int wfd)
 	data_race(ts->wr->f_flags |= O_NONBLOCK);
 
 	client->trans = ts;
+	trace_printk("Client connected\n");
 	client->status = Connected;
 
 	return 0;
@@ -889,6 +914,7 @@ static int p9_socket_open(struct p9_client *client, struct socket *csocket)
 	get_file(file);
 	p->wr = p->rd = file;
 	client->trans = p;
+	trace_printk("Client connected\n");
 	client->status = Connected;
 
 	p->rd->f_flags |= O_NONBLOCK;
@@ -920,6 +946,7 @@ static void p9_conn_destroy(struct p9_conn *m)
 		m->wreq = NULL;
 	}
 
+	trace_printk("Connection destroy\n");
 	p9_conn_cancel(m, -ECONNRESET);
 
 	m->client = NULL;
@@ -942,6 +969,7 @@ static void p9_fd_close(struct p9_client *client)
 	if (!ts)
 		return;
 
+	trace_printk("Client disconnected (fd_close)\n");
 	client->status = Disconnected;
 
 	p9_conn_destroy(&ts->conn);

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux