Re: [PATCH net-next v3] vsock/test: Add test for null ptr deref when transport changes

Luigi Leonardi <leonardi@xxxxxxxxxx> · Mon, 16 Jun 2025 16:23:40 +0200




Hi Stefano,

On Wed, Jun 11, 2025 at 04:53:11PM +0200, Stefano Garzarella wrote:
On Wed, Jun 11, 2025 at 04:07:25PM +0200, Luigi Leonardi wrote:
Add a new test to ensure that when the transport changes a null pointer
dereference does not occur. The bug was reported upstream [1] and fixed
with commit 2cb7c756f605 ("vsock/virtio: discard packets if the
transport changes").

KASAN: null-ptr-deref in range [0x0000000000000060-0x0000000000000067]
CPU: 2 UID: 0 PID: 463 Comm: kworker/2:3 Not tainted
Workqueue: vsock-loopback vsock_loopback_work
RIP: 0010:vsock_stream_has_data+0x44/0x70
Call Trace:
virtio_transport_do_close+0x68/0x1a0
virtio_transport_recv_pkt+0x1045/0x2ae4
vsock_loopback_work+0x27d/0x3f0
process_one_work+0x846/0x1420
worker_thread+0x5b3/0xf80
kthread+0x35a/0x700
ret_from_fork+0x2d/0x70
ret_from_fork_asm+0x1a/0x30

Note that this test may not fail in a kernel without the fix, but it may
hang on the client side if it triggers a kernel oops.

This works by creating a socket, trying to connect to a server, and then
executing a second connect operation on the same socket but to a
different CID (0). This triggers a transport change. If the connect
operation is interrupted by a signal, this could cause a null-ptr-deref.

Since this bug is non-deterministic, we need to try several times. It
is reasonable to assume that the bug will show up within the timeout
period.

If there is a G2H transport loaded in the system, the bug is not
triggered and this test will always pass.

Should we re-use what Michal is doing in https://lore.kernel.org/virtualization/20250528-vsock-test-inc-cov-v2-0-8f655b40d57c@xxxxxxx/
to print a warning?

Yes, good idea! IMHO we can skip the test if a G2H transport is loaded.  
I'll wait for Michal's patch to be merged before sending v4.


[1]https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/

Suggested-by: Hyunwoo Kim <v4bel@xxxxxxxxx>

#include "vsock_test_zerocopy.h"
#include "timeout.h"
@@ -1811,6 +1813,168 @@ static void test_stream_connect_retry_server(const struct test_opts *opts)
	close(fd);
+		/* Set CID to 0 cause a transport change. */
+		sa.svm_cid = 0;
+		/* This connect must fail. No-one listening on CID 0
+		 * This connect can also be interrupted, ignore this error.
+		 */
+		ret = connect(s, (struct sockaddr *)&sa, sizeof(sa));
+		if (ret != -1 && errno != EINTR) {

Should this condition be `ret != -1 || errno != EINTR` ?

Right, good catch.

+			fprintf(stderr,
+				"connect: expected a failure because of unused CID: %d\n", errno);
+			exit(EXIT_FAILURE);
+		}
+
+		close(s);
+
+		control_writeulong(CONTROL_CONTINUE);
+
+	} while (current_nsec() < tout);
+
+	control_writeulong(CONTROL_DONE);
+
+	ret = pthread_cancel(thread_id);
+	if (ret) {
+		fprintf(stderr, "pthread_cancel: %d\n", ret);
+		exit(EXIT_FAILURE);
+	}
+
+	/* Wait for the thread to terminate */
+	ret = pthread_join(thread_id, NULL);
+	if (ret) {
+		fprintf(stderr, "pthread_join: %d\n", ret);
+		exit(EXIT_FAILURE);
+	}
+
+	/* Restore the old handler */
+	if (signal(SIGUSR1, old_handler) == SIG_ERR) {
+		perror("signal");
+		exit(EXIT_FAILURE);
+	}
+}
+
+static void test_stream_transport_change_server(const struct test_opts *opts)
+{
+	int ret, s;
+
+	s = vsock_stream_listen(VMADDR_CID_ANY, opts->peer_port);
+
+	/* Set the socket to be nonblocking because connects that have been interrupted
+	 * (EINTR) can fill the receiver's accept queue anyway, leading to connect failure.
+	 * As of today (6.15) in such situation there is no way to understand, from the
+	 * client side, if the connection has been queued in the server or not.
+	 */
+	ret = fcntl(s, F_SETFL, fcntl(s, F_GETFL, 0) | O_NONBLOCK);
+	if (ret < 0) {

nit: If you need to resend, I'd remove `ret` and check fcntl directly:
	if (fcntl(...) < 0) {
Ok!

+		perror("fcntl");
+		exit(EXIT_FAILURE);
+	}
+	control_writeln("LISTENING");
+
+	while (control_readulong() == CONTROL_CONTINUE) {
+		struct sockaddr_vm sa_client;
+		socklen_t socklen_client = sizeof(sa_client);
+
+		/* Must accept the connection, otherwise the `listen`
+		 * queue will fill up and new connections will fail.
+		 * There can be more than one queued connection,
+		 * clear them all.
+		 */
+		while (true) {
+			int client = accept(s, (struct sockaddr *)&sa_client, &socklen_client);
+
+			if (client < 0 && errno != EAGAIN) {
+				perror("accept");
+				exit(EXIT_FAILURE);
+			} else if (client > 0) {

0 in theory is a valid fd, so here we should check `client >= 0`.
Right!

+				close(client);
+			}
+
+			if (errno == EAGAIN)
+				break;

I think you can refactor in this way:
			if (client < 0) {
				if (errno == EAGAIN)
					break;

				perror("accept");
				exit(EXIT_FAILURE);
			}

			close(client);

Will do.
Thanks,
Stefano


Thanks for the review.
Luigi