Re: [PATCH 2/3] vsock/virtio: Add SIOCINQ support for all virtio based transports

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 21, 2025 at 10:06:13AM +0800, Xuewei Niu wrote:
On Mon, May 19, 2025 at 03:06:48PM +0800, Xuewei Niu wrote:
>The virtio_vsock_sock has a new field called bytes_unread as the return
>value of the SIOCINQ ioctl.
>
>Though the rx_bytes exists, we introduce a bytes_unread field to the
>virtio_vsock_sock struct. The reason is that it will not be updated
>until the skbuff is fully consumed, which causes inconsistency.
>
>The byte_unread is increased by the length of the skbuff when skbuff is
>enqueued, and it is decreased when dequeued.
>
>Signed-off-by: Xuewei Niu <niuxuewei.nxw@xxxxxxxxxxxx>
>---
> drivers/vhost/vsock.c                   |  1 +
> include/linux/virtio_vsock.h            |  2 ++
> net/vmw_vsock/virtio_transport.c        |  1 +
> net/vmw_vsock/virtio_transport_common.c | 17 +++++++++++++++++
> net/vmw_vsock/vsock_loopback.c          |  1 +
> 5 files changed, 22 insertions(+)
>
>diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
>index 802153e23073..0f20af6e5036 100644
>--- a/drivers/vhost/vsock.c
>+++ b/drivers/vhost/vsock.c
>@@ -452,6 +452,7 @@ static struct virtio_transport vhost_transport = {
> 		.notify_set_rcvlowat      = virtio_transport_notify_set_rcvlowat,
>
> 		.unsent_bytes             = virtio_transport_unsent_bytes,
>+		.unread_bytes             = virtio_transport_unread_bytes,
>
> 		.read_skb = virtio_transport_read_skb,
> 	},
>diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h
>index 0387d64e2c66..0a7bd240113a 100644
>--- a/include/linux/virtio_vsock.h
>+++ b/include/linux/virtio_vsock.h
>@@ -142,6 +142,7 @@ struct virtio_vsock_sock {
> 	u32 buf_alloc;
> 	struct sk_buff_head rx_queue;
> 	u32 msg_count;
>+	size_t bytes_unread;

Can we just use `rx_bytes` field we already have?

Thanks,
Stefano

I perfer not. The `rx_bytes` won't be updated until the skbuff is fully
consumed, causing inconsistency issues. If it is acceptable to you, I'll
reuse the field instead.

I think here we found a little pre-existing issue that should be related
also to what Arseniy (CCed) is trying to fix (low_rx_bytes).

We basically have 2 counters:
- rx_bytes, which we use internally to see if there are bytes to read
  and for sock_rcvlowat
- fwd_cnt, which we use instead for the credit mechanism and informing
  the other peer whether we have space or not

These are updated with virtio_transport_dec_rx_pkt() and virtio_transport_inc_rx_pkt()

As far as I can see, from the beginning, we call virtio_transport_dec_rx_pkt() only when we consume the entire packet. This makes sense for `fwd_cnt`, because we still have occupied space in memory and we don't want to update the credit until we free all the space, but I think it makes no sense for `rx_bytes`, which is only used internally and should reflect the current situation of bytes to read.

So in my opinion we should fix it this way (untested):

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 11eae88c60fc..ee70cb114328 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -449,10 +449,10 @@ static bool virtio_transport_inc_rx_pkt(struct virtio_vsock_sock *vvs,
 }

 static void virtio_transport_dec_rx_pkt(struct virtio_vsock_sock *vvs,
-					u32 len)
+					u32 bytes_read, u32 bytes_dequeued)
 {
-	vvs->rx_bytes -= len;
-	vvs->fwd_cnt += len;
+	vvs->rx_bytes -= bytes_read;
+	vvs->fwd_cnt += bytes_dequeued;
 }

 void virtio_transport_inc_tx_pkt(struct virtio_vsock_sock *vvs, struct sk_buff *skb)
@@ -581,11 +581,11 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 				   size_t len)
 {
 	struct virtio_vsock_sock *vvs = vsk->trans;
-	size_t bytes, total = 0;
 	struct sk_buff *skb;
 	u32 fwd_cnt_delta;
 	bool low_rx_bytes;
 	int err = -EFAULT;
+	size_t total = 0;
 	u32 free_space;

 	spin_lock_bh(&vvs->rx_lock);
@@ -597,6 +597,8 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 	}

 	while (total < len && !skb_queue_empty(&vvs->rx_queue)) {
+		size_t bytes, dequeued = 0;
+
 		skb = skb_peek(&vvs->rx_queue);

 		bytes = min_t(size_t, len - total,
@@ -620,12 +622,12 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 		VIRTIO_VSOCK_SKB_CB(skb)->offset += bytes;

 		if (skb->len == VIRTIO_VSOCK_SKB_CB(skb)->offset) {
-			u32 pkt_len = le32_to_cpu(virtio_vsock_hdr(skb)->len);
-
-			virtio_transport_dec_rx_pkt(vvs, pkt_len);
+			dequeued = le32_to_cpu(virtio_vsock_hdr(skb)->len);
 			__skb_unlink(skb, &vvs->rx_queue);
 			consume_skb(skb);
 		}
+
+		virtio_transport_dec_rx_pkt(vvs, bytes, dequeued);
 	}

 	fwd_cnt_delta = vvs->fwd_cnt - vvs->last_fwd_cnt;
@@ -782,7 +784,7 @@ static int virtio_transport_seqpacket_do_dequeue(struct vsock_sock *vsk,
 				msg->msg_flags |= MSG_EOR;
 		}

-		virtio_transport_dec_rx_pkt(vvs, pkt_len);
+		virtio_transport_dec_rx_pkt(vvs, pkt_len, pkt_len);
 		vvs->bytes_unread -= pkt_len;
 		kfree_skb(skb);
 	}
@@ -1752,6 +1754,7 @@ int virtio_transport_read_skb(struct vsock_sock *vsk, skb_read_actor_t recv_acto
 	struct sock *sk = sk_vsock(vsk);
 	struct virtio_vsock_hdr *hdr;
 	struct sk_buff *skb;
+	u32 pkt_len;
 	int off = 0;
 	int err;

@@ -1769,7 +1772,8 @@ int virtio_transport_read_skb(struct vsock_sock *vsk, skb_read_actor_t recv_acto
 	if (le32_to_cpu(hdr->flags) & VIRTIO_VSOCK_SEQ_EOM)
 		vvs->msg_count--;

-	virtio_transport_dec_rx_pkt(vvs, le32_to_cpu(hdr->len));
+	pkt_len = le32_to_cpu(hdr->len);
+	virtio_transport_dec_rx_pkt(vvs, pkt_len, pkt_len);
 	spin_unlock_bh(&vvs->rx_lock);

 	virtio_transport_send_credit_update(vsk);

@Arseniy WDYT?
I will test it and send a proper patch.

@Xuewei with that fixed, I think you can use `rx_bytes`, right?

Also because you missed for example `virtio_transport_read_skb()` used by ebpf (see commit 3543152f2d33 ("vsock: Update rx_bytes on read_skb()")).

Thanks,
Stefano





[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux