Jason Wang wrote: > On Tue, Sep 2, 2025 at 4:10 PM Simon Schippers > <simon.schippers@xxxxxxxxxxxxxx> wrote: >> >> The netdev queue is stopped in tun_net_xmit after inserting an SKB into >> the ring buffer if the ring buffer became full because of that. If the >> insertion into the ptr_ring fails, the netdev queue is also stopped and >> the SKB is dropped. However, this never happened in my testing. > > You can reach this by using pktgen on TUN. > Yes, and I think it could also be reached after ptr_ring_unconsume is called in vhost_net. >> To ensure >> that the ptr_ring change is available to the consumer before the netdev >> queue stop, an smp_wmb() is used. >> >> Then in tun_ring_recv, the new helper wake_netdev_queue is called in the >> blocking wait queue and after consuming an SKB from the ptr_ring. This >> helper first checks if the netdev queue has stopped. Then with the paired >> smp_rmb() it is known that tun_net_xmit will not produce SKBs anymore. >> With that knowledge, the helper can then wake the netdev queue if there is >> at least a single spare slot in the ptr_ring by calling ptr_ring_spare >> with cnt=1. >> >> Co-developed-by: Tim Gebauer <tim.gebauer@xxxxxxxxxxxxxx> >> Signed-off-by: Tim Gebauer <tim.gebauer@xxxxxxxxxxxxxx> >> Signed-off-by: Simon Schippers <simon.schippers@xxxxxxxxxxxxxx> >> --- >> drivers/net/tun.c | 33 ++++++++++++++++++++++++++++++--- >> 1 file changed, 30 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/net/tun.c b/drivers/net/tun.c >> index cc6c50180663..735498e221d8 100644 >> --- a/drivers/net/tun.c >> +++ b/drivers/net/tun.c >> @@ -1060,13 +1060,21 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) >> >> nf_reset_ct(skb); >> >> - if (ptr_ring_produce(&tfile->tx_ring, skb)) { >> + queue = netdev_get_tx_queue(dev, txq); >> + if (unlikely(ptr_ring_produce(&tfile->tx_ring, skb))) { >> + /* Paired with smp_rmb() in wake_netdev_queue. */ >> + smp_wmb(); >> + netif_tx_stop_queue(queue); > > The barrier looks odd since it requires the driver to care about the > ordering, can you elaborate more on this? > > There's a WRITE_ONCE + mb() in netif_tx_stop_queue already: > > static __always_inline void netif_tx_stop_queue(struct netdev_queue *dev_queue) > { > /* Paired with READ_ONCE() from dev_watchdog() */ > WRITE_ONCE(dev_queue->trans_start, jiffies); > > /* This barrier is paired with smp_mb() from dev_watchdog() */ > smp_mb__before_atomic(); > > /* Must be an atomic op see netif_txq_try_stop() */ > set_bit(__QUEUE_STATE_DRV_XOFF, &dev_queue->state); > } > >> drop_reason = SKB_DROP_REASON_FULL_RING; >> goto drop; >> } >> + if (ptr_ring_full(&tfile->tx_ring)) { >> + /* Paired with smp_rmb() in wake_netdev_queue. */ >> + smp_wmb(); >> + netif_tx_stop_queue(queue); >> + } >> >> /* dev->lltx requires to do our own update of trans_start */ >> - queue = netdev_get_tx_queue(dev, txq); >> txq_trans_cond_update(queue); >> >> /* Notify and wake up reader process */ >> @@ -2110,6 +2118,24 @@ static ssize_t tun_put_user(struct tun_struct *tun, >> return total; >> } >> >> +static inline void wake_netdev_queue(struct tun_file *tfile) > > Let's rename this to tun_wake_xxx. > Okay. I will rename wake_netdev_queue to tun_wake_netdev_queue and tap_wake_netdev_queue respectively and remove inline. Then vhost_net just calls these methods in vhost_net_buf_produce with file->private_data. >> +{ >> + struct netdev_queue *txq; >> + struct net_device *dev; >> + >> + rcu_read_lock(); >> + dev = rcu_dereference(tfile->tun)->dev; >> + txq = netdev_get_tx_queue(dev, tfile->queue_index); >> + >> + if (netif_tx_queue_stopped(txq)) { >> + /* Paired with smp_wmb() in tun_net_xmit. */ >> + smp_rmb(); >> + if (ptr_ring_spare(&tfile->tx_ring, 1)) > > I wonder if there would be a case that will use cnt > 1. If not a > ptr_ring_can_produce() should be sufficient. > >> + netif_tx_wake_queue(txq); >> + } >> + rcu_read_unlock(); >> +} >> + >> static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err) >> { >> DECLARE_WAITQUEUE(wait, current); >> @@ -2139,7 +2165,7 @@ static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err) >> error = -EFAULT; >> break; >> } >> - >> + wake_netdev_queue(tfile); >> schedule(); >> } >> >> @@ -2147,6 +2173,7 @@ static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err) >> remove_wait_queue(&tfile->socket.wq.wait, &wait); >> >> out: >> + wake_netdev_queue(tfile); >> *err = error; >> return ptr; >> } >> -- >> 2.43.0 >> > > Thanks >