On 6/10/25 2:40 PM, Jesper Dangaard Brouer wrote:
On 10/06/2025 20.26, Ihor Solodrai wrote:
On 6/10/25 8:56 AM, Jesper Dangaard Brouer wrote:
On 10/06/2025 13.43, Jesper Dangaard Brouer wrote:
On 10/06/2025 00.09, Ihor Solodrai wrote:
[...]
Can you give me the output from below command (on your compiled
kernel):
./scripts/faddr2line drivers/net/veth.o veth_xdp_rcv.constprop.0+0x6b
Still need above data/info please.
root@devvm7589:/ci/workspace# ./scripts/faddr2line ./kout.gcc/drivers/
net/veth.o veth_xdp_rcv.constprop.0+0x6b
veth_xdp_rcv.constprop.0+0x6b/0x390:
netdev_get_tx_queue at /ci/workspace/kout.gcc/../include/linux/
netdevice.h:2637
(inlined by) veth_xdp_rcv at /ci/workspace/kout.gcc/../drivers/net/
veth.c:912
Which is:
veth.c:912
struct veth_priv *priv = netdev_priv(rq->dev);
int queue_idx = rq->xdp_rxq.queue_index;
struct netdev_queue *peer_txq;
struct net_device *peer_dev;
int i, done = 0, n_xdpf = 0;
void *xdpf[VETH_XDP_BATCH];
/* NAPI functions as RCU section */
peer_dev = rcu_dereference_check(priv->peer,
rcu_read_lock_bh_held());
---> peer_txq = netdev_get_tx_queue(peer_dev, queue_idx);
netdevice.h:2637
static inline
struct netdev_queue *netdev_get_tx_queue(const struct net_device
*dev,
unsigned int index)
{
DEBUG_NET_WARN_ON_ONCE(index >= dev->num_tx_queues);
---> return &dev->_tx[index];
}
So the suspect is peer_dev (priv->peer)?
Yes, this is the problem!
So, it seems that peer_dev (priv->peer) can become a NULL pointer.
Managed to reproduce - via manually deleting the peer device:
- ip link delete dev veth42
- while overloading veth41 via XDP redirecting packets into it.
Managed to trigger concurrent crashes on two CPUs (C0 + C3)
- so below output gets interlaced a bit:
[...]
A fix could look like this:
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index e58a0f1b5c5b..a3046142cb8e 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -909,7 +909,7 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget,
/* NAPI functions as RCU section */
peer_dev = rcu_dereference_check(priv->peer,
rcu_read_lock_bh_held());
- peer_txq = netdev_get_tx_queue(peer_dev, queue_idx);
+ peer_txq = peer_dev ? netdev_get_tx_queue(peer_dev, queue_idx) :
NULL;
for (i = 0; i < budget; i++) {
void *ptr = __ptr_ring_consume(&rq->xdp_ring);
@@ -959,7 +959,7 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget,
rq->stats.vs.xdp_packets += done;
u64_stats_update_end(&rq->stats.syncp);
- if (unlikely(netif_tx_queue_stopped(peer_txq)))
+ if (peer_txq && unlikely(netif_tx_queue_stopped(peer_txq)))
netif_tx_wake_queue(peer_txq);
Great! I presume you will send a patch separately?
--Jesper