On Fri, Jun 27, 2025 at 04:57:44PM +0800, Jason Xing wrote: > From: Jason Xing <kernelxing@xxxxxxxxxxx> > > For afxdp, the return value of sendto() syscall doesn't reflect how many > descs handled in the kernel. One of use cases is that when user-space > application tries to know the number of transmitted skbs and then decides > if it continues to send, say, is it stopped due to max tx budget? > > The following formular can be used after sending to learn how many > skbs/descs the kernel takes care of: > > tx_queue.consumers_before - tx_queue.consumers_after > > Prior to the current patch, in non-zc mode, the consumer of tx queue is > not immediately updated at the end of each sendto syscall when error > occurs, which leads to the consumer value out-of-dated from the perspective > of user space. So this patch requires store operation to pass the cached > value to the shared value to handle the problem. > > More than those explicit errors appearing in the while() loop in > __xsk_generic_xmit(), there are a few possible error cases that might > be neglected in the following call trace: > __xsk_generic_xmit() > xskq_cons_peek_desc() > xskq_cons_read_desc() > xskq_cons_is_valid_desc() > It will also cause the premature exit in the while() loop even if not > all the descs are consumed. > > Based on the above analysis, using @sent_frame could cover all the possible > cases where it might lead to out-of-dated global state of consumer after > finishing __xsk_generic_xmit(). > > The patch also adds a common helper __xsk_tx_release() to keep align > with the zc mode usage in xsk_tx_release(). > > Signed-off-by: Jason Xing <kernelxing@xxxxxxxxxxx> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@xxxxxxxxx> > --- > v4 > Link: https://lore.kernel.org/all/20250625101014.45066-1-kerneljasonxing@xxxxxxxxx/ > 1. use the common helper > 2. keep align with the zc mode usage in xsk_tx_release() > > v3 > Link: https://lore.kernel.org/all/20250623073129.23290-1-kerneljasonxing@xxxxxxxxx/ > 1. use xskq_has_descs helper. > 2. add selftest > > V2 > Link: https://lore.kernel.org/all/20250619093641.70700-1-kerneljasonxing@xxxxxxxxx/ > 1. filter out those good cases because only those that return error need > updates. > Side note: > 1. in non-batched zero copy mode, at the end of every caller of > xsk_tx_peek_desc(), there is always a xsk_tx_release() function that used > to update the local consumer to the global state of consumer. So for the > zero copy mode, no need to change at all. > 2. Actually I have no strong preference between v1 (see the above link) > and v2 because smp_store_release() shouldn't cause side effect. > Considering the exactitude of writing code, v2 is a more preferable > one. > --- > net/xdp/xsk.c | 17 ++++++++++------- > 1 file changed, 10 insertions(+), 7 deletions(-) > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > index 72c000c0ae5f..bd61b0bc9c24 100644 > --- a/net/xdp/xsk.c > +++ b/net/xdp/xsk.c > @@ -300,6 +300,13 @@ static bool xsk_tx_writeable(struct xdp_sock *xs) > return true; > } > > +static void __xsk_tx_release(struct xdp_sock *xs) > +{ > + __xskq_cons_release(xs->tx); > + if (xsk_tx_writeable(xs)) > + xs->sk.sk_write_space(&xs->sk); > +} > + > static bool xsk_is_bound(struct xdp_sock *xs) > { > if (READ_ONCE(xs->state) == XSK_BOUND) { > @@ -407,11 +414,8 @@ void xsk_tx_release(struct xsk_buff_pool *pool) > struct xdp_sock *xs; > > rcu_read_lock(); > - list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) { > - __xskq_cons_release(xs->tx); > - if (xsk_tx_writeable(xs)) > - xs->sk.sk_write_space(&xs->sk); > - } > + list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) > + __xsk_tx_release(xs); > rcu_read_unlock(); > } > EXPORT_SYMBOL(xsk_tx_release); > @@ -858,8 +862,7 @@ static int __xsk_generic_xmit(struct sock *sk) > > out: > if (sent_frame) > - if (xsk_tx_writeable(xs)) > - sk->sk_write_space(sk); > + __xsk_tx_release(xs); > > mutex_unlock(&xs->mutex); > return err; > -- > 2.41.3 >