Hi Maciej, On Wed, Jun 25, 2025 at 7:09 PM Maciej Fijalkowski <maciej.fijalkowski@xxxxxxxxx> wrote: > > On Wed, Jun 25, 2025 at 06:10:13PM +0800, Jason Xing wrote: > > From: Jason Xing <kernelxing@xxxxxxxxxxx> > > > > For afxdp, the return value of sendto() syscall doesn't reflect how many > > descs handled in the kernel. One of use cases is that when user-space > > application tries to know the number of transmitted skbs and then decides > > if it continues to send, say, is it stopped due to max tx budget? > > > > The following formular can be used after sending to learn how many > > skbs/descs the kernel takes care of: > > > > tx_queue.consumers_before - tx_queue.consumers_after > > > > Prior to the current patch, in non-zc mode, the consumer of tx queue is > > not immediately updated at the end of each sendto syscall when error > > occurs, which leads to the consumer value out-of-dated from the perspective > > of user space. So this patch requires store operation to pass the cached > > value to the shared value to handle the problem. > > > > More than those explicit errors appearing in the while() loop in > > __xsk_generic_xmit(), there are a few possible error cases that might > > be neglected in the following call trace: > > __xsk_generic_xmit() > > xskq_cons_peek_desc() > > xskq_cons_read_desc() > > xskq_cons_is_valid_desc() > > It will also cause the premature exit in the while() loop even if not > > all the descs are consumed. > > > > Based on the above analysis, using 'cached_prod != cached_cons' could > > cover all the possible cases because it represents there are remaining > > descs that are not handled and cached_cons are not updated to the global > > state of consumer at this time. > > > > Signed-off-by: Jason Xing <kernelxing@xxxxxxxxxxx> > > --- > > v3 > > Link: https://lore.kernel.org/all/20250623073129.23290-1-kerneljasonxing@xxxxxxxxx/ > > 1. use xskq_has_descs helper. > > 2. add selftest > > > > V2 > > Link: https://lore.kernel.org/all/20250619093641.70700-1-kerneljasonxing@xxxxxxxxx/ > > 1. filter out those good cases because only those that return error need > > updates. > > Side note: > > 1. in non-batched zero copy mode, at the end of every caller of > > xsk_tx_peek_desc(), there is always a xsk_tx_release() function that used > > to update the local consumer to the global state of consumer. So for the > > zero copy mode, no need to change at all. > > 2. Actually I have no strong preference between v1 (see the above link) > > and v2 because smp_store_release() shouldn't cause side effect. > > Considering the exactitude of writing code, v2 is a more preferable > > one. > > --- > > net/xdp/xsk.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > > index 5542675dffa9..ab6351b24ac8 100644 > > --- a/net/xdp/xsk.c > > +++ b/net/xdp/xsk.c > > @@ -856,6 +856,9 @@ static int __xsk_generic_xmit(struct sock *sk) > > } > > > > out: > > + if (xskq_has_descs(xs->tx)) > > + __xskq_cons_release(xs->tx); > > + > > if (sent_frame) > > if (xsk_tx_writeable(xs)) > > sk->sk_write_space(sk); > > Hi Jason, > IMHO below should be enough to address the issue: Sure, it can. Can I ask one more thing? Technically it's not considered a bug, right? I'm not sure if it's worth telling the stable team to backport in older versions. > > if (sent_frame) { Using this condition means the consumer is updated in majority cases including those good cases [1]. The intention of the current patch is to update the consumer only when the error occurs because in other cases xskq_cons_peek_desc() does it. [1]: https://lore.kernel.org/all/aFVr60tw3QJopcOo@mini-arch/ > __xskq_cons_release(xs->tx); > if (xsk_tx_writeable(xs)) > sk->sk_write_space(sk); > } > > which basically is what xsk_tx_release() does for each tx socket in list. > zc drivers call it whenever there was a single descriptor produced to HW > ring. So should we on generic xmit side, based on @sent_frame. As you said, they would be the same :) > > We could even wrap these 3 lines onto internal function, say > __xsk_tx_release() and use it in xsk_tx_release() as well. I can do it in the next respin. But I have no obvious opinion on how to write it. If no one is opposed to the taste of patch, I will follow your advice. Thanks. Thanks, Jason > > > -- > > 2.41.3 > >