On Wed, Jun 18, 2025 at 8:29 AM Jason Xing <kerneljasonxing@xxxxxxxxx> wrote: > > On Wed, Jun 18, 2025 at 1:46 AM Stanislav Fomichev <stfomichev@xxxxxxxxx> wrote: > > > > On 06/17, Jason Xing wrote: > > > Hi Stanislav, > > > > > > On Tue, Jun 17, 2025 at 9:11 AM Stanislav Fomichev <stfomichev@xxxxxxxxx> wrote: > > > > > > > > On 06/17, Jason Xing wrote: > > > > > From: Jason Xing <kernelxing@xxxxxxxxxxx> > > > > > > > > > > Introduce a control method in the xsk path to let users have the chance > > > > > to tune it manually. > > > > > > > > Can you expand more on why the defaults don't work for you? > > > > > > We use a user-level tcp stack with xsk to transmit packets that have > > > higher priorities than other normal kernel tcp flows. It turns out > > > that enlarging the number can minimize times of triggering sendto > > > sysctl, which contributes to faster transmission. it's very easy to > > > hit the upper bound (namely, 32) if you log the return value of > > > sendto. I mentioned a bit about this in the second patch, saying that > > > we can have a similar knob already appearing in the qdisc layer. > > > Furthermore, exposing important parameters can help applications > > > complete their AI/auto-tuning to judge which one is the best fit in > > > their production workload. That is also one of the promising > > > tendencies :) > > > > > > > > > > > Also, can we put these settings into the socket instead of (global/ns) > > > > sysctl? > > > > > > As to MAX_PER_SOCKET_BUDGET, it seems not easy to get its > > > corresponding netns? I have no strong opinion on this point for now. > > To add to that, after digging into this part, I realized that we're > able to use sock_net(sk)->core.max_tx_budget directly to finish the > namespace stuff because xsk_create() calls sk_alloc() which correlates > its netns in the sk->sk_net. Sounds reasonable? Updated patch here: https://lore.kernel.org/all/20250618065553.96822-1-kerneljasonxing@xxxxxxxxx/ Please review :0 Thanks, Jason > > > > > I'm suggesting something along these lines (see below). And then add > > some way to configure it (plus, obviously, set the default value > > on init). There is also a question on whether you need separate > > values for MAX_PER_SOCKET_BUDGET and TX_BATCH_SIZE, and if yes, > > For now, actually I don't see a specific reason to separate them, so > let me use a single one in V2. My use case only expects to see the > TX_BATCH_SIZE adjustment. > > > then why. > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > > index 72c000c0ae5f..fb2caec9914d 100644 > > --- a/net/xdp/xsk.c > > +++ b/net/xdp/xsk.c > > @@ -424,7 +424,7 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc) > > rcu_read_lock(); > > again: > > list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) { > > - if (xs->tx_budget_spent >= MAX_PER_SOCKET_BUDGET) { > > + if (xs->tx_budget_spent >= xs->max_tx_budget) { > > If we implement it like this, xs->max_tx_budget has to read a > per-netns from somewhere and then initialize it. The core problem > still remains: where to store the per netns value. > > Do you think using the aforementioned sock_net(sk)->core.max_tx_budget > is possible? > > Thanks, > Jason > > > budget_exhausted = true; > > continue; > > } > > @@ -779,7 +779,6 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, > > static int __xsk_generic_xmit(struct sock *sk) > > { > > struct xdp_sock *xs = xdp_sk(sk); > > - u32 max_batch = TX_BATCH_SIZE; > > bool sent_frame = false; > > struct xdp_desc desc; > > struct sk_buff *skb; > > @@ -797,7 +796,7 @@ static int __xsk_generic_xmit(struct sock *sk) > > goto out; > > > > while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) { > > - if (max_batch-- == 0) { > > + if (xs->max_tx_budget-- == 0) { > > err = -EAGAIN; > > goto out; > > }