Re: [PATCH net-next v3] net: xsk: introduce XDP_MAX_TX_BUDGET set/getsockopt

Jason Xing <kerneljasonxing@xxxxxxxxx> · Fri, 20 Jun 2025 23:03:50 +0800

On Fri, Jun 20, 2025 at 9:50 PM Willem de Bruijn
<willemdebruijn.kernel@xxxxxxxxx> wrote:
>
> Jason Xing wrote:
> > On Thu, Jun 19, 2025 at 11:09 PM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
> > >
> > > On Thu, 19 Jun 2025 17:04:40 +0800 Jason Xing wrote:
> > > > @@ -424,7 +421,9 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc)
> > > >       rcu_read_lock();
> > > >  again:
> > > >       list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) {
> > > > -             if (xs->tx_budget_spent >= MAX_PER_SOCKET_BUDGET) {
> > > > +             int max_budget = READ_ONCE(xs->max_tx_budget);
> > > > +
> > > > +             if (xs->tx_budget_spent >= max_budget) {
> > > >                       budget_exhausted = true;
> > > >                       continue;
> > > >               }
> > > > @@ -779,7 +778,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> > > >  static int __xsk_generic_xmit(struct sock *sk)
> > > >  {
> > > >       struct xdp_sock *xs = xdp_sk(sk);
> > > > -     u32 max_batch = TX_BATCH_SIZE;
> > > > +     u32 max_budget = READ_ONCE(xs->max_tx_budget);
> > >
> > > Hm, maybe a question to Stan / Willem & other XSK experts but are these
> > > two max values / code paths really related? Question 2 -- is generic
> > > XSK a legit optimization target, legit enough to add uAPI?
> >
> > I'm not an expert but my take is:
> > #1, I don't see the correlation actually while I don't see any reason
> > to use the different values for both of them.
> > #2, These two definitions are improvement points because whether to do
> > the real send is driven by calling sendto(). Enlarging a little bit of
> > this value could save many times of calling sendto(). As for the uAPI,
> > I don't know if it's worth it, sorry. If not, the previous version 2
> > patch (regarding per-netns policy) will be revived.
> >
> > So I will leave those two questions to XSK experts as well.
>
> You're proposing the code change, so I think it's on you to make
> this argument?
>
> > #2 quantification
> > It's really hard to do so mainly because of various stacks implemented
> > in the user-space. AF_XDP is providing a fundamental mechanism only
> > and its upper layer is prosperous.
>
> I think it's a hard sell to argue adding a tunable, if no plausible
> recommendation can be given on how the tunable is to be used.

Actually I mentioned it in the commit message. One of advantages is to
contribute to less frequencies of sendto() and overall higher
transmission speed.

>
> It's not necessary, and most cases infeasible, to give a heuristic
> that fits all possible users. But at a minimum the one workload that
> prompted the patch. What value do you set it to and how did you
> arrive at that number?

One naive question from me is why the number of packets to be sent is
definitely required to be limited within a small number by default?
Let me set tcp as an example, a simple sendmsg call will not be
stopped because of the hardcoded limitation.

For one application I saw, I suggested using 128 because I saw two
limitations without changing any default configuration: 1)
XDP_MAX_TX_BUDGET, 2) socket sndbuf which is 212992 decided by
net.core.wmem_default. As to XDP_MAX_TX_BUDGET, the scenario behind
this was I counted how many desc are transmitted to the driver at one
time of sendto() based on [1] patch and then I calculated the
possibility of hitting the upper bound. Finally I chose 128 as a
suitable value because 1) it covers most of the cases, 2) a higher
number would not bring evident results.

[1]: https://lore.kernel.org/all/20250619093641.70700-1-kerneljasonxing@xxxxxxxxx/

Thanks,
Jason