From: Maciej Fijalkowski <maciej.fijalkowski@xxxxxxxxx> Date: Tue, 11 Mar 2025 16:50:07 +0100 > On Wed, Mar 05, 2025 at 05:21:30PM +0100, Alexander Lobakin wrote: >> Use libeth XDP infra to support running XDP program on Rx polling. >> This includes all of the possible verdicts/actions. >> XDP Tx queues are cleaned only in "lazy" mode when there are less than >> 1/4 free descriptors left on the ring. libeth helper macros to define >> driver-specific XDP functions make sure the compiler could uninline >> them when needed. [...] >> +/** >> + * idpf_clean_xdp_irq - Reclaim a batch of TX resources from completed XDP_TX >> + * @_xdpq: XDP Tx queue >> + * @budget: maximum number of descriptors to clean >> + * >> + * Returns number of cleaned descriptors. >> + */ >> +static u32 idpf_clean_xdp_irq(void *_xdpq, u32 budget) >> +{ >> + struct libeth_xdpsq_napi_stats ss = { }; >> + struct idpf_tx_queue *xdpq = _xdpq; >> + u32 tx_ntc = xdpq->next_to_clean; >> + u32 tx_cnt = xdpq->desc_count; >> + struct xdp_frame_bulk bq; >> + struct libeth_cq_pp cp = { >> + .dev = xdpq->dev, >> + .bq = &bq, >> + .xss = &ss, >> + .napi = true, >> + }; >> + u32 done_frames; >> + >> + done_frames = idpf_xdpsq_poll(xdpq, budget); > > nit: maybe pass {tx_ntc, tx_cnt} to the above? Not folloween... =\ > >> + if (unlikely(!done_frames)) >> + return 0; >> + >> + xdp_frame_bulk_init(&bq); >> + >> + for (u32 i = 0; likely(i < done_frames); i++) { >> + libeth_xdp_complete_tx(&xdpq->tx_buf[tx_ntc], &cp); >> + >> + if (unlikely(++tx_ntc == tx_cnt)) >> + tx_ntc = 0; >> + } >> + >> + xdp_flush_frame_bulk(&bq); >> + >> + xdpq->next_to_clean = tx_ntc; >> + xdpq->pending -= done_frames; >> + xdpq->xdp_tx -= cp.xdp_tx; > > not following this variable. __libeth_xdp_complete_tx() decresases > libeth_cq_pp::xdp_tx by libeth_sqe::nr_frags. can you shed more light > what's going on here? libeth_sqe::nr_frags is not the same as skb_shared_info::nr_frags, it equals to 1 when there's only 1 fragment. Basically, xdp_tx field is the number of pending XDP-non-XSk descriptors. When it's zero, we don't traverse Tx descriptors at all on XSk completion (thx to splitq). > >> + >> + return done_frames; >> +} >> + >> +static u32 idpf_xdp_tx_prep(void *_xdpq, struct libeth_xdpsq *sq) >> +{ >> + struct idpf_tx_queue *xdpq = _xdpq; >> + u32 free; >> + >> + libeth_xdpsq_lock(&xdpq->xdp_lock); >> + >> + free = xdpq->desc_count - xdpq->pending; >> + if (free <= xdpq->thresh) >> + free += idpf_clean_xdp_irq(xdpq, xdpq->thresh); >> + >> + *sq = (struct libeth_xdpsq){ > > could you have libeth_xdpsq embedded in idpf_tx_queue and avoid that > initialization? Not really. &libeth_xdpsq, same as &libeth_fq et al, has only a few fields grouped together, while in driver's queue structure they can (and likely will be) be scattered across cachelines. This initialization is cheap anyway, &libeth_xdpsq exists only inside __always_inline helpers, so it might not even be present in the bytecode. > >> + .sqes = xdpq->tx_buf, >> + .descs = xdpq->desc_ring, >> + .count = xdpq->desc_count, >> + .lock = &xdpq->xdp_lock, >> + .ntu = &xdpq->next_to_use, >> + .pending = &xdpq->pending, >> + .xdp_tx = &xdpq->xdp_tx, >> + }; >> + >> + return free; >> +} Thanks, Olek