Re: [PATCH bpf-next V2 0/7] xdp: Allow BPF to set RX hints for XDP_REDIRECTed packets

Jesper Dangaard Brouer <hawk@xxxxxxxxxx> · Tue, 29 Jul 2025 13:15:53 +0200

On 28/07/2025 18.29, Jakub Kicinski wrote:
On Mon, 28 Jul 2025 12:53:01 +0200 Lorenzo Bianconi wrote:
I can see why you might think that, but from my perspective, the
xdp_frame *is* the implementation of the mini-SKB concept. We've been
building it incrementally for years. It started as the most minimal
structure possible and has gradually gained more context (e.g. dev_rx,
mem_info/rxq_info, flags, and also uses skb_shared_info with same layout
as SKB).

My understanding was that just adding all the fields to xdp_frame was
considered too wasteful. Otherwise we would have done something along
those lines ~10 years ago :S

Hi Jakub,

sorry for the late reply.

Same, back from vacation.

I am completely fine to redesign the solution to overcome the problem but I
guess this feature will allow us to improve XDP performance in a common/real
use-case. Let's consider we want to redirect a packet into a veth and then into
a container. Preserving the hw metadata performing XDP_REDIRECT will allow us
to avoid recalculating the checksum creating the skb. This will result in a
very nice performance improvement.
So I guess we should really come up with some idea to add this missing feature.

Martin mentioned to me that he had proposed in the past that we allow
allocating the skb at the XDP level, if the program needs "skb-level
metadata". That actually seems pretty clean to me.. Was it ever
explored?

That idea has been considered before, but it unfortunately doesn't work
from a performance angle. The performance model of XDP_REDIRECT into
CPUMAP relies on moving the expensive SKB allocation+init to a remote
CPU. This keeps the ingress CPU free to process packets at near line
rate (our DDoS use-case). If we allocate the SKB on the ingress-CPU
before the redirect, we destroy this load-balancing model and create the
exact bottleneck we designed CPUMAP to avoid.

To bring the focus back to the specific problem this series solves,
let's review the concrete use case. Our IPsec scenario is a key example:
on the ingress CPU, an XDP program calculates a hash from inner packet
headers to load-balance traffic via CPUMAP. When the packet arrives on
the remote CPU, this hash is lost, so the new SKB is created with a hash
of zero. This, in turn, causes poor load-balancing when the packet is
forwarded to a multi-queue device like veth, as traffic often collapses
to a single queue. The purpose of this patchset is simply to provide a
standard way to carry that hash to the remote CPU within the xdp_frame.
(Same goes for a standard way to carry VLAN tags)

Given this specific problem, is there a better approach to solving it
than what this patchset proposes?

--Jesper