On Jul 28, Jakub Kicinski wrote: > On Mon, 28 Jul 2025 12:53:01 +0200 Lorenzo Bianconi wrote: > > > > I can see why you might think that, but from my perspective, the > > > > xdp_frame *is* the implementation of the mini-SKB concept. We've been > > > > building it incrementally for years. It started as the most minimal > > > > structure possible and has gradually gained more context (e.g. dev_rx, > > > > mem_info/rxq_info, flags, and also uses skb_shared_info with same layout > > > > as SKB). > > > > > > My understanding was that just adding all the fields to xdp_frame was > > > considered too wasteful. Otherwise we would have done something along > > > those lines ~10 years ago :S > > > > Hi Jakub, > > > > sorry for the late reply. > > I am completely fine to redesign the solution to overcome the problem but I > > guess this feature will allow us to improve XDP performance in a common/real > > use-case. Let's consider we want to redirect a packet into a veth and then into > > a container. Preserving the hw metadata performing XDP_REDIRECT will allow us > > to avoid recalculating the checksum creating the skb. This will result in a > > very nice performance improvement. > > So I guess we should really come up with some idea to add this missing feature. > > I don't think the counter-proposal prevents that. As long as veth > supports "set" callbacks the program can transfer the metadata over > to the veth and the second program at veth can communicate them to > the driver. IIUC the 'set' proposal (please correct me if I am wrong), the eBPF program running on the NIC that is receiving the packet from the wire is supposed to set (or update) the hw metadata info (e.g. RX HASH or RX checksum) in the RX DMA descriptor associated to the packet to be successively consumed. Am I right? I think this approach works fine if the SKB is created locally in the NAPI loop of the receiving driver (e.g if the eBPF program bounded on the NIC is returning XDP_PASS) but I guess it does not work if the packet is redirected into a remote CPU or a remote device (e.g. veth). Considering the veth use-case, veth_ndo_xdp_xmit() enqueues the packet into a ptr_ring and schedule a NAPI. When the NAPI runs I guess the DMA descriptor originally associated to the packet has been already queued back to the hw ring to be consumed for a following packet. In order to be able to easily consume these hw metadata I guess we should store these info in the same packet buffer. Am I missing something? Regards, Lorenzo > > Martin mentioned to me that he had proposed in the past that we allow > allocating the skb at the XDP level, if the program needs "skb-level > metadata". That actually seems pretty clean to me.. Was it ever > explored?
Attachment:
signature.asc
Description: PGP signature