Re: Performance impact of disabling VLAN offload [was: Re: [PATCH bpf-next V1 7/7] net: xdp: update documentation for xdp-rx-metadata.rst]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jesper Dangaard Brouer <hawk@xxxxxxxxxx> writes:

> On 17/06/2025 17.10, Toke Høiland-Jørgensen wrote:
>>> Later we will look at using the vlan tag. Today we have disabled HW
>>> vlan-offloading, because XDP originally didn't support accessing HW vlan
>>> tags.
>> 
>> Side note (with changed subject to disambiguate): Do you have any data
>> on the performance impact of disabling VLAN offload that you can share?
>> I've been sort of wondering whether saving those couple of bytes has any
>> measurable impact on real workloads (where you end up looking at the
>> headers anyway, so saving the cache miss doesn't matter so much)?
>> 
>
> Our production setup have two different VLAN IDs, one for INTERNAL-ID
> and one for EXTERNAL-ID (Internet) traffic.  On (many) servers this is
> on the same physical net_device.
>
> Our Unimog XDP load-balancer *only* handles EXTERNAL-ID.  Thus, the very
> first thing Unimog does is checking the VLAN ID.  If this doesn't match
> EXTERNAL-ID it returns XDP_PASS.  This is the first time packet data
> area is read which (due to our AMD-CPUs) will be a cache-miss.
>
> If this were INTERNAL-ID then we have caused a cache-miss earlier than
> needed.  The NIC driver have already started a net_prefetch.  Thus, if
> we can return XDP_PASS without touching packet data, then we can
> (latency) hide part of the cache-miss (behind SKB-zero-ing). (We could
> also CPUMAP redirect the INTERNAL-ID to a remote CPU for further gains).
>   Using the kfunc (bpf_xdp_metadata_rx_vlan_tag[1]) for reading VLAN ID
> doesn't touch/read packet data.
>
> I hope this makes it clear why reading the HW offloaded VLAN tag from
> the RX-descriptor is a performance benefit?

Right, I can certainly see the argument, but I was hoping you'd have
some data to quantify exactly how much of a difference this makes? :)

Also, I guess this XDP-based early demux is a bit special as far as this
use case is concerned? For regular net-stack usage of the VLAN field,
we'll already have touched the packet data while building the skb; so
the difference will be less, as it shouldn't be a cache miss. Which
doesn't invalidate your use case, of course, it just makes it different...

-Toke






[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux