Hi,
when decapsulating VXLAN packets with bpf_skb_adjust_room and
redirecting to a tap device I observed unexpected segmentation.
In my setup there is a sched_cls program attached at the ingress path of
a physical NIC with GRO enabled. Packets are redirected either directly
for plain traffic, or decapsulated beforehand in case of VXLAN.
Decapsulation is done by bpf_skb_adjust_room with
BPF_F_ADJ_ROOM_DECAP_L3_IPV4.
For both kinds of traffic GRO on the physical NIC works as expected
resulting in merged packets.
Large non-decapsulated packets are transmitted directly on the tap
interface as expected. But surprisingly, decapsulated packets are being
segmented again before transmission.
When analyzing and comparing the call chains I observed that
netif_skb_features returns different values for the different kind of
traffic.
The tap devices have the following features set:
dev->features = 0x1558c9
dev->hw_enc_features = 0x10000001
For the non-decapsulated traffic netif_skb_features returns 0x1558c9 but
for the decapsulated traffic it returns 0x1. This is same value as the
result of "dev->features & dev->hw_enc_features".
In netif_skb_features this operation effectively happens in case
skb->encapsulation is set. Inspecting the skb in both cases showed that
in case of decapsulation the skb->encapsulation flag was indeed still set.
I wonder if there is a reason that the skb->encapsulation flag is not
unset in bpf_skb_net_shrink when BPF_F_ADJ_ROOM_DECAP_* flags are
present? Since skb->encapsulation is set in bpf_skb_net_grow when adding
space for encapsulation my expectation would be that the flag is also
unset when doing the opposite operation.
Thanks,
Tobias