Stanislav Fomichev wrote: > On 09/10, Tobias Böhm wrote: > > Hi, > > > > when decapsulating VXLAN packets with bpf_skb_adjust_room and redirecting to > > a tap device I observed unexpected segmentation. > > > > In my setup there is a sched_cls program attached at the ingress path of a > > physical NIC with GRO enabled. Packets are redirected either directly for > > plain traffic, or decapsulated beforehand in case of VXLAN. Decapsulation is > > done by bpf_skb_adjust_room with BPF_F_ADJ_ROOM_DECAP_L3_IPV4. > > > > For both kinds of traffic GRO on the physical NIC works as expected > > resulting in merged packets. > > > > Large non-decapsulated packets are transmitted directly on the tap interface > > as expected. But surprisingly, decapsulated packets are being segmented > > again before transmission. > > > > When analyzing and comparing the call chains I observed that > > netif_skb_features returns different values for the different kind of > > traffic. > > > > The tap devices have the following features set: > > > > dev->features = 0x1558c9 > > dev->hw_enc_features = 0x10000001 > > > > For the non-decapsulated traffic netif_skb_features returns 0x1558c9 but for > > the decapsulated traffic it returns 0x1. This is same value as the result of > > "dev->features & dev->hw_enc_features". > > > > In netif_skb_features this operation effectively happens in case > > skb->encapsulation is set. Inspecting the skb in both cases showed that in > > case of decapsulation the skb->encapsulation flag was indeed still set. > > > > I wonder if there is a reason that the skb->encapsulation flag is not unset > > in bpf_skb_net_shrink when BPF_F_ADJ_ROOM_DECAP_* flags are present? Since > > skb->encapsulation is set in bpf_skb_net_grow when adding space for > > encapsulation my expectation would be that the flag is also unset when doing > > the opposite operation. > > + Willem and netdev for visibility. I think it just has not been implemented before. The encap path is more strict. Besides setting skb->encapsulation, it also initializes the inner_.. helpers. The decap path does not do this, it expects IPIP packets to arrive from the network, without the stack detecting them as such or setting skb->encapsulation. We must preserve that behavior. But we additionally can detect skbs with encapsulation fields configured, and convert those. The encap path also explicit UDP_L4 and GRE flags to update GSO packets. For VXLAN decap, we probably need the same?