On 5/27/25 11:48 AM, Paul Chaignon wrote: > During ILA address translations, the L4 checksums can be handled in > different ways. One of them, adj-transport, consist in parsing the > transport layer and updating any found checksum. This logic relies on > inet_proto_csum_replace_by_diff and produces an incorrect skb->csum when > in state CHECKSUM_COMPLETE. > > This bug can be reproduced with a simple ILA to SIR mapping, assuming > packets are received with CHECKSUM_COMPLETE: > > $ ip a show dev eth0 > 14: eth0@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 > link/ether 62:ae:35:9e:0f:8d brd ff:ff:ff:ff:ff:ff link-netnsid 0 > inet6 3333:0:0:1::c078/64 scope global > valid_lft forever preferred_lft forever > inet6 fd00:10:244:1::c078/128 scope global nodad > valid_lft forever preferred_lft forever > inet6 fe80::60ae:35ff:fe9e:f8d/64 scope link proto kernel_ll > valid_lft forever preferred_lft forever > $ ip ila add loc_match fd00:10:244:1 loc 3333:0:0:1 \ > csum-mode adj-transport ident-type luid dev eth0 > > Then I hit [fd00:10:244:1::c078]:8000 with a server listening only on > [3333:0:0:1::c078]:8000. With the bug, the SYN packet is dropped with > SKB_DROP_REASON_TCP_CSUM after inet_proto_csum_replace_by_diff changed > skb->csum. The translation and drop are visible on pwru [1] traces: > > IFACE TUPLE FUNC > eth0:9 [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp) ipv6_rcv > eth0:9 [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp) ip6_rcv_core > eth0:9 [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp) nf_hook_slow > eth0:9 [fd00:10:244:3::3d8]:51420->[fd00:10:244:1::c078]:8000(tcp) inet_proto_csum_replace_by_diff > eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) tcp_v6_early_demux > eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ip6_route_input > eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ip6_input > eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ip6_input_finish > eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ip6_protocol_deliver_rcu > eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) raw6_local_deliver > eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) ipv6_raw_deliver > eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) tcp_v6_rcv > eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) __skb_checksum_complete > eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) kfree_skb_reason(SKB_DROP_REASON_TCP_CSUM) > eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) skb_release_head_state > eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) skb_release_data > eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) skb_free_head > eth0:9 [fd00:10:244:3::3d8]:51420->[3333:0:0:1::c078]:8000(tcp) kfree_skbmem > > This is happening because inet_proto_csum_replace_by_diff is updating > skb->csum when it shouldn't. The L4 checksum is updated such that it > "cancels" the IPv6 address change in terms of checksum computation, so > the impact on skb->csum is null. > > Note this would be different for an IPv4 packet since three fields > would be updated: the IPv4 address, the IP checksum, and the L4 > checksum. Two would cancel each other and skb->csum would still need > to be updated to take the L4 checksum change into account. > > This patch fixes it by passing an ipv6 flag to > inet_proto_csum_replace_by_diff, to skip the skb->csum update if we're > in the IPv6 case. Note the behavior of the only other user of > inet_proto_csum_replace_by_diff, the BPF subsystem, is left as is in > this patch and fixed in the subsequent patch. > > With the fix, using the reproduction from above, I can confirm > skb->csum is not touched by inet_proto_csum_replace_by_diff and the TCP > SYN proceeds to the application after the ILA translation. > > 1 - https://github.com/cilium/pwru > Fixes: 65d7ab8de582 ("net: Identifier Locator Addressing module") > Signed-off-by: Paul Chaignon <paul.chaignon@xxxxxxxxx> Patch 2 does not apply cleanly anymore, please rebase. While at it, please also replace: 1 - https://github.com/cilium/pwru with a more customary tag: Link: https://github.com/cilium/pwru [1] Thanks, Paolo