Maciej Żenczykowski wrote: > On Sat, Jun 7, 2025 at 10:47 PM Jakub Kicinski <kuba@xxxxxxxxxx> wrote: > > > > A not-so-careful NAT46 BPF program can crash the kernel > > if it indiscriminately flips ingress packets from v4 to v6: > > > > BUG: kernel NULL pointer dereference, address: 0000000000000000 > > ip6_rcv_core (net/ipv6/ip6_input.c:190:20) > > ipv6_rcv (net/ipv6/ip6_input.c:306:8) > > process_backlog (net/core/dev.c:6186:4) > > napi_poll (net/core/dev.c:6906:9) > > net_rx_action (net/core/dev.c:7028:13) > > do_softirq (kernel/softirq.c:462:3) > > netif_rx (net/core/dev.c:5326:3) > > dev_loopback_xmit (net/core/dev.c:4015:2) > > ip_mc_finish_output (net/ipv4/ip_output.c:363:8) > > NF_HOOK (./include/linux/netfilter.h:314:9) > > ip_mc_output (net/ipv4/ip_output.c:400:5) > > dst_output (./include/net/dst.h:459:9) > > ip_local_out (net/ipv4/ip_output.c:130:9) > > ip_send_skb (net/ipv4/ip_output.c:1496:8) > > udp_send_skb (net/ipv4/udp.c:1040:8) > > udp_sendmsg (net/ipv4/udp.c:1328:10) > > > > The output interface has a 4->6 program attached at ingress. > > We try to loop the multicast skb back to the sending socket. > > Ingress BPF runs as part of netif_rx(), pushes a valid v6 hdr > > and changes skb->protocol to v6. We enter ip6_rcv_core which > > tries to use skb_dst(). But the dst is still an IPv4 one left > > after IPv4 mcast output. > > > > Clear the dst in all BPF helpers which change the protocol. > > Also clear the dst if we did an encap or decap as those > > will most likely make the dst stale. > > Try to preserve metadata dsts, those may carry non-routing > > metadata. > > > > Reviewed-by: Maciej Żenczykowski <maze@xxxxxxxxxx> > > Acked-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx> > > Fixes: d219df60a70e ("bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()") > > Fixes: 1b00e0dfe7d0 ("bpf: update skb->protocol in bpf_skb_net_grow") > > Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper") > > Signed-off-by: Jakub Kicinski <kuba@xxxxxxxxxx> Reviewed-by: Willem de Bruijn <willemb@xxxxxxxxxx> > > --- > > v2: > > - drop on encap/decap > > - fix typo (protcol) > > - add the test to the Makefile > > v1: https://lore.kernel.org/20250604210604.257036-1-kuba@xxxxxxxxxx > > > > I wonder if we should not skip ingress (tc_skip_classify?) > > for looped back packets in the first place. But that doesn't > > seem robust enough vs multiple redirections to solve the crash. > > > > Ignoring LOOPBACK packets (like the NAT46 prog should) doesn't > > work either, since BPF can change pkt_type arbitrarily. > > > > CC: martin.lau@xxxxxxxxx > > CC: daniel@xxxxxxxxxxxxx > > CC: john.fastabend@xxxxxxxxx > > CC: eddyz87@xxxxxxxxx > > CC: sdf@xxxxxxxxxxx > > CC: haoluo@xxxxxxxxxx > > CC: willemb@xxxxxxxxxx > > CC: william.xuanziyang@xxxxxxxxxx > > CC: alan.maguire@xxxxxxxxxx > > CC: bpf@xxxxxxxxxxxxxxx > > CC: edumazet@xxxxxxxxxx > > CC: maze@xxxxxxxxxx > > CC: shuah@xxxxxxxxxx > > CC: linux-kselftest@xxxxxxxxxxxxxxx > > CC: yonghong.song@xxxxxxxxx > > --- > > tools/testing/selftests/net/Makefile | 1 + > > net/core/filter.c | 31 +++++++++++++++++++------- > > tools/testing/selftests/net/nat6to4.sh | 15 +++++++++++++ > > 3 files changed, 39 insertions(+), 8 deletions(-) > > create mode 100755 tools/testing/selftests/net/nat6to4.sh > > > > diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile > > index ea84b88bcb30..ab996bd22a5f 100644 > > --- a/tools/testing/selftests/net/Makefile > > +++ b/tools/testing/selftests/net/Makefile > > @@ -27,6 +27,7 @@ TEST_PROGS += amt.sh > > TEST_PROGS += unicast_extensions.sh > > TEST_PROGS += udpgro_fwd.sh > > TEST_PROGS += udpgro_frglist.sh > > +TEST_PROGS += nat6to4.sh > > TEST_PROGS += veth.sh > > TEST_PROGS += ioam6.sh > > TEST_PROGS += gro.sh > > diff --git a/net/core/filter.c b/net/core/filter.c > > index 327ca73f9cd7..d5917d6446f2 100644 > > --- a/net/core/filter.c > > +++ b/net/core/filter.c > > @@ -3406,8 +3406,14 @@ BPF_CALL_3(bpf_skb_change_proto, struct sk_buff *, skb, __be16, proto, > > * need to be verified first. > > */ > > ret = bpf_skb_proto_xlat(skb, proto); > > + if (ret) > > + return ret; > > + > > bpf_compute_data_pointers(skb); > > - return ret; I wonder whether that unconditional call to bpf_compute_data_pointers even if ret was there for a reason.