Re: [PATCH net v2] net: clear the dst when changing skb protocol

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Maciej Żenczykowski wrote:
> On Sat, Jun 7, 2025 at 10:47 PM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
> >
> > A not-so-careful NAT46 BPF program can crash the kernel
> > if it indiscriminately flips ingress packets from v4 to v6:
> >
> >   BUG: kernel NULL pointer dereference, address: 0000000000000000
> >     ip6_rcv_core (net/ipv6/ip6_input.c:190:20)
> >     ipv6_rcv (net/ipv6/ip6_input.c:306:8)
> >     process_backlog (net/core/dev.c:6186:4)
> >     napi_poll (net/core/dev.c:6906:9)
> >     net_rx_action (net/core/dev.c:7028:13)
> >     do_softirq (kernel/softirq.c:462:3)
> >     netif_rx (net/core/dev.c:5326:3)
> >     dev_loopback_xmit (net/core/dev.c:4015:2)
> >     ip_mc_finish_output (net/ipv4/ip_output.c:363:8)
> >     NF_HOOK (./include/linux/netfilter.h:314:9)
> >     ip_mc_output (net/ipv4/ip_output.c:400:5)
> >     dst_output (./include/net/dst.h:459:9)
> >     ip_local_out (net/ipv4/ip_output.c:130:9)
> >     ip_send_skb (net/ipv4/ip_output.c:1496:8)
> >     udp_send_skb (net/ipv4/udp.c:1040:8)
> >     udp_sendmsg (net/ipv4/udp.c:1328:10)
> >
> > The output interface has a 4->6 program attached at ingress.
> > We try to loop the multicast skb back to the sending socket.
> > Ingress BPF runs as part of netif_rx(), pushes a valid v6 hdr
> > and changes skb->protocol to v6. We enter ip6_rcv_core which
> > tries to use skb_dst(). But the dst is still an IPv4 one left
> > after IPv4 mcast output.
> >
> > Clear the dst in all BPF helpers which change the protocol.
> > Also clear the dst if we did an encap or decap as those
> > will most likely make the dst stale.
> > Try to preserve metadata dsts, those may carry non-routing
> > metadata.
> >
> > Reviewed-by: Maciej Żenczykowski <maze@xxxxxxxxxx>
> > Acked-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
> > Fixes: d219df60a70e ("bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()")
> > Fixes: 1b00e0dfe7d0 ("bpf: update skb->protocol in bpf_skb_net_grow")
> > Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper")
> > Signed-off-by: Jakub Kicinski <kuba@xxxxxxxxxx>

Reviewed-by: Willem de Bruijn <willemb@xxxxxxxxxx>

> > ---
> > v2:
> >  - drop on encap/decap
> >  - fix typo (protcol)
> >  - add the test to the Makefile
> > v1: https://lore.kernel.org/20250604210604.257036-1-kuba@xxxxxxxxxx
> >
> > I wonder if we should not skip ingress (tc_skip_classify?)
> > for looped back packets in the first place. But that doesn't
> > seem robust enough vs multiple redirections to solve the crash.
> >
> > Ignoring LOOPBACK packets (like the NAT46 prog should) doesn't
> > work either, since BPF can change pkt_type arbitrarily.
> >
> > CC: martin.lau@xxxxxxxxx
> > CC: daniel@xxxxxxxxxxxxx
> > CC: john.fastabend@xxxxxxxxx
> > CC: eddyz87@xxxxxxxxx
> > CC: sdf@xxxxxxxxxxx
> > CC: haoluo@xxxxxxxxxx
> > CC: willemb@xxxxxxxxxx
> > CC: william.xuanziyang@xxxxxxxxxx
> > CC: alan.maguire@xxxxxxxxxx
> > CC: bpf@xxxxxxxxxxxxxxx
> > CC: edumazet@xxxxxxxxxx
> > CC: maze@xxxxxxxxxx
> > CC: shuah@xxxxxxxxxx
> > CC: linux-kselftest@xxxxxxxxxxxxxxx
> > CC: yonghong.song@xxxxxxxxx
> > ---
> >  tools/testing/selftests/net/Makefile   |  1 +
> >  net/core/filter.c                      | 31 +++++++++++++++++++-------
> >  tools/testing/selftests/net/nat6to4.sh | 15 +++++++++++++
> >  3 files changed, 39 insertions(+), 8 deletions(-)
> >  create mode 100755 tools/testing/selftests/net/nat6to4.sh
> >
> > diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
> > index ea84b88bcb30..ab996bd22a5f 100644
> > --- a/tools/testing/selftests/net/Makefile
> > +++ b/tools/testing/selftests/net/Makefile
> > @@ -27,6 +27,7 @@ TEST_PROGS += amt.sh
> >  TEST_PROGS += unicast_extensions.sh
> >  TEST_PROGS += udpgro_fwd.sh
> >  TEST_PROGS += udpgro_frglist.sh
> > +TEST_PROGS += nat6to4.sh
> >  TEST_PROGS += veth.sh
> >  TEST_PROGS += ioam6.sh
> >  TEST_PROGS += gro.sh
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 327ca73f9cd7..d5917d6446f2 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -3406,8 +3406,14 @@ BPF_CALL_3(bpf_skb_change_proto, struct sk_buff *, skb, __be16, proto,
> >          * need to be verified first.
> >          */
> >         ret = bpf_skb_proto_xlat(skb, proto);
> > +       if (ret)
> > +               return ret;
> > +
> >         bpf_compute_data_pointers(skb);
> > -       return ret;

I wonder whether that unconditional call to bpf_compute_data_pointers
even if ret was there for a reason.


[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux