On 2025-08-14 20:24:37, Jesper Dangaard Brouer wrote: > When running an XDP bpf_prog on the remote CPU in cpumap code > then we must disable the direct return optimization that > xdp_return can perform for mem_type page_pool. This optimization > assumes code is still executing under RX-NAPI of the original > receiving CPU, which isn't true on this remote CPU. > > The cpumap code already disabled this via helpers > xdp_set_return_frame_no_direct() and xdp_clear_return_frame_no_direct(), > but the scope didn't include xdp_do_flush(). > > When doing XDP_REDIRECT towards e.g devmap this causes the > function bq_xmit_all() to run with direct return optimization > enabled. This can lead to hard to find bugs. The issue > only happens when bq_xmit_all() cannot ndo_xdp_xmit all > frames and them frees them via xdp_return_frame_rx_napi(). > > Fix by expanding scope to include xdp_do_flush(). > > Fixes: 11941f8a8536 ("bpf: cpumap: Implement generic cpumap") > Found-by: Dragos Tatulea <dtatulea@xxxxxxxxxx> > Reported-by: Chris Arges <carges@xxxxxxxxxxxxxx> > Signed-off-by: Jesper Dangaard Brouer <hawk@xxxxxxxxxx> > --- > kernel/bpf/cpumap.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c > index b2b7b8ec2c2a..c46360b27871 100644 > --- a/kernel/bpf/cpumap.c > +++ b/kernel/bpf/cpumap.c > @@ -186,7 +186,6 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu, > struct xdp_buff xdp; > int i, nframes = 0; > > - xdp_set_return_frame_no_direct(); > xdp.rxq = &rxq; > > for (i = 0; i < n; i++) { > @@ -231,7 +230,6 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu, > } > } > > - xdp_clear_return_frame_no_direct(); > stats->pass += nframes; > > return nframes; > @@ -255,6 +253,7 @@ static void cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu, void **frames, > > rcu_read_lock(); > bpf_net_ctx = bpf_net_ctx_set(&__bpf_net_ctx); > + xdp_set_return_frame_no_direct(); > > ret->xdp_n = cpu_map_bpf_prog_run_xdp(rcpu, frames, ret->xdp_n, stats); > if (unlikely(ret->skb_n)) > @@ -264,6 +263,7 @@ static void cpu_map_bpf_prog_run(struct bpf_cpu_map_entry *rcpu, void **frames, > if (stats->redirect) > xdp_do_flush(); > > + xdp_clear_return_frame_no_direct(); > bpf_net_ctx_clear(bpf_net_ctx); > rcu_read_unlock(); > > > FWIW, I tested this patch and could no longer reproduce the original issue. Tested-By: Chris Arges <carges@xxxxxxxxxxxxxx> --chris