On 8/19/2025 9:44 AM, Jesper Dangaard Brouer wrote: > > > On 19/08/2025 02.38, Jacob Keller wrote: >> >> >> On 8/18/2025 4:05 AM, Jesper Dangaard Brouer wrote: >>> On 15/08/2025 22.41, Tony Nguyen wrote: >>>> This has the advantage that we also no longer need to track or cache the >>>> number of fragments in the rx_ring, which saves a few bytes in the ring. >>>> >>> >>> Have anyone tested the performance impact for XDP_DROP ? >>> (with standard non-multi-buffer frames) >>> >>> Below code change will always touch cache-lines in shared_info area. >>> Before it was guarded with a xdp_buff_has_frags() check. >>> >> >> I did some basic testing with XDP_DROP previously using the xdp-bench >> tool, and do not recall notice an issue. I don't recall the actual >> numbers now though, so I did some quick tests again. >> >> without patch... >> >> Client: >> $ iperf3 -u -c 192.168.93.1 -t86400 -l1200 -P20 -b5G >> ... >> [SUM] 10.00-10.33 sec 626 MBytes 16.0 Gbits/sec 546909 >> >> $ iperf3 -s -B 192.168.93.1%ens260f0 >> [SUM] 0.00-10.00 sec 17.7 GBytes 15.2 Gbits/sec 0.011 ms >> 9712/15888183 (0.061%) receiver >> >> $ xdp-bench drop ens260f0 >> Summary 1,778,935 rx/s 0 err/s >> Summary 2,041,087 rx/s 0 err/s >> Summary 2,005,052 rx/s 0 err/s >> Summary 1,918,967 rx/s 0 err/s >> >> with patch... >> >> Client: >> $ iperf3 -u -c 192.168.93.1 -t86400 -l1200 -P20 -b5G >> ... >> [SUM] 78.00-78.90 sec 2.01 GBytes 19.1 Gbits/sec 1801284 >> >> Server: >> $ iperf3 -s -B 192.168.93.1%ens260f0 >> [SUM] 77.00-78.00 sec 2.14 GBytes 18.4 Gbits/sec 0.012 ms >> 9373/1921186 (0.49%) >> >> xdp-bench: >> $ xdp-bench drop ens260f0 >> Dropping packets on ens260f0 (ifindex 8; driver ice) >> Summary 1,910,918 rx/s 0 err/s >> Summary 1,866,562 rx/s 0 err/s >> Summary 1,901,233 rx/s 0 err/s >> Summary 1,859,854 rx/s 0 err/s >> Summary 1,593,493 rx/s 0 err/s >> Summary 1,891,426 rx/s 0 err/s >> Summary 1,880,673 rx/s 0 err/s >> Summary 1,866,043 rx/s 0 err/s >> Summary 1,872,845 rx/s 0 err/s >> >> >> I ran a few times and it seemed to waffle a bit around 15Gbit/sec to >> 20Gbit/sec, with throughput varying regardless of which patch applied. I >> actually tended to see slightly higher numbers with this fix applied, >> but it was not consistent and hard to measure. >> > > Above testing is not a valid XDP_DROP test. > Fair. I'm no XDP expert, so I have a lot to learn here :) > The packet generator need to be much much faster, as XDP_DROP is for > DDoS protection use-cases (one of Cloudflare's main products). > > I recommend using the script for pktgen in kernel tree: > samples/pktgen/pktgen_sample03_burst_single_flow.sh > > Example: > ./pktgen_sample03_burst_single_flow.sh -vi mlx5p2 -d 198.18.100.1 -m > b4:96:91:ad:0b:09 -t $(nproc) > > >> without the patch: > > On my testlab with CPU: AMD EPYC 9684X (SRSO=IBPB) running: > - sudo ./xdp-bench drop ice4 # (defaults to no-touch) > > XDP_DROP (with no-touch) > Without patch : 54,052,300 rx/s = 18.50 nanosec/packet > With the patch: 33,420,619 rx/s = 29.92 nanosec/packet > Diff: 11.42 nanosec > Oof. Yea, thats not good. > Using perf stat I can see an increase in cache-misses. > > The difference is less, if we read-packet data, running: > - sudo ./xdp-bench drop ice4 --packet-operation read-data > > XDP_DROP (with read-data) > Without patch : 27,200,683 rx/s = 36.76 nanosec/packet > With the patch: 24,348,751 rx/s = 41.07 nanosec/packet > Diff: 4.31 nanosec > > On this CPU we don't have DDIO/DCA, so we take a big hit reading the > packet data in XDP. This will be needed by our DDoS bpf_prog. > The nanosec diff isn't the same, so it seem this change can hide a > little behind the cache-miss in the XDP bpf_prog. > > >> Without xdp-bench running the XDP program, top showed a CPU usage of >> 740% and an ~86 idle score. >> > > We don't want a scaling test for this. For this XDP_DROP/DDoS test we > want to target a single CPU. This is easiest done by generating a single > flow (hint pktgen script is called _single_flow). We want to see a > single CPU running ksoftirqd 100% of the time. > Ok. >> >> I'm not certain, but reading the helpers it might be correct to do >> something like this: >> >> if (unlikely(xdp_buff_has_frags(xdp))) >> nr_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags; >> else >> nr_frags = 1 > > Yes, that looks like a correct pattern. > >> either in the driver code or by adding a new xdp helper function. >> >> I'm not sure its worth it though. We have pending work from our >> development team to refactor ice to use page pool and switch to libeth >> XDP helpers which eliminates all of this driver-specific logic. > > Please do proper testing of XDP_DROP case when doing this change. > >> I don't personally think its worth holding up this series and this >> important memory leak fix for a minor potential code change that I can't >> measure an obvious improvement on. > > IMHO you included an optimization (that wasn't a gain) in a fix patch. > I think you can fix the memory leak without the "optimization" part. > It wasn't intended as an optimization in any case, but me trying to make it easier to keep track of what the driver was doing, but obviously ended up regressing here. @Jakub, @Tony, I guess we'll have to drop this patch from the series, and I'll work on a v2 to avoid this regression. > pw-bot: cr > > --Jesper >
Attachment:
OpenPGP_signature.asc
Description: OpenPGP digital signature