On Tue, Jul 15, 2025 at 12:03 AM Stanislav Fomichev <stfomichev@xxxxxxxxx> wrote: > > On 07/13, Jason Xing wrote: > > From: Jason Xing <kernelxing@xxxxxxxxxxx> > > > > For xsk, it's not needed to validate and check the skb in > > validate_xmit_skb_list() in copy mode because xsk_build_skb() doesn't > > and doesn't need to prepare those requisites to validate. Xsk is just > > responsible for delivering raw data from userspace to the driver. > > So the __dev_direct_xmit was taken out of af_packet in commit 865b03f21162 > ("dev: packet: make packet_direct_xmit a common function"). And a call > to validate_xmit_skb_list was added in 104ba78c9880 ("packet: on direct_xmit, > limit tso and csum to supported devices") to support TSO. Since we don't > support tso/vlan offloads in xsk_build_skb, removing validate_xmit_skb_list > seems fair. Right, if you don't mind, I think I will copy&paste your words in the next respin. > > Although, again, if you care about performance, why not use zerocopy > mode? I attached the performance impact because I'm working on the different modes in xsk to see how it really behaves. You can take it as a kind of investigation :) I like zc mode, but the fact is that: 1) with ixgbe driver, my machine could totally lose connection as long as the xsk tries to send packets. I'm still debugging it :( 2) some customers using virtio_net don't have a supported host, so copy mode is the only one choice. > > > Skipping numerous checks somehow contributes to the transmission > > especially in the extremely hot path. > > > > Performance-wise, I used './xdpsock -i enp2s0f0np0 -t -S -s 64' to verify > > the guess and then measured on the machine with ixgbe driver. It stably > > goes up by 5.48%, which can be seen in the shown below: > > Before: > > sock0@enp2s0f0np0:0 txonly xdp-skb > > pps pkts 1.00 > > rx 0 0 > > tx 1,187,410 3,513,536 > > After: > > sock0@enp2s0f0np0:0 txonly xdp-skb > > pps pkts 1.00 > > rx 0 0 > > tx 1,252,590 2,459,456 > > > > This patch also removes total ~4% consumption which can be observed > > by perf: > > |--2.97%--validate_xmit_skb > > | | > > | --1.76%--netif_skb_features > > | | > > | --0.65%--skb_network_protocol > > | > > |--1.06%--validate_xmit_xfrm > > It is a bit surprising that mostly no-op validate_xmit_skb_list takes > 4% of the cycles. netif_skb_features taking ~2%? Any idea why? Is > it unoptimized kernel? Which driver is it? No idea on this one, sorry. I tested with different drivers (like i40e) and it turned out to be nearly the same result. One of my machines looks like this: # lspci -vv | grep -i ether 02:00.0 Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01) 02:00.1 Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01) # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 48 On-line CPU(s) list: 0-47 Vendor ID: GenuineIntel BIOS Vendor ID: Intel(R) Corporation Model name: Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz BIOS Model name: Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz CPU @ 2.3GHz BIOS CPU family: 179 CPU family: 6 Model: 63 Thread(s) per core: 2 Core(s) per socket: 12 Socket(s): 2 Stepping: 2 CPU(s) scaling MHz: 96% CPU max MHz: 3100.0000 CPU min MHz: 1200.0000 BogoMIPS: 4589.31 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_ts c arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm p cid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb intel_ppin ssbd ibrs ibpb stibp tpr_shadow fl expriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts vnmi md_clear flush_l 1d > > > Signed-off-by: Jason Xing <kernelxing@xxxxxxxxxxx> > > --- > > include/linux/netdevice.h | 4 ++-- > > net/core/dev.c | 10 ++++++---- > > net/xdp/xsk.c | 2 +- > > 3 files changed, 9 insertions(+), 7 deletions(-) > > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > > index a80d21a14612..2df44c22406c 100644 > > --- a/include/linux/netdevice.h > > +++ b/include/linux/netdevice.h > > @@ -3351,7 +3351,7 @@ u16 dev_pick_tx_zero(struct net_device *dev, struct sk_buff *skb, > > struct net_device *sb_dev); > > > > int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev); > > -int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id); > > +int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id, bool validate); > > > > static inline int dev_queue_xmit(struct sk_buff *skb) > > { > > @@ -3368,7 +3368,7 @@ static inline int dev_direct_xmit(struct sk_buff *skb, u16 queue_id) > > { > > int ret; > > > > - ret = __dev_direct_xmit(skb, queue_id); > > + ret = __dev_direct_xmit(skb, queue_id, true); > > if (!dev_xmit_complete(ret)) > > kfree_skb(skb); > > return ret; > > Implementation wise, will it be better if we move a call to validate_xmit_skb_list > from __dev_direct_xmit to dev_direct_xmit (and a few other callers of > __dev_direct_xmit)? This will avoid the new flag. __dev_direct_xmit() helper was developed to serve the xsk type a few years ago. For now it has only two callers. If we expect to avoid a new parameter, we will also move the dev check[1] as below to the callers of __dev_direct_xmit(). Then move validate_xmit_skb_list to __dev_direct_xmit(). It's not that concise, I assume? I'm not sure if I miss your point. [1] if (unlikely(!netif_running(dev) || !netif_carrier_ok(dev))) goto drop; Thanks, Jason