On Thu, Jul 17, 2025 at 10:52 AM Willem de Bruijn <willemdebruijn.kernel@xxxxxxxxx> wrote: > > Jason Xing wrote: > > On Thu, Jul 17, 2025 at 8:52 AM Jakub Kicinski <kuba@xxxxxxxxxx> wrote: > > > > > > On Thu, 17 Jul 2025 08:06:48 +0800 Jason Xing wrote: > > > > To be honest, this patch really only does one thing as the commit > > > > says. It might look very complex, but if readers take a deep look they > > > > will find only one removal of that validation for xsk in the hot path. > > > > Nothing more and nothing less. So IMHO, it doesn't bring more complex > > > > codes here. > > > > > > > > And removal of one validation indeed contributes to the transmission. > > > > I believe there remain a number of applications using copy mode > > > > currently. And maintainers of xsk don't regard copy mode as orphaned, > > > > right? > > > > > > First of all, I'm not sure the patch is correct. The XSK skbs can have > > > frags, if device doesn't support or clears _SG we should linearize, > > > right? > > > > But note that there is one more function __skb_linearize() after > > skb_needs_linearize() in the validate_xmit_skb(). __skb_linearize() > > tests many members of skbs, which are not used to check the skbs from > > xsk. For xsk, it's very simple (please see xsk_build_skb()) > > For single frame xsk skb_needs_linearize will be false and thus > __skb_linearize is not called? > > More generally, I would also think that the cost of the > validate_xmit_skb checks are quite cheap in the xsk case where they > are all false. On the assumption that the touched cachelines are > likely warm. > > > > > > > Second, we don't understand where the win is coming from, the numbers > > > you share are a bit vague. What's so expensive about a few skbs > > > > To be more accurate, it's not "a few" but "so many" because of the > > high pps reaching more than 1,000,000. So if people run the xdpsock to > > test it, it's not hard to see most of time is spent during the skb > > allocation process. > > Right, the alloc or memcpy more than the validate? Thanks for chiming in. Sure thing. Validate only takes 4% total time, which could be easily observed by using perf. The story behind the patch is that I was scanning the code and found the validation is not necessary based on the theory instead of experiment, _then_ I tried xdpsock to see if any performance impact and used perf to capture the hot spot. And I don't think I can find any other useful improvement whether in copy mode or zc mode after finishing investigation so far. Last time, I mentioned that I tried two out-of-thin-air approaches[1]. But everything didn't go as well as expected... [1]: https://lore.kernel.org/all/CAL+tcoAn8ADUGARSzZB=5dGoa+Kh7HnNBLxyqTa3W6tOhUK-sg@xxxxxxxxxxxxxx/ Thanks, Jason