On Tue, Jul 08, 2025 at 10:49:36AM -0700, Stanislav Fomichev wrote: > On 07/08, Alexander Lobakin wrote: > > From: Maciej Fijalkowski <maciej.fijalkowski@xxxxxxxxx> > > Date: Tue, 8 Jul 2025 16:14:39 +0200 > > > > > On Mon, Jul 07, 2025 at 11:40:48AM -0700, Stanislav Fomichev wrote: > > >> On 07/07, Alexander Lobakin wrote: > > > > [...] > > > > >>> BTW isn't num_descs from that new structure would be the same as > > >>> shinfo->nr_frags + 1 (or just nr_frags for xsk_build_skb_zerocopy())? > > >> > > >> So you're saying we don't need to store it? Agreed. But storing the rest > > >> in cb still might be problematic with kconfig-configurable MAX_SKB_FRAGS? > > > > For sure skb->cb is too small for 17+ u64s. > > > > > > > > Hi Stan & Olek, > > > > > > no, as said in v1 drivers might linearize the skb and all frags will be > > > lost. This storage is needed unfortunately. > > > > Aaah sorry. In this case yeah, you need this separate frag count. > > > > > > > >> > > >>>> Can we pre-allocate an array of xsk_addrs during xsk_bind (the number of > > >>>> xsk_addrs is bound by the tx ring size)? Then we can remove the alloc on tx > > >>>> and replace it with some code to manage that pool of xsk_addrs.. > > > > > > That would be pool-bound which makes it a shared resource so I believe > > > that we would repeat the problem being fixed here ;) > > > > Except the system Page Pool idea right below maybe :> > > It doesn't have to be a shared resource, the pool (in whatever form) can be > per xsk. (unless I'm missing something) It can not. when you attach multiple xsks to same {dev,qid} tuple pool is shared. https://elixir.bootlin.com/linux/v6.16-rc5/source/net/xdp/xsk.c#L1249 Not sure right now if we could store the pointer to xsk_addrs in xdp_sock maybe. Let me get back to this after my time off. BTW I didn't realize you added yourself as xsk reviewer. Would be nice to have CCed Magnus or me when doing so :P > > > >>> Nice idea BTW. > > >>> > > >>> We could even use system per-cpu Page Pools to allocate these structs* > > >>> :D It wouldn't waste 1 page per one struct as PP is frag-aware and has > > >>> API for allocating only a small frag. > > >>> > > >>> Headroom stuff was also ok to me: we either way allocate a new skb, so > > >>> we could allocate it with a bit bigger headroom and put that table there > > >>> being sure that nobody will overwrite it (some drivers insert special > > >>> headers or descriptors in front of the actual skb->data). > > > > > > headroom approach was causing one of bpf selftests to fail, but I didn't > > > check in-depth the reason. I didn't really like the check in destructor if > > > addr array was corrupted in v1 and I came up with v2 which seems to me a > > > cleaner fix. > > > > > >>> > > >>> [*] Offtop: we could also use system PP to allocate skbs in > > >>> xsk_build_skb() just like it's done in xdp_build_skb_from_zc() + > > >>> xdp_copy_frags_from_zc() -- no way to avoid memcpy(), but the payload > > >>> buffers would be recycled then. > > >> > > >> Or maybe kmem_cache_alloc_node with a custom cache is good enough? > > >> Headroom also feels ok if we store the whole xsk_addrs struct in it. > > > > > > Yep both of these approaches was something I considered, but keep in mind > > > it's a bugfix so I didn't want to go with something flashy. I have not > > > observed big performance impact but I checked only MAX_SKB_FRAGS being set > > > to standard value. > > > > > > Would you guys be ok if I do the follow-up with possible optimization > > > after my vacation which would be a -next candidate? > > > > As a fix, it's totally fine for me to go in the current form, sure. > > +1