On Thu, Aug 07, 2025 at 09:37:20AM -0700, Stanislav Fomichev wrote: > On 08/07, Maciej Fijalkowski wrote: > > On Wed, Aug 06, 2025 at 10:42:58PM +0200, Maciej Fijalkowski wrote: > > > On Wed, Aug 06, 2025 at 09:43:53AM -0700, Stanislav Fomichev wrote: > > > > On 08/06, Maciej Fijalkowski wrote: > > > > > Eryk reported an issue that I have put under Closes: tag, related to > > > > > umem addrs being prematurely produced onto pool's completion queue. > > > > > Let us make the skb's destructor responsible for producing all addrs > > > > > that given skb used. > > > > > > > > > > Introduce struct xsk_addrs which will carry descriptor count with array > > > > > of addresses taken from processed descriptors that will be carried via > > > > > skb_shared_info::destructor_arg. This way we can refer to it within > > > > > xsk_destruct_skb(). In order to mitigate the overhead that will be > > > > > coming from memory allocations, let us introduce kmem_cache of xsk_addrs > > > > > onto xdp_sock. Utilize the existing struct hole in xdp_sock for that. > > > > > > > > > > Commit from fixes tag introduced the buggy behavior, it was not broken > > > > > from day 1, but rather when xsk multi-buffer got introduced. > > > > > > > > > > Fixes: b7f72a30e9ac ("xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path") > > > > > Reported-by: Eryk Kubanski <e.kubanski@xxxxxxxxxxxxxxxxxxx> > > > > > Closes: https://lore.kernel.org/netdev/20250530103456.53564-1-e.kubanski@xxxxxxxxxxxxxxxxxxx/ > > > > > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@xxxxxxxxx> > > > > > --- > > > > > v1: > > > > > https://lore.kernel.org/bpf/20250702101648.1942562-1-maciej.fijalkowski@xxxxxxxxx/ > > > > > v2: > > > > > https://lore.kernel.org/bpf/20250705135512.1963216-1-maciej.fijalkowski@xxxxxxxxx/ > > > > > > > > > > v1->v2: > > > > > * store addrs in array carried via destructor_arg instead having them > > > > > stored in skb headroom; cleaner and less hacky approach; > > > > > v2->v3: > > > > > * use kmem_cache for xsk_addrs allocation (Stan/Olek) > > > > > * set err when xsk_addrs allocation fails (Dan) > > > > > * change xsk_addrs layout to avoid holes > > > > > * free xsk_addrs on error path > > > > > * rebase > > > > > --- > > > > > include/net/xdp_sock.h | 1 + > > > > > net/xdp/xsk.c | 94 ++++++++++++++++++++++++++++++++++-------- > > > > > net/xdp/xsk_queue.h | 12 ++++++ > > > > > 3 files changed, 89 insertions(+), 18 deletions(-) > > > > > > > > > (...) > > > > > + xs->xsk_addrs_cache = kmem_cache_create("xsk_generic_xmit_cache", > > > > > + sizeof(struct xsk_addrs), 0, > > > > > + SLAB_HWCACHE_ALIGN, NULL); > > > > > + > > > > > + if (!xs->xsk_addrs_cache) { > > > > > + sk_free(sk); > > > > > + return -ENOMEM; > > > > > + } > > > > > > > > Should we move this up to happen before sk_add_node_rcu? Otherwise we > > > > also have to do sk_del_node_init_rcu on !xs->xsk_addrs_cache here? > > > > > > > > Btw, alternatively, why not make this happen at bind time when we know > > > > whether the socket is gonna be copy or zc? And do it only for the copy > > > > mode? > > > > > > thanks for quick review Stan. makes sense to do it for copy mode only. > > > i'll send next revision tomorrow. > > > > FWIW syzbot reported an issue that "xsk_generic_xmit_cache" exists, so > > probably we should include queue id within name so that each socket gets > > its own cache with unique name. > > Interesting. I was wondering whether it's gonna be confusing to see > multiple "xsk_generic_xmit_cache" entries in /proc/slabinfo, but looks > like it's not allowed :-) I played with this a bit more, side note is that i have not seen these entries in /proc/slabinfo unless i provided SLAB_POISON to kmem_cache_create(). Besides I think a solution where each socket adds its own kmem_cache is not really scalable. In theory, if someone would have such a use case that would require loading copy mode xsk socket per each queue on NIC and there would be multiple NICs that require it on the system, then you get a pretty massive count of kmem_caches. I am not sure what would be the consequences of that. I come up with idea to have these kmem_caches as percpu vars with embedded refcounting. xsk will index these by queue id and at the bind time if kmem_cache under certain id was already created we just bump the refcnt. I'll send a v4 with this implemented and I would appreciate the input on this :) >