On Mon, Jun 30, 2025 at 06:12 PM -07, Cong Wang wrote: > From: Zijian Zhang <zijianzhang@xxxxxxxxxxxxx> > > The TCP_BPF ingress redirection path currently lacks the message corking > mechanism found in standard TCP. This causes the sender to wake up the > receiver for every message, even when messages are small, resulting in > reduced throughput compared to regular TCP in certain scenarios. I'm curious what scenarios are you referring to? Is it send-to-local or ingress-to-local? [1] If the sender is emitting small messages, that's probably intended - that is they likely want to get the message across as soon as possible, because They must have disabled the Nagle algo (set TCP_NODELAY) to do that. Otherwise, you get small segment merging on the sender side by default. And if MTU is a limiting factor, you should also be getting batching from GRO. What I'm getting at is that I don't quite follow why you don't see sufficient batching before the sockmap redirect today? > This change introduces a kernel worker-based intermediate layer to provide > automatic message corking for TCP_BPF. While this adds a slight latency > overhead, it significantly improves overall throughput by reducing > unnecessary wake-ups and reducing the sock lock contention. "Slight" for a +5% increase in latency is an understatement :-) IDK about this being always on for every socket. For send-to-local [1], sk_msg redirs can be viewed as a form of IPC, where latency matters. I do understand that you're trying to optimize for bulk-transfer workloads, but please consider also request-response workloads. [1] https://github.com/jsitnicki/kubecon-2024-sockmap/blob/main/cheatsheet-sockmap-redirect.png > Reviewed-by: Amery Hung <amery.hung@xxxxxxxxxxxxx> > Co-developed-by: Cong Wang <cong.wang@xxxxxxxxxxxxx> > Signed-off-by: Cong Wang <cong.wang@xxxxxxxxxxxxx> > Signed-off-by: Zijian Zhang <zijianzhang@xxxxxxxxxxxxx> > ---