Re: [Patch bpf-next v4 4/4] tcp_bpf: improve ingress redirection performance with message corking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 30, 2025 at 06:12 PM -07, Cong Wang wrote:
> From: Zijian Zhang <zijianzhang@xxxxxxxxxxxxx>
>
> The TCP_BPF ingress redirection path currently lacks the message corking
> mechanism found in standard TCP. This causes the sender to wake up the
> receiver for every message, even when messages are small, resulting in
> reduced throughput compared to regular TCP in certain scenarios.

I'm curious what scenarios are you referring to? Is it send-to-local or
ingress-to-local? [1]

If the sender is emitting small messages, that's probably intended -
that is they likely want to get the message across as soon as possible,
because They must have disabled the Nagle algo (set TCP_NODELAY) to do
that.

Otherwise, you get small segment merging on the sender side by default.
And if MTU is a limiting factor, you should also be getting batching
from GRO.

What I'm getting at is that I don't quite follow why you don't see
sufficient batching before the sockmap redirect today?

> This change introduces a kernel worker-based intermediate layer to provide
> automatic message corking for TCP_BPF. While this adds a slight latency
> overhead, it significantly improves overall throughput by reducing
> unnecessary wake-ups and reducing the sock lock contention.

"Slight" for a +5% increase in latency is an understatement :-)

IDK about this being always on for every socket. For send-to-local
[1], sk_msg redirs can be viewed as a form of IPC, where latency
matters.

I do understand that you're trying to optimize for bulk-transfer
workloads, but please consider also request-response workloads.

[1] https://github.com/jsitnicki/kubecon-2024-sockmap/blob/main/cheatsheet-sockmap-redirect.png

> Reviewed-by: Amery Hung <amery.hung@xxxxxxxxxxxxx>
> Co-developed-by: Cong Wang <cong.wang@xxxxxxxxxxxxx>
> Signed-off-by: Cong Wang <cong.wang@xxxxxxxxxxxxx>
> Signed-off-by: Zijian Zhang <zijianzhang@xxxxxxxxxxxxx>
> ---




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux