On Fri, Aug 15, 2025 at 1:40 AM <chia-yu.chang@xxxxxxxxxxxxxxxxxxx> wrote: > > From: Chia-Yu Chang <chia-yu.chang@xxxxxxxxxxxxxxxxxxx> > > Instead of sending the option in every ACK, limit sending to > those ACKs where the option is necessary: > - Handshake > - "Change-triggered ACK" + the ACK following it. The > 2nd ACK is necessary to unambiguously indicate which > of the ECN byte counters in increasing. The first > ACK has two counters increasing due to the ecnfield > edge. > - ACKs with CE to allow CEP delta validations to take > advantage of the option. > - Force option to be sent every at least once per 2^22 > bytes. The check is done using the bit edges of the > byte counters (avoids need for extra variables). > - AccECN option beacon to send a few times per RTT even if > nothing in the ECN state requires that. The default is 3 > times per RTT, and its period can be set via > sysctl_tcp_ecn_option_beacon. > > Below are the pahole outcomes before and after this patch, > in which the group size of tcp_sock_write_tx is increased > from 89 to 97 due to the new u64 accecn_opt_tstamp member: > > [BEFORE THIS PATCH] > struct tcp_sock { > [...] > u64 tcp_wstamp_ns; /* 2488 8 */ > struct list_head tsorted_sent_queue; /* 2496 16 */ > > [...] > __cacheline_group_end__tcp_sock_write_tx[0]; /* 2521 0 */ > __cacheline_group_begin__tcp_sock_write_txrx[0]; /* 2521 0 */ > u8 nonagle:4; /* 2521: 0 1 */ > u8 rate_app_limited:1; /* 2521: 4 1 */ > /* XXX 3 bits hole, try to pack */ > > /* Force alignment to the next boundary: */ > u8 :0; > u8 received_ce_pending:4;/* 2522: 0 1 */ > u8 unused2:4; /* 2522: 4 1 */ > u8 accecn_minlen:2; /* 2523: 0 1 */ > u8 est_ecnfield:2; /* 2523: 2 1 */ > u8 unused3:4; /* 2523: 4 1 */ > > [...] > __cacheline_group_end__tcp_sock_write_txrx[0]; /* 2628 0 */ > > [...] > /* size: 3200, cachelines: 50, members: 171 */ > } > > [AFTER THIS PATCH] > struct tcp_sock { > [...] > u64 tcp_wstamp_ns; /* 2488 8 */ > u64 accecn_opt_tstamp; /* 2596 8 */ > struct list_head tsorted_sent_queue; /* 2504 16 */ > > [...] > __cacheline_group_end__tcp_sock_write_tx[0]; /* 2529 0 */ > __cacheline_group_begin__tcp_sock_write_txrx[0]; /* 2529 0 */ > u8 nonagle:4; /* 2529: 0 1 */ > u8 rate_app_limited:1; /* 2529: 4 1 */ > /* XXX 3 bits hole, try to pack */ > > /* Force alignment to the next boundary: */ > u8 :0; > u8 received_ce_pending:4;/* 2530: 0 1 */ > u8 unused2:4; /* 2530: 4 1 */ > u8 accecn_minlen:2; /* 2531: 0 1 */ > u8 est_ecnfield:2; /* 2531: 2 1 */ > u8 accecn_opt_demand:2; /* 2531: 4 1 */ > u8 prev_ecnfield:2; /* 2531: 6 1 */ > > [...] > __cacheline_group_end__tcp_sock_write_txrx[0]; /* 2636 0 */ > > [...] > /* size: 3200, cachelines: 50, members: 173 */ > } > > Signed-off-by: Chia-Yu Chang <chia-yu.chang@xxxxxxxxxxxxxxxxxxx> > Co-developed-by: Ilpo Järvinen <ij@xxxxxxxxxx> > Signed-off-by: Ilpo Järvinen <ij@xxxxxxxxxx> > Reviewed-by: Eric Dumazet <edumazet@xxxxxxxxxx>