> -----Original Message----- > From: Paolo Abeni <pabeni@xxxxxxxxxx> > Sent: Tuesday, June 17, 2025 10:03 AM > To: Chia-Yu Chang (Nokia) <chia-yu.chang@xxxxxxxxxxxxxxxxxxx>; edumazet@xxxxxxxxxx; linux-doc@xxxxxxxxxxxxxxx; corbet@xxxxxxx; horms@xxxxxxxxxx; dsahern@xxxxxxxxxx; kuniyu@xxxxxxxxxx; bpf@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; dave.taht@xxxxxxxxx; jhs@xxxxxxxxxxxx; kuba@xxxxxxxxxx; stephen@xxxxxxxxxxxxxxxxxx; xiyou.wangcong@xxxxxxxxx; jiri@xxxxxxxxxxx; davem@xxxxxxxxxxxxx; andrew+netdev@xxxxxxx; donald.hunter@xxxxxxxxx; ast@xxxxxxxxxxx; liuhangbin@xxxxxxxxx; shuah@xxxxxxxxxx; linux-kselftest@xxxxxxxxxxxxxxx; ij@xxxxxxxxxx; ncardwell@xxxxxxxxxx; Koen De Schepper (Nokia) <koen.de_schepper@xxxxxxxxxxxxxxxxxxx>; g.white@xxxxxxxxxxxxx; ingemar.s.johansson@xxxxxxxxxxxx; mirja.kuehlewind@xxxxxxxxxxxx; cheshire@xxxxxxxxx; rs.ietf@xxxxxx; Jason_Livingood@xxxxxxxxxxx; vidhi_goel@xxxxxxxxx > Cc: Olivier Tilmans (Nokia) <olivier.tilmans@xxxxxxxxx> > Subject: Re: [PATCH v8 net-next 04/15] tcp: AccECN core > > > CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information. > > > > On 6/10/25 2:53 PM, chia-yu.chang@xxxxxxxxxxxxxxxxxxx wrote: > > From: Ilpo Järvinen <ij@xxxxxxxxxx> > > > > This change implements Accurate ECN without negotiation and AccECN > > Option (that will be added by later changes). Based on AccECN > > specifications: > > https://tools.ietf.org/id/draft-ietf-tcpm-accurate-ecn-28.txt > > > > Accurate ECN allows feeding back the number of CE (congestion > > experienced) marks accurately to the sender in contrast to > > RFC3168 ECN that can only signal one marks-seen-yes/no per RTT. > > Congestion control algorithms can take advantage of the accurate ECN > > information to fine-tune their congestion response to avoid drastic > > rate reduction when only mild congestion is encountered. > > > > With Accurate ECN, tp->received_ce (r.cep in AccECN spec) keeps track > > of how many segments have arrived with a CE mark. Accurate ECN uses > > ACE field (ECE, CWR, AE) to communicate the value back to the sender > > which updates tp->delivered_ce (s.cep) based on the feedback. This > > signalling channel is lossy when ACE field overflow occurs. > > > > Conservative strategy is selected here to deal with the ACE overflow, > > however, some strategies using the AccECN option later in the overall > > patchset mitigate against false overflows detected. > > > > The ACE field values on the wire are offset by > > TCP_ACCECN_CEP_INIT_OFFSET. Delivered_ce/received_ce count the real CE > > marks rather than forcing all downstream users to adapt to the wire > > offset. > > > > This patch uses the first 1-byte hole and the last 4-byte hole of the > > tcp_sock_write_txrx for 'received_ce_pending' and 'received_ce'. > > Also, the group size of tcp_sock_write_txrx is increased from > > 91 + 4 to 95 + 4 due to the new u32 received_ce member. Below are the > > trimmed pahole outcomes before and after this patch. > > AFAICS 'received_ce' fills the existing 4 bytes hole, so tcp_sock_write_txrx size should be now (95 + 0), am I missreading something? Hi Paolo, First, thanks for the feedback. Regarding such "+ 4" is due to the byte alignemnt needed for ARM arch. You can see below tha pahole outputs BEFORE and AFTER this patch, in which a 4-byte hole is added before "tcp_clock_cache" to make it aligns from the multiple of 8. And the comment in net/ipv4/tcp.c also explain it: "32bit arches with 8byte alignment on u64 fields might need padding before tcp_clock_cache." Does it make sense to you? [BEFORE PATCH] __u8 __cacheline_group_begin__tcp_sock_write_txrx[0]; /* 1869 0 */ u8 nonagle:4; /* 1869: 0 1 */ u8 rate_app_limited:1; /* 1869: 4 1 */ /* XXX 3 bits hole, try to pack */ /* XXX 2 bytes hole, try to pack */ __be32 pred_flags; /* 1872 4 */ /* XXX 4 bytes hole, try to pack */ u64 tcp_clock_cache; /* 1880 8 */ u64 tcp_mstamp; /* 1888 8 */ u32 rcv_nxt; /* 1896 4 */ u32 snd_nxt; /* 1900 4 */ u32 snd_una; /* 1904 4 */ u32 window_clamp; /* 1908 4 */ u32 srtt_us; /* 1912 4 */ u32 packets_out; /* 1916 4 */ /* --- cacheline 30 boundary (1920 bytes) --- */ u32 snd_up; /* 1920 4 */ u32 delivered; /* 1924 4 */ u32 delivered_ce; /* 1928 4 */ u32 app_limited; /* 1932 4 */ u32 rcv_wnd; /* 1936 4 */ struct tcp_options_received rx_opt; /* 1940 24 */ __u8 __cacheline_group_end__tcp_sock_write_txrx[0]; /* 1964 0 */ [AFTER PATCH] __u8 __cacheline_group_begin__tcp_sock_write_txrx[0]; /* 1869 0 */ u8 nonagle:4; /* 1869: 0 1 */ u8 rate_app_limited:1; /* 1869: 4 1 */ /* XXX 3 bits hole, try to pack */ /* Force alignment to the next boundary: */ u8 :0; u8 received_ce_pending:4; /* 1870: 0 1 */ u8 unused2:4; /* 1870: 4 1 */ /* XXX 1 byte hole, try to pack */ __be32 pred_flags; /* 1872 4 */ /* XXX 4 bytes hole, try to pack */ u64 tcp_clock_cache; /* 1880 8 */ u64 tcp_mstamp; /* 1888 8 */ u32 rcv_nxt; /* 1896 4 */ u32 snd_nxt; /* 1900 4 */ u32 snd_una; /* 1904 4 */ u32 window_clamp; /* 1908 4 */ u32 srtt_us; /* 1912 4 */ u32 packets_out; /* 1916 4 */ /* --- cacheline 30 boundary (1920 bytes) --- */ u32 snd_up; /* 1920 4 */ u32 delivered; /* 1924 4 */ u32 delivered_ce; /* 1928 4 */ u32 received_ce; /* 1932 4 */ u32 app_limited; /* 1936 4 */ u32 rcv_wnd; /* 1940 4 */ struct tcp_options_received rx_opt; /* 1944 24 */ __u8 __cacheline_group_end__tcp_sock_write_txrx[0]; /* 1968 0 */ > > > @@ -384,17 +387,16 @@ static void tcp_data_ecn_check(struct sock *sk, const struct sk_buff *skb) > > if (tcp_ca_needs_ecn(sk)) > > tcp_ca_event(sk, CA_EVENT_ECN_IS_CE); > > > > - if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) { > > + if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR) && > > + tcp_ecn_mode_rfc3168(tp)) { > > /* Better not delay acks, sender can have a very low cwnd */ > > tcp_enter_quickack_mode(sk, 2); > > tp->ecn_flags |= TCP_ECN_DEMAND_CWR; > > } > > - tp->ecn_flags |= TCP_ECN_SEEN; > > It's not clear why you need to move this statement earlier in the code path even for ecn_mode_rfc3168(). Either a comment or > > if (!tcp_ecn_mode_rfc3168(tp)) > break; > > a few lines aboved could help. > > > break; > > default: > > if (tcp_ca_needs_ecn(sk)) > > tcp_ca_event(sk, CA_EVENT_ECN_NO_CE); > > - tp->ecn_flags |= TCP_ECN_SEEN; > > Same here. > > Thanks, > > Paolo OK, will apply the above feedback in the next version and thanks. BRs, Chia-Yu