> -----Original Message----- > From: Paolo Abeni <pabeni@xxxxxxxxxx> > Sent: Monday, July 14, 2025 4:32 PM > To: Chia-Yu Chang (Nokia) <chia-yu.chang@xxxxxxxxxxxxxxxxxxx>; edumazet@xxxxxxxxxx; linux-doc@xxxxxxxxxxxxxxx; corbet@xxxxxxx; horms@xxxxxxxxxx; dsahern@xxxxxxxxxx; kuniyu@xxxxxxxxxx; bpf@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; dave.taht@xxxxxxxxx; jhs@xxxxxxxxxxxx; kuba@xxxxxxxxxx; stephen@xxxxxxxxxxxxxxxxxx; xiyou.wangcong@xxxxxxxxx; jiri@xxxxxxxxxxx; davem@xxxxxxxxxxxxx; andrew+netdev@xxxxxxx; donald.hunter@xxxxxxxxx; ast@xxxxxxxxxxx; liuhangbin@xxxxxxxxx; shuah@xxxxxxxxxx; linux-kselftest@xxxxxxxxxxxxxxx; ij@xxxxxxxxxx; ncardwell@xxxxxxxxxx; Koen De Schepper (Nokia) <koen.de_schepper@xxxxxxxxxxxxxxxxxxx>; g.white@xxxxxxxxxxxxx; ingemar.s.johansson@xxxxxxxxxxxx; mirja.kuehlewind@xxxxxxxxxxxx; cheshire@xxxxxxxxx; rs.ietf@xxxxxx; Jason_Livingood@xxxxxxxxxxx; vidhi_goel@xxxxxxxxx > Subject: Re: [PATCH v12 net-next 11/15] tcp: accecn: AccECN option > > > CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information. > > > > On 7/4/25 10:53 AM, chia-yu.chang@xxxxxxxxxxxxxxxxxxx wrote: [...] > > +} > > + > > +/* Handles AccECN option ECT and CE 24-bit byte counters update into > > + * the u32 value in tcp_sock. As we're processing TCP options, it is > > + * safe to access from - 1. > > + */ > > +static inline s32 tcp_update_ecn_bytes(u32 *cnt, const char *from, > > + u32 init_offset) { > > + u32 truncated = (get_unaligned_be32(from - 1) - init_offset) & > > + 0xFFFFFFU; > > + u32 delta = (truncated - *cnt) & 0xFFFFFFU; > > + > > + /* If delta has the highest bit set (24th bit) indicating > > + * negative, sign extend to correct an estimation using > > + * sign_extend32(delta, 24 - 1) > > + */ > > + delta = sign_extend32(delta, 23); > > I'm under the impression that delta could be simply: > > delta = (truncated - *cnt) > > What am I missing? Hi Paolo, I think this code is necessary to ensure delta will not a super large value in case of wrap adound. For instance, if truncated = 0x0000001F and *cnt = 0x00FFFFFF, then (truncated - *cnt) = 0xFF000020 But sign_extend32(((truncated - *cnt) & 0xFFFFFFU, 23) = 0x00000020, which shall be corrrect. Another example, if truncated = 0x0000001F and *cnt = 0x0000003F, then (truncated - *cnt) = 0xFFFFFFE0 And sign_extend32(((truncated - *cnt) & 0xFFFFFFU, 23) = 0xFFFFFFE0. In this latter example, both are correct. [...] > > a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index > > d98a1a17eb52..2169fd28594e 100644 > > --- a/net/ipv4/tcp_output.c > > +++ b/net/ipv4/tcp_output.c > > @@ -385,6 +385,7 @@ static inline bool tcp_urg_mode(const struct tcp_sock *tp) > > #define OPTION_SMC BIT(9) > > #define OPTION_MPTCP BIT(10) > > #define OPTION_AO BIT(11) > > +#define OPTION_ACCECN BIT(12) > > > > static void smc_options_write(__be32 *ptr, u16 *options) { @@ -406,6 > > +407,8 @@ struct tcp_out_options { > > u16 mss; /* 0 to disable */ > > u8 ws; /* window scale, 0 to disable */ > > u8 num_sack_blocks; /* number of SACK blocks to include */ > > + u8 num_accecn_fields:7, /* number of AccECN fields needed */ > > + use_synack_ecn_bytes:1; /* Use synack_ecn_bytes or not */ > > u8 hash_size; /* bytes in hash_location */ > > u8 bpf_opt_len; /* length of BPF hdr option */ > > __u8 *hash_location; /* temporary pointer, overloaded */ > > @@ -621,6 +624,8 @@ static void tcp_options_write(struct tcphdr *th, struct tcp_sock *tp, > > struct tcp_out_options *opts, > > struct tcp_key *key) { > > + u8 leftover_highbyte = TCPOPT_NOP; /* replace 1st NOP if avail */ > > + u8 leftover_lowbyte = TCPOPT_NOP; /* replace 2nd NOP in > > + succession */ > > __be32 *ptr = (__be32 *)(th + 1); > > u16 options = opts->options; /* mungable copy */ > > > > @@ -656,15 +661,79 @@ static void tcp_options_write(struct tcphdr *th, struct tcp_sock *tp, > > *ptr++ = htonl(opts->tsecr); > > } > > > > + if (OPTION_ACCECN & options) { > > + /* Initial values for AccECN option, ordered is based on ECN field bits > > + * similar to received_ecn_bytes. Used for SYN/ACK AccECN option. > > + */ > > + static u32 synack_ecn_bytes[3] = { 0, 0, 0 }; > > I think this does not address Eric's concern on v9 WRT global variable, as every CPU will still touch the same memory while accessing the above array. > > > + const u8 ect0_idx = INET_ECN_ECT_0 - 1; > > + const u8 ect1_idx = INET_ECN_ECT_1 - 1; > > + const u8 ce_idx = INET_ECN_CE - 1; > > + u32 e0b; > > + u32 e1b; > > + u32 ceb; > > + u8 len; > > + > > + if (opts->use_synack_ecn_bytes) { > > + e0b = synack_ecn_bytes[ect0_idx] + TCP_ACCECN_E0B_INIT_OFFSET; > > + e1b = synack_ecn_bytes[ect1_idx] + TCP_ACCECN_E1B_INIT_OFFSET; > > + ceb = synack_ecn_bytes[ce_idx] + > > + TCP_ACCECN_CEB_INIT_OFFSET; > > On the flip side I don't see such array modified here, not in later patches?!? If so you could make it const and a global variable would be ok. Sure, I will make it as static const global variable, which I hope this is ok for you. > > +/* Calculates how long AccECN option will fit to @remaining option space. > > + * > > + * AccECN option can sometimes replace NOPs used for alignment of > > +other > > + * TCP options (up to @max_combine_saving available). > > + * > > + * Only solutions with at least @required AccECN fields are accepted. > > + * > > + * Returns: The size of the AccECN option excluding space repurposed > > +from > > + * the alignment of the other options. > > + */ > > +static int tcp_options_fit_accecn(struct tcp_out_options *opts, int required, > > + int remaining) { > > + int size = TCP_ACCECN_MAXSIZE; > > + int max_combine_saving; > > + > > + if (opts->use_synack_ecn_bytes) > > + max_combine_saving = tcp_synack_options_combine_saving(opts); > > + else > > + max_combine_saving = opts->num_sack_blocks > 0 ? 2 : 0; > > + opts->num_accecn_fields = TCP_ACCECN_NUMFIELDS; > > + while (opts->num_accecn_fields >= required) { > > + int leftover_size = size & 0x3; > > + /* Pad to dword if cannot combine */ > > + if (leftover_size > max_combine_saving) > > + leftover_size = -((4 - leftover_size) & 0x3); > > I *think* that with the above you mean something alike: > > size = ALIGN(size, 4); > leftover_size = 0 > > ? > > The used code looks quite obscure to me. > > /P Indeed, I will make below changes in the next version by using ALIGN() and ALIGN_DOWN() Here the aim is to pad up (if max_combine_saving is not enough) or trim down (if max_combine saving is enough) to DWORD. And the final return size will be the the a multiple of DWORD. Would it be more readable? /* Pad to DWORD if cannot combine. Align_size represents * the final size to be used by AccECN options. * +======+=============+====================+============+ * | size | size exceed | max_combine_saving | align_size | * | | DWORD | | | * +======+=============+====================+============+ * | 2 | 2 | < 2 | 4 | * | 2 | 2 | >=2 | 0 | * | 5 | 1 | < 1 | 8 | * | 5 | 1 | >=1 | 4 | * | 8 | 0 | Any | 8 | * | 11 | 3 | < 3 | 12 | * | 11 | 3 | >=3 | 8 | * +======+=============+====================+============+ */ if ((size & 0x3) > max_combine_saving) align_size = ALIGN(size, 4); else align_size = ALIGN_DOWN(size, 4); if (remaining >= align_size) { size = align_size; break; }