RE: [PATCH v12 net-next 11/15] tcp: accecn: AccECN option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Paolo Abeni <pabeni@xxxxxxxxxx> 
> Sent: Monday, July 14, 2025 4:32 PM
> To: Chia-Yu Chang (Nokia) <chia-yu.chang@xxxxxxxxxxxxxxxxxxx>; edumazet@xxxxxxxxxx; linux-doc@xxxxxxxxxxxxxxx; corbet@xxxxxxx; horms@xxxxxxxxxx; dsahern@xxxxxxxxxx; kuniyu@xxxxxxxxxx; bpf@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; dave.taht@xxxxxxxxx; jhs@xxxxxxxxxxxx; kuba@xxxxxxxxxx; stephen@xxxxxxxxxxxxxxxxxx; xiyou.wangcong@xxxxxxxxx; jiri@xxxxxxxxxxx; davem@xxxxxxxxxxxxx; andrew+netdev@xxxxxxx; donald.hunter@xxxxxxxxx; ast@xxxxxxxxxxx; liuhangbin@xxxxxxxxx; shuah@xxxxxxxxxx; linux-kselftest@xxxxxxxxxxxxxxx; ij@xxxxxxxxxx; ncardwell@xxxxxxxxxx; Koen De Schepper (Nokia) <koen.de_schepper@xxxxxxxxxxxxxxxxxxx>; g.white@xxxxxxxxxxxxx; ingemar.s.johansson@xxxxxxxxxxxx; mirja.kuehlewind@xxxxxxxxxxxx; cheshire@xxxxxxxxx; rs.ietf@xxxxxx; Jason_Livingood@xxxxxxxxxxx; vidhi_goel@xxxxxxxxx
> Subject: Re: [PATCH v12 net-next 11/15] tcp: accecn: AccECN option
> 
> 
> CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
> 
> 
> 
> On 7/4/25 10:53 AM, chia-yu.chang@xxxxxxxxxxxxxxxxxxx wrote:
[...]
> > +}
> > +
> > +/* Handles AccECN option ECT and CE 24-bit byte counters update into
> > + * the u32 value in tcp_sock. As we're processing TCP options, it is
> > + * safe to access from - 1.
> > + */
> > +static inline s32 tcp_update_ecn_bytes(u32 *cnt, const char *from,
> > +                                    u32 init_offset) {
> > +     u32 truncated = (get_unaligned_be32(from - 1) - init_offset) &
> > +                     0xFFFFFFU;
> > +     u32 delta = (truncated - *cnt) & 0xFFFFFFU;
> > +
> > +     /* If delta has the highest bit set (24th bit) indicating
> > +      * negative, sign extend to correct an estimation using
> > +      * sign_extend32(delta, 24 - 1)
> > +      */
> > +     delta = sign_extend32(delta, 23);
> 
> I'm under the impression that delta could be simply:
> 
>         delta = (truncated - *cnt)
> 
> What am I missing?

Hi Paolo,

I think this code is necessary to ensure delta will not a super large value in case of wrap adound.

For instance, if truncated = 0x0000001F and *cnt = 0x00FFFFFF, then (truncated - *cnt) = 0xFF000020

But sign_extend32(((truncated - *cnt) & 0xFFFFFFU, 23) = 0x00000020, which shall be corrrect.

Another example, if truncated = 0x0000001F and *cnt = 0x0000003F, then (truncated - *cnt) = 0xFFFFFFE0

And sign_extend32(((truncated - *cnt) & 0xFFFFFFU, 23) = 0xFFFFFFE0.

In this latter example, both are correct.

[...]
> > a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 
> > d98a1a17eb52..2169fd28594e 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -385,6 +385,7 @@ static inline bool tcp_urg_mode(const struct tcp_sock *tp)
> >  #define OPTION_SMC           BIT(9)
> >  #define OPTION_MPTCP         BIT(10)
> >  #define OPTION_AO            BIT(11)
> > +#define OPTION_ACCECN                BIT(12)
> >
> >  static void smc_options_write(__be32 *ptr, u16 *options)  { @@ -406,6 
> > +407,8 @@ struct tcp_out_options {
> >       u16 mss;                /* 0 to disable */
> >       u8 ws;                  /* window scale, 0 to disable */
> >       u8 num_sack_blocks;     /* number of SACK blocks to include */
> > +     u8 num_accecn_fields:7, /* number of AccECN fields needed */
> > +        use_synack_ecn_bytes:1; /* Use synack_ecn_bytes or not */
> >       u8 hash_size;           /* bytes in hash_location */
> >       u8 bpf_opt_len;         /* length of BPF hdr option */
> >       __u8 *hash_location;    /* temporary pointer, overloaded */
> > @@ -621,6 +624,8 @@ static void tcp_options_write(struct tcphdr *th, struct tcp_sock *tp,
> >                             struct tcp_out_options *opts,
> >                             struct tcp_key *key)  {
> > +     u8 leftover_highbyte = TCPOPT_NOP; /* replace 1st NOP if avail */
> > +     u8 leftover_lowbyte = TCPOPT_NOP;  /* replace 2nd NOP in 
> > + succession */
> >       __be32 *ptr = (__be32 *)(th + 1);
> >       u16 options = opts->options;    /* mungable copy */
> >
> > @@ -656,15 +661,79 @@ static void tcp_options_write(struct tcphdr *th, struct tcp_sock *tp,
> >               *ptr++ = htonl(opts->tsecr);
> >       }
> >
> > +     if (OPTION_ACCECN & options) {
> > +             /* Initial values for AccECN option, ordered is based on ECN field bits
> > +              * similar to received_ecn_bytes. Used for SYN/ACK AccECN option.
> > +              */
> > +             static u32 synack_ecn_bytes[3] = { 0, 0, 0 };
> 
> I think this does not address Eric's concern on v9 WRT global variable, as every CPU will still touch the same memory while accessing the above array.
> 
> > +             const u8 ect0_idx = INET_ECN_ECT_0 - 1;
> > +             const u8 ect1_idx = INET_ECN_ECT_1 - 1;
> > +             const u8 ce_idx = INET_ECN_CE - 1;
> > +             u32 e0b;
> > +             u32 e1b;
> > +             u32 ceb;
> > +             u8 len;
> > +
> > +             if (opts->use_synack_ecn_bytes) {
> > +                     e0b = synack_ecn_bytes[ect0_idx] + TCP_ACCECN_E0B_INIT_OFFSET;
> > +                     e1b = synack_ecn_bytes[ect1_idx] + TCP_ACCECN_E1B_INIT_OFFSET;
> > +                     ceb = synack_ecn_bytes[ce_idx] + 
> > + TCP_ACCECN_CEB_INIT_OFFSET;
> 
> On the flip side I don't see such array modified here, not in later patches?!? If so you could make it const and a global variable would be ok.

Sure, I will make it as static const global variable, which I hope this is ok for you.


> > +/* Calculates how long AccECN option will fit to @remaining option space.
> > + *
> > + * AccECN option can sometimes replace NOPs used for alignment of 
> > +other
> > + * TCP options (up to @max_combine_saving available).
> > + *
> > + * Only solutions with at least @required AccECN fields are accepted.
> > + *
> > + * Returns: The size of the AccECN option excluding space repurposed 
> > +from
> > + * the alignment of the other options.
> > + */
> > +static int tcp_options_fit_accecn(struct tcp_out_options *opts, int required,
> > +                               int remaining) {
> > +     int size = TCP_ACCECN_MAXSIZE;
> > +     int max_combine_saving;
> > +
> > +     if (opts->use_synack_ecn_bytes)
> > +             max_combine_saving = tcp_synack_options_combine_saving(opts);
> > +     else
> > +             max_combine_saving = opts->num_sack_blocks > 0 ? 2 : 0;
> > +     opts->num_accecn_fields = TCP_ACCECN_NUMFIELDS;
> > +     while (opts->num_accecn_fields >= required) {
> > +             int leftover_size = size & 0x3;
> > +             /* Pad to dword if cannot combine */
> > +             if (leftover_size > max_combine_saving)
> > +                     leftover_size = -((4 - leftover_size) & 0x3);
> 
> I *think* that with the above you mean something alike:
> 
>                         size = ALIGN(size, 4);
>                         leftover_size = 0
> 
> ?
> 
> The used code looks quite obscure to me.
> 
> /P

Indeed, I will make below changes in the next version by using ALIGN() and ALIGN_DOWN()

Here the aim is to pad up (if max_combine_saving is not enough) or trim down (if max_combine saving is enough) to DWORD.

And the final return size will be the the a multiple of DWORD.

Would it be more readable?

/* Pad to DWORD if cannot combine. Align_size represents
 * the final size to be used by AccECN options.
 * +======+=============+====================+============+
 * | size | size exceed | max_combine_saving | align_size |
 * |      |    DWORD    |                    |            |
 * +======+=============+====================+============+
 * |   2  |       2     |         < 2        |      4     |
 * |   2  |       2     |         >=2        |      0     |
 * |   5  |       1     |         < 1        |      8     |
 * |   5  |       1     |         >=1        |      4     |
 * |   8  |       0     |         Any        |      8     |
 * |  11  |       3     |         < 3        |     12     |
 * |  11  |       3     |         >=3        |      8     |
 * +======+=============+====================+============+
 */
if ((size & 0x3) > max_combine_saving)
        align_size = ALIGN(size, 4);
else
        align_size = ALIGN_DOWN(size, 4);

if (remaining >= align_size) {
        size = align_size;
        break;
}




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux