Re: [PATCH v2 02/10] net: add skb_crc32c()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/19/25 19:50, Eric Biggers wrote:
From: Eric Biggers <ebiggers@xxxxxxxxxx>

Add skb_crc32c(), which calculates the CRC32C of a sk_buff.  It will
replace __skb_checksum(), which unnecessarily supports arbitrary
checksums.  Compared to __skb_checksum(), skb_crc32c():

    - Uses the correct type for CRC32C values (u32, not __wsum).

    - Does not require the caller to provide a skb_checksum_ops struct.

    - Is faster because it does not use indirect calls and does not use
      the very slow crc32c_combine().

According to commit 2817a336d4d5 ("net: skb_checksum: allow custom
update/combine for walking skb") which added __skb_checksum(), the
original motivation for the abstraction layer was to avoid code
duplication for CRC32C and other checksums in the future.  However:

    - No additional checksums showed up after CRC32C.  __skb_checksum()
      is only used with the "regular" net checksum and CRC32C.

    - Indirect calls are expensive.  Commit 2544af0344ba ("net: avoid
      indirect calls in L4 checksum calculation") worked around this
      using the INDIRECT_CALL_1 macro. But that only avoided the indirect
      call for the net checksum, and at the cost of an extra branch.

    - The checksums use different types (__wsum and u32), causing casts
      to be needed.

    - It made the checksums of fragments be combined (rather than
      chained) for both checksums, despite this being highly
      counterproductive for CRC32C due to how slow crc32c_combine() is.
      This can clearly be seen in commit 4c2f24549644 ("sctp: linearize
      early if it's not GSO") which tried to work around this performance
      bug.  With a dedicated function for each checksum, we can instead
      just use the proper strategy for each checksum.

As shown by the following tables, the new function skb_crc32c() is
faster than __skb_checksum(), with the improvement varying greatly from
5% to 2500% depending on the case.  The largest improvements come from
fragmented packets, mainly due to eliminating the inefficient
crc32c_combine().  But linear packets are improved too, especially
shorter ones, mainly due to eliminating indirect calls.  These
benchmarks were done on AMD Zen 5.  On that CPU, Linux uses IBRS instead
of retpoline; an even greater improvement might be seen with retpoline:

     Linear sk_buffs

         Length in bytes    __skb_checksum cycles    skb_crc32c cycles
         ===============    =====================    =================
                      64                       43                   18
                     256                       94                   77
                    1420                      204                  161
                   16384                     1735                 1642

     Nonlinear sk_buffs (even split between head and one fragment)

         Length in bytes    __skb_checksum cycles    skb_crc32c cycles
         ===============    =====================    =================
                      64                      579                   22
                     256                      829                   77
                    1420                     1506                  194
                   16384                     4365                 1682

Signed-off-by: Eric Biggers <ebiggers@xxxxxxxxxx>
---
  include/linux/skbuff.h |  1 +
  net/core/skbuff.c      | 73 ++++++++++++++++++++++++++++++++++++++++++
  2 files changed, 74 insertions(+)

Reviewed-by: Hannes Reinecke <hare@xxxxxxx>

Cheers,

Hannes
--
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@xxxxxxx                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich




[Index of Archives]     [Linux Networking Development]     [Linux OMAP]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     SCTP

  Powered by Linux