Re: [PATCH net-next v10 1/8] hinic3: Async Event Queue interfaces

Simon Horman <horms@xxxxxxxxxx> · Thu, 31 Jul 2025 15:04:04 +0100

On Thu, Jul 31, 2025 at 03:58:39PM +0300, Gur Stavi wrote:
> > On Thu, Jul 24, 2025 at 09:45:51PM +0800, Fan Gong wrote:
> > > > > +
> > > > > +/* Data provided to/by cmdq is arranged in structs with little endian fields but
> > > > > + * every dword (32bits) should be swapped since HW swaps it again when it
> > > > > + * copies it from/to host memory.
> > > > > + */
> > > >
> > > > This scheme may work on little endian hosts.
> > > > But if so it seems unlikely to work on big endian hosts.
> > > >
> > > > I expect you want be32_to_cpu_array() for data coming from hw,
> > > > with a source buffer as an array of __be32 while
> > > > the destination buffer is an array of u32.
> > > >
> > > > And cpu_to_be32_array() for data going to the hw,
> > > > with the types of the source and destination buffers reversed.
> > > >
> > > > If those types don't match your data, then we have
> > > > a framework to have that discussion.
> > > >
> > > >
> > > > That said, it is more usual for drivers to keep structures in the byte
> > > > order they are received. Stored in structures with members with types, in
> > > > this case it seems that would be __be32, and accessed using a combination
> > > > of BIT/GENMASK, FIELD_PREP/FIELD_GET, and cpu_to_be*/be*_to_cpu (in this
> > > > case cpu_to_be32/be32_to_cpu).
> > > >
> > > > An advantage of this approach is that the byte order of
> > > > data is only changed when needed. Another is that it is clear
> > > > what the byte order of data is.
> > >
> > > There is a simplified example:
> > >
> > > Here is a 64 bit little endian that may appear in cmdq:
> > > __le64 x
> > > After the swap it will become:
> > > __be32 x_lo
> > > __be32 x_hi
> > > This is NOT __be64.
> > > __be64 looks like this:
> > > __be32 x_hi
> > > __be32 x_lo
> >
> > Sure, byte swapping 64 bit entities is different to byte swapping two
> > consecutive 32 bit entities. I completely agree.
> >
> > >
> > > So the swapped data by HW is neither BE or LE. In this case, we should use
> > > swab32 to obtain the correct LE data because our driver currently supports LE.
> > > This is for compensating for bad HW decisions.
> >
> > Let us assume that the host is reading data provided by HW.
> >
> > If the swab32 approach works on a little endian host
> > to allow the host to access 32-bit values in host byte order.
> > Then this is because it outputs a 32-bit little endian values
> 
> Values can be any size. 32 bit is arbitrary.
> .
> >
> > But, given the same input, it will not work on a big endian host.
> > This is because the same little endian output will be produced,
> > while the host byte order is big endian.
> >
> > I think you need something based on be32_to_cpu()/cpu_to_be32().
> > This will effectively be swab32 on little endian hosts (no change!).
> > And a no-op on big endian hosts (addressing my point above).
> >
> > More specifically, I think you should use be32_to_cpu_array() and
> > cpu_to_be32_array() instead of swab32_array().
> >
> 
> Lets define a "coherent struct" as a structure made of fields that makes sense
> to human beings. Every field endianity is defined and fields are arranged in
> order that "makes sense". Fields can be of any integer size 8,16,32,64 and not
> necessarily naturally aligned.
> 
> swab32_array transforms a coherent struct into "byte jumble". Small fields are
> reordered and larger (misaligned) fields may be split into 2 (or even 3) parts.
> swab32_array is reversible so a 2nd call with byte jumble as input will produce
> the original coherent struct.
> 
> hinic3 dma has "swab32_array" built in.
> On send-to-device it expects a byte jubmle so the DMA engine will transform it
> into a coherent struct.
> On receive-from-device it provides a byte jumble so the driver needs
> to call swab32_array to transform it into a coherent struct.
> 
> The hinic3_cmdq_buf_swab32 function will work correctly, producing byte jumble,
> on little endian and big endian hosts.
> 
> The code that runs prior to hinic3_cmdq_buf_swab32 that initializes a coherent
> struct is endianity sensitive. It needs to initialize fields based on their
> coherent endianity with or without byte swap. Practically use cpu_to_le or
> cpu_to_be based on the coherent definition.
> 
> Specifically, cmdq "coherent structs" in hinic3 use little endian and since
> Kconfig currently declares that big endian hosts are not supported then
> coherent structs are initialized without explicit cpu_to_le macros.
> 
> And this is what the comment says:
> 
> /* Data provided to/by cmdq is arranged in structs with little endian fields but
>  * every dword (32bits) should be swapped since HW swaps it again when it
>  * copies it from/to host memory.
>  */
> 

Thanks, I think I am closer to understanding things now.

Let me try and express things in my own words:

1. On the hardware side, things are stored in a way that may be represented
   as structures with little-endian values. The members of the structures may
   have different sizes: 8-bit, 16-bit, 32-bit, ...

2. The hardware runs the equivalent of swab32_array() over this data
   when writing it to (or reading it from) the host. So we get a
   "byte jumble".

3. In this patch, the hinic3_cmdq_buf_swab32 reverses this jumbling
   by running he equivalent of swab32_array() over this data again.

   As 3 exactly reverses 2, what is left are structures exactly as in 1.

If so, I agree this makes sense and I am sorry for missing this before.

And if so, is the intention for the cmdq "coherent structs" in the driver
to look something like this.

   struct {
	u8 a;
	u8 b;
	__le16 c;
	__le32 d;
   };

If so, this seems sensible to me.

But I think it would be best so include some code in this patchset
that makes use of such structures - sorry if it is there, I couldn't find
it just now.

And, although there is no intention for the driver to run on big endian
systems, the __le* fields should be accessed using cpu_to_le*/le*_to_cpu
helpers.