On Thu, Jul 31, 2025 at 03:58:39PM +0300, Gur Stavi wrote: > > On Thu, Jul 24, 2025 at 09:45:51PM +0800, Fan Gong wrote: > > > > > + > > > > > +/* Data provided to/by cmdq is arranged in structs with little endian fields but > > > > > + * every dword (32bits) should be swapped since HW swaps it again when it > > > > > + * copies it from/to host memory. > > > > > + */ > > > > > > > > This scheme may work on little endian hosts. > > > > But if so it seems unlikely to work on big endian hosts. > > > > > > > > I expect you want be32_to_cpu_array() for data coming from hw, > > > > with a source buffer as an array of __be32 while > > > > the destination buffer is an array of u32. > > > > > > > > And cpu_to_be32_array() for data going to the hw, > > > > with the types of the source and destination buffers reversed. > > > > > > > > If those types don't match your data, then we have > > > > a framework to have that discussion. > > > > > > > > > > > > That said, it is more usual for drivers to keep structures in the byte > > > > order they are received. Stored in structures with members with types, in > > > > this case it seems that would be __be32, and accessed using a combination > > > > of BIT/GENMASK, FIELD_PREP/FIELD_GET, and cpu_to_be*/be*_to_cpu (in this > > > > case cpu_to_be32/be32_to_cpu). > > > > > > > > An advantage of this approach is that the byte order of > > > > data is only changed when needed. Another is that it is clear > > > > what the byte order of data is. > > > > > > There is a simplified example: > > > > > > Here is a 64 bit little endian that may appear in cmdq: > > > __le64 x > > > After the swap it will become: > > > __be32 x_lo > > > __be32 x_hi > > > This is NOT __be64. > > > __be64 looks like this: > > > __be32 x_hi > > > __be32 x_lo > > > > Sure, byte swapping 64 bit entities is different to byte swapping two > > consecutive 32 bit entities. I completely agree. > > > > > > > > So the swapped data by HW is neither BE or LE. In this case, we should use > > > swab32 to obtain the correct LE data because our driver currently supports LE. > > > This is for compensating for bad HW decisions. > > > > Let us assume that the host is reading data provided by HW. > > > > If the swab32 approach works on a little endian host > > to allow the host to access 32-bit values in host byte order. > > Then this is because it outputs a 32-bit little endian values > > Values can be any size. 32 bit is arbitrary. > . > > > > But, given the same input, it will not work on a big endian host. > > This is because the same little endian output will be produced, > > while the host byte order is big endian. > > > > I think you need something based on be32_to_cpu()/cpu_to_be32(). > > This will effectively be swab32 on little endian hosts (no change!). > > And a no-op on big endian hosts (addressing my point above). > > > > More specifically, I think you should use be32_to_cpu_array() and > > cpu_to_be32_array() instead of swab32_array(). > > > > Lets define a "coherent struct" as a structure made of fields that makes sense > to human beings. Every field endianity is defined and fields are arranged in > order that "makes sense". Fields can be of any integer size 8,16,32,64 and not > necessarily naturally aligned. > > swab32_array transforms a coherent struct into "byte jumble". Small fields are > reordered and larger (misaligned) fields may be split into 2 (or even 3) parts. > swab32_array is reversible so a 2nd call with byte jumble as input will produce > the original coherent struct. > > hinic3 dma has "swab32_array" built in. > On send-to-device it expects a byte jubmle so the DMA engine will transform it > into a coherent struct. > On receive-from-device it provides a byte jumble so the driver needs > to call swab32_array to transform it into a coherent struct. > > The hinic3_cmdq_buf_swab32 function will work correctly, producing byte jumble, > on little endian and big endian hosts. > > The code that runs prior to hinic3_cmdq_buf_swab32 that initializes a coherent > struct is endianity sensitive. It needs to initialize fields based on their > coherent endianity with or without byte swap. Practically use cpu_to_le or > cpu_to_be based on the coherent definition. > > Specifically, cmdq "coherent structs" in hinic3 use little endian and since > Kconfig currently declares that big endian hosts are not supported then > coherent structs are initialized without explicit cpu_to_le macros. > > And this is what the comment says: > > /* Data provided to/by cmdq is arranged in structs with little endian fields but > * every dword (32bits) should be swapped since HW swaps it again when it > * copies it from/to host memory. > */ > Thanks, I think I am closer to understanding things now. Let me try and express things in my own words: 1. On the hardware side, things are stored in a way that may be represented as structures with little-endian values. The members of the structures may have different sizes: 8-bit, 16-bit, 32-bit, ... 2. The hardware runs the equivalent of swab32_array() over this data when writing it to (or reading it from) the host. So we get a "byte jumble". 3. In this patch, the hinic3_cmdq_buf_swab32 reverses this jumbling by running he equivalent of swab32_array() over this data again. As 3 exactly reverses 2, what is left are structures exactly as in 1. If so, I agree this makes sense and I am sorry for missing this before. And if so, is the intention for the cmdq "coherent structs" in the driver to look something like this. struct { u8 a; u8 b; __le16 c; __le32 d; }; If so, this seems sensible to me. But I think it would be best so include some code in this patchset that makes use of such structures - sorry if it is there, I couldn't find it just now. And, although there is no intention for the driver to run on big endian systems, the __le* fields should be accessed using cpu_to_le*/le*_to_cpu helpers.