Re: [PATCH v1 2/2] ARM: dts: samsung: Add cache information to the Exynos542x SoC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Anand,

Thanks for working on this!

On Tue, Sep 09, 2025 at 07:29:31PM +0530, Anand Moon wrote:
[ ... ]
> > > >>>> On 30.07.2024 11:13, Anand Moon wrote:
> > > >>>>> As per the Exynos 5422 user manual add missing cache information to
> > > >>>>> the Exynos542x SoC.
> > > >>>>>
> > > >>>>> - Each Cortex-A7 core has 32 KB of instruction cache and
> > > >>>>>       32 KB of L1 data cache available.
> > > >>>>> - Each Cortex-A15 core has 32 KB of L1 instruction cache and
> > > >>>>>       32 KB of L1 data cache available.
> > > >>>>> - The little (A7) cluster has 512 KB of unified L2 cache available.
> > > >>>>> - The big (A15) cluster has 2 MB of unified L2 cache available.
> > > >>>>>
> > > >>>>> Features:
> > > >>>>> - Exynos 5422 support cache coherency interconnect (CCI) bus with
> > > >>>>>    L2 cache snooping capability. This hardware automatic L2 cache
> > > >>>>>    snooping removes the efforts of synchronizing the contents of the
> > > >>>>>    two L2 caches in core switching event.
> > > >>>>>
> > > >>>>> Signed-off-by: Anand Moon <linux.amoon@xxxxxxxxx>
> > > >>>>
> > > >>>>
> > > >>>> The provided values are not correct. Please refer to commit 5f41f9198f29
> > > >>>> ("ARM: 8864/1: Add workaround for I-Cache line size mismatch between CPU
> > > >>>> cores"), which adds workaround for different l1 icache line size between
> > > >>>> big and little CPUs. This workaround gets enabled on all Exynos542x/5800
> > > >>>> boards.
> > > >>>>
> > > >>> Ok, I have just referred to the Exynos 5422 user manual for this patch,
> > > >>> This patch is just updating the cache size for CPU for big.litle architecture..

I do not have access to the 5422 manual unfortunately, but if I add
some prints in the code from the commit Marek referenced:

```diff
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -173,6 +173,7 @@ void check_cpu_icache_size(int cpuid)
        asm("mrc p15, 0, %0, c0, c0, 1" : "=r" (ctr));
 
        size = 1 << ((ctr & 0xf) + 2);
+       pr_warn("CPU%u: icache line size: %u, size %u\n", cpuid, icache_size, size);
        if (cpuid != 0 && icache_size != size)
                pr_info("CPU%u: detected I-Cache line size mismatch, workaround enabled\n",
                        cpuid);
```

Then we get in dmesg:

CPU0: icache line size: 64, size 32
CPU1: icache line size: 32, size 32
CPU2: icache line size: 32, size 32
CPU3: icache line size: 32, size 32
CPU4: icache line size: 32, size 64
CPU5: icache line size: 32, size 64
CPU6: icache line size: 32, size 64
CPU7: icache line size: 32, size 64

I interpret this as that the i-cache-line-size property of CPU4, 5, 6
and 7 (i.e. cpu@0, cpu@1, cpu@2 and cpu@4) should be 64 instead of 32.

Not sure about the other properties..

> Here's an article that provides detailed insights into the cache feature.
> [0] http://jake.dothome.co.kr/cache4/
> 
> The values associated with L1 and L2 caches indicate their respective sizes,
> as specified in the ARM Technical Reference Manual (TRM) below.
> 
> Cortex-A15 L2 cache controller
> [0] https://developer.arm.com/documentation/ddi0503/i/programmers-model/programmable-peripherals-and-interfaces/cortex-a15-l2-cache-controller
> 
> Cortex-A7 L2 cache controller
> [1] https://developer.arm.com/documentation/ddi0503/i/programmers-model/programmable-peripherals-and-interfaces/cortex-a7-l2-cache-controller
> 
> These changes help define a fixed cache size, ensuring that active pages
> are mapped correctly within the expected cache boundaries.
> 
> Here is the small test case using perf
> Before
> 
> $ sudo perf stat -e L1-dcache-loads,L1-dcache-load-misses ./fact
> 
> Simulated Cache Miss Time (avg): 4766632 ns
> Factorial(10) = 3628800
> 
>  Performance counter stats for './fact':
> 
>             926328      armv7_cortex_a15/L1-dcache-loads/
>      <not counted>      armv7_cortex_a7/L1-dcache-loads/
>                          (0.00%)
>              16510      armv7_cortex_a15/L1-dcache-load-misses/ #
> 1.78% of all L1-dcache accesses
>      <not counted>      armv7_cortex_a7/L1-dcache-load-misses/
>                                (0.00%)
> 
>        0.008970031 seconds time elapsed
> 
>        0.000000000 seconds user
>        0.009673000 seconds sys
> 
> After
> $ sudo perf stat -e L1-dcache-loads,L1-dcache-load-misses ./fact
> Simulated Cache Miss Time (avg): 4623272 ns
> Factorial(10) = 3628800
> 
>  Performance counter stats for './fact':
> 
>             930570      armv7_cortex_a15/L1-dcache-loads/
>      <not counted>      armv7_cortex_a7/L1-dcache-loads/
>                          (0.00%)
>               4755      armv7_cortex_a15/L1-dcache-load-misses/ #
> 0.51% of all L1-dcache accesses
>      <not counted>      armv7_cortex_a7/L1-dcache-load-misses/
>                                (0.00%)
> 
>        0.011068250 seconds time elapsed
> 
>        0.000000000 seconds user
>        0.010793000 seconds sys

I tried out the same test on my odroid-xu4, but was not able to
reliably get the same improvement. Cache misses varied between around
0.8 % to around 2.8 %. This was with a desktop UI installed and
though, will try it out in a headless installation in the next few
days, and perhaps try it on exynos5800 as well.

Might be worth also testing on both small and big cores, like:

$ sudo taskset -c 0,1,2,3 perf stat -e L1-dcache-loads,L1-dcache-load-misses ./fact
$ sudo taskset -c 4,5,6,7 perf stat -e L1-dcache-loads,L1-dcache-load-misses ./fact

Best regards,
Henrik Grimler




[Index of Archives]     [Linux SoC Development]     [Linux Rockchip Development]     [Linux for Synopsys ARC Processors]    
  • [Linux on Unisoc (RDA Micro) SoCs]     [Linux Actions SoC]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Linux SCSI]     [Yosemite News]

  •   Powered by Linux