Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx> writes: > 在 2025/4/14 11:46, Ankur Arora 写道: >> Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx> writes: >> >>> 在 2025/4/12 04:57, Ankur Arora 写道: >>>> Shuai Xue <xueshuai@xxxxxxxxxxxxxxxxx> writes: >>>> >>>>> 在 2025/2/19 05:33, Ankur Arora 写道: >>>>>> Needed for cpuidle-haltpoll. >>>>>> Acked-by: Will Deacon <will@xxxxxxxxxx> >>>>>> Signed-off-by: Ankur Arora <ankur.a.arora@xxxxxxxxxx> >>>>>> --- >>>>>> arch/arm64/kernel/idle.c | 1 + >>>>>> 1 file changed, 1 insertion(+) >>>>>> diff --git a/arch/arm64/kernel/idle.c b/arch/arm64/kernel/idle.c >>>>>> index 05cfb347ec26..b85ba0df9b02 100644 >>>>>> --- a/arch/arm64/kernel/idle.c >>>>>> +++ b/arch/arm64/kernel/idle.c >>>>>> @@ -43,3 +43,4 @@ void __cpuidle arch_cpu_idle(void) >>>>>> */ >>>>>> cpu_do_idle(); >>>>> >>>>> Hi, Ankur, >>>>> >>>>> With haltpoll_driver registered, arch_cpu_idle() on x86 can select >>>>> mwait_idle() in idle threads. >>>>> >>>>> It use MONITOR sets up an effective address range that is monitored >>>>> for write-to-memory activities; MWAIT places the processor in >>>>> an optimized state (this may vary between different implementations) >>>>> until a write to the monitored address range occurs. >>>> MWAIT is more capable than WFE -- it allows selection of deeper idle >>>> state. IIRC C2/C3. >>>> >>>>> Should arch_cpu_idle() on arm64 also use the LDXR/WFE >>>>> to avoid wakeup IPI like x86 monitor/mwait? >>>> Avoiding the wakeup IPI needs TIF_NR_POLLING and polling in idle support >>>> that this series adds. >>>> As Haris notes, the negative with only using WFE is that it only allows >>>> a single idle state, one that is fairly shallow because the event-stream >>>> causes a wakeup every 100us. >>>> -- >>>> ankur >>> >>> Hi, Ankur and Haris >>> >>> Got it, thanks for explaination :) >>> >>> Comparing sched-pipe performance on Rund with Yitian 710, *IPC improved 35%*: >> Thanks for testing Shuai. I wasn't expecting the IPC to improve by quite >> that much :). The reduced instructions make sense since we don't have to >> handle the IRQ anymore but we would spend some of the saved cycles >> waiting in WFE instead. >> I'm not familiar with the Yitian 710. Can you check if you are running >> with WFE? That's the __smp_cond_load_relaxed_timewait() path vs the >> __smp_cond_load_relaxed_spinwait() path in [0]. Same question for the >> Kunpeng 920. > > Yes, it running with __smp_cond_load_relaxed_timewait(). > > I use perf-probe to check if WFE is available in Guest: > > perf probe 'arch_timer_evtstrm_available%return r=$retval' > perf record -e probe:arch_timer_evtstrm_available__return -aR sleep 1 > perf script > swapper 0 [000] 1360.063049: probe:arch_timer_evtstrm_available__return: (ffff800080a5c640 <- ffff800080d42764) r=0x1 > > arch_timer_evtstrm_available returns true, so > __smp_cond_load_relaxed_timewait() is used. Great. Thanks for checking. >> Also, I'm working on a new version of the series in [1]. Would you be >> okay trying that out? > > Sure. Please cc me when you send out a new version. Will do. Thanks! -- ankur