On Fri, 30 May 2025 17:27:28 +0800 Bo Li <libo.gcs85@xxxxxxxxxxxxx> wrote: > During testing, the client transmitted 1 million 32-byte messages, and we > computed the per-message average latency. The results are as follows: > > ***************** > Without RPAL: Message length: 32 bytes, Total TSC cycles: 19616222534, > Message count: 1000000, Average latency: 19616 cycles > With RPAL: Message length: 32 bytes, Total TSC cycles: 1703459326, > Message count: 1000000, Average latency: 1703 cycles > ***************** > > These results confirm that RPAL delivers substantial latency improvements > over the current epoll implementation—achieving a 17,913-cycle reduction > (an ~91.3% improvement) for 32-byte messages. Noted ;) Quick question: > arch/x86/Kbuild | 2 + > arch/x86/Kconfig | 2 + > arch/x86/entry/entry_64.S | 160 ++ > arch/x86/events/amd/core.c | 14 + > arch/x86/include/asm/pgtable.h | 25 + > arch/x86/include/asm/pgtable_types.h | 11 + > arch/x86/include/asm/tlbflush.h | 10 + > arch/x86/kernel/asm-offsets.c | 3 + > arch/x86/kernel/cpu/common.c | 8 +- > arch/x86/kernel/fpu/core.c | 8 +- > arch/x86/kernel/nmi.c | 20 + > arch/x86/kernel/process.c | 25 +- > arch/x86/kernel/process_64.c | 118 + > arch/x86/mm/fault.c | 271 ++ > arch/x86/mm/mmap.c | 10 + > arch/x86/mm/tlb.c | 172 ++ > arch/x86/rpal/Kconfig | 21 + > arch/x86/rpal/Makefile | 6 + > arch/x86/rpal/core.c | 477 ++++ > arch/x86/rpal/internal.h | 69 + > arch/x86/rpal/mm.c | 426 +++ > arch/x86/rpal/pku.c | 196 ++ > arch/x86/rpal/proc.c | 279 ++ > arch/x86/rpal/service.c | 776 ++++++ > arch/x86/rpal/thread.c | 313 +++ The changes are very x86-heavy. Is that a necessary thing? Would another architecture need to implement a similar amount to enable RPAL? IOW, how much of the above could be made arch-neutral?