Hello, kernel test robot noticed a 3.6% improvement of will-it-scale.per_thread_ops on: commit: 665575cff098b696995ddaddf4646a4099941f5e ("filemap: move prefaulting out of hot write path") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master testcase: will-it-scale config: x86_64-rhel-9.4 compiler: gcc-12 test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory parameters: nr_task: 100% mode: thread test: writeseek1 cpufreq_governor: performance In addition to that, the commit also has significant impact on the following tests: +------------------+-------------------------------------------------------------------------------------------+ | testcase: change | unixbench: unixbench.throughput 4.6% improvement | | test machine | 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory | | test parameters | cpufreq_governor=performance | | | nr_task=100% | | | runtime=300s | | | test=fsbuffer-w | +------------------+-------------------------------------------------------------------------------------------+ Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20250331/202503311302.a2bb29e1-lkp@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-9.4/thread/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/writeseek1/will-it-scale commit: 654b33ada4 ("proc: fix UAF in proc_get_inode()") 665575cff0 ("filemap: move prefaulting out of hot write path") 654b33ada4ab5e92 665575cff098b696995ddaddf46 ---------------- --------------------------- %stddev %change %stddev \ | \ 1.171e+08 ± 11% +30.4% 1.526e+08 ± 18% cpuidle..time 96.67 ± 15% -38.6% 59.33 ± 15% perf-c2c.HITM.local 91.33 ± 22% -37.8% 56.83 ± 18% perf-c2c.HITM.remote 77338762 +3.6% 80146917 will-it-scale.64.threads 1208417 +3.6% 1252295 will-it-scale.per_thread_ops 77338762 +3.6% 80146917 will-it-scale.workload 0.02 ± 3% +0.0 0.03 ± 19% perf-stat.i.branch-miss-rate% 9721738 ± 4% +8.9% 10586240 ± 5% perf-stat.i.branch-misses 0.02 ± 3% +0.0 0.02 ± 5% perf-stat.overall.branch-miss-rate% 683007 -3.3% 660149 perf-stat.overall.path-length 9685250 ± 4% +8.9% 10545947 ± 5% perf-stat.ps.branch-misses 31.54 -2.4 29.18 perf-profile.calltrace.cycles-pp.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64 40.31 -1.9 38.39 perf-profile.calltrace.cycles-pp.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 46.03 -1.9 44.17 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 53.97 -1.7 52.30 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 58.43 -1.4 56.98 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 59.33 -1.4 57.92 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write 74.17 -0.8 73.38 perf-profile.calltrace.cycles-pp.write 0.55 +0.0 0.57 perf-profile.calltrace.cycles-pp.x64_sys_call.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 0.97 +0.0 1.01 perf-profile.calltrace.cycles-pp.folio_unlock.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write 0.57 ± 3% +0.0 0.60 perf-profile.calltrace.cycles-pp.file_remove_privs_flags.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64 1.01 +0.0 1.04 perf-profile.calltrace.cycles-pp.fput.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 1.08 +0.0 1.12 perf-profile.calltrace.cycles-pp.up_write.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64 0.54 +0.0 0.59 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.llseek 0.99 +0.0 1.04 perf-profile.calltrace.cycles-pp.mutex_unlock.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 1.60 +0.1 1.67 perf-profile.calltrace.cycles-pp.down_write.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64 0.68 ± 2% +0.1 0.75 ± 3% perf-profile.calltrace.cycles-pp.ktime_get_coarse_real_ts64_mg.current_time.inode_needs_update_time.file_update_time.shmem_file_write_iter 1.61 +0.1 1.69 perf-profile.calltrace.cycles-pp.mutex_lock.fdget_pos.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.94 +0.1 1.02 perf-profile.calltrace.cycles-pp.folio_mark_accessed.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter 3.19 +0.1 3.27 perf-profile.calltrace.cycles-pp.shmem_write_end.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write 0.87 +0.1 0.96 perf-profile.calltrace.cycles-pp.rw_verify_area.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 2.23 +0.1 2.32 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.llseek 2.21 +0.1 2.31 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 1.57 +0.2 1.72 perf-profile.calltrace.cycles-pp.current_time.inode_needs_update_time.file_update_time.shmem_file_write_iter.vfs_write 4.90 +0.2 5.08 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.write 0.60 ± 3% +0.2 0.80 ± 5% perf-profile.calltrace.cycles-pp.balance_dirty_pages_ratelimited_flags.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write 6.44 +0.2 6.66 perf-profile.calltrace.cycles-pp.clear_bhb_loop.llseek 2.30 +0.2 2.52 perf-profile.calltrace.cycles-pp.inode_needs_update_time.file_update_time.shmem_file_write_iter.vfs_write.ksys_write 2.24 +0.2 2.48 perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter 6.06 +0.2 6.30 perf-profile.calltrace.cycles-pp.clear_bhb_loop.write 2.82 +0.3 3.09 perf-profile.calltrace.cycles-pp.file_update_time.shmem_file_write_iter.vfs_write.ksys_write.do_syscall_64 4.78 +0.3 5.06 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.llseek 5.77 +0.5 6.26 perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write.shmem_file_write_iter.vfs_write 0.00 +0.5 0.52 perf-profile.calltrace.cycles-pp.__cond_resched.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.00 +0.5 0.52 perf-profile.calltrace.cycles-pp.shmem_file_llseek.ksys_lseek.do_syscall_64.entry_SYSCALL_64_after_hwframe.llseek 0.00 +0.5 0.54 ± 2% perf-profile.calltrace.cycles-pp.xas_load.filemap_get_entry.shmem_get_folio_gfp.shmem_write_begin.generic_perform_write 6.65 +0.5 7.20 perf-profile.calltrace.cycles-pp.shmem_write_begin.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write 14.05 +0.8 14.81 perf-profile.calltrace.cycles-pp.copy_page_from_iter_atomic.generic_perform_write.shmem_file_write_iter.vfs_write.ksys_write 28.53 +0.9 29.47 perf-profile.calltrace.cycles-pp.llseek 32.10 -2.4 29.69 perf-profile.children.cycles-pp.generic_perform_write 40.86 -1.9 38.96 perf-profile.children.cycles-pp.shmem_file_write_iter 46.47 -1.8 44.64 perf-profile.children.cycles-pp.vfs_write 54.38 -1.6 52.74 perf-profile.children.cycles-pp.ksys_write 72.02 -1.1 70.92 perf-profile.children.cycles-pp.do_syscall_64 73.76 -1.1 72.66 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 74.63 -0.7 73.89 perf-profile.children.cycles-pp.write 0.59 ± 2% -0.0 0.54 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack 0.20 +0.0 0.21 perf-profile.children.cycles-pp.file_remove_privs 0.33 +0.0 0.35 perf-profile.children.cycles-pp.__f_unlock_pos 0.53 +0.0 0.54 perf-profile.children.cycles-pp.generic_file_llseek_size 0.89 +0.0 0.92 perf-profile.children.cycles-pp.testcase 2.29 +0.0 2.32 perf-profile.children.cycles-pp.fput 0.68 ± 2% +0.0 0.71 perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare 0.38 +0.0 0.42 ± 2% perf-profile.children.cycles-pp.security_file_permission 0.42 +0.0 0.46 ± 2% perf-profile.children.cycles-pp.write@plt 1.17 +0.0 1.21 perf-profile.children.cycles-pp.x64_sys_call 1.04 +0.0 1.08 perf-profile.children.cycles-pp.folio_unlock 1.14 +0.0 1.19 perf-profile.children.cycles-pp.up_write 0.63 ± 2% +0.0 0.67 perf-profile.children.cycles-pp.file_remove_privs_flags 0.53 ± 2% +0.0 0.58 perf-profile.children.cycles-pp.shmem_file_llseek 1.59 +0.1 1.65 perf-profile.children.cycles-pp.rcu_all_qs 2.19 +0.1 2.26 perf-profile.children.cycles-pp.mutex_unlock 0.75 +0.1 0.82 ± 3% perf-profile.children.cycles-pp.ktime_get_coarse_real_ts64_mg 0.22 ± 3% +0.1 0.29 ± 2% perf-profile.children.cycles-pp.inode_to_bdi 0.44 ± 2% +0.1 0.52 ± 2% perf-profile.children.cycles-pp.xas_start 1.73 +0.1 1.80 perf-profile.children.cycles-pp.down_write 3.40 +0.1 3.48 perf-profile.children.cycles-pp.shmem_write_end 1.00 +0.1 1.08 perf-profile.children.cycles-pp.folio_mark_accessed 3.53 +0.1 3.62 perf-profile.children.cycles-pp.mutex_lock 0.65 +0.1 0.75 perf-profile.children.cycles-pp.xas_load 1.00 +0.1 1.10 perf-profile.children.cycles-pp.rw_verify_area 1.48 ± 3% +0.1 1.59 ± 2% perf-profile.children.cycles-pp.syscall_return_via_sysret 3.58 +0.2 3.74 perf-profile.children.cycles-pp.__cond_resched 1.71 +0.2 1.86 perf-profile.children.cycles-pp.current_time 4.70 +0.2 4.90 perf-profile.children.cycles-pp.syscall_exit_to_user_mode 4.36 +0.2 4.57 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack 0.74 ± 2% +0.2 0.95 ± 4% perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited_flags 2.43 +0.2 2.66 perf-profile.children.cycles-pp.inode_needs_update_time 2.38 +0.2 2.62 perf-profile.children.cycles-pp.filemap_get_entry 5.53 +0.3 5.78 perf-profile.children.cycles-pp.entry_SYSCALL_64 2.96 +0.3 3.23 perf-profile.children.cycles-pp.file_update_time 12.62 +0.5 13.10 perf-profile.children.cycles-pp.clear_bhb_loop 6.04 +0.5 6.54 perf-profile.children.cycles-pp.shmem_get_folio_gfp 6.79 +0.6 7.35 perf-profile.children.cycles-pp.shmem_write_begin 14.14 +0.8 14.90 perf-profile.children.cycles-pp.copy_page_from_iter_atomic 28.74 +0.9 29.66 perf-profile.children.cycles-pp.llseek 0.58 ± 3% -0.0 0.54 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack 0.13 +0.0 0.14 perf-profile.self.cycles-pp.__f_unlock_pos 0.47 +0.0 0.48 perf-profile.self.cycles-pp.generic_file_llseek_size 0.36 +0.0 0.38 perf-profile.self.cycles-pp.folio_mark_dirty 0.27 ± 2% +0.0 0.29 perf-profile.self.cycles-pp.xas_load 1.56 +0.0 1.58 perf-profile.self.cycles-pp.shmem_write_end 1.14 +0.0 1.17 perf-profile.self.cycles-pp.down_write 0.31 +0.0 0.34 ± 2% perf-profile.self.cycles-pp.security_file_permission 0.54 ± 3% +0.0 0.58 perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare 1.02 +0.0 1.06 perf-profile.self.cycles-pp.x64_sys_call 0.56 ± 2% +0.0 0.60 perf-profile.self.cycles-pp.file_remove_privs_flags 0.52 +0.0 0.56 ± 2% perf-profile.self.cycles-pp.file_update_time 0.41 ± 2% +0.0 0.45 perf-profile.self.cycles-pp.shmem_file_llseek 0.96 +0.0 1.00 perf-profile.self.cycles-pp.folio_unlock 1.07 +0.0 1.12 perf-profile.self.cycles-pp.up_write 1.36 +0.0 1.41 perf-profile.self.cycles-pp.entry_SYSCALL_64 0.80 +0.0 0.85 perf-profile.self.cycles-pp.ksys_lseek 0.86 ± 2% +0.0 0.91 perf-profile.self.cycles-pp.generic_write_checks 1.20 +0.0 1.24 perf-profile.self.cycles-pp.rcu_all_qs 0.75 +0.1 0.80 ± 2% perf-profile.self.cycles-pp.shmem_write_begin 0.16 ± 4% +0.1 0.21 ± 4% perf-profile.self.cycles-pp.inode_to_bdi 2.05 +0.1 2.11 perf-profile.self.cycles-pp.mutex_unlock 0.62 ± 2% +0.1 0.69 perf-profile.self.cycles-pp.rw_verify_area 0.69 ± 2% +0.1 0.75 ± 3% perf-profile.self.cycles-pp.ktime_get_coarse_real_ts64_mg 0.72 +0.1 0.79 perf-profile.self.cycles-pp.inode_needs_update_time 0.31 ± 2% +0.1 0.38 ± 3% perf-profile.self.cycles-pp.xas_start 0.93 +0.1 1.01 perf-profile.self.cycles-pp.folio_mark_accessed 1.98 +0.1 2.07 perf-profile.self.cycles-pp.__cond_resched 2.34 +0.1 2.43 perf-profile.self.cycles-pp.llseek 0.94 +0.1 1.04 perf-profile.self.cycles-pp.current_time 1.48 ± 3% +0.1 1.59 ± 2% perf-profile.self.cycles-pp.syscall_return_via_sysret 2.50 +0.1 2.62 perf-profile.self.cycles-pp.do_syscall_64 0.52 ± 4% +0.1 0.66 ± 7% perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited_flags 1.72 +0.1 1.87 perf-profile.self.cycles-pp.filemap_get_entry 4.03 +0.2 4.19 perf-profile.self.cycles-pp.syscall_exit_to_user_mode 2.97 +0.2 3.15 perf-profile.self.cycles-pp.write 2.14 +0.2 2.33 perf-profile.self.cycles-pp.shmem_get_folio_gfp 4.22 +0.2 4.42 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack 12.49 +0.5 12.96 perf-profile.self.cycles-pp.clear_bhb_loop 13.95 +0.7 14.70 perf-profile.self.cycles-pp.copy_page_from_iter_atomic *************************************************************************************************** lkp-icl-2sp9: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory ========================================================================================= compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/300s/lkp-icl-2sp9/fsbuffer-w/unixbench commit: 654b33ada4 ("proc: fix UAF in proc_get_inode()") 665575cff0 ("filemap: move prefaulting out of hot write path") 654b33ada4ab5e92 665575cff098b696995ddaddf46 ---------------- --------------------------- %stddev %change %stddev \ | \ 32471117 +4.6% 33974569 unixbench.throughput 1819 +4.0% 1892 unixbench.time.user_time 1.201e+10 +4.8% 1.259e+10 unixbench.workload 0.33 ± 2% +3.1% 0.34 perf-stat.i.MPKI 4.577e+10 +1.4% 4.64e+10 perf-stat.i.branch-instructions 0.02 -0.0 0.02 perf-stat.overall.branch-miss-rate% 6053 -4.0% 5808 perf-stat.overall.path-length 4.566e+10 +1.4% 4.629e+10 perf-stat.ps.branch-instructions Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki