Hello, kernel test robot noticed a 5.3% improvement of stress-ng.loop.ops_per_sec on: commit: e7bc0010ceb403d025100698586c8e760921d471 ("loop: properly send KOBJ_CHANGED uevent for disk device") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master testcase: stress-ng config: x86_64-rhel-9.4 compiler: gcc-12 test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory parameters: nr_threads: 100% testtime: 60s test: loop cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20250604/202506042201.1d6ccec5-lkp@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-12/performance/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/loop/stress-ng/60s commit: 1fdb8188c3 ("loop: aio inherit the ioprio of original request") e7bc0010ce ("loop: properly send KOBJ_CHANGED uevent for disk device") 1fdb8188c3d50545 e7bc0010ceb403d025100698586 ---------------- --------------------------- %stddev %change %stddev \ | \ 0.13 ± 6% +0.1 0.19 ± 4% mpstat.cpu.all.iowait% 2.94 ± 2% +0.8 3.69 ± 3% mpstat.cpu.all.sys% 32546 +30.4% 42443 vmstat.system.cs 74083 +4.6% 77510 vmstat.system.in 7889 ±118% +373.6% 37367 ± 45% numa-meminfo.node0.AnonHugePages 21065 ±128% +220.9% 67589 ± 32% numa-meminfo.node0.Shmem 11373 ± 3% +21.7% 13841 ± 3% numa-meminfo.node1.Active(file) 22923 ± 5% +16.8% 26769 ± 4% meminfo.Active(file) 448.71 ± 8% +106.3% 925.77 ± 9% meminfo.Inactive 448.71 ± 8% +106.3% 925.77 ± 9% meminfo.Inactive(file) 165282 ± 5% +23.3% 203822 ± 2% meminfo.Shmem 549441 ± 5% +18.0% 648129 numa-numastat.node0.local_node 591390 ± 2% +16.7% 690232 ± 2% numa-numastat.node0.numa_hit 614834 ± 5% +14.1% 701232 ± 2% numa-numastat.node1.local_node 642698 ± 3% +13.4% 729095 ± 2% numa-numastat.node1.numa_hit 6173 ± 64% +72.8% 10667 ± 8% sched_debug.cfs_rq:/.avg_vruntime.min 6173 ± 64% +72.8% 10667 ± 8% sched_debug.cfs_rq:/.min_vruntime.min 12139 ± 61% +83.0% 22217 sched_debug.cpu.nr_switches.avg 22304 ± 45% +55.3% 34647 ± 6% sched_debug.cpu.nr_switches.max 7497 ± 68% +100.9% 15063 ± 9% sched_debug.cpu.nr_switches.min -77.83 +69.3% -131.75 sched_debug.cpu.nr_uninterruptible.min 15.60 ± 44% +58.1% 24.65 ± 7% sched_debug.cpu.nr_uninterruptible.stddev 5562 +4.7% 5825 stress-ng.loop.ops 91.71 +5.3% 96.60 stress-ng.loop.ops_per_sec 6139 ± 2% +8.7% 6671 ± 2% stress-ng.time.involuntary_context_switches 210787 +2.4% 215791 stress-ng.time.minor_page_faults 167.83 ± 2% +20.1% 201.50 ± 4% stress-ng.time.percent_of_cpu_this_job_got 102.43 ± 2% +19.1% 122.04 ± 4% stress-ng.time.system_time 349026 ± 2% +50.0% 523587 stress-ng.time.voluntary_context_switches 2815 ± 6% +20.4% 3390 ± 7% numa-vmstat.node0.nr_active_file 5302 ±129% +219.5% 16943 ± 32% numa-vmstat.node0.nr_shmem 2814 ± 6% +20.5% 3391 ± 7% numa-vmstat.node0.nr_zone_active_file 591173 ± 2% +16.7% 689866 ± 2% numa-vmstat.node0.numa_hit 549224 ± 5% +17.9% 647763 numa-vmstat.node0.numa_local 2816 ± 2% +26.8% 3569 ± 4% numa-vmstat.node1.nr_active_file 38.54 ± 63% +213.0% 120.63 ± 23% numa-vmstat.node1.nr_inactive_file 2815 ± 2% +26.9% 3571 ± 4% numa-vmstat.node1.nr_zone_active_file 38.54 ± 63% +213.1% 120.67 ± 23% numa-vmstat.node1.nr_zone_inactive_file 641841 ± 3% +13.6% 729026 ± 2% numa-vmstat.node1.numa_hit 613978 ± 5% +14.2% 701163 ± 2% numa-vmstat.node1.numa_local 222483 ± 2% +3.8% 230998 proc-vmstat.nr_active_anon 5703 ± 4% +22.3% 6976 ± 6% proc-vmstat.nr_active_file 966262 +1.1% 976967 proc-vmstat.nr_file_pages 110.67 ± 6% +105.8% 227.81 ± 9% proc-vmstat.nr_inactive_file 41661 +2.2% 42580 proc-vmstat.nr_mapped 41337 ± 5% +23.4% 50991 ± 2% proc-vmstat.nr_shmem 30961 +2.2% 31647 proc-vmstat.nr_slab_reclaimable 46186 +2.4% 47285 proc-vmstat.nr_slab_unreclaimable 222483 ± 2% +3.8% 230998 proc-vmstat.nr_zone_active_anon 5703 ± 4% +22.3% 6976 ± 6% proc-vmstat.nr_zone_active_file 110.67 ± 6% +105.8% 227.81 ± 9% proc-vmstat.nr_zone_inactive_file 1234795 +15.0% 1420254 proc-vmstat.numa_hit 1164983 +15.9% 1350288 proc-vmstat.numa_local 3262158 +4.5% 3407812 proc-vmstat.pgalloc_normal 3097627 +3.9% 3219201 proc-vmstat.pgfree 3787 +8.7% 4118 proc-vmstat.pgmajfault 75121 +17.7% 88386 proc-vmstat.unevictable_pgs_culled 2.333e+09 ± 2% +8.1% 2.522e+09 perf-stat.i.branch-instructions 1.88 ± 2% -0.1 1.76 perf-stat.i.branch-miss-rate% 47022551 +6.9% 50273881 perf-stat.i.cache-references 35100 +29.6% 45490 perf-stat.i.context-switches 1.63 +2.6% 1.67 perf-stat.i.cpi 1.764e+10 ± 2% +10.2% 1.944e+10 perf-stat.i.cpu-cycles 1372 ± 2% +22.2% 1676 ± 2% perf-stat.i.cpu-migrations 1.093e+10 ± 2% +7.1% 1.171e+10 perf-stat.i.instructions 0.01 ±142% +888.9% 0.08 ± 33% perf-stat.i.metric.K/sec 1.94 ± 2% -0.1 1.81 perf-stat.overall.branch-miss-rate% 1.62 +2.7% 1.66 perf-stat.overall.cpi 2.186e+09 +9.8% 2.401e+09 perf-stat.ps.branch-instructions 43719263 +8.4% 47392135 perf-stat.ps.cache-references 33090 +30.8% 43266 perf-stat.ps.context-switches 1.651e+10 +11.8% 1.846e+10 perf-stat.ps.cpu-cycles 1292 ± 2% +23.7% 1598 ± 3% perf-stat.ps.cpu-migrations 1.022e+10 +8.9% 1.113e+10 perf-stat.ps.instructions 6.273e+11 +8.8% 6.824e+11 perf-stat.total.instructions 5.40 ± 28% -48.0% 2.81 ± 31% perf-sched.sch_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity 0.01 ± 11% +37.9% 0.02 ± 10% perf-sched.sch_delay.avg.ms.__cond_resched.mutex_lock.blk_mq_unfreeze_queue_nomemrestore.loop_set_block_size.lo_simple_ioctl 0.09 ± 48% -73.0% 0.02 ± 69% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll 0.49 ± 12% -25.7% 0.37 ± 9% perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.05 ± 42% +1.2e+05% 60.77 ±207% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.__synchronize_srcu.part.0 998.14 ± 47% -54.0% 458.71 ± 44% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity 0.04 ± 71% +1978.2% 0.84 ±179% perf-sched.sch_delay.max.ms.__cond_resched.dput.simple_recursive_removal.debugfs_remove.blk_unregister_queue 0.02 ± 69% +138.1% 0.04 ± 44% perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.sync_bdevs.ksys_sync.__x64_sys_sync 0.05 ± 27% +72.4% 0.09 ± 49% perf-sched.sch_delay.max.ms.blk_mq_freeze_queue_wait.loop_set_status.lo_ioctl.blkdev_ioctl 0.00 ±107% +2770.0% 0.10 ±148% perf-sched.sch_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.open_last_lookups 18.75 ± 12% -29.8% 13.16 ± 3% perf-sched.total_wait_and_delay.average.ms 95102 ± 15% +53.4% 145892 ± 4% perf-sched.total_wait_and_delay.count.ms 18.59 ± 12% -29.8% 13.06 ± 3% perf-sched.total_wait_time.average.ms 9.20 ± 11% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.devtmpfs_work_loop.devtmpfsd.kthread.ret_from_fork 3.44 ± 20% -60.5% 1.36 ± 23% perf-sched.wait_and_delay.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep 18.29 ± 85% +145.1% 44.83 ± 40% perf-sched.wait_and_delay.avg.ms.exp_funnel_lock.synchronize_rcu_expedited.bdi_unregister.del_gendisk 1.83 ± 30% +62.1% 2.97 ± 19% perf-sched.wait_and_delay.avg.ms.io_schedule.folio_wait_bit_common.filemap_update_page.filemap_get_pages 8.92 ± 21% -42.3% 5.15 ± 5% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait 467.37 ± 24% -41.3% 274.21 ± 14% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll 97.15 ± 37% -66.7% 32.36 ± 23% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.__lru_add_drain_all 2.79 ± 17% +92.6% 5.37 ± 23% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.kernfs_find_and_get_ns 60.00 ± 10% -46.0% 32.38 ± 28% perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.__flush_work.__lru_add_drain_all 54.09 ± 11% -20.3% 43.09 ± 2% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 6658 ± 12% +25.6% 8362 ± 4% perf-sched.wait_and_delay.count.__cond_resched.loop_process_work.process_one_work.worker_thread.kthread 539.83 ± 13% -100.0% 0.00 perf-sched.wait_and_delay.count.devtmpfs_work_loop.devtmpfsd.kthread.ret_from_fork 15267 ± 29% +141.0% 36792 ± 18% perf-sched.wait_and_delay.count.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep 4141 ± 12% +67.7% 6944 ± 5% perf-sched.wait_and_delay.count.io_schedule.folio_wait_bit_common.filemap_update_page.filemap_get_pages 10673 ± 14% +70.0% 18140 ± 3% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait 29.17 ± 17% +88.6% 55.00 ± 9% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll 104.50 ± 16% +119.0% 228.83 ± 12% perf-sched.wait_and_delay.count.schedule_preempt_disabled.__mutex_lock.constprop.0.__lru_add_drain_all 29.33 ± 18% -73.3% 7.83 ± 34% perf-sched.wait_and_delay.count.schedule_preempt_disabled.__mutex_lock.constprop.0.bdev_release 607.50 ± 48% +49.9% 910.67 ± 5% perf-sched.wait_and_delay.count.schedule_preempt_disabled.__mutex_lock.constprop.0.lo_open 125.33 ± 42% +373.4% 593.33 ± 17% perf-sched.wait_and_delay.count.schedule_preempt_disabled.__mutex_lock.constprop.0.sync_bdevs 7635 ± 14% +68.4% 12861 ± 12% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.kernfs_dop_revalidate 775.50 ± 19% +40.1% 1086 ± 8% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.kernfs_find_and_get_ns 866.50 ± 15% +48.2% 1284 ± 12% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.kernfs_iop_lookup 1961 ± 17% +29.7% 2544 ± 6% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.kernfs_remove 2905 ± 22% +52.3% 4425 ± 6% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.kernfs_remove_by_name_ns 229.00 ± 11% -43.4% 129.50 ± 19% perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.__flush_work.__lru_add_drain_all 877.50 ± 14% +24.1% 1089 ± 4% perf-sched.wait_and_delay.count.schedule_timeout.__wait_for_common.__synchronize_srcu.part.0 10423 ± 13% +29.2% 13464 ± 4% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 10975 ± 12% +20.2% 13195 ± 4% perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 1238 ± 24% -29.2% 876.43 ± 13% perf-sched.wait_and_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 1350 ± 30% -39.9% 811.58 ± 18% perf-sched.wait_and_delay.max.ms.blk_mq_freeze_queue_wait.loop_set_status.loop_set_status_old.blkdev_ioctl 1281 ± 34% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.devtmpfs_work_loop.devtmpfsd.kthread.ret_from_fork 1349 ± 30% -40.6% 801.82 ± 20% perf-sched.wait_and_delay.max.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault 896.11 ± 53% -61.3% 347.06 ± 43% perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.__lru_add_drain_all 1324 ± 31% -60.0% 529.55 ± 32% perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.__flush_work.__lru_add_drain_all 1345 ± 30% -42.5% 773.93 ± 23% perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.devtmpfs_submit_req.devtmpfs_create_node 1329 ± 31% -52.1% 637.20 ± 29% perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.__wait_rcu_gp 0.00 ±147% +4468.8% 0.12 ±130% perf-sched.wait_time.avg.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kvasprintf.kobject_set_name_vargs.kobject_add 0.07 ± 2% -33.3% 0.05 ± 21% perf-sched.wait_time.avg.ms.__cond_resched.__wait_for_common.submit_bio_wait.blkdev_issue_flush.blkdev_fsync 0.05 ±101% +78602.4% 37.91 ±184% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc_noprof.__kernfs_new_node.kernfs_new_node.__kernfs_create_file 11.08 ±102% -99.7% 0.03 ±108% perf-sched.wait_time.avg.ms.__cond_resched.lo_read_simple.loop_process_work.process_one_work.worker_thread 0.02 ± 33% -73.9% 0.01 ±142% perf-sched.wait_time.avg.ms.__cond_resched.mutex_lock.bd_abort_claiming.loop_configure.lo_ioctl 3.41 ± 21% -60.4% 1.35 ± 23% perf-sched.wait_time.avg.ms.do_nanosleep.hrtimer_nanosleep.common_nsleep.__x64_sys_clock_nanosleep 1.83 ± 30% +61.4% 2.95 ± 20% perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.filemap_update_page.filemap_get_pages 8.86 ± 21% -42.3% 5.12 ± 5% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.ep_poll.do_epoll_wait.__x64_sys_epoll_wait 467.29 ± 24% -41.3% 274.19 ± 14% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll 96.08 ± 38% -66.6% 32.11 ± 24% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.__lru_add_drain_all 0.07 ± 42% +494.5% 0.41 ±104% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.dev_pm_qos_constraints_destroy 2.38 ±103% +1749.5% 44.04 ± 42% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.lo_simple_ioctl 2.73 ± 16% +96.2% 5.36 ± 23% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.kernfs_find_and_get_ns 3.18 ± 75% +171.8% 8.65 ± 24% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.kernfs_iop_permission 59.89 ± 10% -46.1% 32.26 ± 29% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.__flush_work.__lru_add_drain_all 0.04 ± 25% +72.9% 0.07 ± 27% perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.devtmpfs_submit_req.devtmpfs_delete_node 3.13 ± 22% +116.4% 6.77 ± 59% perf-sched.wait_time.avg.ms.schedule_timeout.synchronize_rcu_expedited_wait_once.synchronize_rcu_expedited_wait.rcu_exp_wait_wake 53.60 ± 11% -20.3% 42.72 ± 2% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.00 ±150% +7429.4% 0.21 ±109% perf-sched.wait_time.max.ms.__cond_resched.__kmalloc_node_track_caller_noprof.kvasprintf.kobject_set_name_vargs.kobject_add 236.96 ±105% -99.7% 0.59 ±182% perf-sched.wait_time.max.ms.__cond_resched.lo_read_simple.loop_process_work.process_one_work.worker_thread 0.16 ± 71% -96.4% 0.01 ±142% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.bd_abort_claiming.loop_configure.lo_ioctl 1238 ± 24% -29.2% 876.42 ± 13% perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 1350 ± 30% -39.9% 811.55 ± 18% perf-sched.wait_time.max.ms.blk_mq_freeze_queue_wait.loop_set_status.loop_set_status_old.blkdev_ioctl 872.85 ± 8% -26.5% 641.54 ± 20% perf-sched.wait_time.max.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.x64_sys_call 1349 ± 30% -40.6% 801.76 ± 20% perf-sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.filemap_fault.__do_fault 894.57 ± 53% -61.2% 347.04 ± 43% perf-sched.wait_time.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.__lru_add_drain_all 4.21 ±101% +1975.4% 87.43 ±134% perf-sched.wait_time.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.dev_pm_qos_constraints_destroy 7.31 ±114% +7928.3% 586.82 ± 40% perf-sched.wait_time.max.ms.schedule_preempt_disabled.__mutex_lock.constprop.0.lo_simple_ioctl 1324 ± 31% -60.0% 529.52 ± 32% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.__flush_work.__lru_add_drain_all 1345 ± 30% -42.5% 773.91 ± 23% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.devtmpfs_submit_req.devtmpfs_create_node 1329 ± 31% -52.5% 630.99 ± 28% perf-sched.wait_time.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.__wait_rcu_gp Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki