Hello, kernel test robot noticed a 31.1% improvement of stress-ng.fsize.ops_per_sec on: commit: ad0d50f30d3fe376a99fd0e392867c7ca9b619e3 ("[PATCH v2 03/16] ext4: remove unnecessary s_md_lock on update s_mb_last_group") url: https://github.com/intel-lab-lkp/linux/commits/Baokun-Li/ext4-add-ext4_try_lock_group-to-skip-busy-groups/20250623-155451 base: https://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git dev patch link: https://lore.kernel.org/all/20250623073304.3275702-4-libaokun1@xxxxxxxxxx/ patch subject: [PATCH v2 03/16] ext4: remove unnecessary s_md_lock on update s_mb_last_group testcase: stress-ng config: x86_64-rhel-9.4 compiler: gcc-12 test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory parameters: nr_threads: 100% disk: 1HDD testtime: 60s fs: ext4 test: fsize cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20250701/202507010457.3b3d3c33-lkp@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-12/performance/1HDD/ext4/x86_64-rhel-9.4/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp4/fsize/stress-ng/60s commit: 86f92bf2c0 ("ext4: remove unnecessary s_mb_last_start") ad0d50f30d ("ext4: remove unnecessary s_md_lock on update s_mb_last_group") 86f92bf2c059852a ad0d50f30d3fe376a99fd0e3928 ---------------- --------------------------- %stddev %change %stddev \ | \ 5042 ± 4% -10.1% 4532 ± 2% meminfo.Dirty 100194 ± 63% +92.5% 192828 ± 32% numa-meminfo.node0.Shmem 5082 ± 3% +28.1% 6510 ± 5% vmstat.system.cs 71089 -17.1% 58900 ± 2% perf-c2c.DRAM.remote 44206 -13.4% 38284 ± 2% perf-c2c.HITM.remote 131696 -4.1% 126359 ± 2% perf-c2c.HITM.total 0.15 ± 18% +0.2 0.35 ± 14% mpstat.cpu.all.iowait% 0.32 ± 7% -0.0 0.28 ± 4% mpstat.cpu.all.irq% 0.05 ± 4% +0.0 0.07 ± 3% mpstat.cpu.all.soft% 0.50 ± 13% +0.2 0.69 ± 16% mpstat.cpu.all.usr% 14478005 ± 2% +32.7% 19217687 ± 4% numa-numastat.node0.local_node 14540770 ± 2% +32.6% 19285137 ± 4% numa-numastat.node0.numa_hit 14722680 +28.8% 18967713 numa-numastat.node1.local_node 14793059 +28.7% 19032805 numa-numastat.node1.numa_hit 918392 -38.4% 565297 ± 18% sched_debug.cpu.avg_idle.avg 356474 ± 5% -92.0% 28413 ± 90% sched_debug.cpu.avg_idle.min 2362 ± 2% +18.8% 2806 ± 4% sched_debug.cpu.nr_switches.avg 1027 +35.5% 1391 ± 6% sched_debug.cpu.nr_switches.min 25263 ± 63% +91.0% 48258 ± 31% numa-vmstat.node0.nr_shmem 14540796 ± 2% +32.5% 19271949 ± 4% numa-vmstat.node0.numa_hit 14478031 ± 2% +32.6% 19204499 ± 4% numa-vmstat.node0.numa_local 14792432 +28.6% 19020203 numa-vmstat.node1.numa_hit 14722053 +28.8% 18955111 numa-vmstat.node1.numa_local 3780 +30.9% 4950 ± 2% stress-ng.fsize.SIGXFSZ_signals_per_sec 643887 +31.0% 843807 ± 2% stress-ng.fsize.ops 10726 +31.1% 14059 ± 2% stress-ng.fsize.ops_per_sec 126167 ± 2% +8.7% 137085 ± 2% stress-ng.time.involuntary_context_switches 21.82 ± 2% +45.1% 31.66 ± 4% stress-ng.time.user_time 5144 ± 15% +704.0% 41366 ± 20% stress-ng.time.voluntary_context_switches 1272 ± 4% -10.8% 1135 ± 2% proc-vmstat.nr_dirty 59459 +8.1% 64288 proc-vmstat.nr_slab_reclaimable 1272 ± 4% -10.8% 1134 ± 2% proc-vmstat.nr_zone_write_pending 29335922 +30.6% 38319823 proc-vmstat.numa_hit 29202778 +30.8% 38187281 proc-vmstat.numa_local 35012787 +31.9% 46166245 ± 2% proc-vmstat.pgalloc_normal 34753289 +31.9% 45830460 ± 2% proc-vmstat.pgfree 120464 +2.3% 123212 proc-vmstat.pgpgout 0.35 ± 3% +0.1 0.41 ± 3% perf-stat.i.branch-miss-rate% 48059547 +21.7% 58484853 perf-stat.i.branch-misses 33.69 -1.8 31.91 perf-stat.i.cache-miss-rate% 1.227e+08 +13.5% 1.392e+08 ± 7% perf-stat.i.cache-misses 3.623e+08 +19.9% 4.342e+08 ± 7% perf-stat.i.cache-references 4958 ± 3% +30.4% 6467 ± 4% perf-stat.i.context-switches 6.10 -5.2% 5.79 ± 4% perf-stat.i.cpi 208.43 +22.0% 254.30 ± 5% perf-stat.i.cpu-migrations 3333 -11.4% 2954 ± 7% perf-stat.i.cycles-between-cache-misses 0.33 +0.1 0.39 ± 2% perf-stat.overall.branch-miss-rate% 33.87 -1.8 32.04 perf-stat.overall.cache-miss-rate% 6.16 -5.3% 5.83 ± 4% perf-stat.overall.cpi 3360 -11.5% 2973 ± 7% perf-stat.overall.cycles-between-cache-misses 0.16 +5.8% 0.17 ± 4% perf-stat.overall.ipc 47200442 +21.7% 57451126 perf-stat.ps.branch-misses 1.206e+08 +13.5% 1.369e+08 ± 7% perf-stat.ps.cache-misses 3.563e+08 +19.9% 4.271e+08 ± 7% perf-stat.ps.cache-references 4873 ± 3% +30.3% 6351 ± 4% perf-stat.ps.context-switches 204.75 +22.0% 249.75 ± 5% perf-stat.ps.cpu-migrations 6.583e+10 +5.7% 6.955e+10 ± 4% perf-stat.ps.instructions 4.046e+12 +5.5% 4.267e+12 ± 4% perf-stat.total.instructions 0.15 ± 24% +97.6% 0.31 ± 21% perf-sched.sch_delay.avg.ms.__cond_resched.ext4_free_blocks.ext4_remove_blocks.ext4_ext_rm_leaf.ext4_ext_remove_space 0.69 ± 34% -45.3% 0.38 ± 24% perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.poll_schedule_timeout.constprop.0.do_poll 0.04 ± 2% -11.0% 0.03 ± 7% perf-sched.sch_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.09 ± 18% +104.1% 0.19 ± 38% perf-sched.sch_delay.avg.ms.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.[unknown] 0.32 ± 59% +284.8% 1.24 ± 71% perf-sched.sch_delay.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_noprof.__filemap_get_folio 16.34 ± 81% -81.7% 2.99 ± 34% perf-sched.sch_delay.max.ms.__cond_resched.__ext4_handle_dirty_metadata.ext4_mb_mark_context.ext4_mb_mark_diskspace_used.ext4_mb_new_blocks 3.51 ± 11% +56.2% 5.48 ± 38% perf-sched.sch_delay.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_dirty_inode.__mark_inode_dirty.ext4_setattr 0.06 ±223% +1443.8% 0.86 ± 97% perf-sched.sch_delay.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_dirty_inode.__mark_inode_dirty.generic_update_time 0.47 ± 33% +337.5% 2.05 ± 67% perf-sched.sch_delay.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_ext_insert_extent.ext4_ext_map_blocks.ext4_map_create_blocks 0.47 ± 64% +417.9% 2.43 ± 53% perf-sched.sch_delay.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_truncate.ext4_setattr.notify_change 7.30 ± 60% -53.7% 3.38 ± 22% perf-sched.sch_delay.max.ms.__cond_resched.__find_get_block_slow.find_get_block_common.bdev_getblk.ext4_read_block_bitmap_nowait 2.72 ± 34% +59.5% 4.33 ± 20% perf-sched.sch_delay.max.ms.__cond_resched.down_read.ext4_map_blocks.ext4_alloc_file_blocks.isra 0.08 ±138% +382.6% 0.37 ± 24% perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_truncate.do_ftruncate.do_sys_ftruncate 1.33 ± 90% +122.5% 2.96 ± 34% perf-sched.sch_delay.max.ms.__cond_resched.down_write.ext4_alloc_file_blocks.isra.0 3.04 +93.7% 5.89 ± 82% perf-sched.sch_delay.max.ms.__cond_resched.down_write.ext4_setattr.notify_change.do_truncate 3.66 ± 19% +52.6% 5.59 ± 31% perf-sched.sch_delay.max.ms.__cond_resched.down_write.ext4_truncate.ext4_setattr.notify_change 0.41 ± 26% +169.4% 1.11 ± 78% perf-sched.sch_delay.max.ms.__cond_resched.ext4_free_blocks.ext4_remove_blocks.ext4_ext_rm_leaf.ext4_ext_remove_space 6.93 ± 82% -65.5% 2.39 ± 49% perf-sched.sch_delay.max.ms.__cond_resched.ext4_mb_regular_allocator.ext4_mb_new_blocks.ext4_ext_map_blocks.ext4_map_create_blocks 0.23 ± 68% +357.9% 1.04 ± 82% perf-sched.sch_delay.max.ms.__cond_resched.kmem_cache_alloc_noprof.ext4_mb_clear_bb.ext4_remove_blocks.ext4_ext_rm_leaf 0.26 ± 39% +205.8% 0.78 ± 73% perf-sched.sch_delay.max.ms.__cond_resched.mutex_lock.ext4_mb_initialize_context.ext4_mb_new_blocks.ext4_ext_map_blocks 0.11 ± 93% +1390.4% 1.60 ± 62% perf-sched.sch_delay.max.ms.io_schedule.bit_wait_io.__wait_on_bit_lock.out_of_line_wait_on_bit_lock 0.30 ± 74% +2467.2% 7.58 ± 60% perf-sched.sch_delay.max.ms.io_schedule.folio_wait_bit_common.__find_get_block_slow.find_get_block_common 2.66 ± 18% +29.4% 3.44 ± 7% perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 2.64 ± 21% +197.3% 7.84 ± 53% perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 87.11 ± 2% -15.3% 73.79 ± 4% perf-sched.total_wait_and_delay.average.ms 21561 ± 2% +18.5% 25553 ± 4% perf-sched.total_wait_and_delay.count.ms 86.95 ± 2% -15.4% 73.60 ± 4% perf-sched.total_wait_time.average.ms 0.76 ± 54% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.bdev_getblk.ext4_read_block_bitmap_nowait.ext4_read_block_bitmap.ext4_mb_mark_context 0.61 ± 47% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.ext4_mb_regular_allocator.ext4_mb_new_blocks.ext4_ext_map_blocks.ext4_map_create_blocks 168.47 ± 2% -10.4% 150.98 ± 4% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 125.33 ± 10% +72.2% 215.83 ± 8% perf-sched.wait_and_delay.count.__cond_resched.__ext4_handle_dirty_metadata.ext4_do_update_inode.isra.0 781.33 ± 3% -74.6% 198.83 ± 15% perf-sched.wait_and_delay.count.__cond_resched.__ext4_handle_dirty_metadata.ext4_mb_mark_context.ext4_mb_mark_diskspace_used.ext4_mb_new_blocks 278.67 ± 13% +310.9% 1145 ± 20% perf-sched.wait_and_delay.count.__cond_resched.__ext4_mark_inode_dirty.ext4_dirty_inode.__mark_inode_dirty.ext4_setattr 1116 ± 3% -81.5% 206.33 ± 13% perf-sched.wait_and_delay.count.__cond_resched.__find_get_block_slow.find_get_block_common.bdev_getblk.ext4_read_block_bitmap_nowait 166.33 ± 8% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.bdev_getblk.ext4_read_block_bitmap_nowait.ext4_read_block_bitmap.ext4_mb_mark_context 115.50 ± 46% +298.7% 460.50 ± 16% perf-sched.wait_and_delay.count.__cond_resched.down_read.ext4_map_blocks.ext4_alloc_file_blocks.isra 138.33 ± 16% +290.7% 540.50 ± 18% perf-sched.wait_and_delay.count.__cond_resched.down_write.ext4_setattr.notify_change.do_truncate 310.17 ± 14% +263.9% 1128 ± 21% perf-sched.wait_and_delay.count.__cond_resched.down_write.ext4_truncate.ext4_setattr.notify_change 1274 ± 2% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.ext4_mb_regular_allocator.ext4_mb_new_blocks.ext4_ext_map_blocks.ext4_map_create_blocks 7148 ± 2% +11.9% 7998 ± 4% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 32.82 ± 80% -81.8% 5.99 ± 34% perf-sched.wait_and_delay.max.ms.__cond_resched.__ext4_handle_dirty_metadata.ext4_mb_mark_context.ext4_mb_mark_diskspace_used.ext4_mb_new_blocks 12.06 ± 22% +168.4% 32.36 ± 47% perf-sched.wait_and_delay.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_dirty_inode.__mark_inode_dirty.ext4_setattr 20.55 ± 82% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.bdev_getblk.ext4_read_block_bitmap_nowait.ext4_read_block_bitmap.ext4_mb_mark_context 27.66 ± 20% +78.9% 49.49 ± 60% perf-sched.wait_and_delay.max.ms.__cond_resched.ext4_journal_check_start.__ext4_journal_start_sb.ext4_dirty_inode.__mark_inode_dirty 16.75 ± 64% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.ext4_mb_regular_allocator.ext4_mb_new_blocks.ext4_ext_map_blocks.ext4_map_create_blocks 0.19 ± 29% +191.5% 0.55 ± 29% perf-sched.wait_time.avg.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_truncate.ext4_setattr.notify_change 0.15 ± 24% +98.1% 0.31 ± 21% perf-sched.wait_time.avg.ms.__cond_resched.ext4_free_blocks.ext4_remove_blocks.ext4_ext_rm_leaf.ext4_ext_remove_space 168.44 ± 2% -10.4% 150.94 ± 4% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.36 ± 40% +392.9% 1.78 ± 71% perf-sched.wait_time.max.ms.__cond_resched.__alloc_frozen_pages_noprof.alloc_pages_mpol.folio_alloc_noprof.__filemap_get_folio 17.42 ± 70% -82.4% 3.07 ± 34% perf-sched.wait_time.max.ms.__cond_resched.__ext4_handle_dirty_metadata.ext4_mb_mark_context.ext4_mb_mark_diskspace_used.ext4_mb_new_blocks 11.49 ± 26% +180.6% 32.23 ± 48% perf-sched.wait_time.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_dirty_inode.__mark_inode_dirty.ext4_setattr 0.06 ±223% +1443.8% 0.86 ± 97% perf-sched.wait_time.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_dirty_inode.__mark_inode_dirty.generic_update_time 0.47 ± 33% +411.8% 2.40 ± 56% perf-sched.wait_time.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_ext_insert_extent.ext4_ext_map_blocks.ext4_map_create_blocks 0.64 ±161% +244.6% 2.20 ± 61% perf-sched.wait_time.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_setattr.notify_change.do_truncate 0.47 ± 64% +968.9% 5.01 ± 83% perf-sched.wait_time.max.ms.__cond_resched.__ext4_mark_inode_dirty.ext4_truncate.ext4_setattr.notify_change 0.08 ±138% +382.6% 0.37 ± 24% perf-sched.wait_time.max.ms.__cond_resched.down_write.do_truncate.do_ftruncate.do_sys_ftruncate 0.41 ± 26% +169.4% 1.11 ± 78% perf-sched.wait_time.max.ms.__cond_resched.ext4_free_blocks.ext4_remove_blocks.ext4_ext_rm_leaf.ext4_ext_remove_space 17.67 ± 25% +110.8% 37.26 ± 35% perf-sched.wait_time.max.ms.__cond_resched.ext4_journal_check_start.__ext4_journal_start_sb.ext4_dirty_inode.__mark_inode_dirty 2.23 ± 51% +360.3% 10.28 ± 71% perf-sched.wait_time.max.ms.__cond_resched.ext4_journal_check_start.__ext4_journal_start_sb.ext4_ext_remove_space.ext4_ext_truncate 84.33 ± 14% -46.9% 44.77 ± 72% perf-sched.wait_time.max.ms.__cond_resched.ext4_mb_load_buddy_gfp.ext4_process_freed_data.ext4_journal_commit_callback.jbd2_journal_commit_transaction 0.23 ± 68% +357.9% 1.04 ± 82% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc_noprof.ext4_mb_clear_bb.ext4_remove_blocks.ext4_ext_rm_leaf 0.26 ± 39% +205.8% 0.78 ± 73% perf-sched.wait_time.max.ms.__cond_resched.mutex_lock.ext4_mb_initialize_context.ext4_mb_new_blocks.ext4_ext_map_blocks 276.82 ± 13% -22.2% 215.50 ± 13% perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.30 ± 74% +9637.4% 28.76 ± 48% perf-sched.wait_time.max.ms.io_schedule.folio_wait_bit_common.__find_get_block_slow.find_get_block_common 1.44 ± 79% +11858.3% 172.80 ±219% perf-sched.wait_time.max.ms.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown] Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki