On Mon, Jul 14, 2025 at 09:03:11PM +0800, Baokun Li wrote: > When ext4 allocates blocks, we used to just go through the block groups > one by one to find a good one. But when there are tons of block groups > (like hundreds of thousands or even millions) and not many have free space > (meaning they're mostly full), it takes a really long time to check them > all, and performance gets bad. So, we added the "mb_optimize_scan" mount > option (which is on by default now). It keeps track of some group lists, > so when we need a free block, we can just grab a likely group from the > right list. This saves time and makes block allocation much faster. > > But when multiple processes or containers are doing similar things, like > constantly allocating 8k blocks, they all try to use the same block group > in the same list. Even just two processes doing this can cut the IOPS in > half. For example, one container might do 300,000 IOPS, but if you run two > at the same time, the total is only 150,000. > > Since we can already look at block groups in a non-linear way, the first > and last groups in the same list are basically the same for finding a block > right now. Therefore, add an ext4_try_lock_group() helper function to skip > the current group when it is locked by another process, thereby avoiding > contention with other processes. This helps ext4 make better use of having > multiple block groups. > > Also, to make sure we don't skip all the groups that have free space > when allocating blocks, we won't try to skip busy groups anymore when > ac_criteria is CR_ANY_FREE. > > Performance test data follows: > > Test: Running will-it-scale/fallocate2 on CPU-bound containers. > Observation: Average fallocate operations per container per second. > > |CPU: Kunpeng 920 | P80 | > |Memory: 512GB |-------------------------| > |960GB SSD (0.5GB/s)| base | patched | > |-------------------|-------|-----------------| > |mb_optimize_scan=0 | 2667 | 4821 (+80.7%) | > |mb_optimize_scan=1 | 2643 | 4784 (+81.0%) | > > |CPU: AMD 9654 * 2 | P96 | > |Memory: 1536GB |-------------------------| > |960GB SSD (1GB/s) | base | patched | > |-------------------|-------|-----------------| > |mb_optimize_scan=0 | 3450 | 15371 (+345%) | > |mb_optimize_scan=1 | 3209 | 6101 (+90.0%) | > > Signed-off-by: Baokun Li <libaokun1@xxxxxxxxxx> > Reviewed-by: Jan Kara <jack@xxxxxxx> Hey Baokun, I reviewed some of the patches in v2 but i think that was very last moment so I'll add the comments in this series, dont mind the copy paste :) The patch itself looks good, thanks for the changes. Feel free to add: Reviewed-by: Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx> Regards, ojaswin