On Mon 14-07-25 21:03:25, Baokun Li wrote: > While traversing the list, holding a spin_lock prevents load_buddy, making > direct use of ext4_try_lock_group impossible. This can lead to a bouncing > scenario where spin_is_locked(grp_A) succeeds, but ext4_try_lock_group() > fails, forcing the list traversal to repeatedly restart from grp_A. > > In contrast, linear traversal directly uses ext4_try_lock_group(), > avoiding this bouncing. Therefore, we need a lockless, ordered traversal > to achieve linear-like efficiency. > > Therefore, this commit converts both average fragment size lists and > largest free order lists into ordered xarrays. > > In an xarray, the index represents the block group number and the value > holds the block group information; a non-empty value indicates the block > group's presence. > > While insertion and deletion complexity remain O(1), lookup complexity > changes from O(1) to O(nlogn), which may slightly reduce single-threaded > performance. > > Additionally, xarray insertions might fail, potentially due to memory > allocation issues. However, since we have linear traversal as a fallback, > this isn't a major problem. Therefore, we've only added a warning message > for insertion failures here. > > A helper function ext4_mb_find_good_group_xarray() is added to find good > groups in the specified xarray starting at the specified position start, > and when it reaches ngroups-1, it wraps around to 0 and then to start-1. > This ensures an ordered traversal within the xarray. > > Performance test results are as follows: Single-process operations > on an empty disk show negligible impact, while multi-process workloads > demonstrate a noticeable performance gain. > > |CPU: Kunpeng 920 | P80 | P1 | > |Memory: 512GB |------------------------|-------------------------| > |960GB SSD (0.5GB/s)| base | patched | base | patched | > |-------------------|-------|----------------|--------|----------------| > |mb_optimize_scan=0 | 20097 | 19555 (-2.6%) | 316141 | 315636 (-0.2%) | > |mb_optimize_scan=1 | 13318 | 15496 (+16.3%) | 325273 | 323569 (-0.5%) | > > |CPU: AMD 9654 * 2 | P96 | P1 | > |Memory: 1536GB |------------------------|-------------------------| > |960GB SSD (1GB/s) | base | patched | base | patched | > |-------------------|-------|----------------|--------|----------------| > |mb_optimize_scan=0 | 53603 | 53192 (-0.7%) | 214243 | 212678 (-0.7%) | > |mb_optimize_scan=1 | 20887 | 37636 (+80.1%) | 213632 | 214189 (+0.2%) | > > Signed-off-by: Baokun Li <libaokun1@xxxxxxxxxx> The patch looks good and the results are nice. I've just noticed two typos: > +static inline void ext4_mb_avg_fragment_size_destory(struct ext4_sb_info *sbi) ^^^ destroy > +{ > + for (int i = 0; i < MB_NUM_ORDERS(sbi->s_sb); i++) > + xa_destroy(&sbi->s_mb_avg_fragment_size[i]); > + kfree(sbi->s_mb_avg_fragment_size); > +} > + > +static inline void ext4_mb_largest_free_orders_destory(struct ext4_sb_info *sbi) ^^^ destroy > +{ > + for (int i = 0; i < MB_NUM_ORDERS(sbi->s_sb); i++) > + xa_destroy(&sbi->s_mb_largest_free_orders[i]); > + kfree(sbi->s_mb_largest_free_orders); > +} Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR