Re: [PATCH v3 15/17] ext4: convert free groups order lists to xarrays

Jan Kara <jack@xxxxxxx> · Mon, 21 Jul 2025 13:07:39 +0200

On Mon 14-07-25 21:03:25, Baokun Li wrote:
> While traversing the list, holding a spin_lock prevents load_buddy, making
> direct use of ext4_try_lock_group impossible. This can lead to a bouncing
> scenario where spin_is_locked(grp_A) succeeds, but ext4_try_lock_group()
> fails, forcing the list traversal to repeatedly restart from grp_A.
> 
> In contrast, linear traversal directly uses ext4_try_lock_group(),
> avoiding this bouncing. Therefore, we need a lockless, ordered traversal
> to achieve linear-like efficiency.
> 
> Therefore, this commit converts both average fragment size lists and
> largest free order lists into ordered xarrays.
> 
> In an xarray, the index represents the block group number and the value
> holds the block group information; a non-empty value indicates the block
> group's presence.
> 
> While insertion and deletion complexity remain O(1), lookup complexity
> changes from O(1) to O(nlogn), which may slightly reduce single-threaded
> performance.
> 
> Additionally, xarray insertions might fail, potentially due to memory
> allocation issues. However, since we have linear traversal as a fallback,
> this isn't a major problem. Therefore, we've only added a warning message
> for insertion failures here.
> 
> A helper function ext4_mb_find_good_group_xarray() is added to find good
> groups in the specified xarray starting at the specified position start,
> and when it reaches ngroups-1, it wraps around to 0 and then to start-1.
> This ensures an ordered traversal within the xarray.
> 
> Performance test results are as follows: Single-process operations
> on an empty disk show negligible impact, while multi-process workloads
> demonstrate a noticeable performance gain.
> 
> |CPU: Kunpeng 920   |          P80           |            P1           |
> |Memory: 512GB      |------------------------|-------------------------|
> |960GB SSD (0.5GB/s)| base  |    patched     | base   |    patched     |
> |-------------------|-------|----------------|--------|----------------|
> |mb_optimize_scan=0 | 20097 | 19555 (-2.6%)  | 316141 | 315636 (-0.2%) |
> |mb_optimize_scan=1 | 13318 | 15496 (+16.3%) | 325273 | 323569 (-0.5%) |
> 
> |CPU: AMD 9654 * 2  |          P96           |             P1          |
> |Memory: 1536GB     |------------------------|-------------------------|
> |960GB SSD (1GB/s)  | base  |    patched     | base   |    patched     |
> |-------------------|-------|----------------|--------|----------------|
> |mb_optimize_scan=0 | 53603 | 53192 (-0.7%)  | 214243 | 212678 (-0.7%) |
> |mb_optimize_scan=1 | 20887 | 37636 (+80.1%) | 213632 | 214189 (+0.2%) |
> 
> Signed-off-by: Baokun Li <libaokun1@xxxxxxxxxx>

The patch looks good and the results are nice. I've just noticed two typos:

> +static inline void ext4_mb_avg_fragment_size_destory(struct ext4_sb_info *sbi)
						^^^ destroy

> +{
> +	for (int i = 0; i < MB_NUM_ORDERS(sbi->s_sb); i++)
> +		xa_destroy(&sbi->s_mb_avg_fragment_size[i]);
> +	kfree(sbi->s_mb_avg_fragment_size);
> +}
> +
> +static inline void ext4_mb_largest_free_orders_destory(struct ext4_sb_info *sbi)
						  ^^^ destroy

> +{
> +	for (int i = 0; i < MB_NUM_ORDERS(sbi->s_sb); i++)
> +		xa_destroy(&sbi->s_mb_largest_free_orders[i]);
> +	kfree(sbi->s_mb_largest_free_orders);
> +}

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR