On Mon 23-06-25 15:32:51, Baokun Li wrote: > After we optimized the block group lock, we found another lock > contention issue when running will-it-scale/fallocate2 with multiple > processes. The fallocate's block allocation and the truncate's block > release were fighting over the s_md_lock. The problem is, this lock > protects totally different things in those two processes: the list of > freed data blocks (s_freed_data_list) when releasing, and where to start > looking for new blocks (mb_last_group) when allocating. > > Now we only need to track s_mb_last_group and no longer need to track > s_mb_last_start, so we don't need the s_md_lock lock to ensure that the > two are consistent, and we can ensure that the s_mb_last_group read is up > to date by using smp_store_release/smp_load_acquire. > > Besides, the s_mb_last_group data type only requires ext4_group_t > (i.e., unsigned int), rendering unsigned long superfluous. > > Performance test data follows: > > Test: Running will-it-scale/fallocate2 on CPU-bound containers. > Observation: Average fallocate operations per container per second. > > | Kunpeng 920 / 512GB -P80| AMD 9654 / 1536GB -P96 | > Disk: 960GB SSD |-------------------------|-------------------------| > | base | patched | base | patched | > -------------------|-------|-----------------|-------|-----------------| > mb_optimize_scan=0 | 4821 | 7612 (+57.8%) | 15371 | 21647 (+40.8%) | > mb_optimize_scan=1 | 4784 | 7568 (+58.1%) | 6101 | 9117 (+49.4%) | > > Signed-off-by: Baokun Li <libaokun1@xxxxxxxxxx> ... > diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c > index 5cdae3bda072..3f103919868b 100644 > --- a/fs/ext4/mballoc.c > +++ b/fs/ext4/mballoc.c > @@ -2168,11 +2168,9 @@ static void ext4_mb_use_best_found(struct ext4_allocation_context *ac, > ac->ac_buddy_folio = e4b->bd_buddy_folio; > folio_get(ac->ac_buddy_folio); > /* store last allocated for subsequent stream allocation */ > - if (ac->ac_flags & EXT4_MB_STREAM_ALLOC) { > - spin_lock(&sbi->s_md_lock); > - sbi->s_mb_last_group = ac->ac_f_ex.fe_group; > - spin_unlock(&sbi->s_md_lock); > - } > + if (ac->ac_flags & EXT4_MB_STREAM_ALLOC) > + /* pairs with smp_load_acquire in ext4_mb_regular_allocator() */ > + smp_store_release(&sbi->s_mb_last_group, ac->ac_f_ex.fe_group); Do you really need any kind of barrier (implied by smp_store_release()) here? I mean the store to s_mb_last_group is perfectly fine to be reordered with other accesses from the thread, isn't it? As such it should be enough to have WRITE_ONCE() here... > /* > * As we've just preallocated more space than > * user requested originally, we store allocated > @@ -2844,12 +2842,9 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac) > } > > /* if stream allocation is enabled, use global goal */ > - if (ac->ac_flags & EXT4_MB_STREAM_ALLOC) { > - /* TBD: may be hot point */ > - spin_lock(&sbi->s_md_lock); > - ac->ac_g_ex.fe_group = sbi->s_mb_last_group; > - spin_unlock(&sbi->s_md_lock); > - } > + if (ac->ac_flags & EXT4_MB_STREAM_ALLOC) > + /* pairs with smp_store_release in ext4_mb_use_best_found() */ > + ac->ac_g_ex.fe_group = smp_load_acquire(&sbi->s_mb_last_group); ... and READ_ONCE() here. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR