[PATCH 0/3] mm: move migration work around to buffer-heads

Luis Chamberlain <mcgrof@xxxxxxxxxx> · Sat, 29 Mar 2025 23:47:29 -0700

We have an eye-sore of a spin lock held during page migration which
was added for a ext4 jbd corruption fix for which we have no clear
public corruption data. We want to remove the spin lock on mm/migrate
so to help buffer-head filesystems embrace large folios, since we
can cond_resched() on large folios on folio_mc_copy(). I've managed
to reproduce a corruption by just removing the spinlock and stressing
ext4 with generic/750, a corruption happens after 3 hours.

The spin lock was added to help ext4 jbd and other users of
buffer_migrate_folio_norefs(), so the block device cache and nilfs2.
This does the work to move the heuristic needed to avoid page migration
to back to the buffere-head code on __find_get_block_slow() and only
to users of buffer_migrate_folio_norefs(). I have ran generic/750 over
20 hours and don't see the corruption issue.

I've also ran this patchset against all the following ext4 profiles on
all fstests tests and have found no regression, I've published the
baseline based on linux-next tag next-20250328 onto kdevops [0]. For
further sanity I've also tested this patchset against blktests as well
and found no regressions.

ext4-defaults
ext4-1k
ext4-2k
ext4-4k
ext4-bigalloc16k-4k
ext4-bigalloc32k-4k
ext4-bigalloc64k-4k
ext4-bigalloc1024k-4k
ext4-bigalloc2048k-4k
ext4-advanced-features

[0] https://github.com/linux-kdevops/kdevops/commit/3ecd638e67b14162b76b733a120e6e1b55698cc9

Luis Chamberlain (3):
  mm/migrate: add might_sleep() on __migrate_folio()
  fs/buffer: avoid races with folio migrations on
    __find_get_block_slow()
  mm/migrate: avoid atomic context on buffer_migrate_folio_norefs()
    migration

 fs/buffer.c  | 9 +++++++++
 mm/migrate.c | 6 +++---
 2 files changed, 12 insertions(+), 3 deletions(-)

-- 
2.47.2