Re: [PATCH 1/3] mm/migrate: add might_sleep() on __migrate_folio()

Luis Chamberlain <mcgrof@xxxxxxxxxx> · Sun, 30 Mar 2025 23:28:07 -0700

On Sun, Mar 30, 2025 at 01:04:02PM +0100, Matthew Wilcox wrote:
> On Sat, Mar 29, 2025 at 11:47:30PM -0700, Luis Chamberlain wrote:
> > However tracing shows that folio_mc_copy() *isn't* being called
> > as often as we'd expect from buffer_migrate_folio_norefs() path
> > as we're likely bailing early now thanks to the check added by commit
> > 060913999d7a ("mm: migrate: support poisoned recover from migrate
> > folio").
> 
> Umm.  You're saying that most folios we try to migrate have extra refs?
> That seems unexpected; does it indicate a bug in 060913999d7a?

I've debugged this further, the migration does succeed and I don't see
any failures due to the new refcheck added by 060913999d7a. I've added
stats in a out of tree patch [0] in case folks find this useful, I could
submit this. But the point is that even if you use dd against a large
block device you won't always end up trying to migrate large folios
*right away* even if you trigger folio migration through compaction,
specially if you use a large bs on dd like bs=1M. Using a size matching
more close to the logical block size will trigger large folio migration
much faster.

Example of the stats:

# cat /sys/kernel/debug/mm/migrate/bh/stats

[buffer_migrate_folio]
                    calls       9874
                  success       9854
                    fails       20

[buffer_migrate_folio_norefs]
                    calls       3694
                  success       1651
                    fails       2043
          no-head-success       532
            no-head-fails       0
                  invalid       2040
                    valid       1119
            valid-success       1119
              valid-fails       0

Success ratios:
buffer_migrate_folio: 99% success (9854/9874)
buffer_migrate_folio_norefs: 44% success (1651/3694)

> > +++ b/mm/migrate.c
> > @@ -751,6 +751,8 @@ static int __migrate_folio(struct address_space *mapping, struct folio *dst,
> >  {
> >  	int rc, expected_count = folio_expected_refs(mapping, src);
> >  
> > +	might_sleep();
> 
> We deliberately don't sleep when the folio is only a single page.
> So this needs to be:
> 
> 	might_sleep_if(folio_test_large(folio));

That does reduce the scope of our test coverage but, sure.

[0] https://lore.kernel.org/all/20250331061306.4073352-1-mcgrof@xxxxxxxxxx/

  Luis