On Sat, Mar 29, 2025 at 10:05:34PM -0400, Rik van Riel wrote: > On Thu, 2025-03-27 at 14:42 -0700, Luis Chamberlain wrote: > > On Thu, Mar 27, 2025 at 09:26:41AM -0700, syzbot wrote: > > > Hello, > > > > Thanks, this is a known issue and we're having a hard time > > reproducing [0]. > > > > > C reproducer: > > > https://syzkaller.appspot.com/x/repro.c?x=152d4de4580000 > > > > Thanks! Sadly this has not yet been able to let me reprodouce the > > issue, > > and so we're trying to come up with other ways to test the imminent > > spin > > lock + sleep on buffer_migrate_folio_norefs() path different ways > > now, > > including a new fstests [1] but no luck yet. > > The backtrace in the report seems to make the cause > of the bug fairly clear, though. > > The function folio_mc_copy() can sleep. > > The function __buffer_migrate_folio() calls > filemap_migrate_folio() with a spinlock held. > > That function eventually calls folio_mc_copy(): > > __might_resched+0x5d4/0x780 kernel/sched/core.c:8764 > folio_mc_copy+0x13c/0x1d0 mm/util.c:742 > __migrate_folio mm/migrate.c:758 [inline] > filemap_migrate_folio+0xb4/0x4c0 mm/migrate.c:943 > __buffer_migrate_folio+0x3ec/0x5d0 mm/migrate.c:874 > move_to_new_folio+0x2ac/0xc20 mm/migrate.c:1050 > migrate_folio_move mm/migrate.c:1358 [inline] > migrate_folios_move mm/migrate.c:1710 [inline] > > The big question is how to safely release the > spinlock in __buffer_migrate_folio() before calling > filemap_migrate_folio() I suggested a way in the other 0-day reported bug report as that was the thread that started this investigation [0]. That has survived 20 hours of ext4 with generic/750, and the newly proposed generic/764 [1] while also using a block device with large folios and runnding dd against it in a loop. And so now I'm going to establish an ext4 baseline with kdevops on all ext4 profiles on linux-next, and then check to see if there are any regressions with it. I've localized the new check for only those that need it too. [0] https://lkml.kernel.org/r/Z-dHqMtGneCVs3v5@xxxxxxxxxxxxxxxxxxxxxx> [1] https://lkml.kernel.org/r/20250326185101.2237319-1-mcgrof@xxxxxxxxxx Anwyay, below is the latest changes: