Re: [syzbot] [mm?] [fs?] BUG: sleeping function called from invalid context in folio_mc_copy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Mar 29, 2025 at 10:05:34PM -0400, Rik van Riel wrote:
> On Thu, 2025-03-27 at 14:42 -0700, Luis Chamberlain wrote:
> > On Thu, Mar 27, 2025 at 09:26:41AM -0700, syzbot wrote:
> > > Hello,
> > 
> > Thanks, this is a known issue and we're having a hard time
> > reproducing [0].
> > 
> > > C reproducer:  
> > > https://syzkaller.appspot.com/x/repro.c?x=152d4de4580000
> > 
> > Thanks! Sadly this has not yet been able to let me reprodouce the
> > issue,
> > and so we're trying to come up with other ways to test the imminent
> > spin
> > lock + sleep on buffer_migrate_folio_norefs() path different ways
> > now,
> > including a new fstests [1] but no luck yet.
> 
> The backtrace in the report seems to make the cause
> of the bug fairly clear, though.
> 
> The function folio_mc_copy() can sleep.
> 
> The function __buffer_migrate_folio() calls
> filemap_migrate_folio() with a spinlock held.
> 
> That function eventually calls folio_mc_copy():
> 
>  __might_resched+0x5d4/0x780 kernel/sched/core.c:8764
>  folio_mc_copy+0x13c/0x1d0 mm/util.c:742
>  __migrate_folio mm/migrate.c:758 [inline]
>  filemap_migrate_folio+0xb4/0x4c0 mm/migrate.c:943
>  __buffer_migrate_folio+0x3ec/0x5d0 mm/migrate.c:874
>  move_to_new_folio+0x2ac/0xc20 mm/migrate.c:1050
>  migrate_folio_move mm/migrate.c:1358 [inline]
>  migrate_folios_move mm/migrate.c:1710 [inline]
> 
> The big question is how to safely release the
> spinlock in __buffer_migrate_folio() before calling
> filemap_migrate_folio()

I suggested a way in the other 0-day reported bug report as that was
the thread that started this investigation [0]. That has survived
20 hours of ext4 with generic/750, and the newly proposed generic/764 [1]
while also using a block device with large folios and runnding dd
against it in a loop.

And so now I'm going to establish an ext4 baseline with kdevops on all
ext4 profiles on linux-next, and then check to see if there are any
regressions with it.

I've localized the new check for only those that need it too. 

[0] https://lkml.kernel.org/r/Z-dHqMtGneCVs3v5@xxxxxxxxxxxxxxxxxxxxxx>
[1] https://lkml.kernel.org/r/20250326185101.2237319-1-mcgrof@xxxxxxxxxx

Anwyay, below is the latest changes:


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux