On Tue, Apr 15, 2025 at 06:23:54PM +0200, Jan Kara wrote: > On Tue 15-04-25 08:47:51, Luis Chamberlain wrote: > > On Tue, Apr 15, 2025 at 11:05:38AM +0200, Christian Brauner wrote: > > > On Mon, Apr 14, 2025 at 03:19:33PM -0700, Luis Chamberlain wrote: > > > > On Mon, Apr 14, 2025 at 02:09:46PM -0700, Luis Chamberlain wrote: > > > > > On Thu, Apr 10, 2025 at 02:05:38PM +0200, Jan Kara wrote: > > > > > > > @@ -859,12 +862,12 @@ static int __buffer_migrate_folio(struct address_space *mapping, > > > > > > > } > > > > > > > bh = bh->b_this_page; > > > > > > > } while (bh != head); > > > > > > > + spin_unlock(&mapping->i_private_lock); > > > > > > > > > > > > No, you've just broken all simple filesystems (like ext2) with this patch. > > > > > > You can reduce the spinlock critical section only after providing > > > > > > alternative way to protect them from migration. So this should probably > > > > > > happen at the end of the series. > > > > > > > > > > So you're OK with this spin lock move with the other series in place? > > > > > > > > > > And so we punt the hard-to-reproduce corruption issue as future work > > > > > to do? Becuase the other alternative for now is to just disable > > > > > migration for jbd2: > > > > > > > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > > > > > index 1dc09ed5d403..ef1c3ef68877 100644 > > > > > --- a/fs/ext4/inode.c > > > > > +++ b/fs/ext4/inode.c > > > > > @@ -3631,7 +3631,6 @@ static const struct address_space_operations ext4_journalled_aops = { > > > > > .bmap = ext4_bmap, > > > > > .invalidate_folio = ext4_journalled_invalidate_folio, > > > > > .release_folio = ext4_release_folio, > > > > > - .migrate_folio = buffer_migrate_folio_norefs, > > > > > .is_partially_uptodate = block_is_partially_uptodate, > > > > > .error_remove_folio = generic_error_remove_folio, > > > > > .swap_activate = ext4_iomap_swap_activate, > > > > > > > > BTW I ask because.. are your expectations that the next v3 series also > > > > be a target for Linus tree as part of a fix for this spinlock > > > > replacement? > > > > > > Since this is fixing potential filesystem corruption I will upstream > > > whatever we need to do to fix this. Ideally we have a minimal fix to > > > upstream now and a comprehensive fix and cleanup for v6.16. > > > > Despite our efforts we don't yet have an agreement on how to fix the > > ext4 corruption, becuase Jan noted the buffer_meta() check in this patch > > is too broad and would affect other filesystems (I have yet to > > understand how, but will review). > > > > And so while we have agreement we can remove the spin lock to fix the > > sleeping while atomic incurred by large folios for buffer heads by this > > patch series, the removal of the spin lock would happen at the end of > > this series. > > > > And so the ext4 corruption is an existing issue as-is today, its > > separate from the spin lock removal goal to fix the sleeping while > > atomic.. > > I agree. Ext4 corruption problems are separate from sleeping in atomic > issues. Glad that's clear. > > However this series might be quite big for an rc2 or rc3 fix for that spin > > lock removal issue. It should bring in substantial performance benefits > > though, so it might be worthy to consider. We can re-run tests with the > > adjustment to remove the spin lock until the last patch in this series. > > > > The alternative is to revert the spin lock addition commit for Linus' > > tree, ie commit ebdf4de5642fb6 ("mm: migrate: fix reference check race > > between __find_get_block() and migration") and note that it in fact does > > not fix the ext4 corruption as we've noted, and in fact causes an issue > > with sleeping while atomic with support for large folios for buffer > > heads. If we do that then we punt this series for the next development > > window, and it would just not have the spin lock removal on the last > > patch. > > Well, the commit ebdf4de5642fb6 is 6 years old. At that time there were no > large folios (in fact there were no folios at all ;)) in the page cache and > it does work quite well (I didn't see a corruption report from real users > since then). It is still a work around. > So I don't like removing that commit because it makes a > "reproducible with a heavy stress test" problem become a "reproduced by > real world workloads" problem. So how about just patch 2 and 8 in this series, with the spin lock removal happening on the last patch for Linus tree? Luis