On Mon, Sep 08, 2025 at 03:27:47AM -0700, Hugh Dickins wrote: > On Mon, 1 Sep 2025, David Hildenbrand wrote: > > On 01.09.25 09:52, David Hildenbrand wrote: > > > On 01.09.25 03:17, Hugh Dickins wrote: > > >> On Mon, 1 Sep 2025, Matthew Wilcox wrote: > > >>> On Sun, Aug 31, 2025 at 02:01:16AM -0700, Hugh Dickins wrote: > > >>>> 6.16's folio_expected_ref_count() is forgetting the PG_private_2 flag, > > >>>> which (like PG_private, but not in addition to PG_private) counts for > > >>>> 1 more reference: it needs to be using folio_has_private() in place of > > >>>> folio_test_private(). > > >>> > > >>> No, it doesn't. I know it used to, but no filesystem was actually doing > > >>> that. So I changed mm to match how filesystems actually worked. > > I think Matthew may be remembering how he wanted it to behave (? but he > wanted it to go away completely) rather than how it ended up behaving: > we've both found that PG_private_2 always goes with refcount increment. Let me explain that better. No filesystem followed the documented rule that the refcount must be incremented by one if either PG_private or PG_private_2 was set. And no surprise; that's a very complicated rule for filesystems to follow. Many of them weren't even following the rule to increment the refcount by one when PG_private was set. So some were incrementing the refcount by one if PG_private were set, but not bumping the refcount by one if PG_private_2 were set (I think this is how btrfs worked, and you seem to believe the same thing). Others were bumping the refcount by two if both PG_private and PG_private_2 were set (I think this is how netfs works today). > > > Now, one problem would be if migration / splitting / ... code where we > > > use folio_expected_ref_count() cannot deal with that additional > > > reference properly, in which case this patch would indeed cause harm. > > Yes, that appears to be why Matthew said NAK and "dangerously wrong". > > So far as I could tell, there is no problem with nfs, it has, and has > all along had, the appropriate release_folio and migrate_folio methods. > > ceph used to have what's needed, but 6.0's changes from page_has_private() > to folio_test_private() (the change from "has" either bit to "test" just > the one bit really should have been highlighted) broke the migration of > ceph's PG_private_2 folios. > > (I think it may have got re-enabled in intervening releases: David > Howells reinstated folio_has_private() inside fallback_migrate_folio()'s > filemap_release_folio(), which may have been enough to get ceph's > PG_private_2s migratable again; but then 6.15's ceph .migrate_folio = > filemap_migrate_folio will have broken it again.) > > Folio migration does not and never has copied over PG_private_2 from > src to dst; so my 1/7 patch would have permitted migration of a ceph > PG_private_2 src folio to a dst folio left with refcount 1 more than > it should be (plus whatever the consequences of migrating such a > folio which should have waited for the flag to be cleared first). But that's another problem. The current meaning of PG_fscache (and also that has changed over the years!) is that the data in the folio is being written to the fscache. So we _shouldn't_ migrate the folio as some piece of storage hardware is busy reading from the old folio. And if somebody else starts writing to the old folio, we'll have a corrupted fscache. So the current behaviour where we set private_2 and bump the refcount, but don't take the private_2 status into account is the safe one, because the elevated refcount means we'll skip the PG_fscache folio. Maybe it'd be better to wait for it to clear. But since Dave Howells is busy killing it off, I'm just inclined to wait for that to happen. > I'm just going to drop this 1/7, and add a (briefer than this!) > paragraph to 2/7 == 1/6's commit message in v2 later today. Thank you!