On 5/25/25 8:06 PM, Al Viro wrote: > On Sun, May 25, 2025 at 09:32:09AM +0100, Al Viro wrote: > >> Breakage is still present in the current mainline ;-/ > > With CONFIG_DEBUG_VM on top of pagealloc debugging: > > [ 1434.992817] run fstests generic/127 at 2025-05-25 11:46:11g > [ 1448.956242] BUG: Bad page state in process kworker/2:1 pfn:112cb0g > [ 1448.956846] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x3e pfn:0x112cb0g > [ 1448.957453] flags: 0x800000000000000e(referenced|uptodate|writeback|zone=2)g It doesn't like the writeback flag. > [ 1448.957863] raw: 800000000000000e dead000000000100 dead000000000122 0000000000000000g > [ 1448.958303] raw: 000000000000003e 0000000000000000 00000000ffffffff 0000000000000000g > [ 1448.958833] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) setg > [ 1448.959320] Modules linked in: xfs autofs4 fuse nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc loop ecryptfs 9pnet_virtio 9pnet netfs evdev pcspkr sg button ext4 jbd2 btrfs blake2b_generic xor zlib_deflate raid6_pq zstd_compress sr_mod cdrom ata_generic ata_piix psmouse serio_raw i2c_piix4 i2c_smbus libata e1000g > [ 1448.960874] CPU: 2 UID: 0 PID: 2614 Comm: kworker/2:1 Not tainted 6.14.0-rc1+ #78g > [ 1448.960878] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014g > [ 1448.960879] Workqueue: xfs-conv/sdb1 xfs_end_io [xfs]g > [ 1448.960938] Call Trace:g > [ 1448.960939] <TASK>g > [ 1448.960940] dump_stack_lvl+0x4f/0x60g > [ 1448.960953] bad_page+0x6f/0x100g > [ 1448.960957] free_frozen_pages+0x471/0x640g > [ 1448.960958] iomap_finish_ioend+0x196/0x3c0g > [ 1448.960963] iomap_finish_ioends+0x83/0xc0g > [ 1448.960964] xfs_end_ioend+0x64/0x140 [xfs]g > [ 1448.961003] xfs_end_io+0x93/0xc0 [xfs]g > [ 1448.961036] process_one_work+0x153/0x390g > [ 1448.961044] worker_thread+0x2ab/0x3b0g > [ 1448.961045] ? rescuer_thread+0x470/0x470g > [ 1448.961047] kthread+0xf7/0x200g > [ 1448.961048] ? kthread_use_mm+0xa0/0xa0g > [ 1448.961049] ret_from_fork+0x2d/0x50g > [ 1448.961053] ? kthread_use_mm+0xa0/0xa0g > [ 1448.961054] ret_from_fork_asm+0x11/0x20g > [ 1448.961058] </TASK>g > [ 1448.961155] Disabling lock debugging due to kernel taintg > [ 1448.969569] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x3e pfn:0x112cb0g same pfn, same struct page > [ 1448.970023] flags: 0x800000000000000e(referenced|uptodate|writeback|zone=2)g > [ 1448.970651] raw: 800000000000000e dead000000000100 dead000000000122 0000000000000000g > [ 1448.971222] raw: 000000000000003e 0000000000000000 00000000ffffffff 0000000000000000g > [ 1448.971812] page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127u))g > [ 1448.972490] ------------[ cut here ]------------g > [ 1448.972841] kernel BUG at ./include/linux/mm.h:1455!g this is folio_get() noticing refcount is 0, so a use-after free, because we already tried to free the page above. I'm not familiar with this code too much, but I suspect problem was introduced by commit fb7d3bc414939 ("mm/filemap: drop streaming/uncached pages when writeback completes") and only (more) exposed here. so in folio_end_writeback() we have if (__folio_end_writeback(folio)) folio_wake_bit(folio, PG_writeback); but calling the folio_end_dropbehind_write() doesn't depend on the result of __folio_end_writeback() this seems rather suspicious I think if __folio_end_writeback() was true then PG_writeback would be cleared and thus we'd not see the PAGE_FLAGS_CHECK_AT_FREE failure. Instead we do a premature folio_end_dropbehind_write() dropping a page ref and then the final folio_put() in folio_end_writeback() frees the page and splats on the PG_writeback. Then the folio is processed again in the following iteration of iomap_finish_ioend() and splats on the refcount-already-zero. So I think folio_end_dropbehind_write() should only be done when __folio_end_writeback() was true. Most likely even the folio_test_clear_dropbehind() should be tied to that, or we clear it too early and then never act upon it later?