On Wed, May 28, 2025 at 06:56:37PM -0700, Darrick J. Wong wrote: > On Sun, May 25, 2025 at 09:32:09AM +0100, Al Viro wrote: > > generic/127 with xfstests built on debian-testing (trixie) ends up with > > assorted memory corruption; trace below is with CONFIG_DEBUG_PAGEALLOC and > > CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT and it looks like a double free > > somewhere in iomap. Unfortunately, commit in question is just making > > xfs use the infrastructure built in earlier series - not that useful > > for isolating the breakage. > > > > [ 22.001529] run fstests generic/127 at 2025-05-25 04:13:23 > > [ 35.498573] BUG: Bad page state in process kworker/2:1 pfn:112ce9 > > [ 35.499260] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x3e 9 > > [ 35.499764] flags: 0x800000000000000e(referenced|uptodate|writeback|zone=2) > > [ 35.500302] raw: 800000000000000e dead000000000100 dead000000000122 000000000 > > [ 35.500786] raw: 000000000000003e 0000000000000000 00000000ffffffff 000000000 > > [ 35.501248] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set > > [ 35.501624] Modules linked in: xfs autofs4 fuse nfsd auth_rpcgss nfs_acl nfs0 > > [ 35.503209] CPU: 2 UID: 0 PID: 85 Comm: kworker/2:1 Not tainted 6.14.0-rc1+ 7 > > [ 35.503211] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.164 > > [ 35.503212] Workqueue: xfs-conv/sdb1 xfs_end_io [xfs] > > [ 35.503279] Call Trace: > > [ 35.503281] <TASK> > > [ 35.503282] dump_stack_lvl+0x4f/0x60 > > [ 35.503296] bad_page+0x6f/0x100 > > [ 35.503300] free_frozen_pages+0x303/0x550 > > [ 35.503301] iomap_finish_ioend+0xf6/0x380 > > [ 35.503304] iomap_finish_ioends+0x83/0xc0 > > [ 35.503305] xfs_end_ioend+0x64/0x140 [xfs] > > [ 35.503342] xfs_end_io+0x93/0xc0 [xfs] > > [ 35.503378] process_one_work+0x153/0x390 > > [ 35.503382] worker_thread+0x2ab/0x3b0 > > > > It's 4:30am here, so I'm going to leave attempts to actually debug that > > thing until tomorrow; I do have a kvm where it's reliably reproduced > > within a few minutes, so if anyone comes up with patches, I'll be able > > to test them. > > > > Breakage is still present in the current mainline ;-/ > > Hey Al, > > Welll this certainly looks like the same report I made a month ago. > I'll go run 6.15 final (with the #define RWF_DONTCACHE 0) overnight to > confirm if that makes my problem go away. If these are one and the same > bug, then thank you for finding a better reproducer! :) > > https://lore.kernel.org/linux-fsdevel/20250416180837.GN25675@frogsfrogsfrogs/ After a full QA run, 6.15 final passes fstests with flying colors. So I guess we now know the culprit. Will test the new RWF_DONTCACHE fixes whenever they appear in upstream. --D