On 5/30/25 7:10 PM, Darrick J. Wong wrote: > On Wed, May 28, 2025 at 06:56:37PM -0700, Darrick J. Wong wrote: >> On Sun, May 25, 2025 at 09:32:09AM +0100, Al Viro wrote: >>> generic/127 with xfstests built on debian-testing (trixie) ends up with >>> assorted memory corruption; trace below is with CONFIG_DEBUG_PAGEALLOC and >>> CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT and it looks like a double free >>> somewhere in iomap. Unfortunately, commit in question is just making >>> xfs use the infrastructure built in earlier series - not that useful >>> for isolating the breakage. >>> >>> [ 22.001529] run fstests generic/127 at 2025-05-25 04:13:23 >>> [ 35.498573] BUG: Bad page state in process kworker/2:1 pfn:112ce9 >>> [ 35.499260] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x3e 9 >>> [ 35.499764] flags: 0x800000000000000e(referenced|uptodate|writeback|zone=2) >>> [ 35.500302] raw: 800000000000000e dead000000000100 dead000000000122 000000000 >>> [ 35.500786] raw: 000000000000003e 0000000000000000 00000000ffffffff 000000000 >>> [ 35.501248] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set >>> [ 35.501624] Modules linked in: xfs autofs4 fuse nfsd auth_rpcgss nfs_acl nfs0 >>> [ 35.503209] CPU: 2 UID: 0 PID: 85 Comm: kworker/2:1 Not tainted 6.14.0-rc1+ 7 >>> [ 35.503211] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.164 >>> [ 35.503212] Workqueue: xfs-conv/sdb1 xfs_end_io [xfs] >>> [ 35.503279] Call Trace: >>> [ 35.503281] <TASK> >>> [ 35.503282] dump_stack_lvl+0x4f/0x60 >>> [ 35.503296] bad_page+0x6f/0x100 >>> [ 35.503300] free_frozen_pages+0x303/0x550 >>> [ 35.503301] iomap_finish_ioend+0xf6/0x380 >>> [ 35.503304] iomap_finish_ioends+0x83/0xc0 >>> [ 35.503305] xfs_end_ioend+0x64/0x140 [xfs] >>> [ 35.503342] xfs_end_io+0x93/0xc0 [xfs] >>> [ 35.503378] process_one_work+0x153/0x390 >>> [ 35.503382] worker_thread+0x2ab/0x3b0 >>> >>> It's 4:30am here, so I'm going to leave attempts to actually debug that >>> thing until tomorrow; I do have a kvm where it's reliably reproduced >>> within a few minutes, so if anyone comes up with patches, I'll be able >>> to test them. >>> >>> Breakage is still present in the current mainline ;-/ >> >> Hey Al, >> >> Welll this certainly looks like the same report I made a month ago. >> I'll go run 6.15 final (with the #define RWF_DONTCACHE 0) overnight to >> confirm if that makes my problem go away. If these are one and the same >> bug, then thank you for finding a better reproducer! :) >> >> https://lore.kernel.org/linux-fsdevel/20250416180837.GN25675@frogsfrogsfrogs/ > > After a full QA run, 6.15 final passes fstests with flying colors. So I > guess we now know the culprit. Will test the new RWF_DONTCACHE fixes > whenever they appear in upstream. Please do! Unfortunately I never saw your original report as I wasn't CC'ed on it, which I can't really fault anyone for as there was no reason to suspect it so far. -- Jens Axboe