Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Fri, 30 May 2025 18:10:50 -0700

On Wed, May 28, 2025 at 06:56:37PM -0700, Darrick J. Wong wrote:
> On Sun, May 25, 2025 at 09:32:09AM +0100, Al Viro wrote:
> > generic/127 with xfstests built on debian-testing (trixie) ends up with
> > assorted memory corruption; trace below is with CONFIG_DEBUG_PAGEALLOC and
> > CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT and it looks like a double free
> > somewhere in iomap.  Unfortunately, commit in question is just making
> > xfs use the infrastructure built in earlier series - not that useful
> > for isolating the breakage.
> > 
> > [   22.001529] run fstests generic/127 at 2025-05-25 04:13:23
> > [   35.498573] BUG: Bad page state in process kworker/2:1  pfn:112ce9
> > [   35.499260] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x3e 9
> > [   35.499764] flags: 0x800000000000000e(referenced|uptodate|writeback|zone=2)
> > [   35.500302] raw: 800000000000000e dead000000000100 dead000000000122 000000000
> > [   35.500786] raw: 000000000000003e 0000000000000000 00000000ffffffff 000000000
> > [   35.501248] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
> > [   35.501624] Modules linked in: xfs autofs4 fuse nfsd auth_rpcgss nfs_acl nfs0
> > [   35.503209] CPU: 2 UID: 0 PID: 85 Comm: kworker/2:1 Not tainted 6.14.0-rc1+ 7
> > [   35.503211] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.164
> > [   35.503212] Workqueue: xfs-conv/sdb1 xfs_end_io [xfs]
> > [   35.503279] Call Trace:
> > [   35.503281]  <TASK>
> > [   35.503282]  dump_stack_lvl+0x4f/0x60
> > [   35.503296]  bad_page+0x6f/0x100
> > [   35.503300]  free_frozen_pages+0x303/0x550
> > [   35.503301]  iomap_finish_ioend+0xf6/0x380
> > [   35.503304]  iomap_finish_ioends+0x83/0xc0
> > [   35.503305]  xfs_end_ioend+0x64/0x140 [xfs]
> > [   35.503342]  xfs_end_io+0x93/0xc0 [xfs]
> > [   35.503378]  process_one_work+0x153/0x390
> > [   35.503382]  worker_thread+0x2ab/0x3b0
> > 
> > It's 4:30am here, so I'm going to leave attempts to actually debug that
> > thing until tomorrow; I do have a kvm where it's reliably reproduced
> > within a few minutes, so if anyone comes up with patches, I'll be able
> > to test them.
> > 
> > Breakage is still present in the current mainline ;-/
> 
> Hey Al,
> 
> Welll this certainly looks like the same report I made a month ago.
> I'll go run 6.15 final (with the #define RWF_DONTCACHE 0) overnight to
> confirm if that makes my problem go away.  If these are one and the same
> bug, then thank you for finding a better reproducer! :)
> 
> https://lore.kernel.org/linux-fsdevel/20250416180837.GN25675@frogsfrogsfrogs/

After a full QA run, 6.15 final passes fstests with flying colors.  So I
guess we now know the culprit.  Will test the new RWF_DONTCACHE fixes
whenever they appear in upstream.

--D