Re: [BUG] regression from 974c5e6139db "xfs: flag as supporting FOP_DONTCACHE" (double free on page?)

Jens Axboe <axboe@xxxxxxxxx> · Sat, 31 May 2025 15:00:42 -0600

On 5/30/25 7:10 PM, Darrick J. Wong wrote:
> On Wed, May 28, 2025 at 06:56:37PM -0700, Darrick J. Wong wrote:
>> On Sun, May 25, 2025 at 09:32:09AM +0100, Al Viro wrote:
>>> generic/127 with xfstests built on debian-testing (trixie) ends up with
>>> assorted memory corruption; trace below is with CONFIG_DEBUG_PAGEALLOC and
>>> CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT and it looks like a double free
>>> somewhere in iomap.  Unfortunately, commit in question is just making
>>> xfs use the infrastructure built in earlier series - not that useful
>>> for isolating the breakage.
>>>
>>> [   22.001529] run fstests generic/127 at 2025-05-25 04:13:23
>>> [   35.498573] BUG: Bad page state in process kworker/2:1  pfn:112ce9
>>> [   35.499260] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x3e 9
>>> [   35.499764] flags: 0x800000000000000e(referenced|uptodate|writeback|zone=2)
>>> [   35.500302] raw: 800000000000000e dead000000000100 dead000000000122 000000000
>>> [   35.500786] raw: 000000000000003e 0000000000000000 00000000ffffffff 000000000
>>> [   35.501248] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>>> [   35.501624] Modules linked in: xfs autofs4 fuse nfsd auth_rpcgss nfs_acl nfs0
>>> [   35.503209] CPU: 2 UID: 0 PID: 85 Comm: kworker/2:1 Not tainted 6.14.0-rc1+ 7
>>> [   35.503211] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.164
>>> [   35.503212] Workqueue: xfs-conv/sdb1 xfs_end_io [xfs]
>>> [   35.503279] Call Trace:
>>> [   35.503281]  <TASK>
>>> [   35.503282]  dump_stack_lvl+0x4f/0x60
>>> [   35.503296]  bad_page+0x6f/0x100
>>> [   35.503300]  free_frozen_pages+0x303/0x550
>>> [   35.503301]  iomap_finish_ioend+0xf6/0x380
>>> [   35.503304]  iomap_finish_ioends+0x83/0xc0
>>> [   35.503305]  xfs_end_ioend+0x64/0x140 [xfs]
>>> [   35.503342]  xfs_end_io+0x93/0xc0 [xfs]
>>> [   35.503378]  process_one_work+0x153/0x390
>>> [   35.503382]  worker_thread+0x2ab/0x3b0
>>>
>>> It's 4:30am here, so I'm going to leave attempts to actually debug that
>>> thing until tomorrow; I do have a kvm where it's reliably reproduced
>>> within a few minutes, so if anyone comes up with patches, I'll be able
>>> to test them.
>>>
>>> Breakage is still present in the current mainline ;-/
>>
>> Hey Al,
>>
>> Welll this certainly looks like the same report I made a month ago.
>> I'll go run 6.15 final (with the #define RWF_DONTCACHE 0) overnight to
>> confirm if that makes my problem go away.  If these are one and the same
>> bug, then thank you for finding a better reproducer! :)
>>
>> https://lore.kernel.org/linux-fsdevel/20250416180837.GN25675@frogsfrogsfrogs/
> 
> After a full QA run, 6.15 final passes fstests with flying colors.  So I
> guess we now know the culprit.  Will test the new RWF_DONTCACHE fixes
> whenever they appear in upstream.

Please do! Unfortunately I never saw your original report as I wasn't
CC'ed on it, which I can't really fault anyone for as there was no
reason to suspect it so far.

-- 
Jens Axboe