Re: cleanup log item formatting

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 18 Jun 2025 19:24:42 +1000

On Tue, Jun 10, 2025 at 07:14:57AM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> I dug into a rabit hole about the log item formatting recently,
> and noticed that the handling of the opheaders is still pretty
> ugly because it leaks pre-delayed logging implementation
> details into the log item implementations.
> 
> The core of this series is to remove the to reserve space in the
> CIL buffers/shadow buffers for the opheaders that already were
> generated more or less on the fly by the lowlevel log write
> code anyway, but there's lots of other cleanups around it.

OK, so I found the xfs-log-format-cleanups branch in your
tree. I just had check-parallel fail with:

[ 5060.219054] XFS (loop210): xlog_recovery_process_trans: bad flag 0x1
[ 5060.225296] XFS: Assertion failed: 0, file: fs/xfs/xfs_log_recover.c, line: 2324
[ 5060.233223] ------------[ cut here ]------------
[ 5060.243515] kernel BUG at fs/xfs/xfs_message.c:102!
[ 5060.262119] Oops: invalid opcode: 0000 [#2] SMP NOPTI
[ 5060.295341] CPU: 9 UID: 0 PID: 2180403 Comm: mount Tainted: G      D             6.15.0-dgc+ #338 PREEMPT(full)
[ 5060.295347] Tainted: [D]=DIE
[ 5060.295348] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 5060.295350] RIP: 0010:assfail+0x3a/0x40
[ 5060.295360] Code: 89 f1 48 89 fe 48 c7 c7 8a f7 ed 82 48 c7 c2 f2 85 e8 82 e8 c8 fc ff ff 80 3d 19 b6 50 03 01 74 09 0f 0b 5d c3 cc cc cc cc cc <0f> 0b 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
[ 5060.295363] RSP: 0018:ffffc900198a79c8 EFLAGS: 00010246
[ 5060.317824] RAX: f72505c615164000 RBX: ffff8888c8204680 RCX: f72505c615164000
[ 5060.317826] RDX: ffffc900198a7898 RSI: 000000000000000a RDI: ffffffff82edf78a
[ 5060.317827] RBP: ffffc900198a79c8 R08: 0000000000000000 R09: 000000000000000a
[ 5060.317829] R10: 0000000000000000 R11: 0000000000000021 R12: ffff88821c475800
[ 5060.317830] R13: ffff8881d27bcc70 R14: ffff8881d27bcc7c R15: ffffc900198a7b58
[ 5060.317834] FS:  00007f6d916b8840(0000) GS:ffff88889a6be000(0000) knlGS:0000000000000000
[ 5060.317835] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5060.317836] CR2: 00007f4c2650d350 CR3: 0000000226620000 CR4: 0000000000350ef0
[ 5060.317841] Call Trace:
[ 5060.341040]  <TASK>
[ 5060.341043]  xlog_recovery_process_trans+0xd6/0x100
[ 5060.341050]  xlog_recover_process_ophdr+0xdd/0x140
[ 5060.341052]  xlog_recover_process_data+0x9b/0x160
[ 5060.341054]  xlog_recover_process+0xb2/0x110
[ 5060.341056]  xlog_do_recovery_pass+0x685/0x900
[ 5060.341059]  xlog_do_log_recovery+0x43/0xb0
[ 5060.341061]  xlog_do_recover+0x2c/0x190
[ 5060.341063]  xlog_recover+0x165/0x180
[ 5060.341065]  xfs_log_mount+0x14d/0x270
[ 5060.376607]  xfs_mountfs+0x3aa/0x990
[ 5060.376616]  xfs_fs_fill_super+0x701/0x870
[ 5060.376619]  ? __pfx_xfs_fs_fill_super+0x10/0x10
[ 5060.474403]  get_tree_bdev_flags+0x120/0x1a0
[ 5060.476710]  get_tree_bdev+0x10/0x20
[ 5060.478520]  xfs_fs_get_tree+0x15/0x20
[ 5060.480175]  vfs_get_tree+0x28/0xe0
[ 5060.481861]  vfs_cmd_create+0x5f/0xd0
[ 5060.483491]  vfs_fsconfig_locked+0x50/0x130
[ 5060.485358]  __se_sys_fsconfig+0x349/0x3d0
[ 5060.487273]  __x64_sys_fsconfig+0x25/0x30
[ 5060.489136]  x64_sys_call+0x3be/0x2f60
[ 5060.491533]  do_syscall_64+0x6c/0x140
[ 5060.493430]  ? exc_page_fault+0x62/0xc0
[ 5060.495220]  entry_SYSCALL_64_after_hwframe+0x76/0x7e

The test that triggered is was xfs/609 running with this config:

[xfs_1k]
FSTYP=xfs
MKFS_OPTIONS="-m rmapbt=1 -b size=1k -i exchange=1"
TEST_MKFS_OPTS="-m rmapbt=1 -b size=1k -i exchange=1"

I don't know if this is reproducable yet.

The assert failure is this case in xlog_recovery_process_trans():

        case XLOG_START_TRANS:
        default:
                xfs_warn(log->l_mp, "%s: bad flag 0x%x", __func__, flags);
                ASSERT(0);
                error = -EFSCORRUPTED;
                break;
        }

This implies that more that one ophdr in a transaction had the
XLOG_START_TRANS flag set, or that there are duplicate transaction
IDs in the journal. i.e. something likely went wrong in xlog_write()
at runtime, but it's not until the journal is recovered that the
issue is noticed....

This reminds me of the subtle, hard to hit xlog_write bugs that
took me months of recovery testing to shake out last time I rewrote
the checkpoint and xlog_write() code... :/

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx