Re: [PATCH] xfs_repair: handling a block with bad crc, bad uuid, and bad magic number needs fixing

Eric Sandeen <sandeen@xxxxxxxxxxx> · Thu, 20 Mar 2025 17:21:22 -0500

For anyone interested in evaluating this problem, here is a cleanroom'd
reproducer metadump image that demonstrates it.

For a single-block extent format directory ...

xfs_db> inode 131
xfs_db> p
...
0:[0,15,1,0]
...
xfs_db> fsblock 15
xfs_db> type dir3

we corrupt the magic, the uuid, and the crc of this dir3 block:

xfs_db> write -c bhdr.hdr.uuid 0xdeadbeef
Allowing write of corrupted data and bad CRC
bhdr.hdr.uuid = 20000000-0000-0000-0000-0000deadbeef
xfs_db> write -c bhdr.hdr.magic 0xfeedface
Allowing write of corrupted data and bad CRC
bhdr.hdr.magic = 0xfeedface
xfs_db> quit

and then xfs_repair fails to fix things up enough to pass the verifier
when it tries to write out the buffer after "fixing" it:

# xfs_repair foo.img
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
Metadata CRC error detected at 0x5556997d5ca0, xfs_dir3_block block 0x78/0x1000
bad directory block magic # 0xfeedface in block 0 for directory inode 131
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 2
        - agno = 1
        - agno = 3
bad directory block magic # 0xfeedface in block 0 for directory inode 131
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
bad directory block magic # 0xfeedface for directory inode 131 block 0: fixing magic # to 0x58444233
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Metadata corruption detected at 0x5556997d5990, xfs_dir3_block block 0x78/0x1000
libxfs_bwrite: write verifier failed on xfs_dir3_block bno 0x78/0x8
xfs_repair: Releasing dirty buffer to free list!
xfs_repair: Refusing to write a corrupt buffer to the data device!
xfs_repair: Lost a write to the data device!

fatal error -- File system metadata writeout failed, err=117.  Re-run xfs_repair.

This is because this sequence in longform_dir2_entry_check():

                /* check v5 metadata */
                d = bp->b_addr;
                if (be32_to_cpu(d->magic) == XFS_DIR3_BLOCK_MAGIC ||
                    be32_to_cpu(d->magic) == XFS_DIR3_DATA_MAGIC) {
                        error = check_dir3_header(mp, bp, ino);
                        if (error) {
                                fixit++;
                                if (fmt == XFS_DIR2_FMT_BLOCK)
                                        goto out_fix;

                                libxfs_buf_relse(bp);
                                bp = NULL;
                                continue;
                        }
                }

                longform_dir2_entry_check_data(mp, ip, num_illegal, need_dot,
                                irec, ino_offset, bp, hashtab,
                                &freetab, da_bno, fmt == XFS_DIR2_FMT_BLOCK);

would have fixed the UUID had check_dir3_header found the error, but the magic
was wrong so that never ran and fixit was never set.

longform_dir2_entry_check_data then fixes the magic and the crc, but does not
fix the UUID, so the verifier check fails on writeout.

When all 3 items are bad, I'm not exactly sure what we should do to get the
UUID fixed up here (or if it just should have been junked at that point)

-Eric
Attachment:
repro.meta.bz2

Description: BZip2 compressed data