RE: [PATCH] hfs: add logic of correcting a next unused CNID

Viacheslav Dubeyko <Slava.Dubeyko@xxxxxxx> · Tue, 15 Jul 2025 18:56:54 +0000

On Tue, 2025-07-15 at 16:14 +0800, Yangtao Li wrote:
> Hi Slava,
> 
> 在 2025/7/1 03:36, Viacheslav Dubeyko 写道:
> > Hi Yangtao,
> > 
> > On Thu, 2025-06-26 at 15:42 +0800, Yangtao Li wrote:
> > > Hi Slava,
> > > 
> > > 在 2025/6/11 07:16, Viacheslav Dubeyko 写道:
> > > > The generic/736 xfstest fails for HFS case:
> > > > 
> > > > BEGIN TEST default (1 test): hfs Mon May 5 03:18:32 UTC 2025
> > > > DEVICE: /dev/vdb
> > > > HFS_MKFS_OPTIONS:
> > > > MOUNT_OPTIONS: MOUNT_OPTIONS
> > > > FSTYP -- hfs
> > > > PLATFORM -- Linux/x86_64 kvm-xfstests 6.15.0-rc4-xfstests-g00b827f0cffa #1 SMP PREEMPT_DYNAMIC Fri May 25
> > > > MKFS_OPTIONS -- /dev/vdc
> > > > MOUNT_OPTIONS -- /dev/vdc /vdc
> > > > 
> > > > generic/736 [03:18:33][ 3.510255] run fstests generic/736 at 2025-05-05 03:18:33
> > > > _check_generic_filesystem: filesystem on /dev/vdb is inconsistent
> > > > (see /results/hfs/results-default/generic/736.full for details)
> > > > Ran: generic/736
> > > > Failures: generic/736
> > > > Failed 1 of 1 tests
> > > > 
> > > > The HFS volume becomes corrupted after the test run:
> > > > 
> > > > sudo fsck.hfs -d /dev/loop50
> > > > ** /dev/loop50
> > > > Using cacheBlockSize=32K cacheTotalBlock=1024 cacheSize=32768K.
> > > > Executing fsck_hfs (version 540.1-Linux).
> > > > ** Checking HFS volume.
> > > > The volume name is untitled
> > > > ** Checking extents overflow file.
> > > > ** Checking catalog file.
> > > > ** Checking catalog hierarchy.
> > > > ** Checking volume bitmap.
> > > > ** Checking volume information.
> > > > invalid MDB drNxtCNID
> > > > Master Directory Block needs minor repair
> > > > (1, 0)
> > > > Verify Status: VIStat = 0x8000, ABTStat = 0x0000 EBTStat = 0x0000
> > > > CBTStat = 0x0000 CatStat = 0x00000000
> > > > ** Repairing volume.
> > > > ** Rechecking volume.
> > > > ** Checking HFS volume.
> > > > The volume name is untitled
> > > > ** Checking extents overflow file.
> > > > ** Checking catalog file.
> > > > ** Checking catalog hierarchy.
> > > > ** Checking volume bitmap.
> > > > ** Checking volume information.
> > > > ** The volume untitled was repaired successfully.
> > > > 
> > > > The main reason of the issue is the absence of logic that
> > > > corrects mdb->drNxtCNID/HFS_SB(sb)->next_id (next unused
> > > > CNID) after deleting a record in Catalog File. This patch
> > > > introduces a hfs_correct_next_unused_CNID() method that
> > > > implements the necessary logic. In the case of Catalog File's
> > > > record delete operation, the function logic checks that
> > > > (deleted_CNID + 1) == next_unused_CNID and it finds/sets the new
> > > > value of next_unused_CNID.
> > > 
> > > Sorry for the late reply.
> > > 
> > > I got you now, and I did some research. And It's a problem of CNID
> > > usage. Catalog tree identification number is a type of u32.
> > > 
> > > And there're some ways to reuse cnid.
> > > If cnid reachs U32_MAX, kHFSCatalogNodeIDsReusedMask(apple open source
> > > code) is marked to reuse unused cnid.
> > > And we can use HFSIOC_CHANGE_NEXTCNID ioctl to make use of unused cnid.
> > > 
> > > 
> > > What confused me is that fsck for hfsplus ignore those unused cnid[1],
> > > but fsck for hfs only ignore those unused cnid if mdbP->drNxtCNID <=
> > > (vcb->vcbNextCatalogID + 4096(which means over 4096 unused cnid)[2]?
> > > 
> > > And I didn't find code logic of changind cnid in apple source code when
> > > romove file.
> > > 
> > > So I think your idea is good, but it looks like that's not what the
> > > original code did? If I'm wrong, please correct me.
> > > 
> > > 
> > 
> > I think you missed what is the problem here. It's not about reaching U32_MAX
> > threshold. The generic/736 test simply creates some number of files and, then,
> > deletes it. We increment mdb->drNxtCNID/HFS_SB(sb)->next_id on every creation of
> > file or folder because we assign the next unused CNID to the created file or
> > folder. But when we delete the file or folder, then we never correct the mdb-
> > > drNxtCNID/HFS_SB(sb)->next_id. And fsck tool expects that next unused CNID
> > should be equal to the last allocated/used CNID + 1. Let's imagine that we
> > create four files, then file1 has CNID 16, file2 has CNID 17, file3 has CNID 18,
> > file4 has CNID 19, and next unused CNID should be 20. If we delete file1, then
> > next unused CNID should be 20 because file4 still exists. And if we deleted all
> > files, then next unused CNID should be 16 again. This is what fsck tool expects
> > to see.
> 
> I got it. If we deleted all files, then next unused CNID should be 16, 
> which sounds reasonable. In fact, then next unused CNID will keep be 20 
> for both hfs and hfsplus.
> 
> It confused me whther changing CNID after remove operation is the best 
> way for hfs. Because I didn't find such logic from apple's hfs code.
> 

I don't quite follow what do you mean here. What do you mean by changing CNID
after remove operation? No such logic in the suggested patch.

> And only hfs failed generic/736, which related to fsck for hfsplus 
> ignore unused CNID. Could we ignore unused CNID for hfs too?
> Those unused CNID might be reused after setting 
> kHFSCatalogNodeIDsReusedMask flag.
> 
> 

This patch is not about ignoring or not ignoring the unused CNIDs. This patch is
about keeping mdb->drNxtCNID/HFS_SB(sb)->next_id in actual state. The actual
state means that this value should be always last_used_CNID + 1, otherwise, fsck
tool will treat the volume as corrupted.

Thanks,
Slava.