Hi All: As discussed in [1], we need to use mmap mechanism to open large "packed_refs" file to save the memory usage. This patch mainly does the following things: 1: Fix an issue that we would report an error when the "packed-refs" file is empty, which does not align with the runtime behavior. 2-4: Extract some logic from the existing code and then use these created helper functions to let fsck code to use mmap necessarily [1] https://lore.kernel.org/git/20250503133158.GA4450@xxxxxxxxxxxxxxxxxxxxxxx Really thank Peff and Patrick to suggest me to do above change. --- Change since v1: 1. Update the commit message of [PATCH 1/4]. And use redirection to create an empty file instead of using `touch`. 2. Don't use if for the refactored function in [PATCH 3/4] and then update the commit message to align with the new function name. 3. Enhance the commit message of [PATCH 4/4]. Thanks, Jialuo shejialuo (4): packed-backend: fsck should allow an empty "packed-refs" file packed-backend: extract snapshot allocation in `load_contents` packed-backend: extract munmap operation for `MMAP_TEMPORARY` packed-backend: mmap large "packed-refs" file during fsck refs/packed-backend.c | 113 ++++++++++++++++++++++++--------------- t/t0602-reffiles-fsck.sh | 13 +++++ 2 files changed, 82 insertions(+), 44 deletions(-) Range-diff against v1: 1: aa9037ebfa ! 1: 26c3fd55a8 packed-backend: skip checking consistency of empty packed-refs file @@ Metadata Author: shejialuo <shejialuo@xxxxxxxxx> ## Commit message ## - packed-backend: skip checking consistency of empty packed-refs file + packed-backend: fsck should allow an empty "packed-refs" file - In "load_contents", when the "packed-refs" is empty, we will just return - the snapshot. However, we would report an error to the user when - checking the consistency of the empty "packed-refs". - - We should align with the runtime behavior. As what "load_contents" does, - let's check whether the file size is zero and if so, we will skip - checking the consistency and simply return. + During fsck, an empty "packed-refs" gives an error; this is unwarranted. + We should just skip checking the content of "packed-refs" just like the + runtime code paths such as "create_snapshot" which simply returns the + "snapshot" without checking the content of "packed-refs". Signed-off-by: shejialuo <shejialuo@xxxxxxxxx> @@ t/t0602-reffiles-fsck.sh: test_expect_success SYMLINKS 'the filetype of packed-r + cd repo && + test_commit default && + -+ touch .git/packed-refs && ++ >.git/packed-refs && + git refs verify 2>err && + test_must_be_empty err + ) 2: 852e8a606b = 2: 4604be8b51 packed-backend: extract snapshot allocation in `load_contents` 3: de146155f6 ! 3: c0609afac9 packed-backend: extract munmap operation for `MMAP_TEMPORARY` @@ Commit message is "MMAP_TEMPORARY". We also need to do this operation when checking the consistency of the "packed-refs" file. - Create a new function "munmap_snapshot_if_temporary" to do above and + Create a new function "munmap_temporary_snapshot" to do above and change "create_snapshot" to align with the behavior. Suggested-by: Jeff King <peff@xxxxxxxx> @@ refs/packed-backend.c: static int allocate_snapshot_buffer(struct snapshot *snap return 1; } -+static void munmap_snapshot_if_temporary(struct snapshot *snapshot) ++static void munmap_temporary_snapshot(struct snapshot *snapshot) +{ -+ if (mmap_strategy != MMAP_OK && snapshot->mmapped) { -+ /* -+ * We don't want to leave the file mmapped, so we are -+ * forced to make a copy now: -+ */ -+ size_t size = snapshot->eof - snapshot->start; -+ char *buf_copy = xmalloc(size); ++ char *buf_copy; ++ size_t size; + -+ memcpy(buf_copy, snapshot->start, size); -+ clear_snapshot_buffer(snapshot); -+ snapshot->buf = snapshot->start = buf_copy; -+ snapshot->eof = buf_copy + size; -+ } ++ if (!snapshot) ++ return; ++ ++ /* ++ * We don't want to leave the file mmapped, so we are ++ * forced to make a copy now: ++ */ ++ size = snapshot->eof - snapshot->start; ++ buf_copy = xmalloc(size); ++ ++ memcpy(buf_copy, snapshot->start, size); ++ clear_snapshot_buffer(snapshot); ++ snapshot->buf = snapshot->start = buf_copy; ++ snapshot->eof = buf_copy + size; +} + /* @@ refs/packed-backend.c: static struct snapshot *create_snapshot(struct packed_ref - snapshot->buf = snapshot->start = buf_copy; - snapshot->eof = buf_copy + size; - } -+ munmap_snapshot_if_temporary(snapshot); ++ if (mmap_strategy == MMAP_TEMPORARY && snapshot->mmapped) ++ munmap_temporary_snapshot(snapshot); return snapshot; } 4: be1e9e2540 ! 4: c868e3dd16 packed-backend: use mmap when opening large "packed-refs" file @@ Metadata Author: shejialuo <shejialuo@xxxxxxxxx> ## Commit message ## - packed-backend: use mmap when opening large "packed-refs" file + packed-backend: mmap large "packed-refs" file during fsck - We use "strbuf_read" to read the content of "packed-refs". However, this - is a bad practice which would consume a lot of memory usage if there are - multiple processes reading large "packed-refs". + During fsck, we use "strbuf_read" to read the content of "packed-refs" + without using mmap mechanism. This is a bad practice which would consume + more memory than using mmap mechanism. Besides, as all code paths in + "packed-backend.c" use this way, we should make "fsck" align with the + current codebase. As we have introduced two helper functions "allocate_snapshot_buffer" and "munmap_snapshot_if_temporary", we could simply call these functions @@ refs/packed-backend.c: static int packed_fsck(struct ref_store *ref_store, + if (!allocate_snapshot_buffer(snapshot, fd, &st)) goto cleanup; - } -+ munmap_snapshot_if_temporary(snapshot); - ret = packed_fsck_ref_content(o, ref_store, &sorted, packed_ref_content.buf, - packed_ref_content.buf + packed_ref_content.len); ++ if (mmap_strategy == MMAP_TEMPORARY && snapshot->mmapped) ++ munmap_temporary_snapshot(snapshot); ++ + ret = packed_fsck_ref_content(o, ref_store, &sorted, snapshot->start, + snapshot->eof); if (!ret && sorted) -- 2.49.0