2025年5月22日 08:08,Taylor Blau <me@xxxxxxxxxxxx> 写道: > > On Tue, May 20, 2025 at 09:23:10AM +0000, Lidong Yan via GitGitGadget wrote: >> From: Lidong Yan <502024330056@xxxxxxxxxxxxxxxx> >> >> This patch add "load corrupt bitmap" test case in t5310-pack-bitmaps.sh. >> >> This test case intentionally corrupt the "xor_offset" field of the first >> entry. To find position of first entry in *.bitmap, we need to skip 4 >> ewah_bitmaps before entries. And I add a function `skip_ewah_bitmap()` >> to do this. > > I'm going to avoid commenting on the message itself, since I think we > may be able to drop this patch entirely, see below. > >> Signed-off-by: Lidong Yan <502024330056@xxxxxxxxxxxxxxxx> >> --- >> t/t5310-pack-bitmaps.sh | 50 +++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 50 insertions(+) >> >> diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh >> index a62b463eaf09..537a507957bb 100755 >> --- a/t/t5310-pack-bitmaps.sh >> +++ b/t/t5310-pack-bitmaps.sh >> @@ -26,6 +26,18 @@ has_any () { >> grep -Ff "$1" "$2" >> } >> >> +skip_ewah_bitmap() { >> + local bitmap="$1" && >> + local offset="$2" && >> + local size= && >> + >> + offset=$(($offset + 4)) && >> + size=0x$(od -An -v -t x1 -j $offset -N 4 $bitmap | tr -d ' \n') && >> + size=$(($size * 8)) && >> + offset=$(($offset + 4 + $size + 4)) && >> + echo $offset >> +} >> + >> # Since name-hash values are stored in the .bitmap files, add a test >> # that checks that the name-hash calculations are stable across versions. >> # Not exhaustive, but these hashing algorithms would be hard to change >> @@ -486,6 +498,44 @@ test_bitmap_cases () { >> grep "ignoring extra bitmap" trace2.txt >> ) >> ' >> + >> + # A `.bitmap` file has the following structure: >> + # | Header | Commits | Trees | Blobs | Tags | Entries... | >> + # >> + # - The header is 32 bytes long when using SHA-1. >> + # - Commits, Trees, Blobs, and Tags are all stored as EWAH bitmaps. >> + # >> + # This test intentionally corrupts the `xor_offset` field of the first entry >> + # to verify robustness against malformed bitmap data. >> + test_expect_success 'load corrupt bitmap' ' > > I am not totally following what this case is supposed to be testing. > Let me think aloud for a moment... > >> + rm -fr repo && >> + git init repo && >> + test_when_finished "rm -fr repo" && >> + ( >> + cd repo && >> + git config pack.writeBitmapLookupTable '"$writeLookupTable"' && > > First we set up a temporary repository, change into it, and enable > bitmap lookup tables. Makes sense. > >> + test_commit base && >> + >> + git repack -adb && >> + bitmap="$(ls .git/objects/pack/pack-*.bitmap)" && >> + chmod +w "$bitmap" && > > Then we make a commit, and write a bitmap containing the objects from > the commit we just made. Good. > >> + hdr_sz=$((12 + $(test_oid rawsz))) && >> + offset=$(skip_ewah_bitmap $bitmap $hdr_sz) && >> + offset=$(skip_ewah_bitmap $bitmap $offset) && >> + offset=$(skip_ewah_bitmap $bitmap $offset) && >> + offset=$(skip_ewah_bitmap $bitmap $offset) && > > Then we read past the header and four type bitmaps. Makes sense. > >> + offset=$((offset + 4)) && > > Now we land at the bitmap for the commit we just wrote. > > (As an aside unrelated to this part of the test, this skip_ewah_bitmap() > function seems awfully fragile. I wonder if it would make more sense to > implement this as a test helper that can dump the offsets of EWAH > bitmaps in a *.bitmap file by object ID rather than trying to parse the > file ourselves? > I am actually replaying the pack-bitmap.c:prepare_bitmap() here. Also I have had write a test helper version once. And since I want to use prepare_bitmap() I have to put the code in pack-bitmap.c. It looks like this diff --git a/pack-bitmap.c b/pack-bitmap.c index b9f1d866046..9642a06b3fe 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -3022,6 +3022,71 @@ cleanup: return ret; } +typedef void(corrupt_fn)(struct bitmap_index *); + +static int bitmap_corrupt_then_load(struct repository *r, corrupt_fn *do_corrupt) +{ + struct bitmap_index *bitmap_git; + unsigned char *map; + + if (!(bitmap_git = prepare_bitmap_git(r))) + die(_("failed to prepare bitmap indexes")); + /* + * If the table lookup extension is not used, + * prepare_bitmap_git has already called load_bitmap_entries_v1(), + * making it impossible to corrupt the bitmap. + */ + if (!bitmap_git->table_lookup) + return 0; + + /* + * bitmap_git->map is read-only; + * to corrupt it, we need a writable memory block. + */ + map = bitmap_git->map; + bitmap_git->map = xmalloc(bitmap_git->map_size); + if (!bitmap_git->map) + return 0; + memcpy(bitmap_git->map, map, bitmap_git->map_size); + + do_corrupt(bitmap_git); + if (!load_bitmap_entries_v1(bitmap_git)) + die(_("load corrupt bitmap successfully")); + + free(bitmap_git->map); + bitmap_git->map = map; + free_bitmap_index(bitmap_git); + + return 0; +} + +static void do_corrupt_commit_pos(struct bitmap_index *bitmap_git) +{ + uint32_t *commit_pos_ptr; + + commit_pos_ptr = (uint32_t *)(bitmap_git->map + bitmap_git->map_pos); + *commit_pos_ptr = (uint32_t)-1; +} + +static void do_corrupt_xor_offset(struct bitmap_index *bitmap_git) +{ + uint8_t *xor_offset_ptr; + + xor_offset_ptr = (uint8_t *)(bitmap_git->map + bitmap_git->map_pos + + sizeof(uint32_t)); + *xor_offset_ptr = MAX_XOR_OFFSET + 1; +} + +int test_bitmap_load_corrupt(struct repository *r) +{ + int res = 0; + if ((res = bitmap_corrupt_then_load(r, do_corrupt_commit_pos))) + return res; + if ((res = bitmap_corrupt_then_load(r, do_corrupt_xor_offset))) + return res; + return res; +} + int rebuild_bitmap(const uint32_t *reposition, struct ewah_bitmap *source, struct bitmap *dest) > We don't currently store an offset for each stored_bitmap that we > maintain, but doing so would be pretty straightforward (add it as a > field to the structure, and store the value of bitmap_git->map_pos from > immediately before reading the actual bitmap).) > >> + printf '\161' | >> + dd of=$bitmap count=1 bs=1 conv=notrunc seek=$offset && > > OK. Now we break the XOR offset field of this bitmap by writing garbage > into it. > >> + git rev-list --count HEAD > expect && >> + git rev-list --use-bitmap-index --count HEAD > actual && >> + test_cmp expect actual > > ...and then we make sure that we still get the correct result. > > Hmmph. I don't think this is quite testing what we want, since this test > passes with or without your first patch. And that makes sense, we have > tests elsewhere in this script that verify we can still fall back to > classic traversal when the bitmap index can't be read. (For some > examples, see: "truncated bitmap fails gracefully (ewah)" and "truncated > bitmap fails gracefully (cache)".) I want to *test* for a memory leak here, not whether git can load a corrupt bitmap. Since git ci linux-leak test runs each test script with ASAN_OPTIONS=detect_leaks=1, I’m including this test case specifically to check whether it triggers a crash when `SANITIZE_LEAK` is enabled. And I do find if without the first patch, leak sanitizer running this test script would output error message. > I think what we're really testing here is the absence of a memory leak, > which we are as of 1fc7ddf35b (test-lib: unconditionally enable leak > checking, 2024-11-20). I wonder whether or not we need this test at all? > > Thanks, > Taylor I am not truly following what are you talking here. But If you think it’s unnecessary to check for potential leaks in load_bitmap() or load_bitmap_entries_v1(). Or this test script shouldn’t be put in this way. I’m happy to drop the final patch. Thanks Lidong Yan