Re: [PATCH v2 3/3] pack-bitmap: add loading corrupt bitmap_index test

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2025年5月22日 08:08,Taylor Blau <me@xxxxxxxxxxxx> 写道:
> 
> On Tue, May 20, 2025 at 09:23:10AM +0000, Lidong Yan via GitGitGadget wrote:
>> From: Lidong Yan <502024330056@xxxxxxxxxxxxxxxx>
>> 
>> This patch add "load corrupt bitmap" test case in t5310-pack-bitmaps.sh.
>> 
>> This test case intentionally corrupt the "xor_offset" field of the first
>> entry. To find position of first entry in *.bitmap, we need to skip 4
>> ewah_bitmaps before entries. And I add a function `skip_ewah_bitmap()`
>> to do this.
> 
> I'm going to avoid commenting on the message itself, since I think we
> may be able to drop this patch entirely, see below.
> 
>> Signed-off-by: Lidong Yan <502024330056@xxxxxxxxxxxxxxxx>
>> ---
>> t/t5310-pack-bitmaps.sh | 50 +++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 50 insertions(+)
>> 
>> diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
>> index a62b463eaf09..537a507957bb 100755
>> --- a/t/t5310-pack-bitmaps.sh
>> +++ b/t/t5310-pack-bitmaps.sh
>> @@ -26,6 +26,18 @@ has_any () {
>> grep -Ff "$1" "$2"
>> }
>> 
>> +skip_ewah_bitmap() {
>> + local bitmap="$1" &&
>> + local offset="$2" &&
>> + local size= &&
>> +
>> + offset=$(($offset + 4)) &&
>> + size=0x$(od -An -v -t x1 -j $offset -N 4 $bitmap | tr -d ' \n') &&
>> + size=$(($size * 8)) &&
>> + offset=$(($offset + 4 + $size + 4)) &&
>> + echo $offset
>> +}
>> +
>> # Since name-hash values are stored in the .bitmap files, add a test
>> # that checks that the name-hash calculations are stable across versions.
>> # Not exhaustive, but these hashing algorithms would be hard to change
>> @@ -486,6 +498,44 @@ test_bitmap_cases () {
>> grep "ignoring extra bitmap" trace2.txt
>> )
>> '
>> +
>> + # A `.bitmap` file has the following structure:
>> + # | Header | Commits | Trees | Blobs | Tags | Entries... |
>> + #
>> + # - The header is 32 bytes long when using SHA-1.
>> + # - Commits, Trees, Blobs, and Tags are all stored as EWAH bitmaps.
>> + #
>> + # This test intentionally corrupts the `xor_offset` field of the first entry
>> + # to verify robustness against malformed bitmap data.
>> + test_expect_success 'load corrupt bitmap' '
> 
> I am not totally following what this case is supposed to be testing.
> Let me think aloud for a moment...
> 
>> + rm -fr repo &&
>> + git init repo &&
>> + test_when_finished "rm -fr repo" &&
>> + (
>> + cd repo &&
>> + git config pack.writeBitmapLookupTable '"$writeLookupTable"' &&
> 
> First we set up a temporary repository, change into it, and enable
> bitmap lookup tables. Makes sense.
> 
>> + test_commit base &&
>> +
>> + git repack -adb &&
>> + bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
>> + chmod +w "$bitmap" &&
> 
> Then we make a commit, and write a bitmap containing the objects from
> the commit we just made. Good.
> 
>> + hdr_sz=$((12 + $(test_oid rawsz))) &&
>> + offset=$(skip_ewah_bitmap $bitmap $hdr_sz) &&
>> + offset=$(skip_ewah_bitmap $bitmap $offset) &&
>> + offset=$(skip_ewah_bitmap $bitmap $offset) &&
>> + offset=$(skip_ewah_bitmap $bitmap $offset) &&
> 
> Then we read past the header and four type bitmaps. Makes sense.
> 
>> + offset=$((offset + 4)) &&
> 
> Now we land at the bitmap for the commit we just wrote.
> 
> (As an aside unrelated to this part of the test, this skip_ewah_bitmap()
> function seems awfully fragile. I wonder if it would make more sense to
> implement this as a test helper that can dump the offsets of EWAH
> bitmaps in a *.bitmap file by object ID rather than trying to parse the
> file ourselves?
> 

I am actually replaying the pack-bitmap.c:prepare_bitmap() here. Also I have had
write a test helper version once. And since I want to use prepare_bitmap()
I have to put the code in pack-bitmap.c. It looks like this

diff --git a/pack-bitmap.c b/pack-bitmap.c
index b9f1d866046..9642a06b3fe 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -3022,6 +3022,71 @@ cleanup:
return ret;
}

+typedef void(corrupt_fn)(struct bitmap_index *);
+
+static int bitmap_corrupt_then_load(struct repository *r, corrupt_fn *do_corrupt)
+{
+ struct bitmap_index *bitmap_git;
+ unsigned char *map;
+
+ if (!(bitmap_git = prepare_bitmap_git(r)))
+     die(_("failed to prepare bitmap indexes"));
+ /*
+  * If the table lookup extension is not used,
+  * prepare_bitmap_git has already called load_bitmap_entries_v1(),
+  * making it impossible to corrupt the bitmap.
+  */
+ if (!bitmap_git->table_lookup)
+     return 0;
+
+ /*
+  * bitmap_git->map is read-only;
+  * to corrupt it, we need a writable memory block.
+  */
+ map = bitmap_git->map;
+ bitmap_git->map = xmalloc(bitmap_git->map_size);
+ if (!bitmap_git->map)
+     return 0;
+ memcpy(bitmap_git->map, map, bitmap_git->map_size);
+
+ do_corrupt(bitmap_git);
+ if (!load_bitmap_entries_v1(bitmap_git))
+     die(_("load corrupt bitmap successfully"));
+
+ free(bitmap_git->map);
+ bitmap_git->map = map;
+ free_bitmap_index(bitmap_git);
+
+ return 0;
+}
+
+static void do_corrupt_commit_pos(struct bitmap_index *bitmap_git)
+{
+ uint32_t *commit_pos_ptr;
+
+ commit_pos_ptr = (uint32_t *)(bitmap_git->map + bitmap_git->map_pos);
+ *commit_pos_ptr = (uint32_t)-1;
+}
+
+static void do_corrupt_xor_offset(struct bitmap_index *bitmap_git)
+{
+ uint8_t *xor_offset_ptr;
+
+ xor_offset_ptr = (uint8_t *)(bitmap_git->map + bitmap_git->map_pos +
+      sizeof(uint32_t));
+ *xor_offset_ptr = MAX_XOR_OFFSET + 1;
+}
+
+int test_bitmap_load_corrupt(struct repository *r)
+{
+ int res = 0;
+ if ((res = bitmap_corrupt_then_load(r, do_corrupt_commit_pos)))
+     return res;
+ if ((res = bitmap_corrupt_then_load(r, do_corrupt_xor_offset)))
+     return res;
+ return res;
+}
+
int rebuild_bitmap(const uint32_t *reposition,
   struct ewah_bitmap *source,
   struct bitmap *dest)

> We don't currently store an offset for each stored_bitmap that we
> maintain, but doing so would be pretty straightforward (add it as a
> field to the structure, and store the value of bitmap_git->map_pos from
> immediately before reading the actual bitmap).)
> 
>> + printf '\161' |
>> + dd of=$bitmap count=1 bs=1 conv=notrunc seek=$offset &&
> 
> OK. Now we break the XOR offset field of this bitmap by writing garbage
> into it.
> 
>> + git rev-list --count HEAD > expect &&
>> + git rev-list --use-bitmap-index --count HEAD > actual &&
>> + test_cmp expect actual
> 
> ...and then we make sure that we still get the correct result.
> 
> Hmmph. I don't think this is quite testing what we want, since this test
> passes with or without your first patch. And that makes sense, we have
> tests elsewhere in this script that verify we can still fall back to
> classic traversal when the bitmap index can't be read. (For some
> examples, see: "truncated bitmap fails gracefully (ewah)" and "truncated
> bitmap fails gracefully (cache)".)

I want to *test* for a memory leak here, not whether git can load a corrupt bitmap.
Since git ci linux-leak test runs each test script with ASAN_OPTIONS=detect_leaks=1, I’m 
including this test case specifically to check whether it triggers a crash when 
`SANITIZE_LEAK` is enabled. And I do find if without the first patch, leak sanitizer
running this test script would output error message.

> I think what we're really testing here is the absence of a memory leak,
> which we are as of 1fc7ddf35b (test-lib: unconditionally enable leak
> checking, 2024-11-20). I wonder whether or not we need this test at all?
> 
> Thanks,
> Taylor

I am not truly following what are you talking here. But If you think it’s unnecessary to
check for potential leaks in load_bitmap() or load_bitmap_entries_v1(). Or this test
script shouldn’t be put in this way. I’m happy to drop the final patch.

Thanks
Lidong Yan




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux