[PATCH 0/4] Fix archiving some corner-case files into zip

Toon Claes <toon@xxxxxxxxx> · Mon, 04 Aug 2025 18:56:31 +0200

At $DAYJOB, we've add a customer report an issue where they failed to
download a zip archive from a repository. The error they saw come from
git-archive(1) is:

        fatal: deflate error (0)

My friendly colleague Justin Tobler was able to reproduce this issue[1].
We've diagnosed this error happens on some files that exceed
core.bigFileThreshold. To reproduce the issue, you can run:

        git clone --depth=1 https://github.com/chromium/chromium.git
        cd chromium
        git -c core.bigFileThreshold=1 archive -o foo.zip --format=zip HEAD -- \
                chrome/test/data/third_party/kraken/tests/kraken-1.1/imaging-darkroom-data.js

(originally he mentioned another file, but that didn't trigger the bug
for me)

And a patch to fix the issue was presented that message.

I have tested the fix, and I can confirm this fixes the issue. But I'm
concerned this doesn't fix all issues.

Another way one could trigger the issue, is by initializing
`unsigned char compressed` with length `STREAM_BUFFER_SIZE / 2` (so half
the length of the input buffer, instead of double).

With Justin's fix, you see the error doesn't happen no more. But it
seems, the resulting zip archive isn't valid. When I try to unzip it, I
see:

    inflating: chrome/test/data/third_party/kraken/tests/kraken-1.1/imaging-darkroom-data.js   bad CRC 3ba68a86  (should be b09a04a2)

And when the length is set to `STREAM_BUFFER_SIZE` (so equal length to
input buffer), the decompress goes well, but the data seems to be
mangled.

This is because only the final call of git_deflate() is being wrapped in
a loop for the current chunk of input data. We can see in various other
callsites in the Git codebase, git_deflate() is usually called in a
`while` loop (even when the `flush` parameter is set to `0` =
Z_NO_FLUSH).

For the record, I want to give all the credit to Justin for diagnosing
this bug and to determine a solution. Where he aims to provide a fix
that is minimal, I wanted to present an alternative solution that
implements zlib usuage according to the official usage example[2], but
the changes are more substantial.

I'm on the fence which of two is the better approach. Because the ZIP
format has a End Of Central Directory record (EOCD) at the end, it's far
more likely *only* the final git_deflate() call suffers from unprocessed
input data, so the final Justin provides probably Just Works. I'm gonna
leave it up to the community to decide what is "better"?

[1]: https://lore.kernel.org/git/20250802220803.95137-1-jltobler@xxxxxxxxx/
[2]: https://zlib.net/zlib_how.html
[3]: https://en.wikipedia.org/wiki/ZIP_(file_format)#End_of_central_directory_record_(EOCD)

--
Cheers,
Toon

---
Toon Claes (4):
      archive-zip: deduplicate code setting output buffer in write_zip_entry()
      archive-zip: remove unneccesarry condition in write_zip_entry()
      archive-zip: in write_zip_entry() call git_deflate() in a loop
      archive-zip: move git_deflate() with Z_FINISH into the loop

 archive-zip.c | 40 +++++++++++++++-------------------------
 1 file changed, 15 insertions(+), 25 deletions(-)
---

---

base-commit: e813a0200a7121b97fec535f0d0b460b0a33356c
change-id: 20250801-toon-archive-zip-fix-2deac42d5aa3

Thanks
--
Toon