At $DAYJOB, we've add a customer report an issue where they failed to download a zip archive from a repository. The error they saw come from git-archive(1) is: fatal: deflate error (0) My friendly colleague Justin Tobler was able to reproduce this issue[1]. We've diagnosed this error happens on some files that exceed core.bigFileThreshold. To reproduce the issue, you can run: git clone --depth=1 https://github.com/chromium/chromium.git cd chromium git -c core.bigFileThreshold=1 archive -o foo.zip --format=zip HEAD -- \ chrome/test/data/third_party/kraken/tests/kraken-1.1/imaging-darkroom-data.js (originally he mentioned another file, but that didn't trigger the bug for me) And a patch to fix the issue was presented that message. I have tested the fix, and I can confirm this fixes the issue. But I'm concerned this doesn't fix all issues. Another way one could trigger the issue, is by initializing `unsigned char compressed` with length `STREAM_BUFFER_SIZE / 2` (so half the length of the input buffer, instead of double). With Justin's fix, you see the error doesn't happen no more. But it seems, the resulting zip archive isn't valid. When I try to unzip it, I see: inflating: chrome/test/data/third_party/kraken/tests/kraken-1.1/imaging-darkroom-data.js bad CRC 3ba68a86 (should be b09a04a2) And when the length is set to `STREAM_BUFFER_SIZE` (so equal length to input buffer), the decompress goes well, but the data seems to be mangled. This is because only the final call of git_deflate() is being wrapped in a loop for the current chunk of input data. We can see in various other callsites in the Git codebase, git_deflate() is usually called in a `while` loop (even when the `flush` parameter is set to `0` = Z_NO_FLUSH). For the record, I want to give all the credit to Justin for diagnosing this bug and to determine a solution. Where he aims to provide a fix that is minimal, I wanted to present an alternative solution that implements zlib usuage according to the official usage example[2], but the changes are more substantial. I'm on the fence which of two is the better approach. Because the ZIP format has a End Of Central Directory record (EOCD) at the end, it's far more likely *only* the final git_deflate() call suffers from unprocessed input data, so the final Justin provides probably Just Works. I'm gonna leave it up to the community to decide what is "better"? [1]: https://lore.kernel.org/git/20250802220803.95137-1-jltobler@xxxxxxxxx/ [2]: https://zlib.net/zlib_how.html [3]: https://en.wikipedia.org/wiki/ZIP_(file_format)#End_of_central_directory_record_(EOCD) -- Cheers, Toon --- Toon Claes (4): archive-zip: deduplicate code setting output buffer in write_zip_entry() archive-zip: remove unneccesarry condition in write_zip_entry() archive-zip: in write_zip_entry() call git_deflate() in a loop archive-zip: move git_deflate() with Z_FINISH into the loop archive-zip.c | 40 +++++++++++++++------------------------- 1 file changed, 15 insertions(+), 25 deletions(-) --- --- base-commit: e813a0200a7121b97fec535f0d0b460b0a33356c change-id: 20250801-toon-archive-zip-fix-2deac42d5aa3 Thanks -- Toon