Hi, at GitLab, we sometimes have the need to list all objects regardless of their reachability. We use git-cat-file(1) with `--batch-all-objects` to do this, and typically this is quite a good fit. In some cases though, we only want to list objects of a specific type, where we then basically have the following pipeline: git cat-file --batch-all-objects --batch-check='%(objecttype) %(objectname)' | grep '^commit ' | cut -d' ' -f2 | git cat-file --batch This works okayish in medium-sized repositories, but once you reach a certain size this isn't really an option anymore. In the Chromium repository for example [1] simply listing all objects in the first invocation of git-cat-file(1) takes around 80 to 100 seconds. The workload is completely I/O-bottlenecked: my machine reads at ~500MB/s, and the packfile is 50GB in size, which matches the 100 seconds that I observe. This series addresses the issue by introducing object filters into git-cat-file(1). These object filters use the exact same syntax as the filters we have in git-rev-list(1), but only a subset of them is supported because not all filters can be computed by git-cat-file(1). Supported are "blob:none", "blob:limit=" as well as "object:type=". The filters alone don't really help though: we still have to scan through the whole packfile in order to compute the packfiles. While we are able to shed a bit of CPU time because we can stop emitting some of the objects, we're still I/O-bottlenecked. The second part of the series thus expands the filters so that they can make use of bitmap indices for some of the filters, if available. This allows us to efficiently answer the question where to find all objects of a specific type, and thus we can avoid scanning through the packfile and instead directly look up relevant objects, leading to a significant speedup: Benchmark 1: git cat-file --batch-check --batch-all-objects --unordered --buffer --no-objects-filter Time (mean ± σ): 82.806 s ± 6.363 s [User: 30.956 s, System: 8.264 s] Range (min … max): 73.936 s … 89.690 s 10 runs Benchmark 2: git cat-file --batch-check --batch-all-objects --unordered --buffer --objects-filter=object:type=tag Time (mean ± σ): 20.8 ms ± 1.3 ms [User: 6.1 ms, System: 14.5 ms] Range (min … max): 18.2 ms … 23.6 ms 127 runs Benchmark 3: git cat-file --batch-check --batch-all-objects --unordered --buffer --objects-filter=object:type=commit Time (mean ± σ): 1.551 s ± 0.008 s [User: 1.401 s, System: 0.147 s] Range (min … max): 1.541 s … 1.566 s 10 runs Benchmark 4: git cat-file --batch-check --batch-all-objects --unordered --buffer --objects-filter=object:type=tree Time (mean ± σ): 11.169 s ± 0.046 s [User: 10.076 s, System: 1.063 s] Range (min … max): 11.114 s … 11.245 s 10 runs Benchmark 5: git cat-file --batch-check --batch-all-objects --unordered --buffer --objects-filter=object:type=blob Time (mean ± σ): 67.342 s ± 3.368 s [User: 20.318 s, System: 7.787 s] Range (min … max): 62.836 s … 73.618 s 10 runs Benchmark 6: git cat-file --batch-check --batch-all-objects --unordered --buffer --objects-filter=blob:none Time (mean ± σ): 13.032 s ± 0.072 s [User: 11.638 s, System: 1.368 s] Range (min … max): 12.960 s … 13.199 s 10 runs Summary git cat-file --batch-check --batch-all-objects --unordered --buffer --objects-filter=object:type=tag 74.75 ± 4.61 times faster than git cat-file --batch-check --batch-all-objects --unordered --buffer --objects-filter=object:type=commit 538.17 ± 33.17 times faster than git cat-file --batch-check --batch-all-objects --unordered --buffer --objects-filter=object:type=tree 627.98 ± 38.77 times faster than git cat-file --batch-check --batch-all-objects --unordered --buffer --objects-filter=blob:none 3244.93 ± 257.23 times faster than git cat-file --batch-check --batch-all-objects --unordered --buffer --objects-filter=object:type=blob 3990.07 ± 392.72 times faster than git cat-file --batch-check --batch-all-objects --unordered --buffer --no-objects-filter We now directly scale with the number of objects of a specific type contained in the packfile instead of scaling with the overall number of objects. It's quite fun to see how the math plays out: if you sum up the times for each of the types you arrive at the time for the unfiltered case. Thanks! Patrick [1]: https://github.com/chromium/chromium.git --- Patrick Steinhardt (9): builtin/cat-file: rename variable that tracks usage builtin/cat-file: wire up an option to filter objects builtin/cat-file: support "blob:none" objects filter builtin/cat-file: support "blob:limit=" objects filter builtin/cat-file: support "object:type=" objects filter pack-bitmap: expose function to iterate over bitmapped objects pack-bitmap: introduce function to check whether a pack is bitmapped builtin/cat-file: deduplicate logic to iterate over all objects builtin/cat-file: use bitmaps to efficiently filter by object type Documentation/git-cat-file.adoc | 16 +++ builtin/cat-file.c | 225 +++++++++++++++++++++++++++++----------- builtin/pack-objects.c | 3 +- builtin/rev-list.c | 3 +- pack-bitmap.c | 80 +++++++++----- pack-bitmap.h | 19 +++- reachable.c | 3 +- t/t1006-cat-file.sh | 77 ++++++++++++++ 8 files changed, 339 insertions(+), 87 deletions(-) --- base-commit: a554262210b4a2ee6fa2d594e1f09f5830888c56 change-id: 20250220-pks-cat-file-object-type-filter-9140c0ed5ee1