On Thu, Aug 21, 2025 at 09:39:12AM +0200, Patrick Steinhardt wrote: > We have two different functions to retrieve packfiles for a packfile > store: > > - `get_packed_git()` returns the list of packfiles after having called > `prepare_packed_git()`. > > - `get_all_packs()` calls `prepare_packed_git()`, as well, but also > calls `prepare_midx_pack()` for each pack. Yeah, having two of these functions that are named so similarly as to suggest they do the same thing (even though they don't) is unfortunate, and I am glad that we are looking at it here. > This means that the latter function also properly loads the info of > whether or not a packfile is part of a multi-pack index. Preparing this > extra information also shouldn't be significantly more expensive: Right; get_packed_git() only loads the non-MIDX'd packs, and get_all_packs() loads everything (regardless whether or not a pack is part of the MIDX or not). Are all of the get_packed_git() callers prepared to handle packs that are in the MIDX? Looking through them: - builtin/gc.c::incremental_repack_auto_condition() skips over 'p->multi_pack_index', so this one is fine to convert. - builtin/grep.c::cmd_grep() calls get_packed_git() but doesn't actually use the result, so this should be fine to convert, though I think there is some subtlty here. - builtin/pack-objects.c::want_object_in_pack_mtime() takes a separate pass over the MIDX'd packs before calling get_packed_git_mru() (which itself calls prepare_packed_git()). I think in practice this is OK, since we will have already handled the MIDX'd packs, but this function is now iterating over packs in the MIDX twice, so it may be worth adding a "if (p->multi_pack_index) continue;" in there. - object-name.c::find_short_packed_object() handles MIDX'd packs separately, and unique_in_pack() is a noop for MIDX'd packs, so this one is fine. - object-name.c::find_abbrev_len_packed() is OK for the same reasons. So I think that want_object_in_pack_mtime() may need a small tweak, and I am not 100% certain that cmd_grep() is OK to convert. > - We have already loaded all packfiles via `prepare_packed_git_one()`. > So given that multi-pack indices may only refer to packfiles in the > same object directory we know that we already loaded each packfile. > > - The multi-pack index was prepared via `packfile_store_prepare()` > already, which calls `prepare_multi_pack_index_one()`. > > - So all that remains to be done is to look up the index of the pack > in its multi-pack index so that we can store that info in both the > pack itself and the MIDX. This clearly shows that get_all_packs() is not meaningfully more expensive than get_packed_git(), but I think that may be obscuring some of the details above on why it is OK (or not) to transition these calls over. > So it is somewhat confusing to readers that one of these two functions > claims to load "all" packfiles while the other one doesn't, even though > the ultimate difference is way more nuanced. > > Convert all of these sites to use `get_all_packs()` instead and remove > `get_packed_git()`. There doesn't seem to be a good reason to discern > these two functions. The last sentence here threw me off a bit. I think the patch message would benefit from some of the reasoning above about why it is OK to transition callers over from one function to the other. As an aside, I am not convinced that having one caller (in pack-objects) that cares about whether or not it sees a MIDX'd pack is that great of a reason to have separate functions. But I am also not convinced that it isn't either. If we kept both, I think they would benefit from having more distinct names, like get_all_packs() and get_non_midx_packs() or something. Thanks, Taylor