On Mon, Apr 14, 2025 at 08:11:22PM -0700, Elijah Newren wrote: > On Mon, Apr 14, 2025 at 1:06 PM Taylor Blau <me@xxxxxxxxxxxx> wrote: > > > > In ddee3703b3 (builtin/repack.c: add cruft packs to MIDX during > > geometric repack, 2022-05-20), repack began adding cruft pack(s) to the > > MIDX with '--write-midx' to ensure that the resulting MIDX was always > > closed under reachability in order to generate reachability bitmaps. > > > > Suppose you have a once-unreachable object packed in a cruft pack, which > > later on becomes reachable from one or more objects in a geometrically > > repacked pack. That once-unreachable object *won't* appear in the new > > pack, since the cruft pack was specified as neither included nor > > excluded to 'pack-objects --stdin-packs'. > > I believe you are talking about the state before your series (i.e., > this is carrying on from the previous paragraph), but it reads as > though you are talking about the state after the first seven patches > of this series. Some kind of connection wording to clarify would > really help here. Sure. > > If the bitmap selection > > process picks one or more commits which reach the once-unreachable > > objects, commit ddee3703b3 ensures that the MIDX will be closed under > > reachability. Without it, we would fail to generate a MIDX bitmap. > > After reading this part, I had to go back and re-read and figure out > what point in time everything was referring to. Yeah, this is confusing to me too after reading it back. I made some tweaks that I think clarify things. > > ddee3703b3 alludes to the fact that this is sub-optimal by saying > > > > [...] it's desirable to avoid including cruft packs in the MIDX > > because it causes the MIDX to store a bunch of objects which are > > likely to get thrown away. > > > > , which is true, but hides an even larger problem. If repositories > > rarely prune their unreachable objects and/or have many of them, the > > MIDX must keep track of a large number of objects which bloats the MIDX > > and slows down object lookup. > > > > This is doubly unfortunate because the vast majority of objects in cruft > > pack(s) are unlikely to be read, but object reads that go through the > > MIDX have to search through them anyway. > > "have to search through them"? That could be read to suggest those > individual objects are read, rather than just traversed over. Maybe > "...unlikely to be read, so the enlarged MIDX is for mostly tracking > known-to-likely-be-irrelevant objects", or something like that? Thanks for pointing out... I clarified this one as well. > > This patch causes geometrically-repacked packs to contain a copy of any > > once-unreachable object(s) with 'git pack-objects --stdin-packs=follow', > > allowing us to avoid including any cruft packs in the MIDX. This is > > because a sequence of geometrically-repacked packs that were all > > generated with '--stdin-packs=follow' are guaranteed to have their union > > be closed under reachability. > > > > Note that you cannot guarantee that a collection of packs is closed > > under reachability if not all of them were generated with following as > > maybe: ...with "follow" as above. "follow" or "following" feels like > it needs quotes so the reader understands its meant as the name of a > mode, rather than a verb in the sentence. > > > above. One tell-tale sign that not all geometrically-repacked packs in > > the MIDX were generated with following is to see if there is a pack in > > same here with "following"...and below. Great calls on both, thanks. > > @@ -808,26 +886,55 @@ static void midx_included_packs(struct string_list *include, > > } > > } > > > > - for_each_string_list_item(item, &existing->cruft_packs) { > > + if (midx_must_contain_cruft || > > + midx_has_unknown_packs(midx_pack_names, midx_pack_names_nr, > > + include, geometry, existing)) { > > /* > > - * When doing a --geometric repack, there is no need to check > > - * for deleted packs, since we're by definition not doing an > > - * ALL_INTO_ONE repack (hence no packs will be deleted). > > - * Otherwise we must check for and exclude any packs which are > > - * enqueued for deletion. > > + * If there are one or more unknown pack(s) present (see > > + * midx_has_unknown_packs() for what makes a pack > > + * "unknown") in the MIDX before the repack, keep them > > + * as they may be required to form a reachability > > + * closure if the MIDX is bitmapped. > > * > > - * So we could omit the conditional below in the --geometric > > - * case, but doing so is unnecessary since no packs are marked > > - * as pending deletion (since we only call > > - * `mark_packs_for_deletion()` when doing an all-into-one > > - * repack). > > + * For example, a cruft pack can be required to form a > > + * reachability closure if the MIDX is bitmapped and one > > + * or more of its selected commits reaches a once-cruft > > + * object that was later made reachable. > > The antecedent of "its" is unclear here; just spell it out to reduce > how much thinking the reader needs to do? Eek, good suggestion again. Thanks, I fixed it up and made the antecedent explicit. > > */ > > - if (pack_is_marked_for_deletion(item)) > > - continue; > > + for_each_string_list_item(item, &existing->cruft_packs) { > > + /* > > + * When doing a --geometric repack, there is no > > + * need to check for deleted packs, since we're > > + * by definition not doing an ALL_INTO_ONE > > + * repack (hence no packs will be deleted). > > + * Otherwise we must check for and exclude any > > + * packs which are enqueued for deletion. > > + * > > + * So we could omit the conditional below in the > > + * --geometric case, but doing so is unnecessary > > + * since no packs are marked as pending > > + * deletion (since we only call > > + * `mark_packs_for_deletion()` when doing an > > + * all-into-one repack). > > + */ > > + if (pack_is_marked_for_deletion(item)) > > + continue; > > > > - strbuf_reset(&buf); > > - strbuf_addf(&buf, "%s.idx", item->string); > > - string_list_insert(include, buf.buf); > > + strbuf_reset(&buf); > > + strbuf_addf(&buf, "%s.idx", item->string); > > + string_list_insert(include, buf.buf); > > + } > > + } else { > > + /* > > + * Modern versions of Git will write new copies of > > + * once-cruft objects when doing a --geometric repack. > > "Modern versions of Git" -> "Modern versions of Git with the > appropriate config setting" ? Heh. Great catch. Can you tell this part was written before I added the configuration option? ;-) Thanks, Taylor