On Thu, May 22, 2025 at 10:08:20PM -0400, Jeff King wrote: > > (As an aside, I think I recall you suggesting a while ago that it might > > be interesting to define "global" things with a different type than > > "local" ones to prevent this sort of confusion. That would allow us to > > keep both "pack_int_id" and the return value of "midx_for_pack()" in > > scope at the same time, without the possibility of using one when you > > meant to use the other.) > > Yeah. The trouble is that it becomes awkward in C, since the language > will happily intermix two integer typedefs. So you have to wrap them in > a struct and access the struct fields everywhere. Yup :-<. > > > - There's a similar case in midx-write.c:want_included_pack(). That > > > one seems to have the same local/global confusion, but I do not > > > obviously see anything preventing it from being fed a non-base midx. > > > So it might possibly be buggy? > > > > Yeah, this spot is definitely broken. At minimum it would need something > > like: > > > > --- 8< --- > > diff --git a/midx-write.c b/midx-write.c > > index 0897cbd829..54a04f7b75 100644 > > --- a/midx-write.c > > +++ b/midx-write.c > > @@ -1634,9 +1634,10 @@ static int want_included_pack(struct repository *r, > > uint32_t pack_int_id) > > { > > struct packed_git *p; > > + midx_for_pack(&m, pack_int_id); > > if (prepare_midx_pack(r, m, pack_int_id)) > > return 0; > > - p = m->packs[pack_int_id]; > > + p = m->packs[pack_int_id - m->num_packs_in_base]; > > if (!pack_kept_objects && p->pack_keep) > > return 0; > > if (p->is_cruft) > > --- >8 --- > > Yep, that's exactly what I was envisioning. I guess it's probably > possible to trigger in practice by writing a new midx based on an > existing incremental state. I'll let you figure that part out. :) Hmm. I thought that this spot was broken last night, but looking at it again today I think that it is actually OK. I started writing an analysis of why in response to your email, and then decided to throw it away since those details really should live in the patch message. Here's what I came up with: --- 8< --- Subject: [PATCH] midx-write.c: guard against incremental MIDXs in want_included_pack() The function want_included_pack() is used to determine whether or not a the packfile corresponding to some given pack_int_id should be included in a 'git multi-pack-index repack' operation. This spot looks like it would be broken, particularly in: struct packed_git *p; if (prepare_midx_pack(r, m, pack_int_id)) return 0; p = m->packs[pack_int_id]; , when pack_int_id is greater than m->num_pack_in_base (supposing that m->num_packs_in_base is non-zero, or equivalently that m->base_midx is non-NULL). Suppose we have two MIDXs in an incremental MIDX chain, each having two packs: - M0 = {packs: [P0, P1], base_midx: NULL} - M1 = {packs: [P2, P3], base_midx: M0} noting that each pack is identified by its global pack_int_id within the chain. So if you do something like: want_included_pack(the_repository, M1, pack_kept_objects, 2); that function will call prepare_midx_pack(), which is smart enough to realize that the pack of interest is in the current layer (M1), and knows how to adjust its global pack_int_id into an index into the current layer's 'packs' array. But the following line: p = m->packs[pack_int_id]; /* m is M1, pack_int_id is 2 */ looks broken, since each layer of the MIDX only maintains an array of the packs stored within that layer, and 'm' wasn't adjusted back to point at M1->base_midx (M0). The right thing to do would be: struct packed_git *p; if (prepare_midx_pack(r, m, pack_int_id)) return 0; /* open-code midx.c::midx_for_pack(), which is private */ while (m && pack_int_id < m->num_packs_in_base) m = m->base_midx; if (!m) BUG("broken midx?"); if (pack_int_id >= m->num_packs + m->num_packs_in_base) BUG("broken pack_int_id?"); p = m->packs[pack_int_id - m->num_packs_in_base]; But that would be overkill, since this function never deals with incremental MIDXs having more than one layer! To see why, observe that want_included_pack() has two callers: - midx-write.c::fill_included_packs_all() - midx-write.c::fill_included_packs_batch() and those functions' sole caller is in midx-write.c::midx_repack(), which dispatches a call to one or the other depending on whether or not the batch_size is non-zero. And at the very top of midx_repack(), we have a guard against non-trivial incremental MIDX chains: if (m->base_midx) die(_("cannot repack an incremental multi-pack-index")); So want_included_pack() is OK becuase it will never encounter a situation where it has to chase backwards through the '->base_midx' pointer. But that is not immediately clear from reading the code, and is too fragile for my comfort. Make this more clear by adding an ASSERT() to the above effect. Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx> --- midx-write.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/midx-write.c b/midx-write.c index 0897cbd829..991c42d1dc 100644 --- a/midx-write.c +++ b/midx-write.c @@ -1634,6 +1634,9 @@ static int want_included_pack(struct repository *r, uint32_t pack_int_id) { struct packed_git *p; + + ASSERT(m && !m->base_midx); + if (prepare_midx_pack(r, m, pack_int_id)) return 0; p = m->packs[pack_int_id]; @@ -1653,6 +1656,8 @@ static void fill_included_packs_all(struct repository *r, uint32_t i; int pack_kept_objects = 0; + ASSERT(m && !m->base_midx); + repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects); for (i = 0; i < m->num_packs; i++) { @@ -1673,6 +1678,8 @@ static void fill_included_packs_batch(struct repository *r, struct repack_info *pack_info; int pack_kept_objects = 0; + ASSERT(m && !m->base_midx); + CALLOC_ARRAY(pack_info, m->num_packs); repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects); -- 2.49.0.221.g485f5f8636.dirty --- >8 --- > > > Likewise fill_included_packs_batch() in the same file. > > > > I think this one is actually OK for the same reason as the > > expire_midx_packs() case. Its sole caller in midx_repack() has: > > > > if (m->base_midx) > > die(_("cannot repack an incremental multi-pack-index")); > > > > , so we are OK there. We might want to add an ASSERT() in > > fill_included_packs_batch() to make it clearer, though. > > Ah, good. Yes, I agree that an assertion would be a nice bit of > documentation/safety. Though for these cases I think they only care > about the pack (not the containing midx), so changing the return type of > prepare_midx_pack() might be an even easier way to get that safety. Ironically, the same argument applies for this function (and the _all() variant) as well ;-). Thanks, Taylor