Re: [PATCH v2 4/4] midx: return a `packed_git` pointer from `prepare_midx_pack()`

Taylor Blau <me@xxxxxxxxxxxx> · Tue, 3 Jun 2025 18:27:14 -0400

On Fri, May 30, 2025 at 02:50:34AM -0400, Jeff King wrote:
> On Wed, May 28, 2025 at 06:59:09PM -0400, Taylor Blau wrote:
>
> > Let's instead have prepare_midx_pack() return a pointer to the
> > packed_git structure itself, hiding the above as an implementation
> > detail of prepare_midx_pack(). This patch turns the above snippet into:
> >
> >     struct packed_git *p = prepare_midx_pack(the_repository, some_midx,
> >                                              some_pack_int_id);
> >     if (!p)
> >         die("could not load pack xyz");
> >
> > making it far easier and less error-prone to access packs by their
> > pack_int_id in a MIDX chain.
>
> So obviously I like this direction, but a few small comments:
>
> > (In the future, we may want to consider similar treatment for, e.g., the
> > pack_names array. Likewise, it might make sense to rename the "packs"
> > member of the MIDX structure to suggest that it shouldn't be accessed
> > directly outside of midx.c.)
>
> Is this note still valid for v2? It looks like patch 1 adds
> nth_midxed_pack_name() and tries to use it everywhere.

Yeah, we should get rid of this. I had written it before I wrote what is
now the first patch in this series, and neglected to remove it before
sending out the latest round.

> > @@ -1649,9 +1646,9 @@ static int want_included_pack(struct repository *r,
> >
> >  	ASSERT(m && !m->base_midx);
> >
> > -	if (prepare_midx_pack(r, m, pack_int_id))
> > +	p = prepare_midx_pack(r, m, pack_int_id);
> > +	if (!p)
> >  		return 0;
> > -	p = m->packs[pack_int_id];
> >  	if (!pack_kept_objects && p->pack_keep)
> >  		return 0;
> >  	if (p->is_cruft)
>
> The ASSERT() in the context is from earlier in the series. But do we
> need it once we have this patch? We no longer look at pack_int_id except
> to pass it to prepare_midx_pack(), which handles non-base midx slices
> just fine.
>
> So we could loosen the assertion now. Or we could wait for later when
> somebody wants/needs to do so, but I'm not sure how easy they would find
> it to dig in the history. They would find the commit that added the
> ASSERT(), but may not realize that this later commit made it OK to
> loosen.
>
> I didn't check the other ASSERT() spots from that earlier patch (IIRC,
> some of them may actually look use the pack_int_id for other things, and
> wouldn't be ready for non-base slices).

We could loosen the assertion here, but part of me likes keeping it as a
self-documenting note that this function is only intended to be used for
non-incremental MIDXs.

> > -int prepare_midx_pack(struct repository *r, struct multi_pack_index *m,
> > -		      uint32_t pack_int_id)
> > +struct packed_git *prepare_midx_pack(struct repository *r,
> > +				     struct multi_pack_index *m,
> > +				     uint32_t pack_int_id)
>
> We used to return "1" for failure and "0" for success. Now we're
> reversed: we return NULL for failure and non-zero for success.
>
> So code like:
>
>   if (prepare_midx_pack(...))
> 	return error("yikes");
>
> needs to be updated, but the compiler won't help us because it is happy
> to convert both an int and a pointer into a boolean check.
>
> Should we rename the function to make sure we catch any callers for
> topics in flight?
>
> I'd have thought we could call it nth_midxed_pack(), but that seems to
> exist already, with the caveat that it never prepares the pack, but only
> serves what's in the cache. I wonder if we could simply replace that
> with what prepare_midx_pack() does, but it may be more conservative to
> leave the two separate. So I guess nth_midxed_pack_load() or something.

In general there aren't a ton of in-flight changes in the MIDX code at
any given time, so I think we could get away without renaming it. But I
don't mind erring on the side of caution here, either.

Thanks,
Taylor