Re: [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 22, 2025 at 09:22:23PM -0400, Taylor Blau wrote:

> > I think there's a subtlety here with incremental midx's, in that a pack
> > id can be "global" within the whole midx chain, or a local index into a
> > specific chain element's list of packs.
> 
> If you'll indulge me in a bit of pedantry for a moment, I would add that
> the pack_int_id is *always* global within the whole MIDX chain. It only
> happens to additionally be the correct local index when
> m->num_packs_in_base is zero.

Yeah, I agree with that...

> The thing here:
> 
>     m->packs[pack_int_id - m->num_packs_in_base];
>              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> is indexing into the current layer's pack array, which only contains the
> packs at that layer, so the index there is no longer really a
> pack_int_id at all.

...and this...

> > In fill_midx_entry(), I think we get such a global id back from
> > nth_midxed_pack_int_id(). And then when we hand that to
> > prepare_midx_pack(), it is converted into a local midx/id pair with:
> >
> >   pack_int_id = midx_for_pack(&m, pack_int_id);
> 
> ...this confusion is really my fault, since the thing midx_for_pack() is
> returning isn't really a pack_int_id in the classic sense at all. I
> think I chose this name to avoid making the patch noisy, and to avoid
> the possibility of accidentally using the global identifier when you
> meant to use the local index.

...but yeah, this is the part that got me. ;) I think elsewhere you
called it local_pack_int_id, in nth_midxed_pack(). Which made things
more clear (though still calls it a pack_int_id ;) ).

> (As an aside, I think I recall you suggesting a while ago that it might
> be interesting to define "global" things with a different type than
> "local" ones to prevent this sort of confusion. That would allow us to
> keep both "pack_int_id" and the return value of "midx_for_pack()" in
> scope at the same time, without the possibility of using one when you
> meant to use the other.)

Yeah. The trouble is that it becomes awkward in C, since the language
will happily intermix two integer typedefs. So you have to wrap them in
a struct and access the struct fields everywhere.

> >   - There's a similar case in midx-write.c:want_included_pack(). That
> >     one seems to have the same local/global confusion, but I do not
> >     obviously see anything preventing it from being fed a non-base midx.
> >     So it might possibly be buggy?
> 
> Yeah, this spot is definitely broken. At minimum it would need something
> like:
> 
> --- 8< ---
> diff --git a/midx-write.c b/midx-write.c
> index 0897cbd829..54a04f7b75 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -1634,9 +1634,10 @@ static int want_included_pack(struct repository *r,
>  			      uint32_t pack_int_id)
>  {
>  	struct packed_git *p;
> +	midx_for_pack(&m, pack_int_id);
>  	if (prepare_midx_pack(r, m, pack_int_id))
>  		return 0;
> -	p = m->packs[pack_int_id];
> +	p = m->packs[pack_int_id - m->num_packs_in_base];
>  	if (!pack_kept_objects && p->pack_keep)
>  		return 0;
>  	if (p->is_cruft)
> --- >8 ---

Yep, that's exactly what I was envisioning. I guess it's probably
possible to trigger in practice by writing a new midx based on an
existing incremental state. I'll let you figure that part out. :)

> >     Likewise fill_included_packs_batch() in the same file.
> 
> I think this one is actually OK for the same reason as the
> expire_midx_packs() case. Its sole caller in midx_repack() has:
> 
>     if (m->base_midx)
>         die(_("cannot repack an incremental multi-pack-index"));
> 
> , so we are OK there. We might want to add an ASSERT() in
> fill_included_packs_batch() to make it clearer, though.

Ah, good. Yes, I agree that an assertion would be a nice bit of
documentation/safety. Though for these cases I think they only care
about the pack (not the containing midx), so changing the return type of
prepare_midx_pack() might be an even easier way to get that safety.

-Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux