Re: [PATCH v2 2/2] midx: stop repeatedly looking up nonexistent packfiles

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 22, 2025 at 10:08:20PM -0400, Jeff King wrote:
> > (As an aside, I think I recall you suggesting a while ago that it might
> > be interesting to define "global" things with a different type than
> > "local" ones to prevent this sort of confusion. That would allow us to
> > keep both "pack_int_id" and the return value of "midx_for_pack()" in
> > scope at the same time, without the possibility of using one when you
> > meant to use the other.)
>
> Yeah. The trouble is that it becomes awkward in C, since the language
> will happily intermix two integer typedefs. So you have to wrap them in
> a struct and access the struct fields everywhere.

Yup :-<.

> > >   - There's a similar case in midx-write.c:want_included_pack(). That
> > >     one seems to have the same local/global confusion, but I do not
> > >     obviously see anything preventing it from being fed a non-base midx.
> > >     So it might possibly be buggy?
> >
> > Yeah, this spot is definitely broken. At minimum it would need something
> > like:
> >
> > --- 8< ---
> > diff --git a/midx-write.c b/midx-write.c
> > index 0897cbd829..54a04f7b75 100644
> > --- a/midx-write.c
> > +++ b/midx-write.c
> > @@ -1634,9 +1634,10 @@ static int want_included_pack(struct repository *r,
> >  			      uint32_t pack_int_id)
> >  {
> >  	struct packed_git *p;
> > +	midx_for_pack(&m, pack_int_id);
> >  	if (prepare_midx_pack(r, m, pack_int_id))
> >  		return 0;
> > -	p = m->packs[pack_int_id];
> > +	p = m->packs[pack_int_id - m->num_packs_in_base];
> >  	if (!pack_kept_objects && p->pack_keep)
> >  		return 0;
> >  	if (p->is_cruft)
> > --- >8 ---
>
> Yep, that's exactly what I was envisioning. I guess it's probably
> possible to trigger in practice by writing a new midx based on an
> existing incremental state. I'll let you figure that part out. :)

Hmm. I thought that this spot was broken last night, but looking at it
again today I think that it is actually OK. I started writing an
analysis of why in response to your email, and then decided to throw it
away since those details really should live in the patch message. Here's
what I came up with:

--- 8< ---
Subject: [PATCH] midx-write.c: guard against incremental MIDXs in
 want_included_pack()

The function want_included_pack() is used to determine whether or not a
the packfile corresponding to some given pack_int_id should be included
in a 'git multi-pack-index repack' operation.

This spot looks like it would be broken, particularly in:

    struct packed_git *p;
    if (prepare_midx_pack(r, m, pack_int_id))
        return 0;
    p = m->packs[pack_int_id];

, when pack_int_id is greater than m->num_pack_in_base (supposing that
m->num_packs_in_base is non-zero, or equivalently that m->base_midx is
non-NULL).

Suppose we have two MIDXs in an incremental MIDX chain, each having two
packs:

  - M0 = {packs: [P0, P1], base_midx: NULL}
  - M1 = {packs: [P2, P3], base_midx: M0}

noting that each pack is identified by its global pack_int_id within the
chain.

So if you do something like:

    want_included_pack(the_repository, M1, pack_kept_objects, 2);

that function will call prepare_midx_pack(), which is smart enough to
realize that the pack of interest is in the current layer (M1), and
knows how to adjust its global pack_int_id into an index into the
current layer's 'packs' array.

But the following line:

    p = m->packs[pack_int_id]; /* m is M1, pack_int_id is 2 */

looks broken, since each layer of the MIDX only maintains an array of
the packs stored within that layer, and 'm' wasn't adjusted back to
point at M1->base_midx (M0).

The right thing to do would be:

    struct packed_git *p;
    if (prepare_midx_pack(r, m, pack_int_id))
        return 0;

    /* open-code midx.c::midx_for_pack(), which is private */
    while (m && pack_int_id < m->num_packs_in_base)
        m = m->base_midx;

    if (!m)
        BUG("broken midx?");
    if (pack_int_id >= m->num_packs + m->num_packs_in_base)
        BUG("broken pack_int_id?");

    p = m->packs[pack_int_id - m->num_packs_in_base];

But that would be overkill, since this function never deals with
incremental MIDXs having more than one layer! To see why, observe that
want_included_pack() has two callers:

  - midx-write.c::fill_included_packs_all()
  - midx-write.c::fill_included_packs_batch()

and those functions' sole caller is in midx-write.c::midx_repack(),
which dispatches a call to one or the other depending on whether or not
the batch_size is non-zero.

And at the very top of midx_repack(), we have a guard against
non-trivial incremental MIDX chains:

    if (m->base_midx)
        die(_("cannot repack an incremental multi-pack-index"));

So want_included_pack() is OK becuase it will never encounter a
situation where it has to chase backwards through the '->base_midx'
pointer. But that is not immediately clear from reading the code, and is
too fragile for my comfort. Make this more clear by adding an ASSERT()
to the above effect.

Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx>
---
 midx-write.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/midx-write.c b/midx-write.c
index 0897cbd829..991c42d1dc 100644
--- a/midx-write.c
+++ b/midx-write.c
@@ -1634,6 +1634,9 @@ static int want_included_pack(struct repository *r,
 			      uint32_t pack_int_id)
 {
 	struct packed_git *p;
+
+	ASSERT(m && !m->base_midx);
+
 	if (prepare_midx_pack(r, m, pack_int_id))
 		return 0;
 	p = m->packs[pack_int_id];
@@ -1653,6 +1656,8 @@ static void fill_included_packs_all(struct repository *r,
 	uint32_t i;
 	int pack_kept_objects = 0;

+	ASSERT(m && !m->base_midx);
+
 	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);

 	for (i = 0; i < m->num_packs; i++) {
@@ -1673,6 +1678,8 @@ static void fill_included_packs_batch(struct repository *r,
 	struct repack_info *pack_info;
 	int pack_kept_objects = 0;

+	ASSERT(m && !m->base_midx);
+
 	CALLOC_ARRAY(pack_info, m->num_packs);

 	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
--
2.49.0.221.g485f5f8636.dirty
--- >8 ---

> > >     Likewise fill_included_packs_batch() in the same file.
> >
> > I think this one is actually OK for the same reason as the
> > expire_midx_packs() case. Its sole caller in midx_repack() has:
> >
> >     if (m->base_midx)
> >         die(_("cannot repack an incremental multi-pack-index"));
> >
> > , so we are OK there. We might want to add an ASSERT() in
> > fill_included_packs_batch() to make it clearer, though.
>
> Ah, good. Yes, I agree that an assertion would be a nice bit of
> documentation/safety. Though for these cases I think they only care
> about the pack (not the containing midx), so changing the return type of
> prepare_midx_pack() might be an even easier way to get that safety.

Ironically, the same argument applies for this function (and the _all()
variant) as well ;-).

Thanks,
Taylor




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux