Re: [PATCH 5/8] packfile: refactor `get_multi_pack_index()` to work on sources

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 10, 2025 at 07:56:28PM -0400, Taylor Blau wrote:
> On Wed, Jul 09, 2025 at 09:54:53AM +0200, Patrick Steinhardt wrote:
> > diff --git a/builtin/repack.c b/builtin/repack.c
> > index 9bbf032b6dd..5956df5d927 100644
> > --- a/builtin/repack.c
> > +++ b/builtin/repack.c
> > @@ -218,9 +218,9 @@ static void mark_packs_for_deletion(struct existing_packs *existing,
> >  static void remove_redundant_pack(const char *dir_name, const char *base_name)
> >  {
> >  	struct strbuf buf = STRBUF_INIT;
> > -	struct multi_pack_index *m = get_local_multi_pack_index(the_repository);
> > +	struct multi_pack_index *m = get_multi_pack_index(the_repository->objects->sources);
> 
> Is the first source always guaranteed to be the local one? I assume that the
> answer here is "yes", and there are certainly other places in the code
> where we make a similar assumption. But just wanted to make sure as
> this popped into my head while reading.
> 
> >  	strbuf_addf(&buf, "%s.pack", base_name);
> > -	if (m && midx_contains_pack(m, buf.buf))
> > +	if (m && m->local && midx_contains_pack(m, buf.buf))
> 
> ...hmm, maybe not?

The answer is "almost always". The only exception is in case there is a
temporary object source added via `odb_set_temporary_primary_source()`,
for example for quarantine directories.

Generally speaking this patch here doesn't really change anything, and
we could just as well drop the `m->local` part above. But I think that
overall we're not doing a good job to track the local object source, so
it felt safer to me to add the above guard.

> > diff --git a/midx-write.c b/midx-write.c
> > index f2cfb85476e..c1ae62d3549 100644
> > --- a/midx-write.c
> > +++ b/midx-write.c
> > @@ -916,26 +916,8 @@ static int write_midx_bitmap(struct write_midx_context *ctx,
> >  static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
> >  							const char *object_dir)
> >  {
> > -	struct multi_pack_index *result = NULL;
> > -	struct multi_pack_index *cur;
> > -	char *obj_dir_real = real_pathdup(object_dir, 1);
> > -	struct strbuf cur_path_real = STRBUF_INIT;
> > -
> > -	/* Ensure the given object_dir is local, or a known alternate. */
> > -	odb_find_source(r->objects, obj_dir_real);
> > -
> > -	for (cur = get_multi_pack_index(r); cur; cur = cur->next) {
> > -		strbuf_realpath(&cur_path_real, cur->object_dir, 1);
> > -		if (!strcmp(obj_dir_real, cur_path_real.buf)) {
> > -			result = cur;
> > -			goto cleanup;
> > -		}
> > -	}
> > -
> > -cleanup:
> > -	free(obj_dir_real);
> > -	strbuf_release(&cur_path_real);
> > -	return result;
> > +	struct odb_source *source = odb_find_source(r->objects, object_dir);
> > +	return get_multi_pack_index(source);
> 
> When I first read this I wondered what would happen if we passed in an
> unknown object_dir such that odb_find_source() returned NULL. But that
> function will never return NULL, and instead will die() if given a bogus
> object_dir.
> 
> So this is fine, though I would have imagined that we'd return NULL
> within odb_find_source() and let the caller die() (or not).

Fully agreed. I've got a follow-up patch series that does this
refactoring.

> > diff --git a/object-name.c b/object-name.c
> > index ddafe7f9b13..1e7fdcb90a8 100644
> > --- a/object-name.c
> > +++ b/object-name.c
> > @@ -198,16 +198,19 @@ static void unique_in_pack(struct packed_git *p,
> >
> >  static void find_short_packed_object(struct disambiguate_state *ds)
> >  {
> > -	struct multi_pack_index *m;
> >  	struct packed_git *p;
> >
> >  	/* Skip, unless oids from the storage hash algorithm are wanted */
> >  	if (ds->bin_pfx.algo && (&hash_algos[ds->bin_pfx.algo] != ds->repo->hash_algo))
> >  		return;
> >
> > -	for (m = get_multi_pack_index(ds->repo); m && !ds->ambiguous;
> > -	     m = m->next)
> > -		unique_in_midx(m, ds);
> > +	odb_prepare_alternates(ds->repo->objects);
> > +	for (struct odb_source *source = ds->repo->objects->sources; source && !ds->ambiguous; source = source->next) {
> > +		struct multi_pack_index *m = get_multi_pack_index(source);
> > +		if (m)
> > +			unique_in_midx(m, ds);
> > +	}
> > +
> 
> Makes sense, though now having seen this pattern a few times, I am
> wondering if it would be worth it to add a utility function that takes a
> callback and iterates over the various MIDXs. But perhaps that is taking
> DRY-ing things up a little too far ;-).
> 
> For what it's worth, I do think that what you wrote here makes more
> logical sense: MIDXs are tied to individual alternate object DBs, which
> here means that there is one MIDX per "struct odb_source". It just is a
> little more verbose to type out.

I was wondering whether it would make sense to add a looping macro, but
I ultimately decided against it as it doesn't make too much of a
difference to really matter. I wouldn't mind adding it though if you or
others feel strongly about it.

Patrick




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux