Re: [PATCH 5/8] packfile: refactor `get_multi_pack_index()` to work on sources

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 09, 2025 at 09:54:53AM +0200, Patrick Steinhardt wrote:
> The function `get_multi_pack_index()` loads multi-pack indices via
> `prepare_packed_git()` and then returns the linked list of multi-pack
> indices that is stored in `struct object_database`. That list is in the
> process of being removed though in favor of storing the MIDX as part of
> the object database source it belongs to.
>
> Refactor `get_multi_pack_index()` so that it returns the multi-pack
> index for a single object source. Callers are now expected to call this
> function for each source they are interested in. This requires them to
> iterate through alternates, so we have to prepare alternate object
> sources before doing so.
>
> Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
> ---
>  builtin/pack-objects.c |  9 ++++++---
>  builtin/repack.c       |  4 ++--
>  midx-write.c           | 22 ++--------------------
>  object-name.c          | 21 ++++++++++++++-------
>  pack-bitmap.c          | 20 ++++++++++++++------
>  packfile.c             | 30 +++++++++++-------------------
>  packfile.h             |  3 +--
>  7 files changed, 50 insertions(+), 59 deletions(-)
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 5781dec9808..f889e69e07d 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -1701,7 +1701,6 @@ static int want_object_in_pack_mtime(const struct object_id *oid,
>  {
>  	int want;
>  	struct list_head *pos;
> -	struct multi_pack_index *m;
>
>  	if (!exclude && local && has_loose_object_nonlocal(oid))
>  		return 0;
> @@ -1721,9 +1720,13 @@ static int want_object_in_pack_mtime(const struct object_id *oid,
>  		*found_offset = 0;
>  	}
>
> -	for (m = get_multi_pack_index(the_repository); m; m = m->next) {
> +	odb_prepare_alternates(the_repository->objects);
> +
> +	for (struct odb_source *source = the_repository->objects->sources; source; source = source->next) {
> +		struct multi_pack_index *m = get_multi_pack_index(source);

All makes sense up to this point. I would suggest adding the following
to the top of this function:

    struct repository *r = the_repository;
    struct odb_source *source;

to shorten up these lines a bit, but I'm nit-picking so if you feel
strongly that the existing patch is clearer, I won't object.

>  		struct pack_entry e;
> -		if (fill_midx_entry(the_repository, oid, &e, m)) {
> +
> +		if (m && fill_midx_entry(the_repository, oid, &e, m)) {

Good.

>  			want = want_object_in_pack_one(e.p, oid, exclude, found_pack, found_offset, found_mtime);
>  			if (want != -1)
>  				return want;
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 9bbf032b6dd..5956df5d927 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -218,9 +218,9 @@ static void mark_packs_for_deletion(struct existing_packs *existing,
>  static void remove_redundant_pack(const char *dir_name, const char *base_name)
>  {
>  	struct strbuf buf = STRBUF_INIT;
> -	struct multi_pack_index *m = get_local_multi_pack_index(the_repository);
> +	struct multi_pack_index *m = get_multi_pack_index(the_repository->objects->sources);

Is the first source always guaranteed to be the local one? I assume that the
answer here is "yes", and there are certainly other places in the code
where we make a similar assumption. But just wanted to make sure as
this popped into my head while reading.

>  	strbuf_addf(&buf, "%s.pack", base_name);
> -	if (m && midx_contains_pack(m, buf.buf))
> +	if (m && m->local && midx_contains_pack(m, buf.buf))

...hmm, maybe not?

>  		clear_midx_file(the_repository);
>  	strbuf_insertf(&buf, 0, "%s/", dir_name);
>  	unlink_pack_path(buf.buf, 1);
> diff --git a/midx-write.c b/midx-write.c
> index f2cfb85476e..c1ae62d3549 100644
> --- a/midx-write.c
> +++ b/midx-write.c
> @@ -916,26 +916,8 @@ static int write_midx_bitmap(struct write_midx_context *ctx,
>  static struct multi_pack_index *lookup_multi_pack_index(struct repository *r,
>  							const char *object_dir)
>  {
> -	struct multi_pack_index *result = NULL;
> -	struct multi_pack_index *cur;
> -	char *obj_dir_real = real_pathdup(object_dir, 1);
> -	struct strbuf cur_path_real = STRBUF_INIT;
> -
> -	/* Ensure the given object_dir is local, or a known alternate. */
> -	odb_find_source(r->objects, obj_dir_real);
> -
> -	for (cur = get_multi_pack_index(r); cur; cur = cur->next) {
> -		strbuf_realpath(&cur_path_real, cur->object_dir, 1);
> -		if (!strcmp(obj_dir_real, cur_path_real.buf)) {
> -			result = cur;
> -			goto cleanup;
> -		}
> -	}
> -
> -cleanup:
> -	free(obj_dir_real);
> -	strbuf_release(&cur_path_real);
> -	return result;
> +	struct odb_source *source = odb_find_source(r->objects, object_dir);
> +	return get_multi_pack_index(source);

When I first read this I wondered what would happen if we passed in an
unknown object_dir such that odb_find_source() returned NULL. But that
function will never return NULL, and instead will die() if given a bogus
object_dir.

So this is fine, though I would have imagined that we'd return NULL
within odb_find_source() and let the caller die() (or not).

>  }
>
>  static int fill_packs_from_midx(struct write_midx_context *ctx,
> diff --git a/object-name.c b/object-name.c
> index ddafe7f9b13..1e7fdcb90a8 100644
> --- a/object-name.c
> +++ b/object-name.c
> @@ -198,16 +198,19 @@ static void unique_in_pack(struct packed_git *p,
>
>  static void find_short_packed_object(struct disambiguate_state *ds)
>  {
> -	struct multi_pack_index *m;
>  	struct packed_git *p;
>
>  	/* Skip, unless oids from the storage hash algorithm are wanted */
>  	if (ds->bin_pfx.algo && (&hash_algos[ds->bin_pfx.algo] != ds->repo->hash_algo))
>  		return;
>
> -	for (m = get_multi_pack_index(ds->repo); m && !ds->ambiguous;
> -	     m = m->next)
> -		unique_in_midx(m, ds);
> +	odb_prepare_alternates(ds->repo->objects);
> +	for (struct odb_source *source = ds->repo->objects->sources; source && !ds->ambiguous; source = source->next) {
> +		struct multi_pack_index *m = get_multi_pack_index(source);
> +		if (m)
> +			unique_in_midx(m, ds);
> +	}
> +

Makes sense, though now having seen this pattern a few times, I am
wondering if it would be worth it to add a utility function that takes a
callback and iterates over the various MIDXs. But perhaps that is taking
DRY-ing things up a little too far ;-).

For what it's worth, I do think that what you wrote here makes more
logical sense: MIDXs are tied to individual alternate object DBs, which
here means that there is one MIDX per "struct odb_source". It just is a
little more verbose to type out.

> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 0a4af199c05..7b51d381837 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -692,14 +692,16 @@ static int open_midx_bitmap(struct repository *r,
>  			    struct bitmap_index *bitmap_git)
>  {
>  	int ret = -1;
> -	struct multi_pack_index *midx;
>
>  	assert(!bitmap_git->map);
>
> -	for (midx = get_multi_pack_index(r); midx; midx = midx->next) {
> -		if (!open_midx_bitmap_1(bitmap_git, midx))
> +	odb_prepare_alternates(r->objects);
> +	for (struct odb_source *source = r->objects->sources; source; source = source->next) {
> +		struct multi_pack_index *midx = get_multi_pack_index(source);
> +		if (midx && !open_midx_bitmap_1(bitmap_git, midx))
>  			ret = 0;
>  	}

Makes sense.

> +

Stray diff?

> @@ -3307,9 +3309,15 @@ int verify_bitmap_files(struct repository *r)
>  {
>  	int res = 0;
>
> -	for (struct multi_pack_index *m = get_multi_pack_index(r);
> -	     m; m = m->next) {
> -		char *midx_bitmap_name = midx_bitmap_filename(m);
> +	odb_prepare_alternates(r->objects);
> +	for (struct odb_source *source = r->objects->sources; source; source = source->next) {
> +		struct multi_pack_index *m = get_multi_pack_index(source);
> +		char *midx_bitmap_name;
> +
> +		if (!m)
> +			continue;
> +
> +		midx_bitmap_name = midx_bitmap_filename(m);

This one looks good as well.

> diff --git a/packfile.c b/packfile.c
> index e5d9d7ac8bc..e1ced050451 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -963,14 +963,17 @@ static void prepare_packed_git(struct repository *r);
>  unsigned long repo_approximate_object_count(struct repository *r)
>  {
>  	if (!r->objects->approximate_object_count_valid) {
> -		unsigned long count;
> -		struct multi_pack_index *m;
> +		unsigned long count = 0;
>  		struct packed_git *p;
>
>  		prepare_packed_git(r);

I was wondering where the odb_prepare_alternates() call was in this one,
but it is a byproduct of calling prepare_packed_git(). So this spot
looks OK as well.

> -		count = 0;
> -		for (m = get_multi_pack_index(r); m; m = m->next)
> -			count += m->num_objects;
> +
> +		for (struct odb_source *source = r->objects->sources; source; source = source->next) {
> +			struct multi_pack_index *m = get_multi_pack_index(source);
> +			if (m)
> +				count += m->num_objects;

Unrelated to your patch, I wonder if it makes sense to check for
overflow here. I think so, though presumably if we are overflowing
"unsigned long" then likely we have much bigger problems to worry about
;-).

The rest looks good to me.

Thanks,
Taylor




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux