Re: [PATCH v2 02/16] odb: move list of packfiles into `struct packfile_store`

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 21, 2025 at 09:39:00AM +0200, Patrick Steinhardt wrote:
> The object database tracks the list of packfiles it currently knows
> about. With the introduction of the `struct packfile_store` we have a
> better place to host this list though.
>
> Move the list accordingly. Extract the logic from `odb_clear()` that
> knows to close all such packfiles and move it into the new subsystem, as
> well.

Not a comment on this patch itself, but as a meta-comment, I really
appreciate you taking such an incremental approach here. The packfile
machinery is quite fragile in my experience, so breaking it up into (what
are so far) easily review-able chunks makes it much easier to build
confidence in the correctness of these changes.

> Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
> ---
>  odb.c      | 11 +----------
>  odb.h      |  1 -
>  packfile.c | 47 ++++++++++++++++++++++++++++++-----------------
>  packfile.h | 15 ++++++++++++++-
>  4 files changed, 45 insertions(+), 29 deletions(-)
>
> diff --git a/odb.c b/odb.c
> index 34b70d0074..17a9135cbd 100644
> --- a/odb.c
> +++ b/odb.c
> @@ -1038,16 +1038,7 @@ void odb_clear(struct object_database *o)
>
>  	INIT_LIST_HEAD(&o->packed_git_mru);
>  	close_object_store(o);
> -
> -	/*
> -	 * `close_object_store()` only closes the packfiles, but doesn't free
> -	 * them. We thus have to do this manually.
> -	 */
> -	for (struct packed_git *p = o->packed_git, *next; p; p = next) {
> -		next = p->next;
> -		free(p);
> -	}
> -	o->packed_git = NULL;
> +	packfile_store_free(o->packfiles);

Interesting. The movement of the for-loop here all looks correct to me.
But I think the new packfile_store is creating a new awkardness here
that we should consider.

In existing implementation, all of the ->next pointers here point to
heap locations that have already been free()'d. But that's OK, since
they aren't reachable at the moment that we do "o-packed_store = NULL".

Having a separate packfile_store changes that, since (from my reading of
the code) o->packfiles will still be non-NULL even after calling
odb_clear(), *and* those pointers will refer to free'd heap locations.

That seems like a potential footgun to me. I think that we could either:

 * Change packfile_store_free() to take in an object_database pointer,
   and NULL out the ->packs pointer after free'ing all of the packfiles.
   That would make it more similar to the existing behavior.

 * Leave packfile_store_free() as-is, document that it does NOT clear
   out the top-level pointer, and so callers are encouraged to NULL it
   out themselves after calling it. Likewise, we should change
   odb_clear() to do:

       packfile_store_free(o->packfiles);
       o->packfiles = NULL;

Let me know what you think.

>  	hashmap_clear(&o->pack_map);
>  	string_list_clear(&o->submodule_source_paths, 0);
> diff --git a/odb.h b/odb.h
> index 08c3a01f3b..6f901c5ac0 100644
> --- a/odb.h
> +++ b/odb.h
> @@ -130,7 +130,6 @@ struct object_database {
>  	 * should only be accessed directly by packfile.c
>  	 */
>  	struct packfile_store *packfiles;
> -	struct packed_git *packed_git;

Makes sense.

>  	/* A most-recently-used ordered version of the packed_git list. */
>  	struct list_head packed_git_mru;
>
> diff --git a/packfile.c b/packfile.c
> index 8fbf1cfc2d..6478e4cc30 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -278,7 +278,7 @@ static int unuse_one_window(struct packed_git *current)
>
>  	if (current)
>  		scan_windows(current, &lru_p, &lru_w, &lru_l);
> -	for (p = current->repo->objects->packed_git; p; p = p->next)
> +	for (p = current->repo->objects->packfiles->packs; p; p = p->next)

Not a huge deal, but I do find "current->repo->objects->packfiles->packs"
to be a bit unfortunate. I wonder if we should rename "packs" to "head"
or "list_head" or similar since it's clear from
"current->repo->objects->packfiles" that this is a list of packfiles.

>  		scan_windows(p, &lru_p, &lru_w, &lru_l);
>  	if (lru_p) {
>  		munmap(lru_w->base, lru_w->len);
> @@ -362,13 +362,8 @@ void close_pack(struct packed_git *p)
>  void close_object_store(struct object_database *o)
>  {
>  	struct odb_source *source;
> -	struct packed_git *p;
>
> -	for (p = o->packed_git; p; p = p->next)
> -		if (p->do_not_close)
> -			BUG("want to close pack marked 'do-not-close'");
> -		else
> -			close_pack(p);
> +	packfile_store_close(o->packfiles);

Looks good.

> @@ -468,7 +463,7 @@ static int close_one_pack(struct repository *r)
>  	struct pack_window *mru_w = NULL;
>  	int accept_windows_inuse = 1;
>
> -	for (p = r->objects->packed_git; p; p = p->next) {
> +	for (p = r->objects->packfiles->packs; p; p = p->next) {

Likewise.

> @@ -2344,5 +2339,23 @@ struct packfile_store *packfile_store_new(struct object_database *odb)
>
>  void packfile_store_free(struct packfile_store *store)
>  {
> +	packfile_store_close(store);

Seeing a call to packfile_store_close() here was a little surprising to
me. The code that you are moving has a comment that says:

   * `close_object_store()` only closes the packfiles, but doesn't free
   * them. We thus have to do this manually.

, so I would have expected to preserve that behavior. I *think* that
this happens to be OK, since close_pack() is a noop if it is called more
than once (though I had to double check through all of its leaf
functions that that was indeed the case).

I would probably strike this from the new function, since the sole
caller above already calls close_object_store() before calling
packfile_store_free().

> +
> +	for (struct packed_git *p = store->packs, *next; p; p = next) {
> +		next = p->next;
> +		free(p);
> +	}
> +
>  	free(store);
>  }

This part looks good.

> +void packfile_store_close(struct packfile_store *store)
> +{
> +	struct packed_git *p;
> +
> +	for (p = store->packs; p; p = p->next)
> +		if (p->do_not_close)
> +			BUG("want to close pack marked 'do-not-close'");
> +		else
> +			close_pack(p);
> +}

And likewise this looks good to me. I do find the braceless for-loop a
little hard to read, but it's (a) correct, and (b) consistent with the
original implementation, so I don't feel strongly about changing it.

As a side-note, you could inline the declaration of "p" here into the
for-loop, but I can understand not wanting to to make the diff more
readable with --color-moved.

> diff --git a/packfile.h b/packfile.h
> index 8d31fd619a..d7ac8d24b4 100644
> --- a/packfile.h
> +++ b/packfile.h

The rest looks good to me.

Thanks,
Taylor




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux