Re: [PATCH] promisor-remote: remove the promisor object check for failed fetch

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 29 May 2025 08:23:17 -0700

Han Young <hanyang.tony@xxxxxxxxxxxxx> writes:

> If the promisor objects fail to fetch, we check the remaining objects
> to see if they are indeed promisor objects. Then, we die on the first
> remaining promisor object. However, this promisor object check is 
> unnecessary because callers of promisor_remote_get_direct already filter
> out local objects. All objects passed to promisor_remote_get_direct are
> promisor objects.
>
> The is_promisor_object check essentially iterates through every object
> in the local packfiles and adds them to an oid set. This process is
> agonizingly slow for large repositories.
> Remove the check so that we fail immediately.

Thanks for CC'ing Christian and Jonathan Tan, who may indeed be good
reviewers for this change.

let me think aloud a bit, but because I haven't looked at this part
of the system for quite some time, what I mumble may not make much
sense.  Please bear with me.

So the incoming list remaining_oids[] is what was already filtered
by the caller and everything in it should be promisor?  If so ...

> -	for (i = 0; i < remaining_nr; i++) {
> -		if (is_promisor_object(repo, &remaining_oids[i]))
> -			die(_("could not fetch %s from promisor remote"),
> -			    oid_to_hex(&remaining_oids[i]));

... the first iteration of this loop, after is_promisor_object()
says the object is a promisor object (which is guaranteed if
everybody in that remaining_oids[] array is promisor object), will
die immediately.  So your "agonizingly slow" comes from just one
single call to is_promisor_object() function is ultra slow?

If that is the case, the patch, including the above explanation,
makes perfect sense (note that I didn't verify the "everybody in the
remaining_oids[] array is guaranteed to be a promisor object" claim
myself, though).

But at the same time, it sounds like is_promisor_object() seriously
is wrong.  Perhaps we need to tell pack-objects to pre-compute the
packfile.c:add_promisor_object() stuff and cache the result in an
on-disk file, just like reverse index is stored in an auxiliary
file?

What do the callers of the function use to "filter out local
objects" to ensure that "all objects passed ... are promisor
objects" do?  Have they already spent agonizingly large amount of
time to do so?  I sampled a few existing callers but they do not
explicitly restrict what they throw at the function to promisor
objects---they prepare an oid array, throw anything they do not see
locally in the array and call this function.  So objects in the
array are either promisor objects or missing due to repository
corruption---we simply cannot tell.  I suspect that the claim
"everything the caller calls the function with is a promisor object"
is not exactly correct.

The end-result when remaining_nr is not zero (i.e., some objects
that the caller wanted us to be fetched) would not exactly be the
same.  The function used to die only when the object we failed to
obtain was what a promisor remote promised to give us.  With this
change, we also die when an object the caller asked us to fetch is
not promised by any promisor.  I do not know what the implication
of this behaviour change would be.  Reviewers, comments?

> +	if (remaining_nr) {
> +		die(_("could not fetch %s from promisor remote"),
> +			oid_to_hex(&remaining_oids[0]));
>  	}