Re: [PATCH 07/13] builtin/index-pack: don't fetch promised objects for collision check

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Patrick Steinhardt <ps@xxxxxx> writes:

> On Wed, Apr 23, 2025 at 10:08:05AM -0700, Karthik Nayak wrote:
>> Patrick Steinhardt <ps@xxxxxx> writes:
>>
>> > Any packed objects indexed via git-index-pack(1) are subject to a
>> > collision check. This collision check has the intent to determine
>> > whether we already have an object with the same object ID, but different
>> > contents in the repository.
>> >
>> > The check whether the collision check is really needed is performed via
>> > `repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK)`. \
>> >
>>
>> Nit: this was a little confusing at first, until I saw the code. So what
>> this means is that the collision check is only performed, iff
>> `repo_has_object_file_with_flags(...)` returns true.
>>
>> I think the confusing part was 'is performed via', perhaps:
>>
>>   The collision check is only performed, if
>>   repo_has_object_file_with_flags(..., OBJECT_INFO_QUICK) returns a
>>   truthy value.
>>
>> But it is okay as is too!
>
> Will rephrase.
>
>> > But unless
>> > explicitly told otherwise via `OBJECT_INFO_SKIP_FETCH_OBJECT`, this
>> > function will also cause us to fetch the object ID in case it is part of
>> > a promisor pack. As such, we may end up fetching the object only to
>> > check whether the fetched object and the object that we're indexing have
>> > the same content.
>> >
>>
>> So us fetching the object is pointless, since we only care about the
>> 'does it exist' part and not really what it contains. In that case,
>> shouldn't this be s/same content/same oid/?
>
> No, it really checks for the same content. It basically verifies that
> any pair of objects that:
>
>   - Exist in the packfile that we're currently indexing.
>   - And preexists in the local repository.
>
> Actually have the same content.
>

Okay this makes sense, if they do have the same content, this is not
a collision. It is simply a duplicate.

> The weird part is that we also do this for objects that don't yet exist
> in the repository, but which are promised to us. This causes us to fetch
> them first only to verify that the fetched promised object has the same
> content as the packfile. And given that git-index-pack(1) would usually
> run after a fetch, we end up verifying that the fetched object obtained
> from the promisor is the same as the fetched object obtained from the
> packfile. Which ultimately seems rather dubious to me.
>

To clarify, the flow currently (simplified) is:

1. We check if a collision test is required, by checking if the new OID
already exists in the repository.
2. If collision test is required.
   a. Fetch and check the object type.
   b. Read the old object data.
   c. Compare the new object data and the old object data.
   d. Collision detected if there is a mismatch.

Currently, we fetch for promisor objects in #1, which is unnecessary
because we simply want to know if the object exists in the repository.
The actual check in #2.b would still fetch the promisor object (if that
flow is taken).

> Patrick

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux