Re: [PATCH v2 3/3] index-pack: allow revisiting REF_DELTA chains

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/6/2025 10:08 PM, Taylor Blau wrote:
> On Mon, Apr 28, 2025 at 08:24:45PM +0000, Derrick Stolee via GitGitGadget wrote:
>> The crux of the matter is how the algorithm works when the REF_DELTAs
>> point to base objects that exist in the local repository.
> 
> Hmm. I'm having trouble squaring this with these next two paragraphs:
> 
>> Consider the case where the packfile has two REF_DELTA objects, A and B,
>> and the delta chain looks like "A depends on B" and "B depends on C" for
>> some third object C, where C is already in the current repository. The
>> algorithm _should_ start with all objects that depend on C, finding B,
>> and then moving on to all objects depending on B, finding A.
>>
>> However, if the repository also already has object B, then the delta
>> chain can be analyzed in a different order. The deltas with base B can
>> be analyzed first, finding A, and then the deltas with base C are
>> analyzed, finding B. The algorithm currently continues to look for
>> objects that depend on B, finding A again. This fails due to A's
>> 'real_type' member already being overwritten from OBJ_REF_DELTA to the
>> correct object type.
> 
> ISTM that this A->B->C chain is a problem because (in the above example)
> the server sent B as a REF_DELTA base for A but also had its own
> pre-existing copy of B.
> 
> But the first quoted sentence suggests that the issue is with REF_DELTAs
> that point to base objects that exist in the local repository. Does
> "point to" mean that the REF_DELTA's base is the local object, or that
> the local object itself was sent as a REF_DELTA against some other base?

The issue is that based on the negotiation, the server knows the client
wants A and has C but doesn't think the client has B despite the client
having B.

If the server sends these two objects as a delta chain A->B->C using
REF_DELTAs (and not the typical OFS_DELTA from A to B) then the client
may inflate A first and B second (and then try to inflate A again).

> I haven't fully wrapped my head around the implications of this all yet,
> but I think that it's the former, though admittedly even typing this I
> am not quite sure of myself. I *think* that doing this is OK if the only
> path from base objects to their corresponding deltas is unique, and/or
> there were no such paths at all.
> 
> I'm trying to think through the implications of this against my
> series[1] from a while ago that converts OFS_DELTAs that weren't usable
> as part of verbatim pack-reuse into REF_DELTAs. There are two cases
> there that I was considering:
> 
>   - For some (delta, base) pair in a pack, there was an additional copy
>     of 'base' in some other pack, and the MIDX chose the copy from that
>     pack. That forms what I'm calling a "cross-pack" delta. This isn't
>     currently reusable as an OFS_DELTA for a variety of reasons, but is
>     OK as a REF_DELTA provided we know that the client either already
>     has the base object or we are sending it as part of the pack anyway.

These are the kinds of deltas that could hit this problem.

>   - The other case is that we the client wants the delta-half of a
>     delta/base-pair, but not the base object. In this case, we can't
>     currently reuse the OFS_DELTA verbatim, but could if we converted it
>     to a REF_DELTA based on the knowledge that the client already has
>     the base object.

If we know the client has the base object, then not including the base in
the pack and referencing it as a REF_DELTA will always be OK.

> The latter is doable based on the information in the wants/haves bitmap.
> The process there looks like: determine that the client doesn't want the
> base object, realize that its bit is set in the 'haves' bitmap, and then
> convert the delta object from a OFS_DELTA to a REF_DELTA.
> 
> But I think that all breaks for older clients that don't meet the unique
> paths condition above. Does that sound right to you?
> 
> I think the cross-pack case is fine, provided we know ahead of time that
> the client doesn't have the (converted-to-REF_DELTA) delta object in its
> local copy.
> 
> Unfortunately, I think this means that [1] is a bit of a dead-end for
> serves that have older clients (running a version that does not include
> this patch). At least, I think that's true if we can construct a
> situation where the server sends a REF_DELTA that it thinks the client
> doesn't have but actually does. I'm not immediately sure what such a
> situation would look like beyond cases like: "the client verbatim asked
> for an object it already has, but isn't reachable from the set of
> provided 'haves'".

I agree that [1] is going to present more opportunities for this bug to
be hit by older clients, though I will admit that the frequency of the
problem seems to have odd frequencies. It relies upon the fact that the
fetch negotiation fails to identify "B" as a common object. This sort of
thing can happen more frequently when using the commit boundary between
the haves and wants (instead of walking from all haves like a bitmap walk
would do). If the client fails to advertise all of its haves, then that
could also lead to this problem even if the server is walking all objects
reachable from the haves.

The other thing that you'll have going for you is that cross-pack deltas
should be rare. The case that helped me discover this issue was where the
server did not send any OFS_DELTAs at all but used REF_DELTAs for all
delta relationships. (This is required if the client doesn't advertise
the "ofs-delta" capability, but we'd need to go _way_ back for a client
to not include that capability.)

Thanks,
-Stolee





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux