On Mon, Apr 28, 2025 at 08:24:45PM +0000, Derrick Stolee via GitGitGadget wrote: > The crux of the matter is how the algorithm works when the REF_DELTAs > point to base objects that exist in the local repository. Hmm. I'm having trouble squaring this with these next two paragraphs: > Consider the case where the packfile has two REF_DELTA objects, A and B, > and the delta chain looks like "A depends on B" and "B depends on C" for > some third object C, where C is already in the current repository. The > algorithm _should_ start with all objects that depend on C, finding B, > and then moving on to all objects depending on B, finding A. > > However, if the repository also already has object B, then the delta > chain can be analyzed in a different order. The deltas with base B can > be analyzed first, finding A, and then the deltas with base C are > analyzed, finding B. The algorithm currently continues to look for > objects that depend on B, finding A again. This fails due to A's > 'real_type' member already being overwritten from OBJ_REF_DELTA to the > correct object type. ISTM that this A->B->C chain is a problem because (in the above example) the server sent B as a REF_DELTA base for A but also had its own pre-existing copy of B. But the first quoted sentence suggests that the issue is with REF_DELTAs that point to base objects that exist in the local repository. Does "point to" mean that the REF_DELTA's base is the local object, or that the local object itself was sent as a REF_DELTA against some other base? I haven't fully wrapped my head around the implications of this all yet, but I think that it's the former, though admittedly even typing this I am not quite sure of myself. I *think* that doing this is OK if the only path from base objects to their corresponding deltas is unique, and/or there were no such paths at all. I'm trying to think through the implications of this against my series[1] from a while ago that converts OFS_DELTAs that weren't usable as part of verbatim pack-reuse into REF_DELTAs. There are two cases there that I was considering: - For some (delta, base) pair in a pack, there was an additional copy of 'base' in some other pack, and the MIDX chose the copy from that pack. That forms what I'm calling a "cross-pack" delta. This isn't currently reusable as an OFS_DELTA for a variety of reasons, but is OK as a REF_DELTA provided we know that the client either already has the base object or we are sending it as part of the pack anyway. - The other case is that we the client wants the delta-half of a delta/base-pair, but not the base object. In this case, we can't currently reuse the OFS_DELTA verbatim, but could if we converted it to a REF_DELTA based on the knowledge that the client already has the base object. The latter is doable based on the information in the wants/haves bitmap. The process there looks like: determine that the client doesn't want the base object, realize that its bit is set in the 'haves' bitmap, and then convert the delta object from a OFS_DELTA to a REF_DELTA. But I think that all breaks for older clients that don't meet the unique paths condition above. Does that sound right to you? I think the cross-pack case is fine, provided we know ahead of time that the client doesn't have the (converted-to-REF_DELTA) delta object in its local copy. Unfortunately, I think this means that [1] is a bit of a dead-end for serves that have older clients (running a version that does not include this patch). At least, I think that's true if we can construct a situation where the server sends a REF_DELTA that it thinks the client doesn't have but actually does. I'm not immediately sure what such a situation would look like beyond cases like: "the client verbatim asked for an object it already has, but isn't reachable from the set of provided 'haves'". Thanks, Taylor [1]: https://lore.kernel.org/git/cover.1728505840.git.me@xxxxxxxxxxxx/