[no subject]

**Date** **Thread**

== Handling renames ==

In the merge machinery when we have a rename of a path A -> B,
processing that rename needs to remove path A, and make sure that path B
has the relevant information.  Note that if the content was also
modified on both sides, this may mean that we have 3 different stages
that need to be stored at path B instead of having some stored at path
A.

Having all stages stored at path B makes it much easier for users to
investigate and resolve the content conflict associated with a renamed
path.  For example:
  * "git status" doesn't have to figure out how to list paths A & B and
    attempt to connect them for users; it can just list path B.
  * Users can use "git ls-files -u B" (instead of trying to find the
    previous name of the file so they can list both, i.e. "git ls-files
    -u A B")
  * Users can resolve via "git add B" (without needing to "git rm A")

== What break detection would mean ==

If break detection were supported, we might have cases where A -> B
*and* C -> A, meaning that both rename pairs might believe they need to
update A.  In particular, the processing of A -> B would need to be
careful to not clear out all stages of A and mark it resolved, while
both renames would need to figure out which stages of A belong with A
and which belong with B, so that both paths have the right stages
associated with them.

merge-ort (like merge-recursive before it) makes no attempt to handle
break detection; it runs with break detection turned off.  It would
need to be retrofitted to handle such cases.

== Directory rename detection ==

If one side of history renames directory D/ -> E/, and the other side of
history adds new files to E/, then directory rename detection notices
and suggests moving those new files to E/.  A similar thing is done for
paths renamed into D/, causing them to be transitively renamed into E/.

The default in the merge machinery is to report a conflict whenever a
directory rename might modify the location of a path, so that users can
decide whether they wanted the original path or the
directory-rename-induced location.  However, that means the default
codepath still runs through all the directory rename detection logic, it
just supplements it with providing conflict notices when it is done.

== Building up increasingly complex testcases ==

I'll start with a really simple directory rename example, and then
slowly add twists that explain new pieces until we get to the
problematic cases:

=== Testcase 1 ===

Let's start with a concrete example, where particular files/directories of
interest that exist or are changed on each side are called out:

  Original:   <nothing of note>
  our side:   rename B/file -> C/file
  their side: rename C/     -> A/

For this case, we'd expect to see the original B/file appear not at
C/file but at A/file.

(We would also expect a conflict notice that the user will want to
choose between C/file and A/file, but I'm going to ignore conflict
notices from here on by assuming merge.directoryRenames is set to
`true` rather than `conflict`; the only difference that assumption
makes is whether that makes the merge be considered to be conflicted
and whether it prints a conflict notice; what is written to the index
or working directory is unchanged.)

=== Testcase 2 ===

Modify testcase 1 by having A/file exist from the start:

  Original:   A/file exists
  our side:   rename B/file -> C/file
  their side: rename C/     -> A/

In such a case, to avoid user confusion at what looks kind of like an
add/add conflict (even though the original path at A/file was not added
by either side of the merge), we turn off directory rename detection for
this path and print a "in the way" warning to the user:
    CONFLICT (implicit dir rename): Existing file/dir ... in the way ...
The testcases in section 5 of t6423 explore these in more detail.

=== Testcase 3 ===

Let's modify testcase 1 in a slightly different way: have A/file be
added by their side rather than it already existing.

  Original:   <nothing of note>
  our side:   rename B/file -> C/file
  their side: rename C/     -> A/
              add A/file

In this case, the directory rename detection basically transforms our
side's original B/file -> C/file into a B/file -> A/file, and so we
get a rename/add conflict, with one version of A/file coming from the
renamed file, and another coming from the new A/file, each stored as
stages 2 and 3 in conflicts.  This kind of add/add conflict is perhaps
slightly more complex than a regular add/add conflict, but with the
printed messages it makes sense where it came from and we have
different stages of the file to work with to resolve the conflict.

=== Testcase 4 ===

Let's do something similar to testcase 3, but have the opposite side of
history add A/file:

  Original:   <nothing of note>
  our side:   rename B/file -> C/file
              add    A/file
  their side: rename C/     -> A/

Now if we allow directory rename detection to modify C/file to A/file,
then we also get a rename/add conflict, but in this case we'd need both
higher order stages being recorded on side 2, which makes no sense.  The
index can't store multiple stage 2 entries, and even if we could, it
would probably be confusing for users to work with.  So, similar to what
we do when there was an A/file in the original version, we simply turn
off directory rename detection for cases like this and provide the "in
the way" CONFLICT notice to the user.

=== Testcase 5 ===

We're slowly getting closer.  Let's mix it up by having A/file exist at
the beginning but not exist on their side:

  original:   A/file exists
  our side:   rename B/file -> C/file
  their side: rename C/     -> A/
              rename A/file -> D/file

For this case, you could say that since A/file -> D/file, it's no longer
in the way of C/file being moved by directory rename detection to
A/file.  But that would give us a case where A/file is both the source
and the target of a rename, similar to break detection, which the code
isn't currently equipped to handle.

This is not yet the case that causes current failures; to the current
code, this kind of looks like testcase 4 in that A/file is in the way
on our side (since A/file was in the original and was umodified by our
side).  So, it results in a "in the way" notification with directory
rename detection being turned off for A/file so that B/file ends up at
C/file.

Perhaps the resolution could be improved in the future, but our "in
the way" checks prevented such problems by noticing that A/file exists
on our side and thus turns off directory rename detection from
affecting C/file's location.  So, while the merge result could be
perhaps improved, the fact that this is currently handled by giving
the user an "in the way" message gives the user a chance to resolve
and prevents the code from tripping itself up.

=== Testcase 6 ===

Let's modify testcase 5 a bit more, to also delete A/file on our side:

  original:   A/file exists
  our side:   rename B/file -> C/file
              delete A/file
  their side: rename C/     -> A/
              rename A/file -> D/file

Now the "in the way" logic doesn't detect that there's an A/file in
the way (neither side has an A/file anymore), so it's fine to
transitively rename C/file further to A/file...except that we end up
with A/file being both the source of one rename, and the target of a
different rename.  Each rename pair tries to handle the resolution of
the source and target paths of its own rename.  But when we go to
process the second rename pair in process_renames(), we do not expect
either the source or the destination to be marked as handled already;
so, when we hit the sanity checks that these are not handled:

    VERIFY_CI(oldinfo);
    VERIFY_CI(newinfo);

then one of these is going to throw an assertion failure since the
previous rename pair already marked both of its paths as handled.
This will give us an error of the form:

    git: merge-ort.c:3007: process_renames: Assertion `newinfo && !newinfo->merged.clean' failed.

This is the failure we're currently triggering, and it fundamentally
depends on:
  * a path existing in the original
  * that original path being removed or renamed on *both* sides
  * some kind of directory rename moving some *other* path into that
    original path

This was added as testcase 12q in t6423.

=== Testcase 7 ===

Bonus bug found while investigating!

Let's go back to the comparison between testcases 2 & 3, and set up a
file present on their side that we need to consider:

  Original:   A/file exists
  our side:   rename B/file -> C/file
              rename A/file -> D/file
  their side: rename C/     -> A/

Here, there is no A/file in the way on our side like testcase 4.
There is an A/file present on their side like testcase 3, which was an
add/add conflict, but that's associated with the file be renamed to
D/file.  So, that really shouldn't be an add/add conflict because we
instead want all modes of the original A/file to be transported to
D/file.

Unfortunately, the current code kind of treats it like an add/add
conflict instead...but even worse.  There is also a valid mode for
A/file in the original, which normally goes to stage 1.  However, an
add/add conflict should be represented in the index with no mode at
stage 1 (for the original side), only modes at stages 2 and 3 (for our
and their side), so for an add/add we'd expect that mode for A/file in
the original version to be cleared out (or be transported to D/file).

Unfortunately, the code currently leaves not only the stage 3 entry
for A/file intact, it also leaves the stage 1 entry for A/file.  This
results in `git ls-files -u A/file` output of the form:

    100644 d00491fd7e5bb6fa28c517a0bb32b8b506539d4d 1	A/file
    100644 0cfbf08886fca9a91cb753ec8734c84fcbe52c9f 2	A/file
    100644 d00491fd7e5bb6fa28c517a0bb32b8b506539d4d 3	A/file

This would likely cause users to believe this isn't an add/add
conflict; rather, this would lead them to believe that A/file was only
modified on our side and that therefore it should not have been a
conflict in the first place.  And while resolving the conflict in
favor of our side is the correct resolution (because stages 1 and 3
should have been cleared out in the first place), this is certainly
likely to cause confusion for anyone attempting to investigate why
this path was marked as conflicted.

This was added as testcase 12p in t6423.

== Attempted solutions that I discarded ==

1) For each side of history, create a strset of the sources of each
   rename on the other side of history.  Then when using directory
   renames to modify existing renames, verify that we aren't renaming
   to a source of another rename.

   Unfortunately, the "relevant renames" optimization in merge-ort
   means we often don't detect renames -- we just see a delete and an
   add -- which is easy to forget and makes debugging testcases harder,
   but it also turns out that this solution in insufficient to solve
   the related problems in the area (more on that below).

2) Modify the code to be aware of the possibility of renaming to
   the source of another side's rename, and make all the conflict
   resolution logic for each case (including existing
   rename/rename(2to1) and rename/rename(1to2) cases) handle the
   additional complexity.  It turns out there was much more code to
   audit than I wanted, for a really niche case.  I didn't like how
   many changes were needed, and aborted.

== Solution ==

We do not want the stages of unrelated files appearing at the same path
in the index except when dealing with an add/add conflict.  While we
previously handled this for stages 2 & 3, we also need to worry about
stage 1.  So check for a stage 1 index entry being in the way of a
directory rename.

However, if we can detect that the stage 1 index entry is actually from
a related file due to a directory-rename-causes-rename-to-self
situation, then we can allow the stage 1 entry to remain.