== Handling renames == In the merge machinery when we have a rename of a path A -> B, processing that rename needs to remove path A, and make sure that path B has the relevant information. Note that if the content was also modified on both sides, this may mean that we have 3 different stages that need to be stored at path B instead of having some stored at path A. Having all stages stored at path B makes it much easier for users to investigate and resolve the content conflict associated with a renamed path. For example: * "git status" doesn't have to figure out how to list paths A & B and attempt to connect them for users; it can just list path B. * Users can use "git ls-files -u B" (instead of trying to find the previous name of the file so they can list both, i.e. "git ls-files -u A B") * Users can resolve via "git add B" (without needing to "git rm A") == What break detection would mean == If break detection were supported, we might have cases where A -> B *and* C -> A, meaning that both rename pairs might believe they need to update A. In particular, the processing of A -> B would need to be careful to not clear out all stages of A and mark it resolved, while both renames would need to figure out which stages of A belong with A and which belong with B, so that both paths have the right stages associated with them. merge-ort (like merge-recursive before it) makes no attempt to handle break detection; it runs with break detection turned off. It would need to be retrofitted to handle such cases. == Directory rename detection == If one side of history renames directory D/ -> E/, and the other side of history adds new files to E/, then directory rename detection notices and suggests moving those new files to E/. A similar thing is done for paths renamed into D/, causing them to be transitively renamed into E/. The default in the merge machinery is to report a conflict whenever a directory rename might modify the location of a path, so that users can decide whether they wanted the original path or the directory-rename-induced location. However, that means the default codepath still runs through all the directory rename detection logic, it just supplements it with providing conflict notices when it is done. == Building up increasingly complex testcases == I'll start with a really simple directory rename example, and then slowly add twists that explain new pieces until we get to the problematic cases: === Testcase 1 === Let's start with a concrete example, where particular files/directories of interest that exist or are changed on each side are called out: Original: <nothing of note> our side: rename B/file -> C/file their side: rename C/ -> A/ For this case, we'd expect to see the original B/file appear not at C/file but at A/file. (We would also expect a conflict notice that the user will want to choose between C/file and A/file, but I'm going to ignore conflict notices from here on by assuming merge.directoryRenames is set to `true` rather than `conflict`; the only difference that assumption makes is whether that makes the merge be considered to be conflicted and whether it prints a conflict notice; what is written to the index or working directory is unchanged.) === Testcase 2 === Modify testcase 1 by having A/file exist from the start: Original: A/file exists our side: rename B/file -> C/file their side: rename C/ -> A/ In such a case, to avoid user confusion at what looks kind of like an add/add conflict (even though the original path at A/file was not added by either side of the merge), we turn off directory rename detection for this path and print a "in the way" warning to the user: CONFLICT (implicit dir rename): Existing file/dir ... in the way ... The testcases in section 5 of t6423 explore these in more detail. === Testcase 3 === Let's modify testcase 1 in a slightly different way: have A/file be added by their side rather than it already existing. Original: <nothing of note> our side: rename B/file -> C/file their side: rename C/ -> A/ add A/file In this case, the directory rename detection basically transforms our side's original B/file -> C/file into a B/file -> A/file, and so we get a rename/add conflict, with one version of A/file coming from the renamed file, and another coming from the new A/file, each stored as stages 2 and 3 in conflicts. This kind of add/add conflict is perhaps slightly more complex than a regular add/add conflict, but with the printed messages it makes sense where it came from and we have different stages of the file to work with to resolve the conflict. === Testcase 4 === Let's do something similar to testcase 3, but have the opposite side of history add A/file: Original: <nothing of note> our side: rename B/file -> C/file add A/file their side: rename C/ -> A/ Now if we allow directory rename detection to modify C/file to A/file, then we also get a rename/add conflict, but in this case we'd need both higher order stages being recorded on side 2, which makes no sense. The index can't store multiple stage 2 entries, and even if we could, it would probably be confusing for users to work with. So, similar to what we do when there was an A/file in the original version, we simply turn off directory rename detection for cases like this and provide the "in the way" CONFLICT notice to the user. === Testcase 5 === We're slowly getting closer. Let's mix it up by having A/file exist at the beginning but not exist on their side: original: A/file exists our side: rename B/file -> C/file their side: rename C/ -> A/ rename A/file -> D/file For this case, you could say that since A/file -> D/file, it's no longer in the way of C/file being moved by directory rename detection to A/file. But that would give us a case where A/file is both the source and the target of a rename, similar to break detection, which the code isn't currently equipped to handle. This is not yet the case that causes current failures; to the current code, this kind of looks like testcase 4 in that A/file is in the way on our side (since A/file was in the original and was umodified by our side). So, it results in a "in the way" notification with directory rename detection being turned off for A/file so that B/file ends up at C/file. Perhaps the resolution could be improved in the future, but our "in the way" checks prevented such problems by noticing that A/file exists on our side and thus turns off directory rename detection from affecting C/file's location. So, while the merge result could be perhaps improved, the fact that this is currently handled by giving the user an "in the way" message gives the user a chance to resolve and prevents the code from tripping itself up. === Testcase 6 === Let's modify testcase 5 a bit more, to also delete A/file on our side: original: A/file exists our side: rename B/file -> C/file delete A/file their side: rename C/ -> A/ rename A/file -> D/file Now the "in the way" logic doesn't detect that there's an A/file in the way (neither side has an A/file anymore), so it's fine to transitively rename C/file further to A/file...except that we end up with A/file being both the source of one rename, and the target of a different rename. Each rename pair tries to handle the resolution of the source and target paths of its own rename. But when we go to process the second rename pair in process_renames(), we do not expect either the source or the destination to be marked as handled already; so, when we hit the sanity checks that these are not handled: VERIFY_CI(oldinfo); VERIFY_CI(newinfo); then one of these is going to throw an assertion failure since the previous rename pair already marked both of its paths as handled. This will give us an error of the form: git: merge-ort.c:3007: process_renames: Assertion `newinfo && !newinfo->merged.clean' failed. This is the failure we're currently triggering, and it fundamentally depends on: * a path existing in the original * that original path being removed or renamed on *both* sides * some kind of directory rename moving some *other* path into that original path This was added as testcase 12q in t6423. === Testcase 7 === Bonus bug found while investigating! Let's go back to the comparison between testcases 2 & 3, and set up a file present on their side that we need to consider: Original: A/file exists our side: rename B/file -> C/file rename A/file -> D/file their side: rename C/ -> A/ Here, there is no A/file in the way on our side like testcase 4. There is an A/file present on their side like testcase 3, which was an add/add conflict, but that's associated with the file be renamed to D/file. So, that really shouldn't be an add/add conflict because we instead want all modes of the original A/file to be transported to D/file. Unfortunately, the current code kind of treats it like an add/add conflict instead...but even worse. There is also a valid mode for A/file in the original, which normally goes to stage 1. However, an add/add conflict should be represented in the index with no mode at stage 1 (for the original side), only modes at stages 2 and 3 (for our and their side), so for an add/add we'd expect that mode for A/file in the original version to be cleared out (or be transported to D/file). Unfortunately, the code currently leaves not only the stage 3 entry for A/file intact, it also leaves the stage 1 entry for A/file. This results in `git ls-files -u A/file` output of the form: 100644 d00491fd7e5bb6fa28c517a0bb32b8b506539d4d 1 A/file 100644 0cfbf08886fca9a91cb753ec8734c84fcbe52c9f 2 A/file 100644 d00491fd7e5bb6fa28c517a0bb32b8b506539d4d 3 A/file This would likely cause users to believe this isn't an add/add conflict; rather, this would lead them to believe that A/file was only modified on our side and that therefore it should not have been a conflict in the first place. And while resolving the conflict in favor of our side is the correct resolution (because stages 1 and 3 should have been cleared out in the first place), this is certainly likely to cause confusion for anyone attempting to investigate why this path was marked as conflicted. This was added as testcase 12p in t6423. == Attempted solutions that I discarded == 1) For each side of history, create a strset of the sources of each rename on the other side of history. Then when using directory renames to modify existing renames, verify that we aren't renaming to a source of another rename. Unfortunately, the "relevant renames" optimization in merge-ort means we often don't detect renames -- we just see a delete and an add -- which is easy to forget and makes debugging testcases harder, but it also turns out that this solution in insufficient to solve the related problems in the area (more on that below). 2) Modify the code to be aware of the possibility of renaming to the source of another side's rename, and make all the conflict resolution logic for each case (including existing rename/rename(2to1) and rename/rename(1to2) cases) handle the additional complexity. It turns out there was much more code to audit than I wanted, for a really niche case. I didn't like how many changes were needed, and aborted. == Solution == We do not want the stages of unrelated files appearing at the same path in the index except when dealing with an add/add conflict. While we previously handled this for stages 2 & 3, we also need to worry about stage 1. So check for a stage 1 index entry being in the way of a directory rename. However, if we can detect that the stage 1 index entry is actually from a related file due to a directory-rename-causes-rename-to-self situation, then we can allow the stage 1 entry to remain.