回复: [BUG] git apply misplaces patch when similar code fragments exist in the same file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Junio,

> This hunk shown here may be a fabrication (it only has two
> pre-context but three post-context lines, which is unusual unless
> the shorter one is at the end of the file),

Yes, you are right. The example I showed earlier was man-made, in order to illustrate the issue more clearly. The real case is longer, and I will show it at the end of this mail.

The scenario where I observed this bug is as follows:  
I pick two adjacent commits A (earlier) and B (later) from Git history. Using A as the base, I extract the changes introduced by B into a patch. Then I split this patch into two smaller patches by dividing hunks randomly, and apply them separately onto A to create two branches. Finally, I merge these two branches, and compare the merged result against commit B to verify merge correctness.

In this process, each branch’s patch is directly cut from B’s patch. The line numbers in each hunk still describe the positions relative to B vs. A. However, since each branch only carries a subset of B’s hunks, the line numbers may no longer correspond exactly to the right positions. As a result, Git falls back to fuzzy matching, and in cases where there are similar code fragments, the patch may be applied to the wrong location.

> After all, the receiving end can make independent
> changes that happen to match the preimage the patch is looking for,
> no matter how wide the context you pick when you generate your
> patch.

I fully agree with your point: no matter how much context we include, we cannot absolutely guarantee precise matching, because the branches may have independent changes. Therefore, I think your suggestion of emitting warnings in potentially ambiguous situations would be very helpful.

Below is a real example I observed:

Branch 1 change:
@@ -131,7 +145,7 @@ public class ExcelWriter {
     }
 
     /**
-     * Write data to a sheet
+     * Write data(List<List<String>>) to a sheet
      * @param data  Data to be written
      * @param sheet Write to this sheet
      * @param table Write to this table

Branch 2 change:
@@ -156,7 +170,7 @@ public class ExcelWriter {
     }
 
     /**
-     * Write data to a sheet
+     * Write data(List<List<Object>>) to a sheet
      * @param data  Data to be written
      * @param sheet Write to this sheet
      * @param table Write to this table

Merging these two branches and comparing the result with commit B using `git diff` gives:

diff --git a/src/main/java/com/alibaba/excel/ExcelWriter.java b/src/main/java/com/alibaba/excel/ExcelWriter.java
index e4061e0e..91c34a63 100644
--- a/src/main/java/com/alibaba/excel/ExcelWriter.java
+++ b/src/main/java/com/alibaba/excel/ExcelWriter.java
@@ -145,7 +145,7 @@ public class ExcelWriter {
     }
 
     /**
-     * Write data(List<List<String>>) to a sheet
+     * Write data(List<List<Object>>) to a sheet
      * @param data  Data to be written
      * @param sheet Write to this sheet
      * @param table Write to this table
@@ -170,7 +170,7 @@ public class ExcelWriter {
     }
 
     /**
-     * Write data(List<List<Object>>) to a sheet
+     * Write data(List<List<String>>) to a sheet
      * @param data  Data to be written
      * @param sheet Write to this sheet
      * @param table Write to this table

As shown, the comment intended for line 145 (`List<List<String>>`) was incorrectly applied at line 170, and the one intended for line 170 (`List<List<Object>>`) was misplaced at line 145. This demonstrates how fuzzy matching can misapply patches when similar fragments exist.

Thanks,
Cori
________________________________________
发件人: Junio C Hamano <gitster@xxxxxxxxx>
发送时间: 2025年9月13日 0:48
收件人: Guo Tingsheng <CoriCraft16@xxxxxxxxxxx>
抄送: git@xxxxxxxxxxxxxxx <git@xxxxxxxxxxxxxxx>
主题: Re: [BUG] git apply misplaces patch when similar code fragments exist in the same file
 
Guo Tingsheng <CoriCraft16@xxxxxxxxxxx> writes:

> 2. In another branch, Commit_2 introduces additional import statements before HeaderComponent, shifting its return statement further down (around line 10). 
>    In Commit_2, the button text in HeaderComponent is modified as follows:
>
>    @@ -10,6 +10,6 @@
>         return `
>             <div class="layout-section">
>    -            <button>Click Me</button>
>    +            <button>点击</button>
>             </div>
>         `;
>     }

This hunk shown here may be a fabrication (it only has two
pre-context but three post-context lines, which is unusual unless
the shorter one is at the end of the file), but in any case, a patch
hunk above is applied to a location that has exactly these lines:

        return `
            <div class="layout-section">
                <button>Click Me</button>
            </div>
        `;
    }

that is the closest to line #10.

If there are more than one places in the target file that the
preimage (i.e. the context lines that are shown with " " at the
beginning, and the preimage lines that are shown with "-" at the
beginning) would match, the patch is ambiguous.

It is very much expected, depending on what other changes have
happened to the target file since they diverged to make the matching
places move from the original place, it would be applied to a
"wrong" place by chance, as the preimage does not uniquely identify
where the patch hunk should to be applied in such a situation.

You can generate a patch with wider context if you can _anticipate_
the issue (for example, you may _know_ that commit-1 already had
multiple lines that match the preimage in the hunk before running
"git diff" or "git format-patch") to give it a better chance to be
unambiguous, e.g. "git diff -U8".  But in general it is impossible
to guarantee that your preimage in the hunk will be and stay to be
unambiguous.  After all, the receiving end can make independent
changes that happen to match the preimage the patch is looking for,
no matter how wide the context you pick when you generate your
patch.

It might be a good starter project for aspiring Git developers to
teach "git apply" to notice this situation and warn about it.  The
tool cannot by definition to always pick the right place to patch,
but the tool should be able to recognise a situation where a patch
hunk is ambiguous and can apply to multiple places in the target and
let you know about it.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux