"Paulo Casaretto via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > From: Paulo Casaretto <pcasaretto@xxxxxxxxx> > > When comparing large commit ranges (e.g., 250,000+ commits), range-diff > attempts to allocate an n×n cost matrix that can exhaust available > memory. For example, with 256,784 commits (n = 513,568), the matrix > would require approximately 256GB of memory (513,568² × 4 bytes), > causing either immediate segmentation faults due to integer overflow or > system hangs. > > Add a memory limit check in get_correspondences() before allocating the > cost matrix. This check uses the total size in bytes (n² × sizeof(int)) > and compares it against a configurable maximum, preventing both > excessive memory usage and integer overflow issues. > > The limit is configurable via a new --max-memory option that accepts > human-readable sizes (e.g., "1G", "500M"). The default is 4GB for 64 bit > systems and 2GB for 32 bit systems. This allows comparing ranges of > approximately 32,000 (16,000) commits - generous for real-world use cases > while preventing impractical operations. > > When the limit is exceeded, range-diff now displays a clear error > message showing both the requested memory size and the maximum allowed, > formatted in human-readable units for better user experience. > > Example usage: > git range-diff --max-memory=1G branch1...branch2 > git range-diff --max-memory=500M base..topic1 base..topic2 > > This approach was chosen over alternatives: > - Pre-counting commits: Would require spawning additional git processes > and reading all commits twice > - Limiting by commit count: Less precise than actual memory usage > - Streaming approach: Would require significant refactoring of the > current algorithm > > This issue was previously discussed in: > https://lore.kernel.org/git/RFC-cover-v2-0.5-00000000000-20211210T122901Z-avarab@xxxxxxxxx/ > > Acked-by: Johannes Schindelin <johannes.schindelin@xxxxxx> > Signed-off-by: Paulo Casaretto <pcasaretto@xxxxxxxxx> > --- Looks good, especially without the reordering existing entries in the options list. The authorship information above looks much better, too. > @@ -40,6 +57,10 @@ int cmd_range_diff(int argc, > PARSE_OPT_OPTARG), > OPT_PASSTHRU_ARGV(0, "diff-merges", &diff_merges_arg, > N_("style"), N_("passed to 'git log'"), 0), > + OPT_CALLBACK(0, "max-memory", &range_diff_opts.max_memory, > + N_("size"), > + N_("maximum memory for cost matrix (default 4G)"), > + parse_max_memory), > OPT_PASSTHRU_ARGV(0, "remerge-diff", &diff_merges_arg, NULL, > N_("passed to 'git log'"), PARSE_OPT_NOARG), > OPT_BOOL(0, "left-only", &left_only, Among existing options (an excerpt from "git range-diff h") --[no-]creation-factor <n> percentage by which creation is weighted This controls how correspondence between commits on old and new branches are computed. --no-dual-color use simple diff colors --dual-color opposite of --no-dual-color These control how the findings are shown, by painting the lines in distinct colors. --[no-]notes[=<notes>] passed to 'git log' --[no-]diff-merges <style> passed to 'git log' --[no-]remerge-diff passed to 'git log' These control what text are used to represent each commit and participate in comparison and display. --[no-]left-only only emit output related to the first range --[no-]right-only only emit output related to the second range These again control how the findings are shown, by omitting some commits from the output. So there is no perfectly logical place to place the new option, but between diff-merges and remerge-diff somewhat feels a bit odder choice than other possible places. Will queue as is. If some users find the location in the "-h" output too odd and disturbing, they can later send in a reordering patch on top, but I would think the chosen location is good enough. As #leftoverbits we might want to * Group range-diff specific options with OPT_GROUP() * Instead of having to match the full NxN matrix, perhaps reduce the matrix by keeping the most promising M (which is much smaller than N) for each N, or something? but that (especially the latter) is totally outside the scope of this patch. Thanks.