2025年6月5日 15:23,Jeff King <peff@xxxxxxxx> 写道: > > On Wed, Jun 04, 2025 at 09:56:39AM +0200, Patrick Steinhardt wrote: > >> On Wed, Jun 04, 2025 at 03:10:07AM +0000, Lidong Yan via GitGitGadget wrote: >>> diff --git a/builtin/replay.c b/builtin/replay.c >>> index 225cef08807..6172c8aacc9 100644 >>> --- a/builtin/replay.c >>> +++ b/builtin/replay.c >>> @@ -84,6 +84,7 @@ static struct commit *create_commit(struct repository *repo, >>> obj = parse_object(repo, &ret); >>> >>> out: >>> + repo_unuse_commit_buffer(the_repository, based_on, message); >>> free_commit_extra_headers(extra); >>> free_commit_list(parents); >>> strbuf_release(&msg); >> >> Makes sense. This one _looks_ like a leak that I'd expect to hit in our >> test suite as it's not part of an error path. > > We'll usually never flag a leak for commit buffers, because they are > stored in (and owned by) a commit-slab. So the memory is not leaked > exactly, but we may hold on to it longer than we need to. This mostly > only becomes obvious when we do it for every commit in a code path that > touches a lot of commits (e.g., "git log" or something). I understand. The static analysis tool which I used to test git find repo_logmsg_reencode() might allocates memory through xstrdup() or reencode_string(), then it report a leak. And I find that xstrdup() is actually dead code. So only reencode_string() may cause leaks. > The exception is if we actually had re-encode, which requires a mismatch > between the commit and output encodings (which both default to UTF-8). > And then it really is a leak. > Agreed. > If we add a hack like this: > > diff --git a/utf8.c b/utf8.c > index 35a0251939..d7b7d372c5 100644 > --- a/utf8.c > +++ b/utf8.c > @@ -3,6 +3,7 @@ > #include "git-compat-util.h" > #include "strbuf.h" > #include "utf8.h" > +#include "parse.h" > > /* This code is originally from https://www.cl.cam.ac.uk/~mgk25/ucs/ */ > > @@ -442,6 +443,12 @@ int is_encoding_utf8(const char *name) > int same_encoding(const char *src, const char *dst) > { > static const char utf8[] = "UTF-8"; > + static int always_reencode = -1; > + > + if (always_reencode < 0) > + always_reencode = git_env_bool("GIT_TEST_ALWAYS_REENCODE", 0); > + if (always_reencode) > + return 0; > > if (!src) > src = utf8; > > then running: > > GIT_TEST_ALWAYS_REENCODE=1 make SANITIZE=leak test > > turns up this leak via t3650-replay-basics.sh (as well as in t6429). > > It's probably a bit too specialized to carry around as a permanent test > mode, though. I thought it might find other cases, but it doesn't. The > other one in this patch only triggers when the commit message has no > header separator, which is not very likely. > >>> - if (!body) >>> + if (!body) { >>> + repo_unuse_commit_buffer(the_repository, commit, commit_buffer); >>> return; >>> + } >>> >>> trailer_iterator_init(&iter, body); >>> while (trailer_iterator_advance(&iter)) { >> >> Should this one maybe be converted into a `goto out` so that we can >> release resources in a single location, only? Something like the below >> patch. > > Yeah, I think that is nicer, though... > >> diff --git a/builtin/shortlog.c b/builtin/shortlog.c >> index 30075b67be8..dd08bc40161 100644 >> --- a/builtin/shortlog.c >> +++ b/builtin/shortlog.c >> @@ -177,7 +177,7 @@ static void insert_records_from_trailers(struct shortlog *log, >> struct strbuf ident = STRBUF_INIT; >> >> if (!log->trailers.nr) >> - return; >> + goto out; > > If you convert this hunk, then we'd look at the uninitialized > commit_buffer variable after we jump to the out label. I think the v2 > just posted is OK, though (it touches only the one conditional that > needs the goto). > > -Peff I actually learn from your hack that GIT_TEST_ is something like GIT_TRACE, both aids to test and debug. Thanks, Lidong