Re: [PATCH] repo_logmsg_reencode: fix memory leak when use repo_logmsg_reencode()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 04, 2025 at 09:56:39AM +0200, Patrick Steinhardt wrote:

> On Wed, Jun 04, 2025 at 03:10:07AM +0000, Lidong Yan via GitGitGadget wrote:
> > diff --git a/builtin/replay.c b/builtin/replay.c
> > index 225cef08807..6172c8aacc9 100644
> > --- a/builtin/replay.c
> > +++ b/builtin/replay.c
> > @@ -84,6 +84,7 @@ static struct commit *create_commit(struct repository *repo,
> >  	obj = parse_object(repo, &ret);
> >  
> >  out:
> > +	repo_unuse_commit_buffer(the_repository, based_on, message);
> >  	free_commit_extra_headers(extra);
> >  	free_commit_list(parents);
> >  	strbuf_release(&msg);
> 
> Makes sense. This one _looks_ like a leak that I'd expect to hit in our
> test suite as it's not part of an error path.

We'll usually never flag a leak for commit buffers, because they are
stored in (and owned by) a commit-slab. So the memory is not leaked
exactly, but we may hold on to it longer than we need to. This mostly
only becomes obvious when we do it for every commit in a code path that
touches a lot of commits (e.g., "git log" or something).

The exception is if we actually had re-encode, which requires a mismatch
between the commit and output encodings (which both default to UTF-8).
And then it really is a leak.

If we add a hack like this:

diff --git a/utf8.c b/utf8.c
index 35a0251939..d7b7d372c5 100644
--- a/utf8.c
+++ b/utf8.c
@@ -3,6 +3,7 @@
 #include "git-compat-util.h"
 #include "strbuf.h"
 #include "utf8.h"
+#include "parse.h"
 
 /* This code is originally from https://www.cl.cam.ac.uk/~mgk25/ucs/ */
 
@@ -442,6 +443,12 @@ int is_encoding_utf8(const char *name)
 int same_encoding(const char *src, const char *dst)
 {
 	static const char utf8[] = "UTF-8";
+	static int always_reencode = -1;
+
+	if (always_reencode < 0)
+		always_reencode = git_env_bool("GIT_TEST_ALWAYS_REENCODE", 0);
+	if (always_reencode)
+		return 0;
 
 	if (!src)
 		src = utf8;

then running:

  GIT_TEST_ALWAYS_REENCODE=1 make SANITIZE=leak test

turns up this leak via t3650-replay-basics.sh (as well as in t6429).

It's probably a bit too specialized to carry around as a permanent test
mode, though. I thought it might find other cases, but it doesn't. The
other one in this patch only triggers when the commit message has no
header separator, which is not very likely.

> > -	if (!body)
> > +	if (!body) {
> > +		repo_unuse_commit_buffer(the_repository, commit, commit_buffer);
> >  		return;
> > +	}
> >  
> >  	trailer_iterator_init(&iter, body);
> >  	while (trailer_iterator_advance(&iter)) {
> 
> Should this one maybe be converted into a `goto out` so that we can
> release resources in a single location, only? Something like the below
> patch.

Yeah, I think that is nicer, though...

> diff --git a/builtin/shortlog.c b/builtin/shortlog.c
> index 30075b67be8..dd08bc40161 100644
> --- a/builtin/shortlog.c
> +++ b/builtin/shortlog.c
> @@ -177,7 +177,7 @@ static void insert_records_from_trailers(struct shortlog *log,
>  	struct strbuf ident = STRBUF_INIT;
>  
>  	if (!log->trailers.nr)
> -		return;
> +		goto out;

If you convert this hunk, then we'd look at the uninitialized
commit_buffer variable after we jump to the out label. I think the v2
just posted is OK, though (it touches only the one conditional that
needs the goto).

-Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux