Re: [PATCH] repo_logmsg_reencode: fix memory leak when use repo_logmsg_reencode()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2025年6月5日 15:23,Jeff King <peff@xxxxxxxx> 写道:
> 
> On Wed, Jun 04, 2025 at 09:56:39AM +0200, Patrick Steinhardt wrote:
> 
>> On Wed, Jun 04, 2025 at 03:10:07AM +0000, Lidong Yan via GitGitGadget wrote:
>>> diff --git a/builtin/replay.c b/builtin/replay.c
>>> index 225cef08807..6172c8aacc9 100644
>>> --- a/builtin/replay.c
>>> +++ b/builtin/replay.c
>>> @@ -84,6 +84,7 @@ static struct commit *create_commit(struct repository *repo,
>>> obj = parse_object(repo, &ret);
>>> 
>>> out:
>>> + repo_unuse_commit_buffer(the_repository, based_on, message);
>>> free_commit_extra_headers(extra);
>>> free_commit_list(parents);
>>> strbuf_release(&msg);
>> 
>> Makes sense. This one _looks_ like a leak that I'd expect to hit in our
>> test suite as it's not part of an error path.
> 
> We'll usually never flag a leak for commit buffers, because they are
> stored in (and owned by) a commit-slab. So the memory is not leaked
> exactly, but we may hold on to it longer than we need to. This mostly
> only becomes obvious when we do it for every commit in a code path that
> touches a lot of commits (e.g., "git log" or something).

I understand. The static analysis tool which I used to test git find
repo_logmsg_reencode() might allocates memory through xstrdup()
or reencode_string(), then it report a leak. And I find that xstrdup() is
actually dead code. So only reencode_string() may cause leaks. 
  
> The exception is if we actually had re-encode, which requires a mismatch
> between the commit and output encodings (which both default to UTF-8).
> And then it really is a leak.
> 

Agreed.

> If we add a hack like this:
> 
> diff --git a/utf8.c b/utf8.c
> index 35a0251939..d7b7d372c5 100644
> --- a/utf8.c
> +++ b/utf8.c
> @@ -3,6 +3,7 @@
> #include "git-compat-util.h"
> #include "strbuf.h"
> #include "utf8.h"
> +#include "parse.h"
> 
> /* This code is originally from https://www.cl.cam.ac.uk/~mgk25/ucs/ */
> 
> @@ -442,6 +443,12 @@ int is_encoding_utf8(const char *name)
> int same_encoding(const char *src, const char *dst)
> {
> static const char utf8[] = "UTF-8";
> + static int always_reencode = -1;
> +
> + if (always_reencode < 0)
> + always_reencode = git_env_bool("GIT_TEST_ALWAYS_REENCODE", 0);
> + if (always_reencode)
> + return 0;
> 
> if (!src)
> src = utf8;
> 
> then running:
> 
>  GIT_TEST_ALWAYS_REENCODE=1 make SANITIZE=leak test
> 
> turns up this leak via t3650-replay-basics.sh (as well as in t6429).
> 
> It's probably a bit too specialized to carry around as a permanent test
> mode, though. I thought it might find other cases, but it doesn't. The
> other one in this patch only triggers when the commit message has no
> header separator, which is not very likely.
> 
>>> - if (!body)
>>> + if (!body) {
>>> + repo_unuse_commit_buffer(the_repository, commit, commit_buffer);
>>> return;
>>> + }
>>> 
>>> trailer_iterator_init(&iter, body);
>>> while (trailer_iterator_advance(&iter)) {
>> 
>> Should this one maybe be converted into a `goto out` so that we can
>> release resources in a single location, only? Something like the below
>> patch.
> 
> Yeah, I think that is nicer, though...
> 
>> diff --git a/builtin/shortlog.c b/builtin/shortlog.c
>> index 30075b67be8..dd08bc40161 100644
>> --- a/builtin/shortlog.c
>> +++ b/builtin/shortlog.c
>> @@ -177,7 +177,7 @@ static void insert_records_from_trailers(struct shortlog *log,
>> struct strbuf ident = STRBUF_INIT;
>> 
>> if (!log->trailers.nr)
>> - return;
>> + goto out;
> 
> If you convert this hunk, then we'd look at the uninitialized
> commit_buffer variable after we jump to the out label. I think the v2
> just posted is OK, though (it touches only the one conditional that
> needs the goto).
> 
> -Peff

I actually learn from your hack that GIT_TEST_ is something like GIT_TRACE,
both aids to test and debug.

Thanks,
Lidong






[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux