Re: .clang-format: how useful, how often used, and how well maintained?

Christian Couder <christian.couder@xxxxxxxxx> · Thu, 19 Jun 2025 22:30:01 +0200

On Thu, Jun 19, 2025 at 6:39 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Cc-list chosen from "git shortlog --since=12.months --no-merges .clang-format".
>
> I am wondering how often our developers use "make style" aka
>
>     git clang-format --style file --diff --extensions c,h
>
> and also wondering if the suggested style fixes are really
> "improvements".

Karthik and I discussed this today in the context of the "check-style"
GitLab CI job which often fails even if it doesn't make the whole CI
job fail. We agreed that it might be a good idea to either improve or
just disable some dubious style checks.

> For example, taking randomly the latest patch I
> just injested into my tree, i.e.
>
>     $ git am a-single-patch-file.txt
>     $ git reset --soft HEAD^
>     $ make style
>
> I got the output attached at the end of the message.  The result is
> a mixed bag (I commented on the "patch" as if it were a patch
> submission).
>
> I have this suspicion that nobody complained these sub-par
> suggestions the tool makes based on what we have in .clang-format
> because not many folks run "make style", and "make style" is not
> very easy to use after you record your changes into a commit.  IOW,
> there is nothing packaged to help "I have four commits on top of the
> upstream, I want to run style checks before running format-patch",
> i.e.
>
>     git clang-format --diff HEAD~4

Maybe a format-patch option (perhaps called '--format-check') could be
added to run such a command before format-patch actually outputs the
patches?

> Even the output from the tool is of mixed quality, there are good
> pieces that can be used to improve your patches.  So we may prefer
> to see the tool used more often, but not in a way to suggest its
> output is always better than what the human developer has written.
>
> For that, there are a few things we'd probably need to do:
>
>  - Improve our tooling so that the develper can check a range of
>    commits they made before running format-patch, and other
>    situations.
>
>  - Improve .clang-format rules to reduce false positives.

I agree that both of these would be nice.

> > git clang-format --style file --diff --extensions c,h diff --git
> > a/builtin/fast-export.c b/builtin/fast-export.c index
> > 332c036ee4..d89e5ba6d5 100644 --- a/builtin/fast-export.c +++
> > b/builtin/fast-export.c @@ -658,17 +658,16 @@ static void
> > print_signature(const char *signature, const char *object_hash) if
> > (!signature) return;
> >
> > -     printf("gpgsig %s %s\ndata %u\n%s",
> > -            object_hash,
> > -            get_signature_format(signature),
> > -            (unsigned)strlen(signature),
> > +     printf("gpgsig %s %s\ndata %u\n%s", object_hash,
> > +            get_signature_format(signature), (unsigned)strlen(signature),
> >              signature);
> >  }
>
> I do not mind the original but the updated one is not worse.  IOW, I
> would reject if a human sent this patch to fix the original that is
> already in-tree with "once the code is written in an acceptable way,
> it is not worth the patch noise to replace it with the updated one
> that is not significantly better".
>
> I'll call this kind "once the code is written" in the rest of the
> message.

Yeah, I think those should be considered false positives. They are not
worth failing the "check-style" CI job or even having a human look at
them.

> >  static void warn_on_extra_sig(const char **pos, struct commit *commit, int is_sha1)
> >  {
> >       const char *header = is_sha1 ? "gpgsig" : "gpgsig-sha256";
> > -     const char *extra_sig = find_commit_multiline_header(*pos + 1, header, pos);
> > +     const char *extra_sig =
> > +             find_commit_multiline_header(*pos + 1, header, pos);
>
> OK.

I don't think those are OK. If the existing code already has longer
lines, like perhaps here the `static void warn_on_extra_sig(...)` line
above, then it's not worth suggesting wrapping lines like this. It
could result in a code with both long lines and wrapped short ones
which could be puzzling and harder to read than if the code had only
long lines.

Ideally our tools should be able to:

  - provide full patch (including the commit message) which correctly
wraps all the long lines in a file, so that such a patch can easily be
created and added as a preparatory patch to a patch series,

  - suggest wrapping lines only when they are longer than
"max(existing lines around the hunk, our maximum line length
default)".

> > @@ -735,19 +734,20 @@ static void handle_commit(struct commit *commit, struct rev_info *rev,
> >                * The searches must start from the same position.
> >                */
> >               sig_sha1 = find_commit_multiline_header(sig_cursor + 1,
> > -                                                     "gpgsig",
> > -                                                     &after_sha1);
> > +                                                     "gpgsig", &after_sha1);
> >               sig_sha256 = find_commit_multiline_header(sig_cursor + 1,
> >                                                         "gpgsig-sha256",
> >                                                         &after_sha256);
>
> This is a suggestion that is clearly worse than the original.  These
> two statements should look similar as they are doing similar things.

I agree.

> Line wrapping the former only because it uses tokens slightly
> shorter than the ones used by the latter inevitably makes them look
> more different.
>
> This is why I am dubious of any automated tools that have to make
> their decision mechanically.

When such a tool suggests wrapping too long lines, it's more likely to
be useful than when it suggests unwrapping short lines. So I think
that if we could configure the tool so that it stops suggesting
unwrapping lines, we could likely reduce the number of false positives
without much drawback.

> Is there a way to express:
>
>     We want lines that are longer than the 80-column limit to be
>     wrapped at 80-column, but do not coalesce shorter lines only
>     to make them into a smaller number of longer lines.
>
> If we can say "wrap overly long lines, whose definition is longer
> than 100-column, at 80-column" in the earlier half of the sentence,
> it would be even better.

Yeah, if it could look at lines around the current hunk that could be
even better.

> > -             /* Warn on any additional signatures, as they will be ignored. */
> > +             /* Warn on any additional signatures, as they will be ignored.
> > +              */
>
> Looks significantly worse.

I agree.

> Is there a way to express:
>
>     Our multi-line comments begin and end with slash-asterisk and
>     asterisk-slash on their own line without anything else.

If there is no way to express this, then it might be better to disable
wrapping any comment if the tool has a knob for that.

[...]

> >       char *space = strchr(args, ' ');
> >
> >       if (!space)
> >               die("Expected gpgsig format: 'gpgsig <hash-algo> <signature-format>', "
> > -                 "got 'gpgsig %s'", args);
> > +                 "got 'gpgsig %s'",
> > +                 args);
>
> What was the tool thinking when it made this suggestion?  IOW, is
> there a stupid rule in .clang-format kicking in?

Yeah, this is kind of strange because it looks like the opposite of
what the tool suggested in some cases above.

> > @@ -2744,8 +2746,7 @@ static void parse_one_signature(struct signature_data *sig, const char *v)
> >               *space = '\0';
> >
> >       /* Validate hash algorithm */
> > -     if (strcmp(sig->hash_algo, "sha1") &&
> > -         strcmp(sig->hash_algo, "sha256"))
> > +     if (strcmp(sig->hash_algo, "sha1") && strcmp(sig->hash_algo, "sha256"))
> >               die("Unknown git hash algorithm in gpgsig: '%s'", sig->hash_algo);
>
> This is probably slightly worse from extensibility's pov, which a
> mechanical tool cannot make a good judgement, but the author of the
> original did ;-)

Thanks ;-)

That makes me wonder if we could use an AI tool with a fixed prompt
(that we could improve over time) and provide it our CodingGuideline
for these kinds of things rather than a mechanical tool.

[...]