.clang-format: how useful, how often used, and how well maintained?

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 19 Jun 2025 09:38:35 -0700

Cc-list chosen from "git shortlog --since=12.months --no-merges .clang-format".

I am wondering how often our developers use "make style" aka

    git clang-format --style file --diff --extensions c,h

and also wondering if the suggested style fixes are really
"improvements".  For example, taking randomly the latest patch I
just injested into my tree, i.e.

    $ git am a-single-patch-file.txt
    $ git reset --soft HEAD^
    $ make style

I got the output attached at the end of the message.  The result is
a mixed bag (I commented on the "patch" as if it were a patch
submission).

I have this suspicion that nobody complained these sub-par
suggestions the tool makes based on what we have in .clang-format
because not many folks run "make style", and "make style" is not
very easy to use after you record your changes into a commit.  IOW,
there is nothing packaged to help "I have four commits on top of the
upstream, I want to run style checks before running format-patch",
i.e.

    git clang-format --diff HEAD~4

Even the output from the tool is of mixed quality, there are good
pieces that can be used to improve your patches.  So we may prefer
to see the tool used more often, but not in a way to suggest its
output is always better than what the human developer has written.

For that, there are a few things we'd probably need to do:

 - Improve our tooling so that the develper can check a range of
   commits they made before running format-patch, and other
   situations.

 - Improve .clang-format rules to reduce false positives.

> git clang-format --style file --diff --extensions c,h diff --git
> a/builtin/fast-export.c b/builtin/fast-export.c index
> 332c036ee4..d89e5ba6d5 100644 --- a/builtin/fast-export.c +++
> b/builtin/fast-export.c @@ -658,17 +658,16 @@ static void
> print_signature(const char *signature, const char *object_hash) if
> (!signature) return;
>  
> -	printf("gpgsig %s %s\ndata %u\n%s",
> -	       object_hash,
> -	       get_signature_format(signature),
> -	       (unsigned)strlen(signature),
> +	printf("gpgsig %s %s\ndata %u\n%s", object_hash,
> +	       get_signature_format(signature), (unsigned)strlen(signature),
>  	       signature);
>  }

I do not mind the original but the updated one is not worse.  IOW, I
would reject if a human sent this patch to fix the original that is
already in-tree with "once the code is written in an acceptable way,
it is not worth the patch noise to replace it with the updated one
that is not significantly better".

I'll call this kind "once the code is written" in the rest of the
message.

>  static void warn_on_extra_sig(const char **pos, struct commit *commit, int is_sha1)
>  {
>  	const char *header = is_sha1 ? "gpgsig" : "gpgsig-sha256";
> -	const char *extra_sig = find_commit_multiline_header(*pos + 1, header, pos);
> +	const char *extra_sig =
> +		find_commit_multiline_header(*pos + 1, header, pos);

OK.

> @@ -735,19 +734,20 @@ static void handle_commit(struct commit *commit, struct rev_info *rev,
>  		 * The searches must start from the same position.
>  		 */
>  		sig_sha1 = find_commit_multiline_header(sig_cursor + 1,
> -							"gpgsig",
> -							&after_sha1);
> +							"gpgsig", &after_sha1);
>  		sig_sha256 = find_commit_multiline_header(sig_cursor + 1,
>  							  "gpgsig-sha256",
>  							  &after_sha256);

This is a suggestion that is clearly worse than the original.  These
two statements should look similar as they are doing similar things.
Line wrapping the former only because it uses tokens slightly
shorter than the ones used by the latter inevitably makes them look
more different.

This is why I am dubious of any automated tools that have to make
their decision mechanically.

Is there a way to express:

    We want lines that are longer than the 80-column limit to be
    wrapped at 80-column, but do not coalesce shorter lines only
    to make them into a smaller number of longer lines.

If we can say "wrap overly long lines, whose definition is longer
than 100-column, at 80-column" in the earlier half of the sentence,
it would be even better.

> -		/* Warn on any additional signatures, as they will be ignored. */
> +		/* Warn on any additional signatures, as they will be ignored.
> +		 */

Looks significantly worse.

Is there a way to express:

    Our multi-line comments begin and end with slash-asterisk and
    asterisk-slash on their own line without anything else.

>  		if (sig_sha1)
>  			warn_on_extra_sig(&after_sha1, commit, 1);
>  		if (sig_sha256)
>  			warn_on_extra_sig(&after_sha256, commit, 0);
>  
> -		commit_buffer_cursor = (after_sha1 > after_sha256) ? after_sha1 : after_sha256;
> +		commit_buffer_cursor =
> +			(after_sha1 > after_sha256) ? after_sha1 : after_sha256;

Good.

> diff --git a/builtin/fast-import.c b/builtin/fast-import.c
> index 48ce8ebb77..5da80e69f3 100644
> --- a/builtin/fast-import.c
> +++ b/builtin/fast-import.c
> @@ -2720,19 +2720,21 @@ static struct hash_list *parse_merge(unsigned int *count)
>  }
>  
>  struct signature_data {
> -	char *hash_algo;      /* "sha1" or "sha256" */
> -	char *sig_format;     /* "openpgp", "x509", "ssh", "unknown" */
> -	struct strbuf data;   /* The actual signature data */
> +	char *hash_algo; /* "sha1" or "sha256" */
> +	char *sig_format; /* "openpgp", "x509", "ssh", "unknown" */
> +	struct strbuf data; /* The actual signature data */
>  };

This is not better or worse, where "once the code is written"
comment would apply.

>  static void parse_one_signature(struct signature_data *sig, const char *v)
>  {
> -	char *args = xstrdup(v); /* Will be freed when sig->hash_algo is freed */
> +	char *args = xstrdup(v); /* Will be freed when sig->hash_algo is freed
> +				  */

Looks significantly worse.

>  	char *space = strchr(args, ' ');
>  
>  	if (!space)
>  		die("Expected gpgsig format: 'gpgsig <hash-algo> <signature-format>', "
> -		    "got 'gpgsig %s'", args);
> +		    "got 'gpgsig %s'",
> +		    args);

What was the tool thinking when it made this suggestion?  IOW, is
there a stupid rule in .clang-format kicking in?

> @@ -2744,8 +2746,7 @@ static void parse_one_signature(struct signature_data *sig, const char *v)
>  		*space = '\0';
>  
>  	/* Validate hash algorithm */
> -	if (strcmp(sig->hash_algo, "sha1") &&
> -	    strcmp(sig->hash_algo, "sha256"))
> +	if (strcmp(sig->hash_algo, "sha1") && strcmp(sig->hash_algo, "sha256"))
>  		die("Unknown git hash algorithm in gpgsig: '%s'", sig->hash_algo);

This is probably slightly worse from extensibility's pov, which a
mechanical tool cannot make a good judgement, but the author of the
original did ;-)

> @@ -2759,8 +2760,7 @@ static void parse_one_signature(struct signature_data *sig, const char *v)
>  	parse_data(&sig->data, 0, NULL);
>  }
>  
> -static void add_gpgsig_to_commit(struct strbuf *commit_data,
> -				 const char *header,
> +static void add_gpgsig_to_commit(struct strbuf *commit_data, const char *header,
>  				 struct signature_data *sig)

"once the code is written".

> @@ make: *** [Makefile:3346: style] Error 1
-2778,8 +2778,7 @@ static void add_gpgsig_to_commit(struct strbuf *commit_data,
>  }
>  
>  static void store_signature(struct signature_data *stored_sig,
> -			    struct signature_data *new_sig,
> -			    const char *hash_type)
> +			    struct signature_data *new_sig, const char *hash_type)

"once the code is written".

> diff --git a/gpg-interface.c b/gpg-interface.c
> index 6f2d87475f..3e17f69cdc 100644
> --- a/gpg-interface.c
> +++ b/gpg-interface.c
> @@ -152,8 +152,7 @@ const char *get_signature_format(const char *buf)
>  
>  int valid_signature_format(const char *format)
>  {
> -       return (!!get_format_by_name(format) ||
> -	       !strcmp(format, "unknown"));
> +	return (!!get_format_by_name(format) || !strcmp(format, "unknown"));
>  }

"once the code is written".