Re: [PATCH v5 2/5] promisor-remote: allow a server to advertise more fields

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 25 Jun 2025 15:29:43 -0700

Christian Couder <christian.couder@xxxxxxxxx> writes:

> For now the "promisor-remote" protocol capability can only pass "name"
> and "url" information from a server to a client in the form
> "name=<remote_name>,url=<remote_url>".
>
> Let's make it possible to pass more information by introducing

Motivation, like "in order to allow the client more efficient
transfer", "in order to give more control on what can be fetched
from the server", etc., needs to be given before we say "let's do
X".  We want to do X not because we can do X; we want to do X as it
currently seems to us the best way to achieve Y.  What is our Y in
this case for the proposed feature?

> a new
> "promisor.sendFields" configuration variable. This variable should
> contain a comma or space separated list

By making it easier for casual humans who manually write the
configuration variable (presumably while testing) and allowing both
comma and space as separator, this design decision is forcing one
more rule to worry about for those who are writing the parser for
the value.  There may be some existing configuration variables with
such a "leninent" syntax, but I'd rather see us not make the mess
even worse.

> of field names that will be
> looked up in the configuration of the remote on the server to find the
> values that will be passed to the client.

I know the name "field" was discussed in earlier iterations, but
these three lines together with "For example" in a later paragraph,
it hints to me that this mechanism is to choose variable-value pairs
for which among remotes.<name>.* variables to send after name=<name>
and url=<url>; is my understanding correct?  If so, can we clarify
the paragraphs around here even more so that I do not even have to
ask this question?

What do we call the third-level of a variable name in the
configuration file?  The description on the "--regexp" option in
"git config --help" hints one:

    With `get`, interpret the name as a regular expression. Regular
    expression matching is currently case-sensitive and done against
    a canonicalized version of the key in which section and variable
    names are lowercased, but subsection names are not.

So a for remote.origin.partialCloneFilter, "remote" is the section
name, "origin" is the subsection name, and "partialCloneFilter" is
the variable name.

Armed with that knowledge, perhaps something like:

    <<the motivation comes here, and then ...>> In order to give
    additional information to the client, the server side can set a
    new 'promisor.sendAdditionalVariables' configuration variable,
    whose value is a comma separated list of variable names.  

    The values of the configuration variables specified by this
    variable for the given remote [*] are sent as comma separated
    list of variable=<value>, after name=<name>,url=<url> in the
    promisor-remote capability.

    [*] Concatenating each of these variable names listed as the
    value of promisor.sendAdditionalVariables after remote.<name>.
    results in the configuration variables exposed to the client.

to replace all of the above, and then it ...

> Only a set of predefined fields are allowed. The only fields in this
> set are "partialCloneFilter" and "token". The "partialCloneFilter"
> field specifies the filter definition used by the promisor remote,
> and the "token" field can provide an authentication credential for
> accessing it.

... is a good thing to describe that we have these two supported.

And this one ...

> For example, if "promisor.sendFields" is set to "partialCloneFilter",
> and the server has the "remote.<name>.partialCloneFilter" config
> variable set to a value for a remote, then that value will be passed
> in the form "partialCloneFilter=<value>" after the "name" and "url"
> fields.

... has already been covered.  My main point is that we should
clarify what a 'field' is and how it relates to the variables in the
configuration file of which side (the server, not the client) a lot
earlier than we say "these two are the only ones that are
supported".  Before understanding what a field is, "only these two
are valid" does not give readers much useful information.

> A following commit will allow the client to use the information to
> decide if it accepts the remote or not. For now the client doesn't do
> anything with the additional information it receives.
>
> Signed-off-by: Christian Couder <chriscool@xxxxxxxxxxxxx>
> ---
>  Documentation/config/promisor.adoc    |  22 +++++
>  Documentation/gitprotocol-v2.adoc     |  59 +++++++++---
>  promisor-remote.c                     | 134 ++++++++++++++++++++++++--
>  t/t5710-promisor-remote-capability.sh |  31 ++++++
>  4 files changed, 221 insertions(+), 25 deletions(-)
>
> diff --git a/Documentation/config/promisor.adoc b/Documentation/config/promisor.adoc
> index 2638b01f83..beb8f65518 100644
> --- a/Documentation/config/promisor.adoc
> +++ b/Documentation/config/promisor.adoc
> @@ -9,6 +9,28 @@ promisor.advertise::
>  	"false", which means the "promisor-remote" capability is not
>  	advertised.
>  
> +promisor.sendFields::
> +	A comma or space separated list of additional remote related
> +	field names. A server will send these field names and the
> +	associated field values from its configuration when
> +	advertising its promisor remotes using the "promisor-remote"
> +	capability, see linkgit:gitprotocol-v2[5]. Currently, only the
> +	"partialCloneFilter" and "token" field names are supported.
> ++
> +* "partialCloneFilter": contains the partial clone filter
> +  used for the remote.
> ++
> +* "token": contains an authentication token for the remote.
> ++
> +When a field name is part of this list and a corresponding
> +"remote.foo.<field name>" config variable is set on the server to a
> +non-empty value, then the field name and value will be sent when
> +advertising the promisor remote "foo".
> ++
> +This list has no effect unless the "promisor.advertise" config
> +variable is set to "true", and the "name" and "url" fields are always
> +advertised regardless of this setting.
> +
>  promisor.acceptFromServer::
>  	If set to "all", a client will accept all the promisor remotes
>  	a server might advertise using the "promisor-remote"
> diff --git a/Documentation/gitprotocol-v2.adoc b/Documentation/gitprotocol-v2.adoc
> index 9a57005d77..0583fafa09 100644
> --- a/Documentation/gitprotocol-v2.adoc
> +++ b/Documentation/gitprotocol-v2.adoc
> @@ -785,33 +785,59 @@ retrieving the header from a bundle at the indicated URI, and thus
>  save themselves and the server(s) the request(s) needed to inspect the
>  headers of that bundle or bundles.
>  
> -promisor-remote=<pr-infos>
> +promisor-remote=<pr-info>
>  ~~~~~~~~~~~~~~~~~~~~~~~~~~
>  
>  The server may advertise some promisor remotes it is using or knows
>  about to a client which may want to use them as its promisor remotes,
> +instead of this repository. In this case <pr-info> should be of the
>  form:
>  
> +	pr-info = pr-fields | pr-info ";" pr-info
>  
> +	pr-fields = field-name "=" field-value | pr-fields "," pr-fields

Typically a name is at most a few words (concatenated with some
punctuation), and url would probably be a few hundred bytes at most?
can we afford to cram even more var=value on the same line?

The syntax allows you to send more than one name/url pair on the
same promisor-remote.  Length aside, does this pr-fields mechanism
give the values of these fields for multiple remotes?

Suppose you want to advertise promisor-remote "A" and "B", and your
sendFields says "token".  What does your promisor-remote=<pr-info> look
like in such a case?

    name=A,url=https://git.git/A,token=foo;name=B,url=https://git.git/B,token=bar

Do we explicitly say that the set of var=val on promisor-remote are
grouped into distinct sets with ';' and each set talks about only
one remote?  I.e.

    name=A,name=B,url=https://git.git/A,url=https://git.git/B,token=foo,token=bar

may be syntactically allowed in the above BNF, but it needs _some_
rule to match which URL and which token go with which name.  Do we
also need to say that within a single ';' separated group, a variable
can only appear once?  IOW, this is invalid?

    name=A,url=https://git.git/A,token=foo,token=bar

> +where all the `field-name` and `field-value` in a given `pr-fields`
> +are field names and values related to a single promisor remote.
>  
> +The server MUST advertise at least the "name" and "url" field names
> +along with the associated field values, which are the name of a valid
> +remote and its URL, in each `pr-fields`. The "name" and "url" fields
> +MUST appear first in each pr-fields, in that order.

With

    pr-fields = pr-fields "," pr-fields

the rule in the last sentence does not make sense.  You'd need a
distinct BNF non-terminal to name the one you want the rule to apply
to.  Perhaps something like

	pr-info = pr-one-remote-info | pr-info ';' pr-one-remote-info 
	pr-one-remote-info = pr-fields
        pr-fields = pr-field | pr-fields ',' pr-field
	pr-field = variable '=' value

Then you can safely say 'name' and 'url', must be the first two that
appear in each pr-one-remote-info.

> +After these mandatory fields, the server MAY advertise the following
> +optional fields in any order:
> +
> +- "partialCloneFilter": The filter specification used by the remote.
> +Clients can use this to determine if the remote's filtering strategy
> +is compatible with their needs (e.g., checking if both use "blob:none").
> +It corresponds to the "remote.<name>.partialCloneFilter" config setting.
> +
> +- "token": An authentication token that clients can use when
> +connecting to the remote. It corresponds to the "remote.<name>.token"
> +config setting.
> +
> +No other fields are defined by the protocol at this time. Clients MUST
> +ignore fields they don't recognize to allow for future protocol
> +extensions.

This forces us to never add support for a field that MUST be
understood for the protocol to operate correctly.  Is it sensible?

Just like the index file format defines optional extensions that can
be ignored (implication of which is that you MUST die if you do not
understand any non-optional ones), shouldn't we have some mechanism
to tell "if you do not understand this, do not use the remote, or
your repository will be broken in a horrible way"?

I dunno.