Re: [PATCH 3/4] promisor-remote: allow a server to advertise extra fields

Patrick Steinhardt <ps@xxxxxx> · Tue, 22 Apr 2025 12:13:45 +0200

On Mon, Apr 14, 2025 at 03:04:07PM -0700, Junio C Hamano wrote:
> Christian Couder <christian.couder@xxxxxxxxx> writes:
> 
> >  
> > +promisor.sendExtraFields::
> > +	A comma or space separated list of additional remote related
> > +	fields that a server will send while advertising its promisor
> > +	remotes using the "promisor-remote" capability, see
> > +	linkgit:gitprotocol-v2[5]. When a field named "bar" is part of
> > +	this list and a corresponding "remote.foo.bar" config variable
> > +	is set on the server to a non empty value, for example "baz",
> > +	then the field and its value, so "bar=baz", will be sent when
> > +	advertising the promisor remote "foo". This list has no effect
> > +	unless the "promisor.advertise" config variable, see above, is
> > +	set to "true", and if that's the case, then whatever this list
> > +	contains, the "name" and "url" fields are advertised anyway
> > +	and contain the remote name and URL respectively, so there is
> > +	no need to add "name" or "url" to this list.
> 
> As a description of overall syntax of the protocol, this and ...
> 
> 
> >  promisor.acceptFromServer::
> >  	If set to "all", a client will accept all the promisor remotes
> >  	a server might advertise using the "promisor-remote"
> > diff --git a/Documentation/gitprotocol-v2.adoc b/Documentation/gitprotocol-v2.adoc
> > index 5598c93e67..f649745837 100644
> > --- a/Documentation/gitprotocol-v2.adoc
> > +++ b/Documentation/gitprotocol-v2.adoc
> > @@ -785,33 +785,39 @@ retrieving the header from a bundle at the indicated URI, and thus
> >  save themselves and the server(s) the request(s) needed to inspect the
> >  headers of that bundle or bundles.
> >  
> > -promisor-remote=<pr-infos>
> > +promisor-remote=<pr-info>
> >  ~~~~~~~~~~~~~~~~~~~~~~~~~~
> >  
> >  The server may advertise some promisor remotes it is using or knows
> >  about to a client which may want to use them as its promisor remotes,
> > -instead of this repository. In this case <pr-infos> should be of the
> > +instead of this repository. In this case <pr-info> should be of the
> >  form:
> >  
> > -	pr-infos = pr-info | pr-infos ";" pr-info
> > +	pr-info = pr-fields | pr-info ";" pr-info
> >  
> > -	pr-info = "name=" pr-name | "name=" pr-name "," "url=" pr-url
> > +	pr-fields = fld-key "=" fld-value | pr-fields "," pr-fields
> >  
> > -where `pr-name` is the urlencoded name of a promisor remote, and
> > -`pr-url` the urlencoded URL of that promisor remote.
> > +where all the `fld-key` and `fld-value` in a given `pr-fields` are
> > +field keys and values related to a single promisor remote.
> 
> ... this may work, but as we are defining protocol between two
> parties, in order to ensure interoperable reimplementations, we need
> to also specify semantics, what are the defined "fields", and what
> each of them mean.  Proposing nebulous "with this framework your
> imagination is the limit, you can invent anything" may work for
> other parts of the system, but not for the part that is about
> communication between two repositories.
> 
> IOW, we shouldn't be internally calling these "extra".  From the
> point of view of "core" they may be "extra", but to the developers
> and certainly to the end-users, they shouldn't be "extra" at all.
> They are all supported parts of the system with defined semantics,
> right?
> 
> Another reason why I hate seeing this nebulous "with this, we can
> send anything extra" is because such a thing will have a wrong
> security posture.  If we truly *need* to be able to carry
> *anything*, we need to make sure how values are quoted/escaped, and
> the code for dequoting/unescaping are robustly written to avoid
> passing malformed input and misinterpreting it as something else,
> which would give a new attack vector.  If we can enumerate supported
> fields, their syntax and their possible values, we can make the
> attack surface a lot smaller.

I agree that we should properly specify which fields are accepted and
what their respective format as well as semantics would be. Otherwise,
without such a definition, hosting sites may eventually end up with
slightly incompatible semantics. I also don't expect that there should
be all that many fields.

This raises a question though: what would happen if a field was
advertised that the client doesn't understand? Should the client simply
ignore such a field? Should they bail out? I think we need to also think
about this edge case and specify client-side behaviour. I think in the
end, both ways would be rather limiting:

  - If we simply ignored all unknown fields our hands might be bound if
    we ever had to introduce changes that aren't backwards compatible.

  - If we always bail out on an unknown field our hands would be bound
    equally, as we cannot ever introduce a new field.

Which raises the question whether we need to be able to dynamically
figure out fields. This could be in the form of capability negotiation
or protocol versions. But in any case, I think we need to have something
ready so that we can change behaviour depending on which features are
supported by a client.

Patrick