Re: [PATCH v2 2/3] promisor-remote: allow a server to advertise more fields

Christian Couder <christian.couder@xxxxxxxxx> · Mon, 19 May 2025 16:11:18 +0200

On Wed, May 7, 2025 at 10:25 AM Patrick Steinhardt <ps@xxxxxx> wrote:
>
> On Tue, Apr 29, 2025 at 04:52:42PM +0200, Christian Couder wrote:
> > diff --git a/Documentation/config/promisor.adoc b/Documentation/config/promisor.adoc
> > index 2638b01f83..71311b70c8 100644
> > --- a/Documentation/config/promisor.adoc
> > +++ b/Documentation/config/promisor.adoc
> > @@ -9,6 +9,24 @@ promisor.advertise::
> >       "false", which means the "promisor-remote" capability is not
> >       advertised.
> >
> > +promisor.sendFields::
> > +     A comma or space separated list of additional remote related
> > +     fields that a server will send while advertising its promisor
> > +     remotes using the "promisor-remote" capability, see
> > +     linkgit:gitprotocol-v2[5]. Currently, only the
> > +     "partialCloneFilter" and "token" fields are supported. The
> > +     "partialCloneFilter" field contains the partial clone filter
> > +     used for the remote, and the "token" field contains an
> > +     authentication token for the remote.
> > ++
>
> Should we maybe convert this into a list of accepted fields? Makes it
> easier to extend going forward.

I am not sure I understand what you mean. This promisor.sendFields
config variable is for the server side which advertises remotes. The
server advertises its remotes (if it wants to) before receiving
information from the client, so it cannot know what the client
accepts.

> Furthermore, should we maybe refactor this to match the restrictive
> design where valid fields are explicitly specified? In other words,
> should we have separate config keys for each of the accepted fields now?

Maybe I don't understand what you mean with "accepted fields".

> Also, shouldn't this setting be per promisor remote that we want to
> advertise? I expect that servers will want to send different partial
> clone filters for each of the advertised remotes, and they may also want
> to send different tokens. So it seems a bit too inflexible to only have
> a single, global "sendFields" configuration that covers all promisors.

First this setting already allows servers to send different partial
clone filters for each of the advertised remotes. For each remote it
advertises, a server would send the partial clone filter that is
already configured for this remote on the server. Same for tokens.

Also we can extend this setting to be per promisor remote later if
there is a need for it. I don't think it would be difficult to do. And
I don't think it's necessary right now, because it's likely that for
simplicity most servers will manage all their promisor remotes in the
same way (at least until usage of promisor remotes grows a lot).

> > diff --git a/Documentation/gitprotocol-v2.adoc b/Documentation/gitprotocol-v2.adoc
> > index 5598c93e67..b4648a7ce6 100644
> > --- a/Documentation/gitprotocol-v2.adoc
> > +++ b/Documentation/gitprotocol-v2.adoc
> > @@ -785,33 +785,52 @@ retrieving the header from a bundle at the indicated URI, and thus
> >  save themselves and the server(s) the request(s) needed to inspect the
> >  headers of that bundle or bundles.
> >
> > -promisor-remote=<pr-infos>
> > +promisor-remote=<pr-info>
> >  ~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> >  The server may advertise some promisor remotes it is using or knows
> >  about to a client which may want to use them as its promisor remotes,
> > -instead of this repository. In this case <pr-infos> should be of the
> > +instead of this repository. In this case <pr-info> should be of the
> >  form:
> >
> > -     pr-infos = pr-info | pr-infos ";" pr-info
> > +     pr-info = pr-fields | pr-info ";" pr-info
> >
> > -     pr-info = "name=" pr-name | "name=" pr-name "," "url=" pr-url
> > +     pr-fields = fld-name "=" fld-value | pr-fields "," pr-fields
>
> Tiny nit, but can we maybe spell out "fld" fully? It doesn't buy us that
> much to abbreviate "field", and it did cause my reading to trip.

Fine, I have done this in v3.

> > -where `pr-name` is the urlencoded name of a promisor remote, and
> > -`pr-url` the urlencoded URL of that promisor remote.
> > +where all the `fld-name` and `fld-value` in a given `pr-fields` are
> > +field names and values related to a single promisor remote.
> >
> > -In this case, if the client decides to use one or more promisor
> > -remotes the server advertised, it can reply with
> > -"promisor-remote=<pr-names>" where <pr-names> should be of the form:
> > +The server MUST advertise at least the "name" and "url" field names
> > +along with the associated field values, which are the name of a valid
> > +remote and its URL, in each `pr-fields`.
> >
> > -     pr-names = pr-name | pr-names ";" pr-name
> > +The server MAY advertise the following optional fields:
> > +
> > +- "partialCloneFilter": Filter used for partial clone, corresponding
> > +  to the "remote.<name>.partialCloneFilter" config setting.
> > +- "token": Authentication token for the remote, corresponding
> > +  to the "remote.<name>.token" config setting.
>
> I think we should define semantics of these fields more closely. What
> exactly is the consequence of a partial clone filter being defined?

It's just that the client will know what filter the server uses to
access the promisor remote. In a later patch in this series the client
will be able to decide to accept the remote or not based on that
information (depending on whether the filter matches the filter
already configured on the client side). The client will still not use
the information for other things than deciding to accept or not the
remote after this patch series.

> Does
> it mean that this promisor remote should only be used in case we do have
> the exact same filter passed to git-clone(1)?

It's up to the client to decide, but yeah it will likely work better
if the same filter is used. It should still work if a different filter
is used though. In case the promisor remote doesn't have an object,
there should be a fallback to ask the main server for that object.

Also the filter mechanism already exists for a long time and this
series doesn't change how it works. It's already possible to have
different repos using the same promisor remote with different filters.
So documentation about what happens when they do that should not be
specific to this patch series.

> Does it mean that the
> remote only contains objects that would've been filtered _out_ by such a
> filter?

The filter specification in general is always about what is filtered
out from the repo accessing the promisor remote (not about what is
filtered out from the promisor remote). Again this is not specific to
this patch series. This is general partial clone information since
partial clone has been developed a long time ago.

> Furthermore, we should specify how the token is supposed to be passed to
> the remote.

First for now the token is not used for other things than deciding if
the client accepts the remote or not. (Same thing as for the filter,
url, name...) I think it can already be useful in some cases.

Now, if we want to talk about a future patch series when the token is
possibly used, fine, let's do it. Please consider the following:

  - If the token is passed to the remote through a remote helper, then
it's up to the remote helper to pass and use it, and it could do so in
many different ways that we can't anticipate and maybe we will not
necessarily know about them (for privately developed and used remote
helpers for example).

  - If we develop a standard way in Git to use such a token, then we
will know about that standard way's use of the token at that time (not
necessarily about how remote helpers in the wild might use it though)
. But to develop such a standard way, it's better to already have such
a token passed.

  - Suppose we require this standard way to be developed alongside a
patch series such as this one which passes a token. The main issue is
that this would prevent remote helpers from using such a token until
that standard way is developed. And for now we don't really know if
such a standard way is actually needed or if people will only or
mostly use remote helpers.

  - What if this standard way to use a token is actually not needed
because people use very different backends for large objects with very
different remote helpers developed by different people?

  - There are not many technical similarities between code that passes
a token and code that possibly creates it and then uses it. So I don't
think there is anything technically that requires us to do these
things in the same patch series.

  - There might be a catch 22 between a patch series such as this one
and a patch series to develop a standard way to use a token. We might
not accept patch series adding a token like this one because the token
is not used yet, while a patch series to develop a standard way to
access remote helpers using a token cannot work or even be developed
because no token is passed yet.

So I strongly suggest accepting that we pass a token right now, and
not wait for possible ways to use it (other than deciding if the
remote is accepted or not).

If 'token' is not put into the promisor.sendFields config variable by
servers, the only cost is that clients might check if 'token' is
passed when servers advertise their promisor remotes. It's a very
small cost to enable something likely very much needed and important
for security reasons.

> > +No other fields are defined by the protocol at this time. Clients SHOULD
> > +ignore fields they don't recognize to allow for future protocol extensions.
>
> Shouldn't we require clients to ignore unknown fields? Otherwise, if
> it's only optional to ignore them, we still can't introduce new fields
> in the future without breaking existing clients that chose to ignore
> this guidance.

In v3 this is now:

"No other fields are defined by the protocol at this time. Clients MUST
ignore fields they don't recognize to allow for future protocol
extensions."

With "MUST" instead of "SHOULD", we now require clients to ignore
unknown fields, so they shouldn't break if we introduce new fields.

> > diff --git a/promisor-remote.c b/promisor-remote.c
> > index 24d0e70132..70abec4c24 100644
> > --- a/promisor-remote.c
> > +++ b/promisor-remote.c
> > @@ -314,6 +314,84 @@ static int allow_unsanitized(char ch)
> >       return ch > 32 && ch < 127;
> >  }
> >
> > +/*
> > + * List of field names allowed to be used in the "promisor-remote"
> > + * protocol capability. Each field should correspond to a configurable
> > + * property of a remote that can be relevant for the client.
> > + */
> > +static const char *allowed_fields[] = {
> > +     "partialCloneFilter", /* Filter used for partial clone */
> > +     "token",              /* Authentication token for the remote */
> > +     NULL
> > +};
> > +
> > +/*
> > + * Check if 'field' is in the list of allowed field names for the
> > + * "promisor-remote" protocol capability.
> > + */
> > +static int is_allowed_field(const char *field)
> > +{
> > +     const char **p;
> > +
> > +     for (p = allowed_fields; *p; p++)
> > +             if (!strcasecmp(*p, field))
> > +                     return 1;
> > +     return 0;
> > +}
>
> Nit: it is a bit funny that we talk about allowed fields here, but
> the recommendation is to just ignore unknown fields. So maybe this
> should instead be called "known_fields".

Fine, it's now 'known_fields' in v3.

Thanks for the review!