On Wed, May 7, 2025 at 10:25 AM Patrick Steinhardt <ps@xxxxxx> wrote: > > On Tue, Apr 29, 2025 at 04:52:42PM +0200, Christian Couder wrote: > > diff --git a/Documentation/config/promisor.adoc b/Documentation/config/promisor.adoc > > index 2638b01f83..71311b70c8 100644 > > --- a/Documentation/config/promisor.adoc > > +++ b/Documentation/config/promisor.adoc > > @@ -9,6 +9,24 @@ promisor.advertise:: > > "false", which means the "promisor-remote" capability is not > > advertised. > > > > +promisor.sendFields:: > > + A comma or space separated list of additional remote related > > + fields that a server will send while advertising its promisor > > + remotes using the "promisor-remote" capability, see > > + linkgit:gitprotocol-v2[5]. Currently, only the > > + "partialCloneFilter" and "token" fields are supported. The > > + "partialCloneFilter" field contains the partial clone filter > > + used for the remote, and the "token" field contains an > > + authentication token for the remote. > > ++ > > Should we maybe convert this into a list of accepted fields? Makes it > easier to extend going forward. I am not sure I understand what you mean. This promisor.sendFields config variable is for the server side which advertises remotes. The server advertises its remotes (if it wants to) before receiving information from the client, so it cannot know what the client accepts. > Furthermore, should we maybe refactor this to match the restrictive > design where valid fields are explicitly specified? In other words, > should we have separate config keys for each of the accepted fields now? Maybe I don't understand what you mean with "accepted fields". > Also, shouldn't this setting be per promisor remote that we want to > advertise? I expect that servers will want to send different partial > clone filters for each of the advertised remotes, and they may also want > to send different tokens. So it seems a bit too inflexible to only have > a single, global "sendFields" configuration that covers all promisors. First this setting already allows servers to send different partial clone filters for each of the advertised remotes. For each remote it advertises, a server would send the partial clone filter that is already configured for this remote on the server. Same for tokens. Also we can extend this setting to be per promisor remote later if there is a need for it. I don't think it would be difficult to do. And I don't think it's necessary right now, because it's likely that for simplicity most servers will manage all their promisor remotes in the same way (at least until usage of promisor remotes grows a lot). > > diff --git a/Documentation/gitprotocol-v2.adoc b/Documentation/gitprotocol-v2.adoc > > index 5598c93e67..b4648a7ce6 100644 > > --- a/Documentation/gitprotocol-v2.adoc > > +++ b/Documentation/gitprotocol-v2.adoc > > @@ -785,33 +785,52 @@ retrieving the header from a bundle at the indicated URI, and thus > > save themselves and the server(s) the request(s) needed to inspect the > > headers of that bundle or bundles. > > > > -promisor-remote=<pr-infos> > > +promisor-remote=<pr-info> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > The server may advertise some promisor remotes it is using or knows > > about to a client which may want to use them as its promisor remotes, > > -instead of this repository. In this case <pr-infos> should be of the > > +instead of this repository. In this case <pr-info> should be of the > > form: > > > > - pr-infos = pr-info | pr-infos ";" pr-info > > + pr-info = pr-fields | pr-info ";" pr-info > > > > - pr-info = "name=" pr-name | "name=" pr-name "," "url=" pr-url > > + pr-fields = fld-name "=" fld-value | pr-fields "," pr-fields > > Tiny nit, but can we maybe spell out "fld" fully? It doesn't buy us that > much to abbreviate "field", and it did cause my reading to trip. Fine, I have done this in v3. > > -where `pr-name` is the urlencoded name of a promisor remote, and > > -`pr-url` the urlencoded URL of that promisor remote. > > +where all the `fld-name` and `fld-value` in a given `pr-fields` are > > +field names and values related to a single promisor remote. > > > > -In this case, if the client decides to use one or more promisor > > -remotes the server advertised, it can reply with > > -"promisor-remote=<pr-names>" where <pr-names> should be of the form: > > +The server MUST advertise at least the "name" and "url" field names > > +along with the associated field values, which are the name of a valid > > +remote and its URL, in each `pr-fields`. > > > > - pr-names = pr-name | pr-names ";" pr-name > > +The server MAY advertise the following optional fields: > > + > > +- "partialCloneFilter": Filter used for partial clone, corresponding > > + to the "remote.<name>.partialCloneFilter" config setting. > > +- "token": Authentication token for the remote, corresponding > > + to the "remote.<name>.token" config setting. > > I think we should define semantics of these fields more closely. What > exactly is the consequence of a partial clone filter being defined? It's just that the client will know what filter the server uses to access the promisor remote. In a later patch in this series the client will be able to decide to accept the remote or not based on that information (depending on whether the filter matches the filter already configured on the client side). The client will still not use the information for other things than deciding to accept or not the remote after this patch series. > Does > it mean that this promisor remote should only be used in case we do have > the exact same filter passed to git-clone(1)? It's up to the client to decide, but yeah it will likely work better if the same filter is used. It should still work if a different filter is used though. In case the promisor remote doesn't have an object, there should be a fallback to ask the main server for that object. Also the filter mechanism already exists for a long time and this series doesn't change how it works. It's already possible to have different repos using the same promisor remote with different filters. So documentation about what happens when they do that should not be specific to this patch series. > Does it mean that the > remote only contains objects that would've been filtered _out_ by such a > filter? The filter specification in general is always about what is filtered out from the repo accessing the promisor remote (not about what is filtered out from the promisor remote). Again this is not specific to this patch series. This is general partial clone information since partial clone has been developed a long time ago. > Furthermore, we should specify how the token is supposed to be passed to > the remote. First for now the token is not used for other things than deciding if the client accepts the remote or not. (Same thing as for the filter, url, name...) I think it can already be useful in some cases. Now, if we want to talk about a future patch series when the token is possibly used, fine, let's do it. Please consider the following: - If the token is passed to the remote through a remote helper, then it's up to the remote helper to pass and use it, and it could do so in many different ways that we can't anticipate and maybe we will not necessarily know about them (for privately developed and used remote helpers for example). - If we develop a standard way in Git to use such a token, then we will know about that standard way's use of the token at that time (not necessarily about how remote helpers in the wild might use it though) . But to develop such a standard way, it's better to already have such a token passed. - Suppose we require this standard way to be developed alongside a patch series such as this one which passes a token. The main issue is that this would prevent remote helpers from using such a token until that standard way is developed. And for now we don't really know if such a standard way is actually needed or if people will only or mostly use remote helpers. - What if this standard way to use a token is actually not needed because people use very different backends for large objects with very different remote helpers developed by different people? - There are not many technical similarities between code that passes a token and code that possibly creates it and then uses it. So I don't think there is anything technically that requires us to do these things in the same patch series. - There might be a catch 22 between a patch series such as this one and a patch series to develop a standard way to use a token. We might not accept patch series adding a token like this one because the token is not used yet, while a patch series to develop a standard way to access remote helpers using a token cannot work or even be developed because no token is passed yet. So I strongly suggest accepting that we pass a token right now, and not wait for possible ways to use it (other than deciding if the remote is accepted or not). If 'token' is not put into the promisor.sendFields config variable by servers, the only cost is that clients might check if 'token' is passed when servers advertise their promisor remotes. It's a very small cost to enable something likely very much needed and important for security reasons. > > +No other fields are defined by the protocol at this time. Clients SHOULD > > +ignore fields they don't recognize to allow for future protocol extensions. > > Shouldn't we require clients to ignore unknown fields? Otherwise, if > it's only optional to ignore them, we still can't introduce new fields > in the future without breaking existing clients that chose to ignore > this guidance. In v3 this is now: "No other fields are defined by the protocol at this time. Clients MUST ignore fields they don't recognize to allow for future protocol extensions." With "MUST" instead of "SHOULD", we now require clients to ignore unknown fields, so they shouldn't break if we introduce new fields. > > diff --git a/promisor-remote.c b/promisor-remote.c > > index 24d0e70132..70abec4c24 100644 > > --- a/promisor-remote.c > > +++ b/promisor-remote.c > > @@ -314,6 +314,84 @@ static int allow_unsanitized(char ch) > > return ch > 32 && ch < 127; > > } > > > > +/* > > + * List of field names allowed to be used in the "promisor-remote" > > + * protocol capability. Each field should correspond to a configurable > > + * property of a remote that can be relevant for the client. > > + */ > > +static const char *allowed_fields[] = { > > + "partialCloneFilter", /* Filter used for partial clone */ > > + "token", /* Authentication token for the remote */ > > + NULL > > +}; > > + > > +/* > > + * Check if 'field' is in the list of allowed field names for the > > + * "promisor-remote" protocol capability. > > + */ > > +static int is_allowed_field(const char *field) > > +{ > > + const char **p; > > + > > + for (p = allowed_fields; *p; p++) > > + if (!strcasecmp(*p, field)) > > + return 1; > > + return 0; > > +} > > Nit: it is a bit funny that we talk about allowed fields here, but > the recommendation is to just ignore unknown fields. So maybe this > should instead be called "known_fields". Fine, it's now 'known_fields' in v3. Thanks for the review!