Re: [PATCH 4/4] bundle-uri: enable git-remote-https progress

Jeff King <peff@xxxxxxxx> · Fri, 21 Feb 2025 02:36:05 -0500

On Fri, Feb 14, 2025 at 12:26:03PM +0100, Toon Claes wrote:

> I've been playing around with things and I haven't found a good way
> forward with this. We could have the parent process ingest stderr of
> git-remote-https and swallow messages that match `/^fatal:/`, but that
> feels like a hack and not foolproof.

Yeah, agree that feels quite hacky (and you'd probably want to swallow
/^error:/, /^warning:/, etc, too).

> I was thinking if we could override `die()` in the child process to have
> it not print anything, but because git-remote-http basically can call
> die() basically from anywhere in the codebase, I don't think we can
> ensure the silenced die() function is called.
> 
> Or what do you mean by "squelch non-progress errors"?

I was thinking of having some kind of "very quiet" mode where
git-remote-http would not print any errors (except for progress). But I
agree that doing it is non-trivial. Our die/warning/error functions are
all pluggable, so remote-http could add its own implementations using
set_die_routine(), etc.

But that does feel pretty heavyweight, and you'd still have to pass
through the "please suppress all your die calls" option into
remote-http. Plus it wouldn't catch any spots in the code that happen to
call fprintf(stderr), etc.

> And yes, sending progress logging over a separate fd seems like the
> ideal approach, but I haven't tried it yet. I'm afraid it's not worth
> attempting so.
> 
> So I think that leaves us with your suggestion to "ferry
> machine-readable output back to the parent". If I understand correctly
> you mean the child process will not write progress logging to stderr but
> to stdout (with some kind of command prefix the parent process knows
> what to do with this)?

Exactly.

> I imagine communication between parent and child will then look
> something like this:
> 
> -> capabilities
> <- stateless-connect
> <- fetch
> <- get
> <- option
> <- push
> <- check-connectivity
> <- object-format
> -> option progress true
> <- ok
> -> get http://example.com git.bundle
> <- progress 123 345 40
> <- progress 234 345 50
> <- progress 345 345 40
> 
> ~fin~
> 
> But then we need to decide on the format the child sends back to the
> parent. In the above example it's something like `progress <size>
> <total> <throughput>`. An alternative proposal could be:
> 
> <- log Downloading via HTTP: 
> <- log Downloading via HTTP: 200.00 KiB | 100.00 KiB/s
> <- log Downloading via HTTP: 300.00 KiB | 100.00 KiB/s
> <- log Downloading via HTTP: 400.00 KiB | 100.00 KiB/s
> <- log Downloading via HTTP: 400.00 KiB | 100.00 KiB/s, done.
> 
> So the child sends the progress text with a `log` prefix, which the
> parent simply has to send that logging to where it wants it to go.
> 
> Or am I completely misunderstanding your proposal? Do you maybe happen
> to have any examples of a similar solution?

No, I think you understand it perfectly.

Your "log" example with arbitrary text seems like the simplest approach,
and might be enough. Then we wouldn't have to define a schema for
progress numbers (and we have at least two types of progress: counts and
throughput, though I guess maybe this would always be throughput?).

But I do wonder if we'd want the flexibility of the machine-readable
numbers. In particular, would we ever fetch multiple bundle-uri files in
parallel? If so, then we wouldn't want them stomping on each other's
progress. You'd probably want the caller to present a unified view based
on the progress reports from all of the child processes.

-Peff