Re: Perf bug: rev-list w/ 2+ paths relatively slow with commit-graph

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I see, more of a perf FR than a bug then.
I don't have much expertise here, but on the surface of it, it doesn't
seem to me like there would be any reason the algorithm couldn't check
each path's bloom filter in turn while searching, other than that this
would be a large and annoying change.

On Mon, Jun 23, 2025 at 3:36 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Kai Koponen <kaikoponen@xxxxxxxxxx> writes:
>
> > Reproduce steps:
> > ```
> > git clone https://github.com/golang/go.git
> > cd go
> > git config core.commitGraph true
> > git commit-graph write --split --reachable --changed-paths  # Without
> > this, all calls equally slow (~1s)
> > time git rev-list -10 3730814f2f2bf24550920c39a16841583de2dac1 --
> > src/clean.bash > /dev/null  # ~90ms
> > time git rev-list -10 3730814f2f2bf24550920c39a16841583de2dac1 --
> > src/Make.dist > /dev/null  # ~100ms
> > time git rev-list -10 3730814f2f2bf24550920c39a16841583de2dac1 --
> > src/clean.bash src/Make.dist > /dev/null  # ~650ms
> > ```
> >
> > The rev-list call with multiple paths takes over 3x longer than the
> > sum of individual calls to it for the same files.
> >
> > Expectation: rev-list with multiple paths should take <= the sum of
> > the time it takes to call it with each path individually (ideally <,
> > since with the count limit it should be able to early-exit and search
> > less commits for either path).
> >
> > Also reproduces without the -10 arg, or with a lower count (double
> > instead of triple w/ -1), but these results are perhaps most
> > surprising with a count present.
>
> I asked
>
>     How does "git log -- path" use the changed-paths bloom filter
>     stored in the commit-graph file?
>
> to https://deepwiki.com/git/git (there is a text field in the bottom
> of the page), and an early part of its answer explains why in a
> fairly convincing way ;-)
>
>     When you run git log -- path, Git first prepares to use bloom
>     filters in the prepare_to_use_bloom_filter function. This function:
>
>      1. Validates the pathspec - It calls forbid_bloom_filters to check
>         if bloom filters can be used revision.c:674-686 . Bloom filters
>         are disabled for wildcards, multiple paths, or complex pathspec
>         magic.
>
>      ...
>
> In short, the changed-path filter is used only when following
> pathspec with a single element that is not a wildcard.  So the
> observed result is (unfortunately) quite expected.
>





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux