Re: [PATCH] name-hash: don't add sparse directories in threaded lazy init

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Derrick,

Appreciate your quick response! I addressed the nits, and will submit
the new patch shortly.

> This seems to be a performance-only fix

Indeed, and it's more impactful for the specific setup we have inside
our org. In short, we have a custom implementation of the sparse
checkout logic that makes sure to leave files outside of sparse cones
present on disk. Because of this, git with sparse index enabled
frequently resorted to a full index expansion despite
'sparse.expectFilesOutsideOfPatterns=true' config set which, I assume,
was only made working with the full index. This problem specifically
was tricky to catch since multithreaded lazy load logic applies only
after a certain limit of objects present in the index.

I hope I will be able to send more patches soon to fix-up bits and
pieces of the 'sparse.expectFilesOutsideOfPatterns' flag with respect
to other expansions. I'm planning to guard those under the same
configuration option, let me know if you think it'd be better to
introduce a new option.


On Wed, May 21, 2025 at 7:17 PM Derrick Stolee <stolee@xxxxxxxxx> wrote:
>
> On 5/21/2025 7:40 AM, Alex Mironov via GitGitGadget wrote:
> > From: Alex Mironov <alexandrfox@xxxxxxxxx>
> >
> > Similarly to 5f116695864788d1fe45ff06bfad7a71a8d98d0a
>
> nit: we typically use the "reference" style to refer to other
> commits, use 'git log -1 --pretty=reference <oid>' to get output
> like this:
>
>   5f116695864 (name-hash: don't add directories to name_hash, 2021-04-12)
>
> > make sure to avoid placing sparse directories into the name_hash
> > hashtable whenever multithreaded initialization is performed.
> >
> > Sparse directory entries represent a directory that is outside the
> > sparse-checkout definition. These are not paths to blobs, so should not
> > be added to the name_hash table as they must never be queried.
> >
> > Signed-off-by: Alex Mironov <alexandrfox@xxxxxxxxx>
> > ---
> >     name-hash: don't add sparse directories in threaded lazy init
> >
> > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1970%2Falexandrfox%2Ffix-threaded-hash-name-v1
> > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1970/alexandrfox/fix-threaded-hash-name-v1
> > Pull-Request: https://github.com/git/git/pull/1970
> >
> >  name-hash.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/name-hash.c b/name-hash.c
> > index d66de1cdfd5..03123a8779a 100644
> > --- a/name-hash.c
> > +++ b/name-hash.c
> > @@ -492,6 +492,9 @@ static void *lazy_name_thread_proc(void *_data)
> >       for (k = 0; k < d->istate->cache_nr; k++) {
> >               struct cache_entry *ce_k = d->istate->cache[k];
> >               ce_k->ce_flags |= CE_HASHED;
> > +             if (S_ISSPARSEDIR(ce_k->ce_mode)) {
> > +                     continue;
> > +             }
>
> nit: for one-line blocks, we usually skip the braces. But I think
> that it might be better to reverse the logic to get something like:
>
>         if (!S_ISSPARSEDIR(ce_k->ce_mode) {
>                 hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name);
>                 hashmap_add(&d->istate->name_hash, &ce_k->ent);
>         }
>
> This seems to be a performance-only fix, and it might be interesting
> to see if there is any impact on p2000-sparse-operations.sh. Those
> tests don't focus on many sparse-directory entries, so that may not
> demonstrate any meaningful difference.
>
> Thanks,
> -Stolee
>


-- 
Best,
Alex Mironov





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux