Hey Derrick, Appreciate your quick response! I addressed the nits, and will submit the new patch shortly. > This seems to be a performance-only fix Indeed, and it's more impactful for the specific setup we have inside our org. In short, we have a custom implementation of the sparse checkout logic that makes sure to leave files outside of sparse cones present on disk. Because of this, git with sparse index enabled frequently resorted to a full index expansion despite 'sparse.expectFilesOutsideOfPatterns=true' config set which, I assume, was only made working with the full index. This problem specifically was tricky to catch since multithreaded lazy load logic applies only after a certain limit of objects present in the index. I hope I will be able to send more patches soon to fix-up bits and pieces of the 'sparse.expectFilesOutsideOfPatterns' flag with respect to other expansions. I'm planning to guard those under the same configuration option, let me know if you think it'd be better to introduce a new option. On Wed, May 21, 2025 at 7:17 PM Derrick Stolee <stolee@xxxxxxxxx> wrote: > > On 5/21/2025 7:40 AM, Alex Mironov via GitGitGadget wrote: > > From: Alex Mironov <alexandrfox@xxxxxxxxx> > > > > Similarly to 5f116695864788d1fe45ff06bfad7a71a8d98d0a > > nit: we typically use the "reference" style to refer to other > commits, use 'git log -1 --pretty=reference <oid>' to get output > like this: > > 5f116695864 (name-hash: don't add directories to name_hash, 2021-04-12) > > > make sure to avoid placing sparse directories into the name_hash > > hashtable whenever multithreaded initialization is performed. > > > > Sparse directory entries represent a directory that is outside the > > sparse-checkout definition. These are not paths to blobs, so should not > > be added to the name_hash table as they must never be queried. > > > > Signed-off-by: Alex Mironov <alexandrfox@xxxxxxxxx> > > --- > > name-hash: don't add sparse directories in threaded lazy init > > > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1970%2Falexandrfox%2Ffix-threaded-hash-name-v1 > > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1970/alexandrfox/fix-threaded-hash-name-v1 > > Pull-Request: https://github.com/git/git/pull/1970 > > > > name-hash.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/name-hash.c b/name-hash.c > > index d66de1cdfd5..03123a8779a 100644 > > --- a/name-hash.c > > +++ b/name-hash.c > > @@ -492,6 +492,9 @@ static void *lazy_name_thread_proc(void *_data) > > for (k = 0; k < d->istate->cache_nr; k++) { > > struct cache_entry *ce_k = d->istate->cache[k]; > > ce_k->ce_flags |= CE_HASHED; > > + if (S_ISSPARSEDIR(ce_k->ce_mode)) { > > + continue; > > + } > > nit: for one-line blocks, we usually skip the braces. But I think > that it might be better to reverse the logic to get something like: > > if (!S_ISSPARSEDIR(ce_k->ce_mode) { > hashmap_entry_init(&ce_k->ent, d->lazy_entries[k].hash_name); > hashmap_add(&d->istate->name_hash, &ce_k->ent); > } > > This seems to be a performance-only fix, and it might be interesting > to see if there is any impact on p2000-sparse-operations.sh. Those > tests don't focus on many sparse-directory entries, so that may not > demonstrate any meaningful difference. > > Thanks, > -Stolee > -- Best, Alex Mironov