Re: [RFC PATCH 1/2] fetch-prune: optimize dangling-ref reporting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 18, 2025 at 02:08:39PM -0700, Phil Hord wrote:

> From: Phil Hord <phil.hord@xxxxxxxxx>
> 
> When pruning during `git fetch` we check each pruned ref against the
> ref_store one at a time to decide whether to report it as dangling.
> This causes every local ref to be scanned for each ref being pruned.
> 
> If there are N refs in the repo and M refs being pruned, this code is
> O(M*N). However, `git remote prune` uses a very similar function that
> is only O(N*log(M)).
> 
> Remove the wasteful ref scanning for each pruned ref and use the faster
> version already available in refs_warn_dangling_symrefs.
> 
> In a repo with 126,000 refs, where I was pruning 28,000 refs, this
> code made about 3.6 billion calls to strcmp and consumed 410 seconds
> of CPU. (Invariably in that time, my remote would timeout and the
> fetch would fail anyway.)
> 
> After this change, the same operation completes in under 4 seconds.

Very nice. I left some thoughts on the ordering question elsewhere, but
I'd be OK with this approach, too.

> I considered further optimizing this function to be O(N), but this
> requires ref_store iterators to be sorted, too. I found some suggestions
> that this is always the case, but I'm not certain it is.
> 
> The current speedup is enough for our needs at the moment.

I think we do guarantee the output order, and for-each-ref at least
takes advantage of this since 2e7c6d2f41 (ref-filter: format iteratively
with lexicographic refname sorting, 2024-10-21).

That said, I wouldn't be surprised if there are other n-log-n bits of
the code, and we usually consider that "good enough". So stopping here
is probably fine.

> +	struct string_list refnames = STRING_LIST_INIT_NODUP;
>  	const char *dangling_msg = dry_run
>  		? _("   (%s will become dangling)")
>  		: _("   (%s has become dangling)");
>  
> +	for (ref = stale_refs; ref; ref = ref->next)
> +		string_list_append(&refnames, ref->name);

I was going to suggest using strset over string_list, since I think we
prefer that these days for a simple set-inclusion check. But...

> +		string_list_sort(&refnames);
> +		refs_warn_dangling_symrefs(get_main_ref_store(the_repository),
> +					   stderr, dangling_msg, &refnames);

...we are ultimately relying on refs_warn_dangling_symrefs(), so we'd
have to update its interface. And we also reuse the list (here, after
your patch, but already in remote.c) to pass to refs_delete_refs(). So
probably not worth it.

-Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux