On Wed, Jul 16, 2025 at 11:39:49AM +0200, René Scharfe wrote: > On 7/16/25 7:05 AM, Jeff King wrote: > > On Tue, Jul 15, 2025 at 04:51:07PM +0200, René Scharfe wrote: > > > >> pop_most_recent_commit() calls commit_list_insert_by_date(), which and > >> is itself called in a loop, which can lead to quadratic complexity. > >> Replace the commit_list with a prio_queue to ensure logarithmic worst > >> case complexity and convert all three users. > > > > I guess I'm cc'd because of my frequent complains about the quadratic > > nature of our commit lists? :) > > And because you introduced prio_queue. I think that was Junio, but I think I can be counted as a cheerleader for the topic. :) > > Mostly I asked because I had to look at pop_most_recent_commit() to see > > what operation would be made faster here. Looks like it's mostly ":/", > > but maybe also fetch's mark_recent_complete_commits()? I guess we might > > hit that if you have a huge number of refs? > > The :/ handling was the easiest to test for me. fetch_pack() and > walker_fetch() require some server side to set up, which seems not worth > it just to demonstrate quadratic behavior. Having thousands of refs > would make the list long enough to notice, as would having lots of > merges. OK, that makes sense. Just making sure I understand the benefits. > My general idea is to get rid of commit_list_insert_by_date() eventually > to avoid quadratic complexity. Yeah, it's certainly at the root of many such problems we've seen over the years. > > I actually have a series turning rev_info.commits into a prio_queue > > which I need to polish up (mostly just writing commit messages; I've > > been running with it for almost 2 years without trouble). Ironically it > > does not touch this spot, as these commit lists are formed on their own. > > That is not a coincidence. I had a look at that series and tried to > reach its goals while keeping rev_info.commits a commit_list. Why? > Mostly being vaguely uncomfortable with prio_queue' memory overhead, > lack of type safety and dual use as a stack. I still used it, but only > as local variable, not in the central struct rev_info. Hmm, I would have thought prio_queue had less memory overhead. You're spending one pointer per entry in a packed array, versus list nodes. But it's true that it doesn't shrink as items are removed (though that is something we _could_ implement). The dual use as a stack actually came in handy for my series, IIRC. There are spots which use a commit_list but care about a specific order, and my list/prio_queue conversion helpers use that to create a non-heap prio_queue that just returns the items in the original order (it's actually FIFO, but we can get that by reversing). I dunno. That's kind of horrible when I say it out loud, but it did make things work. I'm surprised that your attempt ended up with a performance hit when mine did not. Mine tried not to be clever, and even leaves in place a few spots where we convert between the two representations to satisfy various interfaces (with the goal that we'd probably eventually switch to prio_queue everywhere). -Peff