As I mentioned in my response to your the cover letter, I would be more than happy to help you with an effort to introduce those optimizations on top. > diff --git a/builtin/last-modified.c b/builtin/last-modified.c > new file mode 100644 > index 0000000000..63993bc1c9 > --- /dev/null > +++ b/builtin/last-modified.c > @@ -0,0 +1,289 @@ > +#include "git-compat-util.h" > +#include "builtin.h" > +#include "commit.h" > +#include "config.h" > +#include "diff.h" > +#include "diffcore.h" > +#include "hashmap.h" > +#include "hex.h" > +#include "log-tree.h" > +#include "object-name.h" > +#include "object.h" > +#include "parse-options.h" > +#include "quote.h" > +#include "repository.h" > +#include "revision.h" > + > +struct last_modified_entry { > + struct hashmap_entry hashent; > + struct object_id oid; > + const char path[FLEX_ARRAY]; > +}; As a general comment on this patch, I am a little sad to see that many of the implementation details have been moved back into the builtin itself and not in their own last-modified.ch file(s). Apologies if this was already discussed earlier in the thread and I simply missed it, but can you comment on why the last-modified internals were moved into the builtin? Even in the earliest version of 'blame-tree' that I could find (from 26999d045b (add blame-tree command, 2012-10-20) in my fork) many of the internals were written in blame-tree.c instead of builtin/blame-tree.c. > +static int last_modified_entry_hashcmp(const void *unused UNUSED, > + const struct hashmap_entry *hent1, > + const struct hashmap_entry *hent2, > + const void *path) > +{ > + const struct last_modified_entry *ent1 = > + container_of(hent1, const struct last_modified_entry, hashent); > + const struct last_modified_entry *ent2 = > + container_of(hent2, const struct last_modified_entry, hashent); > + return strcmp(ent1->path, path ? path : ent2->path); > +} > + > +struct last_modified { > + struct hashmap paths; > + struct rev_info rev; > + int recursive, tree_in_recursive; Can we either make these two part of a bitfield, or at least declare them separately? > +}; > + > +static void last_modified_release(struct last_modified *lm) > +{ > + hashmap_clear_and_free(&lm->paths, struct last_modified_entry, hashent); > + release_revisions(&lm->rev); > +} > + > +typedef void (*last_modified_callback)(const char *path, > + const struct commit *commit, void *data); > + > +struct last_modified_callback_data { > + struct commit *commit; > + struct hashmap *paths; > + > + last_modified_callback callback; > + void *callback_data; > +}; I can't quite tell what the purpose of this struct is in conjunction with the last_modified_callback type above. The last_modified_callback type makes sense as a generic callback function that callers can pass to get <path, commit> pairs, along with an arbitrary "data" pointer. But then you define a last_modified_callback_data struct that, which made me think that it would be used as the data type passed to the callback. In other words, given the existence of this struct, I would have expected the function pointer above to be defined like: typedef void (*last_modified_callback)(const char *path, const struct commit *commit, struct last_modified_callback_data *data); But the fact that the _data struct contains a last_modified_callback function pointer gives us a hint at what's going on here. It seems like last_modified_callback_data is used to store some bookkeeping information and dispatch calls to the "callback" function pointer. I think that the fact the struct's name ends with "_data" is what is confusing to me. I think this would be a little clearer if you renamed this "struct last_modified_callback" and the function pointer to "last_modified_callback_fn" or similar. (The irony is not lost on me that these comments would be applicable to GitHub's version of this code, too :-s). > +static int populate_paths_from_revs(struct last_modified *lm) > +{ > + int num_interesting = 0; > + struct diff_options diffopt; > + > + memcpy(&diffopt, &lm->rev.diffopt, sizeof(diffopt)); > + copy_pathspec(&diffopt.pathspec, &lm->rev.diffopt.pathspec); > + /* > + * Use a callback to populate the paths from revs > + */ > + diffopt.output_format = DIFF_FORMAT_CALLBACK; > + diffopt.format_callback = add_path_from_diff; > + diffopt.format_callback_data = lm; > + > + for (size_t i = 0; i < lm->rev.pending.nr; i++) { > + struct object_array_entry *obj = lm->rev.pending.objects + i; > + > + if (obj->item->flags & UNINTERESTING) > + continue; > + > + if (num_interesting++) > + return error(_("can only get last-modified one tree at a time")); This error text is a little difficult to parse, but I'm not sure that I have a great suggestion for improving it. The equivalent from GitHub's fork is "can only blame one tree at a time", and I think the difficulty in parsing is that "last-modified" isn't a verb. > +static void mark_path(const char *path, const struct object_id *oid, > + struct last_modified_callback_data *data) > +{ > + struct last_modified_entry *ent; > + > + /* Is it even a path that we are interested in? */ > + ent = hashmap_get_entry_from_hash(data->paths, strhash(path), path, > + struct last_modified_entry, hashent); > + if (!ent) > + return; > + > + /* > + * Is it arriving at a version of interest, or is it from a side branch > + * which did not contribute to the final state? > + */ > + if (!oideq(oid, &ent->oid)) > + return; GitHub's fork writes this as "if (oid && !oideq(oid, &ent->oid))", but the commit that introduces the "oid &&" portion of that expression doesn't provide us with any clues as to why the change was necessary. Since you have spent more time with these patches than I have recently, perhaps you can help shed some light on what's going on here? The rest of the code roughly matches my memory of the early versions of this command. Thanks, Taylor