Taylor Blau <me@xxxxxxxxxxxx> writes: > On Wed, Jul 16, 2025 at 03:35:13PM +0200, Toon Claes wrote: >> 11 files changed, 549 insertions(+) >> create mode 100644 Documentation/git-last-modified.adoc >> create mode 100644 builtin/last-modified.c >> create mode 100755 t/t8020-last-modified.sh > > I'm admittedly not entirely sure what the best way to review this patch > is given its size and my previous exposure to (similar) code. Yeah, I wasn't sure how to approach this. I didn't want to come in with a big bang with the final version, but give the reviewers the change to see the improvements (and complexity) come in gradually. >> diff --git a/builtin/last-modified.c b/builtin/last-modified.c >> new file mode 100644 >> index 0000000000..63993bc1c9 >> --- /dev/null >> +++ b/builtin/last-modified.c >> @@ -0,0 +1,289 @@ >> +#include "git-compat-util.h" >> +#include "builtin.h" >> +#include "commit.h" >> +#include "config.h" >> +#include "diff.h" >> +#include "diffcore.h" >> +#include "hashmap.h" >> +#include "hex.h" >> +#include "log-tree.h" >> +#include "object-name.h" >> +#include "object.h" >> +#include "parse-options.h" >> +#include "quote.h" >> +#include "repository.h" >> +#include "revision.h" >> + >> +struct last_modified_entry { >> + struct hashmap_entry hashent; >> + struct object_id oid; >> + const char path[FLEX_ARRAY]; >> +}; > > As a general comment on this patch, I am a little sad to see that many > of the implementation details have been moved back into the builtin > itself and not in their own last-modified.ch file(s). > > Apologies if this was already discussed earlier in the thread and I > simply missed it, but can you comment on why the last-modified internals > were moved into the builtin? Wasn't discussed yet, and this only happened in this last version. Basically my idea was: there's no one else using this, why put it at the root level anyway? Also, it relies heavily on `setup_revisions()`. In my first iterations `argc` and `argv` from the builtin were passed on directly to the root-level `last-modified.[ch]` subsystem. This is a little awkward, putting so much raw user-input handling in the subsystem. > Even in the earliest version of 'blame-tree' that I could find (from > 26999d045b (add blame-tree command, 2012-10-20) in my fork) many of the > internals were written in blame-tree.c instead of builtin/blame-tree.c. > >> +static int last_modified_entry_hashcmp(const void *unused UNUSED, >> + const struct hashmap_entry *hent1, >> + const struct hashmap_entry *hent2, >> + const void *path) >> +{ >> + const struct last_modified_entry *ent1 = >> + container_of(hent1, const struct last_modified_entry, hashent); >> + const struct last_modified_entry *ent2 = >> + container_of(hent2, const struct last_modified_entry, hashent); >> + return strcmp(ent1->path, path ? path : ent2->path); >> +} >> + >> +struct last_modified { >> + struct hashmap paths; >> + struct rev_info rev; >> + int recursive, tree_in_recursive; > > Can we either make these two part of a bitfield, or at least declare > them separately? > >> +}; >> + >> +static void last_modified_release(struct last_modified *lm) >> +{ >> + hashmap_clear_and_free(&lm->paths, struct last_modified_entry, hashent); >> + release_revisions(&lm->rev); >> +} >> + >> +typedef void (*last_modified_callback)(const char *path, >> + const struct commit *commit, void *data); >> + >> +struct last_modified_callback_data { >> + struct commit *commit; >> + struct hashmap *paths; >> + >> + last_modified_callback callback; >> + void *callback_data; >> +}; > > I can't quite tell what the purpose of this struct is in conjunction > with the last_modified_callback type above. Yeah, this is kind of a remnant of when there was a last-modified subsystem. In current implementation, where all code lives in the builtin, there's no good reason to keep this callback struct. > The last_modified_callback type makes sense as a generic callback > function that callers can pass to get <path, commit> pairs, along with > an arbitrary "data" pointer. > > But then you define a last_modified_callback_data struct that, which > made me think that it would be used as the data type passed to the > callback. In other words, given the existence of this struct, I would > have expected the function pointer above to be defined like: > > typedef void (*last_modified_callback)(const char *path, > const struct commit *commit, > struct last_modified_callback_data *data); > > But the fact that the _data struct contains a last_modified_callback > function pointer gives us a hint at what's going on here. It seems like > last_modified_callback_data is used to store some bookkeeping > information and dispatch calls to the "callback" function pointer. > > I think that the fact the struct's name ends with "_data" is what is > confusing to me. I think this would be a little clearer if you renamed > this "struct last_modified_callback" and the function pointer to > "last_modified_callback_fn" or similar. > > (The irony is not lost on me that these comments would be applicable to > GitHub's version of this code, too :-s). Hey, that's no excuse to keep it like this. I think keeping the callback infrastructure depends on whether bring back the last-modified subsystem. In that case, I will address your comments. If not, I think we can get rid of it completely. >> +static int populate_paths_from_revs(struct last_modified *lm) >> +{ >> + int num_interesting = 0; >> + struct diff_options diffopt; >> + >> + memcpy(&diffopt, &lm->rev.diffopt, sizeof(diffopt)); >> + copy_pathspec(&diffopt.pathspec, &lm->rev.diffopt.pathspec); >> + /* >> + * Use a callback to populate the paths from revs >> + */ >> + diffopt.output_format = DIFF_FORMAT_CALLBACK; >> + diffopt.format_callback = add_path_from_diff; >> + diffopt.format_callback_data = lm; >> + >> + for (size_t i = 0; i < lm->rev.pending.nr; i++) { >> + struct object_array_entry *obj = lm->rev.pending.objects + i; >> + >> + if (obj->item->flags & UNINTERESTING) >> + continue; >> + >> + if (num_interesting++) >> + return error(_("can only get last-modified one tree at a time")); > > This error text is a little difficult to parse, but I'm not sure that I > have a great suggestion for improving it. The equivalent from GitHub's > fork is "can only blame one tree at a time", and I think the difficulty > in parsing is that "last-modified" isn't a verb. Oh yeah, I've been struggling with that myself as well. I'm open to a rename, if you've got a better name? -- Cheers, Toon