Re: [PATCH v5 1/6] last-modified: new subcommand to show when files were last modified

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Taylor Blau <me@xxxxxxxxxxxx> writes:

> On Wed, Jul 16, 2025 at 03:35:13PM +0200, Toon Claes wrote:
>>  11 files changed, 549 insertions(+)
>>  create mode 100644 Documentation/git-last-modified.adoc
>>  create mode 100644 builtin/last-modified.c
>>  create mode 100755 t/t8020-last-modified.sh
>
> I'm admittedly not entirely sure what the best way to review this patch
> is given its size and my previous exposure to (similar) code.

Yeah, I wasn't sure how to approach this. I didn't want to come in with
a big bang with the final version, but give the reviewers the change to
see the improvements (and complexity) come in gradually.

>> diff --git a/builtin/last-modified.c b/builtin/last-modified.c
>> new file mode 100644
>> index 0000000000..63993bc1c9
>> --- /dev/null
>> +++ b/builtin/last-modified.c
>> @@ -0,0 +1,289 @@
>> +#include "git-compat-util.h"
>> +#include "builtin.h"
>> +#include "commit.h"
>> +#include "config.h"
>> +#include "diff.h"
>> +#include "diffcore.h"
>> +#include "hashmap.h"
>> +#include "hex.h"
>> +#include "log-tree.h"
>> +#include "object-name.h"
>> +#include "object.h"
>> +#include "parse-options.h"
>> +#include "quote.h"
>> +#include "repository.h"
>> +#include "revision.h"
>> +
>> +struct last_modified_entry {
>> +	struct hashmap_entry hashent;
>> +	struct object_id oid;
>> +	const char path[FLEX_ARRAY];
>> +};
>
> As a general comment on this patch, I am a little sad to see that many
> of the implementation details have been moved back into the builtin
> itself and not in their own last-modified.ch file(s).
>
> Apologies if this was already discussed earlier in the thread and I
> simply missed it, but can you comment on why the last-modified internals
> were moved into the builtin?

Wasn't discussed yet, and this only happened in this last version.
Basically my idea was: there's no one else using this, why put it at the
root level anyway? Also, it relies heavily on `setup_revisions()`. In my
first iterations `argc` and `argv` from the builtin were passed on
directly to the root-level `last-modified.[ch]` subsystem. This is a
little awkward, putting so much raw user-input handling in the
subsystem.


> Even in the earliest version of 'blame-tree' that I could find (from
> 26999d045b (add blame-tree command, 2012-10-20) in my fork) many of the
> internals were written in blame-tree.c instead of builtin/blame-tree.c.
>
>> +static int last_modified_entry_hashcmp(const void *unused UNUSED,
>> +				       const struct hashmap_entry *hent1,
>> +				       const struct hashmap_entry *hent2,
>> +				       const void *path)
>> +{
>> +	const struct last_modified_entry *ent1 =
>> +		container_of(hent1, const struct last_modified_entry, hashent);
>> +	const struct last_modified_entry *ent2 =
>> +		container_of(hent2, const struct last_modified_entry, hashent);
>> +	return strcmp(ent1->path, path ? path : ent2->path);
>> +}
>> +
>> +struct last_modified {
>> +	struct hashmap paths;
>> +	struct rev_info rev;
>> +	int recursive, tree_in_recursive;
>
> Can we either make these two part of a bitfield, or at least declare
> them separately?
>
>> +};
>> +
>> +static void last_modified_release(struct last_modified *lm)
>> +{
>> +	hashmap_clear_and_free(&lm->paths, struct last_modified_entry, hashent);
>> +	release_revisions(&lm->rev);
>> +}
>> +
>> +typedef void (*last_modified_callback)(const char *path,
>> +				       const struct commit *commit, void *data);
>> +
>> +struct last_modified_callback_data {
>> +	struct commit *commit;
>> +	struct hashmap *paths;
>> +
>> +	last_modified_callback callback;
>> +	void *callback_data;
>> +};
>
> I can't quite tell what the purpose of this struct is in conjunction
> with the last_modified_callback type above.

Yeah, this is kind of a remnant of when there was a last-modified
subsystem. In current implementation, where all code lives in the
builtin, there's no good reason to keep this callback struct.

> The last_modified_callback type makes sense as a generic callback
> function that callers can pass to get <path, commit> pairs, along with
> an arbitrary "data" pointer.
>
> But then you define a last_modified_callback_data struct that, which
> made me think that it would be used as the data type passed to the
> callback. In other words, given the existence of this struct, I would
> have expected the function pointer above to be defined like:
>
>     typedef void (*last_modified_callback)(const char *path,
>                                            const struct commit *commit,
>                                            struct last_modified_callback_data *data);
>
> But the fact that the _data struct contains a last_modified_callback
> function pointer gives us a hint at what's going on here. It seems like
> last_modified_callback_data is used to store some bookkeeping
> information and dispatch calls to the "callback" function pointer.
>
> I think that the fact the struct's name ends with "_data" is what is
> confusing to me. I think this would be a little clearer if you renamed
> this "struct last_modified_callback" and the function pointer to
> "last_modified_callback_fn" or similar.
>
> (The irony is not lost on me that these comments would be applicable to
> GitHub's version of this code, too :-s).

Hey, that's no excuse to keep it like this. I think keeping the callback
infrastructure depends on whether bring back the last-modified
subsystem. In that case, I will address your comments. If not, I think
we can get rid of it completely.


>> +static int populate_paths_from_revs(struct last_modified *lm)
>> +{
>> +	int num_interesting = 0;
>> +	struct diff_options diffopt;
>> +
>> +	memcpy(&diffopt, &lm->rev.diffopt, sizeof(diffopt));
>> +	copy_pathspec(&diffopt.pathspec, &lm->rev.diffopt.pathspec);
>> +	/*
>> +	 * Use a callback to populate the paths from revs
>> +	 */
>> +	diffopt.output_format = DIFF_FORMAT_CALLBACK;
>> +	diffopt.format_callback = add_path_from_diff;
>> +	diffopt.format_callback_data = lm;
>> +
>> +	for (size_t i = 0; i < lm->rev.pending.nr; i++) {
>> +		struct object_array_entry *obj = lm->rev.pending.objects + i;
>> +
>> +		if (obj->item->flags & UNINTERESTING)
>> +			continue;
>> +
>> +		if (num_interesting++)
>> +			return error(_("can only get last-modified one tree at a time"));
>
> This error text is a little difficult to parse, but I'm not sure that I
> have a great suggestion for improving it. The equivalent from GitHub's
> fork is "can only blame one tree at a time", and I think the difficulty
> in parsing is that "last-modified" isn't a verb.

Oh yeah, I've been struggling with that myself as well. I'm open to a
rename, if you've got a better name?

-- 
Cheers,
Toon




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux