On Wed, Jun 04, 2025 at 05:12:54PM -0700, Victoria Dye wrote: > > OK. I read over the discussion from last year, which I think mostly > > centered around this patch. I do still think in the long run it would be > > nice for cat-file to produce what output it _can_ for a missing object > > (e.g., the oid and mode). > > One way to handle that could be changing the message to something like: > > submodule SP <mode> SP <oid> Hmm, yeah. That seemed weird to me at first because it doesn't necessarily match what the caller asked for via batch-check. But really, mode and oid are the only reasonable things we could report anyway[1]. And the mode is implicit in the word "submodule", so really there is only the oid to report. [1] For now, at least. If we ever finally unify all of the various formatting code, then one might in theory be able to feed a refname to cat-file and print information about the ref, or perhaps other meta-information. But let's not worry about that hypothetical for now. > ...I suspect that'd be even less compatible with existing automation around > 'cat-file' than just swapping out "submodule" for "missing", and users can > theoretically infer that the mode is 160000 (S_IFGITLINK). That said, if at > some point in the future we support submodules with a different mode, then > an explicit value would be fairly useful. > > Happy to change it or keep it the same, I have no strong preference either > way. Right, that makes sense. I do wonder if: <oid> missing submodule might be friendlier to readers who are matching on /^[0-9a-f]+ missing/, but now I am just guessing at a hypothetical program. So it may not be worth going down that rabbit hole, and we can just go with what you posted. We can always worry about extending it later with an option to say "turn placeholders for missing objects into empty strings" or similar. I did come across one other interesting case while thinking about this, though. When running: git cat-file --batch-check='%(objectname) %(objectmode)' we do not need to access the object at all! So why does a submodule entry cause us to complain? The answer is that cat-file will (mostly for historical reasons) confirm the existence of the object name that is fed to it by calling oid_object_info(). The only exception is when we are doing --batch-all-objects, since there we know we have the object, because we found it by iterating the odb. And we optimize out the extra call for that case (which makes a big difference if you're just printing the object names). But since we don't expect submodule entries to exist in the first place, it might be reasonable to loosen that check. Something like this, though I think it could benefit from some refactoring: diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 4b23fcecbd..bb52d9b673 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -304,8 +304,20 @@ struct expand_data { * This flag will be true if the requested batch format and options * don't require us to call oid_object_info, which can then be * optimized out. + * + * The "submodule" variant is true if the format doesn't require it, + * but other options mean we'd usually continue to do so to check + * object existence. We can still omit the call for submodules in that + * case. + * + * This might be less confusing if we break skip_object_info down into + * two parts: + * - does the format require oid_object_info? + * - do the other options require checking existence? */ unsigned skip_object_info : 1; + unsigned skip_submodule_info : 1; + }; #define EXPAND_DATA_INIT { .mode = S_IFINVALID } @@ -477,7 +489,8 @@ static void batch_object_write(const char *obj_name, struct packed_git *pack, off_t offset) { - if (!data->skip_object_info) { + if (!(data->skip_object_info || + (data->skip_submodule_info && data->mode == S_IFGITLINK))) { int ret; if (use_mailmap || @@ -939,6 +952,12 @@ static int batch_objects(struct batch_options *opt) strbuf_release(&output); return 0; + } else { + struct object_info empty = OBJECT_INFO_INIT; + + if (!memcmp(&data.info, &empty, sizeof(empty)) && + opt->objects_filter.choice == LOFC_DISABLED) + data.skip_submodule_info = 1; } /* I don't think that needs to be part of your series, though. We'd still potentially need to handle the missing-submodule case for format requests that actually look at the object, which would hit the "<oid> submodule" case you're adding. So it could come later (or not at all), and it's probably only worth pursuing if it would make life easier for your intended caller. -Peff