On Fri, May 16, 2025 at 12:49:53AM -0400, Jeff King wrote: > When fsck-ing a loose object, we use object_info's type_name strbuf to > record the parsed object type as a string. For most objects this is > redundant with the object_type enum, but it does let us report the > string when we encounter an object with an unknown type (for which there > is no matching enum value). > > There are a few downsides, though: > > 1. The code to report these cases is not actually robust. Since we did > not pass a strbuf to unpack_loose_header(), we only retrieved types > from headers up to 32 bytes. In longer cases, we'd simply say > "object corrupt or missing". > > 2. This is the last caller that uses object_info's type_name strbuf > support. It would be nice to refactor it so that we can simplify > that code. > > 3. Likewise, we'll check the hash of the object using its unknown type > (again, as long as that type is short enough). That depends on the > hash_object_file_literally() code, which we'd eventually like to > get rid of. Oh, I'd very much welcome if this code path went away completely. > So we can simplify things by bailing immediately in read_loose_object() > when we encounter an unknown type. This has a few user-visible effects: > > a. Instead of producing a single line of error output like this: > > error: 26ed13ce3564fbbb44e35bde42c7da717ea004a6: object is of unknown type 'bogus': .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6 > > we'll now issue two lines (the first from read_loose_object() when > we see the unparsable header, and the second from the fsck code, > since we couldn't read the object): > > error: unable to parse type from header 'bogus 4' of .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6 > error: 26ed13ce3564fbbb44e35bde42c7da717ea004a6: object corrupt or missing: .git/objects/26/ed13ce3564fbbb44e35bde42c7da717ea004a6 > > This is a little more verbose, but this sort of error should be > rare (such objects are almost impossible to work with, and cannot > be transferred between repositories as they are not representable > in packfiles). And as a bonus, reporting the broken header in full > could help with debugging other cases (e.g., a header like "blob > xyzzy\0" would fail in parsing the size, but previously we'd not > have showed the offending bytes). Yup, I would claim this is an improvement, as well. > b. An object with an unknown type will be reported as corrupt, without > actually doing a hash check. Again, I think this is unlikely to > matter in practice since such objects are totally unusable. Agreed. > We'll update one fsck test to match the new error strings. And we can > remove another test that covered the case of an object with an unknown > type _and_ a hash corruption. Since we'll skip the hash check now in > this case, the test is no longer interesting. > > Signed-off-by: Jeff King <peff@xxxxxxxx> > --- > builtin/fsck.c | 13 ++----------- > object-file.c | 12 +++++++++--- > t/t1450-fsck.sh | 29 +++-------------------------- > 3 files changed, 14 insertions(+), 40 deletions(-) > > diff --git a/builtin/fsck.c b/builtin/fsck.c > index 6cac28356c..e7d96a9c8e 100644 > --- a/builtin/fsck.c > +++ b/builtin/fsck.c > @@ -614,12 +614,11 @@ static void get_default_heads(void) > struct for_each_loose_cb > { > struct progress *progress; > - struct strbuf obj_type; > }; > > -static int fsck_loose(const struct object_id *oid, const char *path, void *data) > +static int fsck_loose(const struct object_id *oid, const char *path, > + void *data UNUSED) > { > - struct for_each_loose_cb *cb_data = data; > struct object *obj; > enum object_type type = OBJ_NONE; > unsigned long size; > @@ -629,8 +628,6 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data) > struct object_id real_oid = *null_oid(the_hash_algo); > int err = 0; > > - strbuf_reset(&cb_data->obj_type); > - oi.type_name = &cb_data->obj_type; > oi.sizep = &size; > oi.typep = &type; > > @@ -642,10 +639,6 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data) > err = error(_("%s: object corrupt or missing: %s"), > oid_to_hex(oid), path); > } > - if (type != OBJ_NONE && type < 0) > - err = error(_("%s: object is of unknown type '%s': %s"), > - oid_to_hex(&real_oid), cb_data->obj_type.buf, > - path); This one is a bit curious. But because we know that we have reported this error in `read_loose_object()` already we don't need to print the error over here anymore. > diff --git a/object-file.c b/object-file.c > index 1127e154f6..7a35bde96e 100644 > --- a/object-file.c > +++ b/object-file.c > @@ -1662,6 +1662,12 @@ int read_loose_object(const char *path, > goto out_inflate; > } > > + if (*oi->typep < 0) { > + error(_("unable to parse type from header '%s' of %s"), > + hdr, path); > + goto out_inflate; > + } > + > if (*oi->typep == OBJ_BLOB && > *size > repo_settings_get_big_file_threshold(the_repository)) { > if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0) So this is where we report the new error now. Makes sense. Patrick