Re: [PATCH 2/2] builtin/pack-objects.c: freshen objects from existing cruft packs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 27, 2025 at 11:26:32AM -0800, Elijah Newren wrote:
> > However, this process breaks down when we attempt to freshen an object
> > packed in an earlier cruft pack that is larger than the threshold and
> > thus will survive the repack.
>
> ...packed in an earlier cruft pack, and that cruft pack is larger than
> the threshold...
>
> (Otherwise, it's unclear whether you are talking about the object or
> the cruft pack it is in being larger than the threshold.)

Good suggestion, thanks!

> > When this is the case, it is impossible to freshen objects in cruft
> > pack(s) which are larger than the threshold. This is because we avoid
> > writing them in the new cruft pack entirely, for a couple of reasons.
>
> ...freshen objects in cruft packs when those cruft packs are larger
> than the threshold...
>
> Again, just to clarify what thing is "larger".

Likewise, this makes sense as well, and I applied it in my local copy.

> Also, this paragraph while fine on its own is slightly unclear whether
> you are discussing pre-patch or post-patch state, which when reading
> the next two items causes some double takes.  Perhaps just spell it
> out slightly clearer here that for the next two enumerated items you
> are discussing the existing state previous to your changes?

I adjusted the paragraph before this one to make it a little clearer.
Instead of saying "However, [...]", I rewrote it as "Prior to this
patch, however, [...]".

> >  - exists in a non-cruft pack that we are retaining, regardless of that
> >    pack's mtime, or
> >
> >  - exists in a cruft pack with an mtime more recent than the copy we are
> >    debating whether or not to pack, in which case freshening would be
> >    redundant.
>
> s/more recent than/at least as recent as/ ?

Thanks for the careful read, and yes, the comparison here is a >= rather
than a strict >, and that difference is worth being precise about.

> >
> > To do this, keep track of whether or not we have any cruft packs in our
> > in-core kept list with a new 'ignore_packed_keep_in_core_has_cruft'
> > flag. When we end up in this new special case, we replace a call to
> > 'has_object_kept_pack()' to 'want_cruft_object_mtime()', and only
> > reject objects when we have a copy in an existing cruft pack with a more
> > recent mtime (in which case "freshening" would be redundant).
>
> Again, s/a more recent/at least as recent/ ?

I like this suggestion, but I think the wording ends up a little awkward
if applied as-is. I turned this sentence into:

  [...], and only reject objects when we have a copy in an existing
  cruft pack with at least as recent an mtime as our candidate (in which
  case "freshening" would be redundant).

Let me know what you think!
> > +test_expect_success '--max-cruft-size with freshened objects (previously cruft)' '
> > +       git init max-cruft-size-threshold &&
> > +       (
> > +               cd max-cruft-size-threshold &&
> > +
> > +               test_commit base &&
> > +               foo="$(generate_random_blob foo $((2*1024*1024)))" &&
> > +               bar="$(generate_random_blob bar $((2*1024*1024)))" &&
> > +               baz="$(generate_random_blob baz $((2*1024*1024)))" &&
> > +
> > +               test-tool chmtime --get -100000 \
> > +                       "$objdir/$(test_oid_to_path "$foo")" >foo.old &&
> > +               test-tool chmtime --get -100000 \
> > +                       "$objdir/$(test_oid_to_path "$bar")" >bar.old &&
> > +               test-tool chmtime --get -100000 \
> > +                       "$objdir/$(test_oid_to_path "$baz")" >baz.old &&
> > +
> > +               git repack --cruft -d &&
> > +
> > +               # Make a packed copy of object $foo with a more recent
> > +               # mtime.
>
> s/$foo/foo/ ?

Eh. $foo holds the OID of that blob, so "foo" on its own doesn't really
mean anything (even though the implicit meaning is clear from context).
I think changing it is fine (leaving it alone is equally fine in my
mind, but I don't feel strongly about it).

> > +               foo="$(generate_random_blob foo $((2*1024*1024)))" &&
>
> I thought this was creating a completely different foo, which would
> defeat the point of the test.  It might be worth adding a comment that
> because generate_random_blob uses a very simplistic and repeatable
> random character generator with the first argument as the seed, that
> this will regenerate the same loose object as above for foo.

I think the part of the comment which reads "packed copy of" makes it
clear-ish that we're creating an identical copy, but it doesn't hurt to
be more explicit here.

Thanks for the careful read!

Thanks,
Taylor




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux