Re: [PATCH] packfile: avoid access(3p) calls for missing packs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 19, 2025 at 02:52:21AM -0400, Jeff King wrote:
> On Fri, May 16, 2025 at 11:34:10AM -0700, Junio C Hamano wrote:
>
> > > The reason why we do this is likely because we reuse `p->pack_name` to
> > > derive the other files' names, as well, so that we only have to allocate
> > > this buffer once. As such, we have to compute the packfile name last so
> > > that it doesn't get overwritten by the other names.
> >
> > I vaguely recall that the order of checks between .idx and .pack
> > were deliberately chosen, as we do not want to add and attempt to
> > use .pack file before its associated .idx file is ready since
> > without the latter the former is unusable at runtime.
>
> It is definitely intentional and important that we discover packs by
> their .idx file first. The writing side uses the creation of the .idx
> file (via atomic rename) as the final step in making a pack accessible.
> Before that, a reader could see a .pack without its various accessory
> files (and so not realize it has a .keep file, for example).

Yup. Or seeing a pack before its .bitmap is written, in which case any
clones or fetches unfortunate enough to take place after the pack was
read but before the .bitmap was written  would be significantly slower
than they otherwise would have been with bitmaps.

> > > All of this likely doesn't matter in the general case: Git shouldn't end
> > > up looking too many nonexistent packfiles anyway.
> >
> > In any case, we shouldn't attempt to use .pack when its associated
> > files are missing.  It is not about nonexistent but about incomplete
> > (including "in the process of being written").
>
> This shouldn't be a race for files being written. The .idx is written
> last, so the .pack should always be there. And the midx is written after
> that, so again, the pack should always be there. At least for newly
> written packs.
>
> We might see the opposite: packs going away because they were repacked
> (and perhaps even a new midx generated). In that case you might see
> a race like:
>
>   - process R opens the existing midx, which mentions pack P (but does
>     not yet open P)
>
>   - process W repacks, deleting P and generating a new midx with P'
>
>   - process R wants object X; the midx claims it is in pack P, so it
>     tries to open that but fails
>
> What should happen here is that R will fail to find the object at all,
> should hit reprepare_packed_git(), and will then pick up the new pack.
>
> I don't recall offhand whether reprepare_packed_git() will open the new
> midx. If it doesn't, we'd still find the object via the actual pack
> .idx, but we may end up consulting the now-stale midx over and over (so
> we'll get the right answer, but there may be some room for
> optimization).

This all sounds right to me. And we do end up loading a new MIDX via
reprepare_packed_git(): that function calls prepare_packed_git() (which
doesn't immediately return, since we just zero'd out the
r->objects->packed_git_initialized flag). We then enumerate the ODBs,
calling prepare_multi_pack_index_one() and prepare_packed_git_one() for
each.


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux