Re: [PATCH] packfile: avoid access(3p) calls for missing packs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 20, 2025 at 08:45:06AM +0200, Patrick Steinhardt wrote:

> Taking a step back though: do we always ensure the order in which we
> those auxiliary files are created and deleted? If we know that those are
> always:
> 
>   - Moved into place before their ".idx" file.
> 
>   - Deleted after deleting their ".pack" file.
> 
> Then reordering may cause us to race indeed so that we see the packfile,
> but miss the auxiliary data structures. `unlink_pack_path()` does seem
> to ensure the latter property, as both ".idx" and ".pack" are deleted
> first. I'm not quite sure about the former, but it seems like we also do
> this.
> 
> So I think that the proposed patch is wrong. There should definitely be
> a comment in that function though to say that this order is intentional
> and not merely an optimization.

Hmm, interesting. Yes, I agree that deleting the .pack before any other
auxiliary files should catch the race, given the ordering on the read
side in add_packed_git().

But I don't think we ever discussed that ordering or its implications.
So I assumed all bets were off with respect to racing.

However, it seems we lucked into doing the correct thing by virtue of
adding to the end of the list in unlink_pack_path() repeatedly (though I
think if you dig all the way back into git-repack.sh, when there were
just .keep files, there were periods where we did it in the wrong
order).

But if we are doing it in the best order, whether correct or not, I
agree we should not disrupt that.

> > [1] There probably are optimization opportunities in add_packed_git(). I
> >     don't think re-ordering will help much in the common case that we
> >     actually do have the pack. But really, most callers do not care
> >     about these auxiliary files at all! We could simply skip them during
> >     the initial setup and lazy-load them via accessor functions.
> > 
> >     I _think_ that should be OK with respect to races. For a newly added
> >     pack, we know they will always be in place before the matching .idx
> >     file (per the logic I outlined above). For a pack that goes away, we
> >     might racily fail to see its auxiliary file. But that is mostly true
> >     now (we might see its .idx, and then the .promisor file is deleted
> >     before we call access()). It does increase the size of that window,
> >     though (and in particular lets it happen even if the pack has
> >     actually been opened).
> 
> I'm not sure it would be okay, as mentioned above. The current ordering
> ensures that we always see auxiliary data structures in case the ".pack"
> file exists. If we started to cache then that wouldn't be the case
> anymore.

Yeah, you're right. I thought it was already racy, but it luckily is
not. It does still feel unfortunate that we have to spend syscalls
loading information that won't be used by most commands, but it may not
be a measurable performance issue. I think if we did want to lazy-load
that information, we'd probably need to do an extra stat() of .pack
afterwards to check for the deletion race (so repack and pack-objects
would pay the penalty, but most other commands would not).

Anyway, that can be explored another day.

-Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux