[PATCH] packfile: avoid access(3p) calls for missing packs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The function `add_packed_git()` adds a packfile to the known list of
packs in case it exists. In case the packfile doesn't look like a pack
though, or in case it doesn't exist, the function will simply return a
`NULL` pointer.

The ordering in which those checks are done is somewhat curious though:
we first perform a couple of access(3p) syscalls for auxiliary files
like ".keep" before we determine whether the ".pack" file itself exists.
And if we see that the ".pack" file does not exist, we bail out and thus
effectively discard all the information. This means that the access(3p)
calls were a complete waste of time.

The reason why we do this is likely because we reuse `p->pack_name` to
derive the other files' names, as well, so that we only have to allocate
this buffer once. As such, we have to compute the packfile name last so
that it doesn't get overwritten by the other names.

All of this likely doesn't matter in the general case: Git shouldn't end
up looking too many nonexistent packfiles anyway. But there are edge
cases where it _does_ matter. One such edge case that we have hit in our
production systems was a stale "multi-pack-index" file that contained
references to nonexistent packfiles. As there is no negative lookup
cache this causes us to repeatedly look up the same packfiles via the
multi-pack index, only to realize that they don't exist. This translated
into hundreds of thousands of syscalls to access(3p) and stat(3p).

While the issue was entirely self-made because the multi-pack index
should have been regenerated, we can still reduce the number of syscalls
by 75% in the case of nonexistent packfiles by reordering these calls.
This requires us to restore the final packfile name after the access(3p)
calls. But given that this is a mere memory copy it is very unlikely to
matter in comparison to the four syscalls we performed beforehand.

Another mitigation would be to introduce a negative lookup cache so that
we don't try to add missing packfiles repeatedly. But that refactoring
would be a bit more involved for dubious gains, so for now the author
has decided against it.

Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
---
Hi,

this patch addresses an issue we have recently seen in our production
systems due to a stale MIDX. The MIDX contained entries for packfiles
that didn't exist anymore, which caused Git to repeatedly look up those
packfiles. Each missing packfile resulted in four repeated syscalls:
three access(3p) calls to check for supporting data structures, and one
call to stat(3p) to check for the packfile itself. The first three calls
are essentially wasted though when the stat(3p) call itself fails, which
is being fixed by this patch.

I doubt that the patch matters in almost any repository, but given that
the refactoring is trivial I thought to submit the patch regardless of
that. Another step would be to introduce a negative lookup cache -- but
that would be a bit more involved, so I decided against it for now as I
don't want to introduce complexity for dubious gains.

Thanks!

Patrick
---
 packfile.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/packfile.c b/packfile.c
index d91016f1c7f..870d48bd949 100644
--- a/packfile.c
+++ b/packfile.c
@@ -737,6 +737,12 @@ struct packed_git *add_packed_git(struct repository *r, const char *path,
 	p = alloc_packed_git(r, alloc);
 	memcpy(p->pack_name, path, path_len);
 
+	xsnprintf(p->pack_name + path_len, alloc - path_len, ".pack");
+	if (stat(p->pack_name, &st) || !S_ISREG(st.st_mode)) {
+		free(p);
+		return NULL;
+	}
+
 	xsnprintf(p->pack_name + path_len, alloc - path_len, ".keep");
 	if (!access(p->pack_name, F_OK))
 		p->pack_keep = 1;
@@ -749,11 +755,8 @@ struct packed_git *add_packed_git(struct repository *r, const char *path,
 	if (!access(p->pack_name, F_OK))
 		p->is_cruft = 1;
 
+	/* Restore the final packfile name. */
 	xsnprintf(p->pack_name + path_len, alloc - path_len, ".pack");
-	if (stat(p->pack_name, &st) || !S_ISREG(st.st_mode)) {
-		free(p);
-		return NULL;
-	}
 
 	/* ok, it looks sane as far as we can check without
 	 * actually mapping the pack file.

---
base-commit: 1a8a4971cc6c179c4dd711f4a7f5d7178f4b3ab7
change-id: 20250516-pks-pack-avoid-stats-on-missing-8e3b75755cf0





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux