When writing objects via the bulk-checkin subsystem we first try to figure out whether an object already exists in the repository before we append it to the packfile. This check uses `repo_has_object_file()`, which knows to fetch promised objects by default. As such, if we were about to write an object that is promised, we'd fetch the object via the promisor and then skip writing it. This behaviour doesn't seem sensible: it should be significantly faster to take the locally-written object instead of faulting in objects from the promisor remote. There is one counter-argument here: it could be that the bulk-checkin mechanism will end up writing an object to disk whose content collides with the object in the promisor remote. The local repository and its promisor remote would now have two objects with different contents but the same name. But the resulting behaviour would be wrong both when we prefer the fetched object, and also when prefering the written object: - When we prefer the written object we will now see a different world compared to everyone else who has the promised object. - When we prefer the fetched object we will end up with an object that is different compared to what the user just asked us to write. This seems even worse compared to the first scenario. In an ideal world, we would protect against this by fetching the promised object and then performing a collision check. But this feels exceedingly expensive and ultimately rather pointless, as more common writing paths like `write_loose_object()` don't protect against this scenario either. And in any case we're talking about a local user that has write access to the repository anyway, so if they want to do any kind of mischieve they already can. Change the behaviour so that we don't fault in the object via the promisor remote. We shouldn't have to worry about hash collisions too much (yet) as the mechanism is only used during local writes anyway. And even if there was a collision, prefering local data that we were just asked to write over data controlled by a potentially untrusted remote feels like the better failure mode. Signed-off-by: Patrick Steinhardt <ps@xxxxxx> --- bulk-checkin.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bulk-checkin.c b/bulk-checkin.c index c31c31b18d8..b182c456d69 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -130,7 +130,7 @@ static void flush_batch_fsync(void) static int already_written(struct bulk_checkin_packfile *state, struct object_id *oid) { /* The object may already exist in the repository */ - if (repo_has_object_file(the_repository, oid)) + if (has_object(the_repository, oid, HAS_OBJECT_RECHECK_PACKED)) return 1; /* Might want to keep the list sorted */ -- 2.49.0.901.g37484f566f.dirty