Patrick Steinhardt <ps@xxxxxx> writes: > This means that we repeatedly store the same object ID in our `have_obj` > array, with two consequences: > > - In protocol v2 we deduplicate ACKs for commits, but not for any > other objects as we send ACKs for every object ID in the `have_obj` > array. > > - The `have_obj` array can grow in size indefinitely with both > protocols. > > The potentially-more-serious issue is the second one, as we basically > have a way for an adversary to allocate arbitrarily large buffers now. > Ultimately, this doesn't seem to be all that serious though: on my > machine, the growth of that array is at around 4MB/s, and after roughly > five minutes I was only at 1GB RSS. So this is concerning, but only > mildly so. > > Fix this bug by storing the `THEY_HAVE` flag independent of the object > type so that we don't store duplicate object IDs in `have_obj` anymore. Makes sense. > diff --git a/upload-pack.c b/upload-pack.c > index 4f26f6afc7..fba3e339e2 100644 > --- a/upload-pack.c > +++ b/upload-pack.c > @@ -476,20 +476,21 @@ static void create_pack_file(struct upload_pack_data *pack_data, > > static int do_got_oid(struct upload_pack_data *data, const struct object_id *oid) > { > - int we_knew_they_have = 0; > struct object *o = parse_object_with_flags(the_repository, oid, > PARSE_OBJECT_SKIP_HASH_CHECK | > PARSE_OBJECT_DISCARD_TREE); > > if (!o) > die("oops (%s)", oid_to_hex(oid)); > + > + if (o->flags & THEY_HAVE) > + return 0; > + o->flags |= THEY_HAVE; > + > if (o->type == OBJ_COMMIT) { > struct commit_list *parents; > struct commit *commit = (struct commit *)o; > - if (o->flags & THEY_HAVE) > - we_knew_they_have = 1; > - else > - o->flags |= THEY_HAVE; > + OK, so regardless of the type, an object that we already know they have do not need to be processed twice, which makes sense? For commits, we used to ensure that even if we see a commit that we know they have for the second time, instead of returning, we marked its immediate parents with THEY_HAVE bit. We no longer do so. Instead, the new code only paints the parents of a commit that we haven't painted with THEY_HAVE. Otherwise we return without doing anything. I wonder if they are equivalent. In the original code, when our commit A has a parent B, where neither of them we haven't painted, we paint A with THEY_HAVE, we paint all parents of A with THEY_HAVE, and put A in the have_obj array and return 1. Then later when we feed B to this function, we already know they have B, so we do not add it to have_obj array, but otherwise we still paint B's parents with THEY_HAVE bit. In the new code with the same set-up, we paint A and B with THEY_HAVE and put A in the have_obj array and return 1. When the caller feeds B to this function, we pretty much do nothing and return, so B's parent will not be painted, will it? I must be misreading the patch, but I do not quite see how. > if (!data->oldest_have || (commit->date < data->oldest_have)) > data->oldest_have = commit->date; > for (parents = commit->parents; > @@ -497,11 +498,9 @@ static int do_got_oid(struct upload_pack_data *data, const struct object_id *oid > parents = parents->next) > parents->item->object.flags |= THEY_HAVE; > } > - if (!we_knew_they_have) { > - add_object_array(o, NULL, &data->have_obj); > - return 1; > - } > - return 0; > + > + add_object_array(o, NULL, &data->have_obj); > + return 1; > } > > static int got_oid(struct upload_pack_data *data,