Re: [PATCH v2 10/13] pack-objects: refactor path-walk delta phase

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/6/25 9:14 PM, Taylor Blau wrote:
On Mon, Mar 24, 2025 at 03:22:46PM +0000, Derrick Stolee via GitGitGadget wrote:
The current implementation is not integrated with threads, but could be
done in a future update.

I think that "in a future update" may be worth replacing with "the
following commit", as the former suggests (to me) that it may be perused
outside of this series.

Noted and fixed.


+static void find_deltas_for_region(struct object_entry *list,
+				   struct packing_region *region,
+				   unsigned int *processed)
+{
+	struct object_entry **delta_list;
+	uint32_t delta_list_nr = 0;
+
+	ALLOC_ARRAY(delta_list, region->nr);
+	for (uint32_t i = 0; i < region->nr; i++) {

I know that these values are all unsigned, and very unlikely to get near
the 32-bit maximum, but I do think we should use size_t here (and
likewise when dealing with values that deal with how many entries in a
list we've allocated and indexes into that list).

I've updated things to size_t where I could across this patch. Thanks.

+static void find_deltas_by_region(struct object_entry *list,
+				  struct packing_region *regions,
+				  uint32_t start, uint32_t nr)

I imagine that "start" here is setup for the following commit which will
parallelize this task across multiple threads, with each thread starting
at a different position.

I wonder if (as an alternative) we could get away with passing in a
(struct packing_region *, size_t) pair and drop "start". I think this
can work if you pass in the result of adding whatever the value of
"start" would be to to_pack.regions here.

That somewhat hides the fact that this code is meant to be run across
multiple threads, but in a way that I think is worth doing. It lets the
function avoid having to do things like:

   for (size_t i = start; i < start + nr; i++)

and instead do the simpler:

   for (size_t i = 0; i < nr; i++)

since the caller already adjusted the regions pointer for us. As a
side-effect, it also means that the call below in prepare_pack() can
avoid passing a literal zero.

I generally avoid this kind of pattern, because it makes it more
difficult to understand what is going on when debugging. The cost
of adding 'start' to the index seems low relative to having a
consistent indexing scheme when accessing from the "global" list
of items.

This only gets more complicated when the threaded version iterates
on a "consecutive sublist of regions".

+/**

This is an extreme nitpick, but I think that "/**" (as opposed to "/*")
is preferred, as the former is used for Doxygen-style comments. I feel
like I have seen Junio comment on this before, but searching 'f:Junio
"/**"' yields a measly 83,000+ results, so I am unlikely to find it ;-).

(Not worth rerolling on its own, but if you are rerolling anyway, which
is what I gathered from some of your earlier replies, it may be worth
picking up along the way.)

I'm confused. Are you saying I should change from "/**" to "/*" (which
is the opposite of your preferred statement) or did you misread that I
had used "/**"? I can't determine which statement is the intended one.
 Thanks,
-Stolee





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux