Re: [PATCH v2 00/17] RFC: Accelerate xdiff and begin its rustification

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 18, 2025 at 3:31 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> "Ezekiel Newren via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
>
> >  * Code style: Should we adopt a Rust code style of some sort? Perhaps have
> >    the code always be formatted by rustfmt in its default configuration?
>
> Sounds sensible.  I'll let folks with more Rust inclination to
> figure out what _the_ style should be, but having _a_ style we all
> stick to is good.
>
> >  * Rust version: We are not using the same Rust version on all platforms in
> >    CI; 32-bit builds and Windows builds require a newer Rust version to
> >    successfully build.
>
> As long as we do not have to bend backwards on the code with "if
> using version X or older, use this alternative codepath" all over
> the place, "pick a version that works on each platform" that results
> in "due to the quality of ports, some platform's older port is
> unusable and newer version is required" is not too bad, especially
> for a system that is still rapidly getting improved and a bit on the
> unstable side, I think.
>
> >  * Performance with whitepsace flags: I originally intended to leave out the
> >    whitespace handling because I knew it was slower,...
>
> If the Rust guinea pig were different from how each line is hashed
> in xdiff, which is targetted by am/xdiff-hash-tweak topic, then we
> can leave out the whitespace-ignoring hashing from this topic
> altogether.
>
> Quite honestly, I do not like throwing away the other optimization
> efforts that can be reviewed and integrated trivially, but it is
> practically impossible to do so while still have a "let's start
> playing with Rust" topic that targets exactly the same area.  Yes,
> this topic licked the same corner of the cake first, but still,
> I was hoping that the second iteration of this series would use a
> different code paths as a Rust guinea pig.

I agree it would be nice to merge those down first.  One possibility
here would be having Ezekiel rebase his work on top of
am/xdiff-hash-tweak...

> After all, the primary objective of our first Rust topic is to set
> the framework right (like the platform and version support policies,
> how foreign interfaces like type systems get impedance-matched, what
> the impact to our build infrature looks like, etc.).  It would be a
> huge plus if it can at the same time demonstrate how much safer code
> we can write with less effort if we switched writing some (and
> gradually larger, posibly) parts in the language.
>
> The result this cover letter has in its title, 'accelerate xdiff',
> is not primarily due to use of Rust, is it?  As the other topic
> demonstrates, it is to use an implementation of a faster hash
> function (we can consider it to be an impressive technology
> demonstration that a rust reimplementation of original C code can be
> done in a very performant way).  And nobody is expecting that we
> would be using Rust for speed anyway, no?

You are correct that it is not due to Rust.  My original objective
that I tried to trick/coax/nerd-snipe/whatever Ezekiel into looking
into was cleaning up xdiff, to allow various features in `git replay`
and rebasing merges.  Ezekiel was interested in Rust and in a
challenge.  xdiff is quite a knot to untangle, and Ezekiel's been at
work on it for quite some time.  But, he happened to notice this
speedup, and found a way to turn it into a short series without all
his other patches.

We could perhaps shift focus, but I'm curious if you're wanting the
xdiff work to be thrown away or shelved in favor of some completely
different area of the code, or if perhaps some other aspects of the
xdiff code would still be amenable.

One big challenge in finding another area, whether in xdiff or
elsewhere, is that Ezekiel really wants to showcase how nice Rust's
unittesting is, but that only works if we start at a low-level and
build up.  If we make Rust code call C code, that'd either not be
readily unit-testable, or would require us stubbing out the entire
implementation behind that C interface or doing something more
complex.

What if Ezekiel rebased his series on am/xdiff-hash-tweak, and then
instead of further modifying the hashing in the first series, he:
  - introduced brian's patch with the platform support
  - setup the CI builds to test building with Rust (including Johannes' patches)
  - started working on transitioning xdfile_t data structure to be FFI friendly

One issue here is that it probably wouldn't be too long before we'd
want to rip out the xdlclassifier struct (mostly a glorified
hashtable), which is kind of tied up in a knot with the hashing and
line equality, so it would probably only be a few more series down the
road before we'd want to start tweaking the code in
am/xdiff-hash-tweak to make use of the new data structures.  Would
that be agreeable?





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux