Re: [PATCH 6/7] xdiff: conditionally use Rust's implementation of xxhash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 18, 2025 at 1:00 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> "Ezekiel Newren via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
>
> > +extern u64 xxh3_64(u8 const* ptr, usize size);
> > +
> > +
> >  static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_t const *xpp,
> >                          xdlclassifier_t *cf, xdfile_t *xdf) {
> >       unsigned long *ha;
> > @@ -175,14 +178,26 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_
> >
> >       xdl_parse_lines(mf, narec, xdf);
> >
> > +     if ((xpp->flags & XDF_WHITESPACE_FLAGS) == 0) {
> > +             for (usize i = 0; i < (usize) xdf->nrec; i++) {
> > +                     xrecord_t *rec = xdf->recs[i];
> > +                     rec->ha = xxh3_64(rec->ptr, rec->size);
> > +             }
> > +     } else {
> > +             for (usize i = 0; i < (usize) xdf->nrec; i++) {
> > +                     xrecord_t *rec = xdf->recs[i];
> > +                     char const* dump = (char const*) rec->ptr;
> > +                     rec->ha = xdl_hash_record(&dump, (char const*) (rec->ptr + rec->size), xpp->flags);
> > +             }
> > +     }
>
> As a technology demonstration and proof of concept patch, this is
> very nice, but to be upstreamed for real, we'd want a variant of
> xxhash that can work with the contents with whitespace squashed to
> be usable with various whitespace ignoring modes of operation.  When
> that happens, and when the result turns out to be more performant,
> we can lose the xdl_hash_record() and require only the xxhash, which
> would be great.
>
> And that variant of xxhash that understands whitespace squashing can
> of course be written in Rust as a part of this effort when the
> series loses its RFC status.  At the same time, those who want to
> use our xdiff code in third-party software (like libgit2 and vim)
> may want to reimplement it in C in their copy.
>
> Thanks.

What is the git precedent for replacement code that is easier to read
and maintain while also being more secure, but is slower? I think
hashing with whitespace handling in Rust might fall in that category.

As far as I can tell the Rust code for dealing with whitespace is
going to be slower than the C code because xdiff used a hash algorithm
(DJB2a) that can operate 1 byte at a time and combined hashing with
determining the length. Xxhash requires that the length be known
beforehand and the memory to be contiguous or to hash it in chunks.
Hashing 1 byte at a time with Xxhash is VERY slow since it's just
copying to an internal buffer until a full block is ready.

On a broader note. How do I show the mailing list the changes that
I've made to this branch/patch series? I'm not sure what the proper
procedure is or even how to do it. What commands would I run, or web
browser steps would I take to show my newest commits?





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux