On Fri, Jul 18, 2025 at 1:00 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > "Ezekiel Newren via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > > > +extern u64 xxh3_64(u8 const* ptr, usize size); > > + > > + > > static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_t const *xpp, > > xdlclassifier_t *cf, xdfile_t *xdf) { > > unsigned long *ha; > > @@ -175,14 +178,26 @@ static int xdl_prepare_ctx(unsigned int pass, mmfile_t *mf, long narec, xpparam_ > > > > xdl_parse_lines(mf, narec, xdf); > > > > + if ((xpp->flags & XDF_WHITESPACE_FLAGS) == 0) { > > + for (usize i = 0; i < (usize) xdf->nrec; i++) { > > + xrecord_t *rec = xdf->recs[i]; > > + rec->ha = xxh3_64(rec->ptr, rec->size); > > + } > > + } else { > > + for (usize i = 0; i < (usize) xdf->nrec; i++) { > > + xrecord_t *rec = xdf->recs[i]; > > + char const* dump = (char const*) rec->ptr; > > + rec->ha = xdl_hash_record(&dump, (char const*) (rec->ptr + rec->size), xpp->flags); > > + } > > + } > > As a technology demonstration and proof of concept patch, this is > very nice, but to be upstreamed for real, we'd want a variant of > xxhash that can work with the contents with whitespace squashed to > be usable with various whitespace ignoring modes of operation. When > that happens, and when the result turns out to be more performant, > we can lose the xdl_hash_record() and require only the xxhash, which > would be great. > > And that variant of xxhash that understands whitespace squashing can > of course be written in Rust as a part of this effort when the > series loses its RFC status. At the same time, those who want to > use our xdiff code in third-party software (like libgit2 and vim) > may want to reimplement it in C in their copy. > > Thanks. What is the git precedent for replacement code that is easier to read and maintain while also being more secure, but is slower? I think hashing with whitespace handling in Rust might fall in that category. As far as I can tell the Rust code for dealing with whitespace is going to be slower than the C code because xdiff used a hash algorithm (DJB2a) that can operate 1 byte at a time and combined hashing with determining the length. Xxhash requires that the length be known beforehand and the memory to be contiguous or to hash it in chunks. Hashing 1 byte at a time with Xxhash is VERY slow since it's just copying to an internal buffer until a full block is ready. On a broader note. How do I show the mailing list the changes that I've made to this branch/patch series? I'm not sure what the proper procedure is or even how to do it. What commands would I run, or web browser steps would I take to show my newest commits?