Re: [PATCH 4/7] xdiff: make fields of xrecord_t Rust friendly

Phillip Wood <phillip.wood123@xxxxxxxxx> · Fri, 1 Aug 2025 10:14:55 +0100

Hi Ezekiel

On 31/07/2025 21:58, Ezekiel Newren wrote:
On Thu, Jul 31, 2025 at 8:20 AM Phillip Wood <phillip.wood123@xxxxxxxxx> wrote:

On 28/07/2025 21:14, Ezekiel Newren wrote:
On Mon, Jul 28, 2025 at 1:52 PM Phillip Wood <phillip.wood123@xxxxxxxxx> wrote:

Ah, I misunderstood the scope of your question. I could not fit an
example of why this design pattern made sense into this patch series,
so I'll explain with an example here:

If C defines a struct like below then it's obvious how to translate
that into rust for ffi purposes. It also makes it clear that this C
struct is expressly for the purpose of C <-> Rust interoperability.
struct some_struct {
      u8* ptr;
      usize length;
      u64 counter;
};

This is how that C struct needs to be defined in Rust so that it can
interoperate with C, and making C use the Rust types reduces the
chance of copy paste, and primitive type definition mismatch errors.
#[repr(C)]
pub struct some_struct {
      ptr: *mut u8,
      length: usize,
      counter: u64,
};

How is the pointer, length pair used in rust? Normally one would use a
slice so do we have to construct a slice every time we want to use the
data in this struct, or do we copy the data in this struct into to a an
idiomatic struct with a slice member? If we end up copying there doesn't
seem much point in changing all the types in the C struct as we can
define a rust struct using *c_char, c_long etc. to interface with the C
code and covert them to an appropriate rust type when we copy the data
to the idiomatic version that is then used by the rust of the rust code.
I can see the value of the typedefs for documenting C<->rust interop if
the same struct is used by both but if we end up copying data on the
rust side I'm not so sure.

Thanks

Phillip

Passing pointer + length from c to Rust does not incur a memory copy
overhead. Take a look at rust/xdiff/src/lib.rs wich has the following
rust function defined:

#[no_mangle]
unsafe extern "C" fn xxh3_64(ptr: *const u8, size: usize) -> u64 {
     let slice = std::slice::from_raw_parts(ptr, size);
     xxhash_rust::xxh3::xxh3_64(slice)
}
I'm afraid I don't find this simple unsafe function example very 
illuminating. I'm trying to understand how we are going to use a struct 
containing a pointer, length pair in code that are more complex than 
this. For example if we implement an entire diff algorithm in rust are 
we going to call std::slice::from_raw_parts() every time we want to 
access a string passed from C? If we're doing that I assume we'd impl a 
safe method on the struct that wraps std::slice::from_raw_parts(). If 
that's the case the method can easily access a field that has type 
*c_char and we don't have to sprinkle casts throughout our C code.

For example (ignoring lifetimes)
#repr["C"]
pub struct SomeStruct {
    ptr *std::ffi::c_char,
    usize len,
    // more members
}

impl SomeStruct {
    get_line(&self) -> &[u8] {
        unsafe {
            std::slice::from_raw_parts(self.ptr as *u8, self.len);
        }
    }
}

On the other hand if at the interface between rust and C, we create a 
slice that we can pass to the rest of the rust code then we also don't 
need to change the C type as there is a single place in the rust code 
where we convert from c_char when we create the slice.

The casts on the C side are pretty invasive. At least casting from char 
to u8 is not going to break anything. The long -> usize and long -> u64 
changes and their associated casts are going to need some careful review 
but in the long run I think the C code also benefits to using those types

Creating a slice tells the compiler what assumptions it can make about
that memory. On the C side in xdiff/xprepare.c:

extern u64 xxh3_64(u8 const* ptr, usize size);

and then it's called like this in that same file:

rec->ha = xxh3_64(rec->ptr, rec->size);

I really wanted to show my ivec type that made passing an
interoperable vector type between C and Rust easy and fast, but this
patch series is already getting very long.
That sounds interesting

Thanks

Phillip