Hi Ezekiel
On 31/07/2025 21:58, Ezekiel Newren wrote:
On Thu, Jul 31, 2025 at 8:20 AM Phillip Wood <phillip.wood123@xxxxxxxxx> wrote:
On 28/07/2025 21:14, Ezekiel Newren wrote:
On Mon, Jul 28, 2025 at 1:52 PM Phillip Wood <phillip.wood123@xxxxxxxxx> wrote:
Ah, I misunderstood the scope of your question. I could not fit an
example of why this design pattern made sense into this patch series,
so I'll explain with an example here:
If C defines a struct like below then it's obvious how to translate
that into rust for ffi purposes. It also makes it clear that this C
struct is expressly for the purpose of C <-> Rust interoperability.
struct some_struct {
u8* ptr;
usize length;
u64 counter;
};
This is how that C struct needs to be defined in Rust so that it can
interoperate with C, and making C use the Rust types reduces the
chance of copy paste, and primitive type definition mismatch errors.
#[repr(C)]
pub struct some_struct {
ptr: *mut u8,
length: usize,
counter: u64,
};
How is the pointer, length pair used in rust? Normally one would use a
slice so do we have to construct a slice every time we want to use the
data in this struct, or do we copy the data in this struct into to a an
idiomatic struct with a slice member? If we end up copying there doesn't
seem much point in changing all the types in the C struct as we can
define a rust struct using *c_char, c_long etc. to interface with the C
code and covert them to an appropriate rust type when we copy the data
to the idiomatic version that is then used by the rust of the rust code.
I can see the value of the typedefs for documenting C<->rust interop if
the same struct is used by both but if we end up copying data on the
rust side I'm not so sure.
Thanks
Phillip
Passing pointer + length from c to Rust does not incur a memory copy
overhead. Take a look at rust/xdiff/src/lib.rs wich has the following
rust function defined:
#[no_mangle]
unsafe extern "C" fn xxh3_64(ptr: *const u8, size: usize) -> u64 {
let slice = std::slice::from_raw_parts(ptr, size);
xxhash_rust::xxh3::xxh3_64(slice)
}
I'm afraid I don't find this simple unsafe function example very
illuminating. I'm trying to understand how we are going to use a struct
containing a pointer, length pair in code that are more complex than
this. For example if we implement an entire diff algorithm in rust are
we going to call std::slice::from_raw_parts() every time we want to
access a string passed from C? If we're doing that I assume we'd impl a
safe method on the struct that wraps std::slice::from_raw_parts(). If
that's the case the method can easily access a field that has type
*c_char and we don't have to sprinkle casts throughout our C code.
For example (ignoring lifetimes)
#repr["C"]
pub struct SomeStruct {
ptr *std::ffi::c_char,
usize len,
// more members
}
impl SomeStruct {
get_line(&self) -> &[u8] {
unsafe {
std::slice::from_raw_parts(self.ptr as *u8, self.len);
}
}
}
On the other hand if at the interface between rust and C, we create a
slice that we can pass to the rest of the rust code then we also don't
need to change the C type as there is a single place in the rust code
where we convert from c_char when we create the slice.
The casts on the C side are pretty invasive. At least casting from char
to u8 is not going to break anything. The long -> usize and long -> u64
changes and their associated casts are going to need some careful review
but in the long run I think the C code also benefits to using those types
Creating a slice tells the compiler what assumptions it can make about
that memory. On the C side in xdiff/xprepare.c:
extern u64 xxh3_64(u8 const* ptr, usize size);
and then it's called like this in that same file:
rec->ha = xxh3_64(rec->ptr, rec->size);
I really wanted to show my ivec type that made passing an
interoperable vector type between C and Rust easy and fast, but this
patch series is already getting very long.
That sounds interesting
Thanks
Phillip