Re: [PATCH 4/7] xdiff: make fields of xrecord_t Rust friendly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ezekiel

On 31/07/2025 21:58, Ezekiel Newren wrote:
On Thu, Jul 31, 2025 at 8:20 AM Phillip Wood <phillip.wood123@xxxxxxxxx> wrote:

On 28/07/2025 21:14, Ezekiel Newren wrote:
On Mon, Jul 28, 2025 at 1:52 PM Phillip Wood <phillip.wood123@xxxxxxxxx> wrote:

Ah, I misunderstood the scope of your question. I could not fit an
example of why this design pattern made sense into this patch series,
so I'll explain with an example here:

If C defines a struct like below then it's obvious how to translate
that into rust for ffi purposes. It also makes it clear that this C
struct is expressly for the purpose of C <-> Rust interoperability.
struct some_struct {
      u8* ptr;
      usize length;
      u64 counter;
};

This is how that C struct needs to be defined in Rust so that it can
interoperate with C, and making C use the Rust types reduces the
chance of copy paste, and primitive type definition mismatch errors.
#[repr(C)]
pub struct some_struct {
      ptr: *mut u8,
      length: usize,
      counter: u64,
};

How is the pointer, length pair used in rust? Normally one would use a
slice so do we have to construct a slice every time we want to use the
data in this struct, or do we copy the data in this struct into to a an
idiomatic struct with a slice member? If we end up copying there doesn't
seem much point in changing all the types in the C struct as we can
define a rust struct using *c_char, c_long etc. to interface with the C
code and covert them to an appropriate rust type when we copy the data
to the idiomatic version that is then used by the rust of the rust code.
I can see the value of the typedefs for documenting C<->rust interop if
the same struct is used by both but if we end up copying data on the
rust side I'm not so sure.

Thanks

Phillip

Passing pointer + length from c to Rust does not incur a memory copy
overhead. Take a look at rust/xdiff/src/lib.rs wich has the following
rust function defined:

#[no_mangle]
unsafe extern "C" fn xxh3_64(ptr: *const u8, size: usize) -> u64 {
     let slice = std::slice::from_raw_parts(ptr, size);
     xxhash_rust::xxh3::xxh3_64(slice)
}
I'm afraid I don't find this simple unsafe function example very illuminating. I'm trying to understand how we are going to use a struct containing a pointer, length pair in code that are more complex than this. For example if we implement an entire diff algorithm in rust are we going to call std::slice::from_raw_parts() every time we want to access a string passed from C? If we're doing that I assume we'd impl a safe method on the struct that wraps std::slice::from_raw_parts(). If that's the case the method can easily access a field that has type *c_char and we don't have to sprinkle casts throughout our C code.

For example (ignoring lifetimes)
#repr["C"]
pub struct SomeStruct {
    ptr *std::ffi::c_char,
    usize len,
    // more members
}

impl SomeStruct {
    get_line(&self) -> &[u8] {
        unsafe {
            std::slice::from_raw_parts(self.ptr as *u8, self.len);
        }
    }
}

On the other hand if at the interface between rust and C, we create a slice that we can pass to the rest of the rust code then we also don't need to change the C type as there is a single place in the rust code where we convert from c_char when we create the slice.

The casts on the C side are pretty invasive. At least casting from char to u8 is not going to break anything. The long -> usize and long -> u64 changes and their associated casts are going to need some careful review but in the long run I think the C code also benefits to using those types

Creating a slice tells the compiler what assumptions it can make about
that memory. On the C side in xdiff/xprepare.c:

extern u64 xxh3_64(u8 const* ptr, usize size);

and then it's called like this in that same file:

rec->ha = xxh3_64(rec->ptr, rec->size);

I really wanted to show my ivec type that made passing an
interoperable vector type between C and Rust easy and fast, but this
patch series is already getting very long.
That sounds interesting

Thanks

Phillip





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux