-
Notifications
You must be signed in to change notification settings - Fork 873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Align buffers of array data imported through the FFI if they aren't aligned #7137
base: main
Are you sure you want to change the base?
Conversation
I wonder if we could make this behaviour opt-in, whilst I accept systems may produce unaligned data, and people may want to accommodate this (although historically the consensus has been this is a bug), IMO I would not expect an FFI interface to be silently reallocating memory buffers without being explicitly asked to do so. Returning an error instead of panicking would be a reasonable parallel improvement |
I considered that, but couldn't think of any scenario where a Rust program importing data from another system would not want to make sure the buffers are aligned. And note that buffers are checked already, that's how the panic is triggered after all.
It's important to consider here that reallocation won't happen if the producer is a good citizen and produces aligned buffers already. Often by reallocation on their side as well. That's what I think is being currently implemented in the C++ implementation of Flight. The buffers coming from gRPC can't possibly be aligned according to Arrow types so no matter who performs the copy for alignment, it will have to be done at least once.
I can't see how that error could be handled at a higher level layer |
I think it boils down to a subjective judgement over which of the following is better.
Or
I personally think the former is better as it drives the ecosystem forwards, but appreciate some people are more in the camp of "I just want it to work".
Neither, but some people have an aversion to panics
Whilst this technically isn't true, I agree practically speaking it is unavoidable in this instance, and tbh given network IO is involved is likely irrelevant. My concern would more be for systems where this copy is unnecessary, e.g. simply not using aligned_alloc. |
Note that [1] https://github.com/apache/arrow-rs/blob/main/arrow-data/src/data.rs#L1591 |
Most arrow implementations actually provide 64 byte alignment
You can always obtain an aligned allocation, by over allocating by the alignment, and then slicing. This is in fact what many arrow implementations do, Go included https://github.com/apache/arrow-go/blob/main/arrow/memory/go_allocator.go#L23 |
Closes #7136.
This will also fix this issue in
arrow-adbc
once thearrow-array
crate version is bumped: apache/arrow-adbc#2526