Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align buffers of array data imported through the FFI if they aren't aligned #7137

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

felipecrv
Copy link
Contributor

@felipecrv felipecrv commented Feb 15, 2025

Closes #7136.

This will also fix this issue in arrow-adbc once the arrow-array crate version is bumped: apache/arrow-adbc#2526

@github-actions github-actions bot added the arrow Changes to the arrow crate label Feb 15, 2025
@tustvold
Copy link
Contributor

tustvold commented Feb 15, 2025

I wonder if we could make this behaviour opt-in, whilst I accept systems may produce unaligned data, and people may want to accommodate this (although historically the consensus has been this is a bug), IMO I would not expect an FFI interface to be silently reallocating memory buffers without being explicitly asked to do so.

Returning an error instead of panicking would be a reasonable parallel improvement

@felipecrv
Copy link
Contributor Author

I wonder if we could make this behaviour opt-in

I considered that, but couldn't think of any scenario where a Rust program importing data from another system would not want to make sure the buffers are aligned. And note that buffers are checked already, that's how the panic is triggered after all.

I would not expect an FFI interface to be silently reallocating memory buffers without being explicitly asked to do so

It's important to consider here that reallocation won't happen if the producer is a good citizen and produces aligned buffers already. Often by reallocation on their side as well. That's what I think is being currently implemented in the C++ implementation of Flight. The buffers coming from gRPC can't possibly be aligned according to Arrow types so no matter who performs the copy for alignment, it will have to be done at least once.

Returning an error instead of panicking would be a reasonable parallel improvement

I can't see how that error could be handled at a higher level layer

@tustvold
Copy link
Contributor

tustvold commented Feb 15, 2025

I think it boils down to a subjective judgement over which of the following is better.

  • arrow-adbc produces unaligned buffers
  • error reported consuming data
  • bug filed on arrow-adbc
  • workaround of realigning enabled linking back to upstream bug

Or

  • arrow-adbc produces unaligned buffers
  • implementation silently performs copy
  • bug goes unnoticed

I personally think the former is better as it drives the ecosystem forwards, but appreciate some people are more in the camp of "I just want it to work".

I can't see how that error could be handled at a higher level layer

Neither, but some people have an aversion to panics

The buffers coming from gRPC can't possibly be aligned according to Arrow types so no matter who performs the copy for alignment, it will have to be done at least once.

Whilst this technically isn't true, I agree practically speaking it is unavoidable in this instance, and tbh given network IO is involved is likely irrelevant.

My concern would more be for systems where this copy is unnecessary, e.g. simply not using aligned_alloc.

@felipecrv
Copy link
Contributor Author

I personally think the former is better as it drives the ecosystem forwards, but appreciate some people are more in the camp of "I just want it to work".

Note that arrow-rs is effectively asking other Arrow implementations to provide 16-byte aligned buffers for some types [1][2]. That's a tough call -- malloc gives 8 by default (64 bits). Possible but hard in C, almost impossible on language runtimes that give less memory allocation controls.

[1] https://github.com/apache/arrow-rs/blob/main/arrow-data/src/data.rs#L1591
[2] apache/arrow-adbc#2526 (comment)

@tustvold
Copy link
Contributor

Note that arrow-rs is effectively asking other Arrow implementations to provide 16-byte aligned buffers for some types

Most arrow implementations actually provide 64 byte alignment

Possible but hard in C, almost impossible on language runtimes that give less memory allocation controls.

You can always obtain an aligned allocation, by over allocating by the alignment, and then slicing. This is in fact what many arrow implementations do, Go included

https://github.com/apache/arrow-go/blob/main/arrow/memory/go_allocator.go#L23

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Array data imported through the FFI might contain unaligned buffers
2 participants