Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add with_skip_validation flag to IPC readers/writers #7093

Closed
wants to merge 4 commits into from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Feb 6, 2025

Which issue does this PR close?

Draft as

TODO:

  • Tests
  • Add to FileReader
  • Benchmarks

Rationale for this change

Forcing Array validation for trusted data is inefficient. Users should have the option to avoid this

What changes are included in this PR?

This PR builds on this PR from @totoroyyb

  1. Add API in IPC options to skip validation
  2. Pass this flag through StreamReader/FileReader/Decoder
  3. Tests for same

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Feb 6, 2025
@alamb alamb force-pushed the alamb/ipc_reader_unsafe branch from 7a2613a to 5b8743a Compare February 6, 2025 22:21
@alamb alamb changed the title Alamb/ipc reader unsafe Add with_skip_validation flag to IPC readers/writers Feb 6, 2025
@alamb
Copy link
Contributor Author

alamb commented Feb 7, 2025

Preliminary Benchmarks

My hacked up local benchmarks show a 3-9x improvement when disabling validation.

(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/arrow-rs$ critcmp main no_validation
group                                                main                                   no_validation
-----                                                ----                                   -------------
arrow_ipc_stream_writer/FileReader/read_10           3.25    252.6±6.64µs        ? ?/sec    1.00     77.8±1.62µs        ? ?/sec
arrow_ipc_stream_writer/FileReader/read_10/mmap      8.98    243.6±6.58µs        ? ?/sec    1.00     27.1±0.54µs        ? ?/sec
arrow_ipc_stream_writer/StreamReader/read_10         3.17   250.2±10.02µs        ? ?/sec    1.00     78.9±3.09µs        ? ?/sec
arrow_ipc_stream_writer/StreamReader/read_10/zstd    1.29      4.3±0.37ms        ? ?/sec    1.00      3.3±0.38ms        ? ?/sec

Branch here: https://github.com/alamb/arrow-rs/pull/new/alamb/mashup

@alamb
Copy link
Contributor Author

alamb commented Feb 11, 2025

Superceded by #7120

@alamb alamb closed this Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve Arrow-IPC performance by avoiding Unsafe Unchecked IPC Read RecordBatch
1 participant