Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuration adjustment in Datafusion #4

Closed
wants to merge 152 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
152 commits
Select commit Hold shift + click to select a range
0f176a1
Initial but not completely work like proto
metesynnada Feb 15, 2024
cac943d
Before proto is handled
metesynnada Feb 26, 2024
3fba4a0
Merge remote-tracking branch 'upstream/main' into configuration
metesynnada Feb 26, 2024
c17d655
Update listing_table_factory.rs
metesynnada Feb 26, 2024
13e7e83
Before proto 2
metesynnada Feb 26, 2024
25d5685
Minor adjustments
metesynnada Feb 26, 2024
b728232
feat: support `FixedSizeList` Type Coercion (#9108)
Weijun-H Feb 26, 2024
ec86acb
feat: expand `unnest` to accept arbitrary single array expression (#…
jonahgao Feb 26, 2024
14ed29f
Merge branch 'main' into configuration
metesynnada Feb 26, 2024
c568407
fix: flight examples (#9335)
codonnell Feb 26, 2024
b8c6e0b
docs: update parquet_sql_multiple_files.rs with a relative path ex (#…
tshauck Feb 26, 2024
a26f583
tests: add tests for writing hive-partitioned parquet (#9316)
tshauck Feb 26, 2024
b55d0ed
feature: support nvl(ifnull) function (#9284)
guojidan Feb 26, 2024
85f7a8e
Move abs to datafusion_functions (#9313)
yyy1000 Feb 27, 2024
8f3d1ef
refactor: `SchemaProvider::table` can fail (#9307)
crepererum Feb 27, 2024
daca94e
Update headers
mustafasrepo Feb 27, 2024
0e1300d
Update copy.slt
metesynnada Feb 27, 2024
c2da778
Merge branch 'configuration' of https://github.com/synnada-ai/datafus…
mustafasrepo Feb 27, 2024
ea22682
Add new test,
mustafasrepo Feb 27, 2024
372204e
fix write_partitioned_parquet_results bug (#9360)
guojidan Feb 27, 2024
14264d2
fix: use `JoinSet` to make spawned tasks cancel-safe (#9318)
DDtKey Feb 27, 2024
acd09da
Update nix requirement from 0.27.1 to 0.28.0 (#9344)
dependabot[bot] Feb 27, 2024
c439bc7
Replace usages of internal_err with exec_err where appropriate (#9241)
Omega359 Feb 27, 2024
fb86d94
Passes SLT tests
metesynnada Feb 27, 2024
ea30b93
feat : Support for deregistering user defined functions (#9239)
mobley-trent Feb 27, 2024
544b3d9
fix return type (#9357)
guojidan Feb 27, 2024
935ebca
refactor: move acos() to function crate (#9297)
SteveLauC Feb 28, 2024
fa8508e
docs: put flatten in top fn list (#9376)
SteveLauC Feb 28, 2024
32d906f
Update list_to_string alias to point to array_to_string (#9374)
monkwire Feb 28, 2024
b89b138
Update csv.rs
metesynnada Feb 28, 2024
33eca8d
Before trying char
metesynnada Feb 28, 2024
e622409
feat: issue_9285: port builtin reg function into datafusion-function-…
Lordworms Feb 28, 2024
3caf33e
Fix u8 handling
metesynnada Feb 28, 2024
d896ebe
Add test to verify issue #9161 (#9265)
jonahgao Feb 28, 2024
a1ae158
refactor: fix error macros hygiene (#9366)
crepererum Feb 28, 2024
b220f03
feat: support for defining ARRAY columns in `CREATE TABLE` (#9381)
jonahgao Feb 28, 2024
19d892a
fix: panic in isnan() when no args are given (#9377)
SteveLauC Feb 28, 2024
ae4b3a0
feat: support `unnest` in FROM clause (#9355)
jonahgao Feb 28, 2024
5f90ead
feat: support nvl2 function (#9364)
guojidan Feb 28, 2024
96abac8
refactor: move asin() to function crate (#9379)
SteveLauC Feb 28, 2024
03e8323
fix: using test data sample for catalog example (#9372)
korowa Feb 28, 2024
d6717f8
delete tail space (#9386)
Tangruilin Feb 28, 2024
e4182d4
Run cargo-fmt on `datafusion-functions/core` (#9367)
alamb Feb 28, 2024
d5b6359
Cache common plan properties to eliminate recursive calls in physical…
mustafasrepo Feb 28, 2024
cf69d38
Run cargo-fmt on all of `datafusion-functions` (#9390)
alamb Feb 28, 2024
f68864b
feat: issue #9224 substitute tlide in table path (#9259)
Lordworms Feb 28, 2024
ca37ce3
port range function and change gen_series logic (#9352)
Lordworms Feb 29, 2024
e1ca74e
[MINOR]: Generate physical plan, instead of logical plan in the bench…
mustafasrepo Feb 29, 2024
d16bb65
Merge remote-tracking branch 'upstream/main' into configuration
metesynnada Feb 29, 2024
4f1acf1
Update according to review
metesynnada Feb 29, 2024
de902de
[task #8987]add_to_date_function (#9019)
Tangruilin Feb 29, 2024
a9e5247
Minor: clarify performance in docs for `ScalarUDF`, `ScalarUDAF` and …
alamb Feb 29, 2024
90eddf6
feat: replace std Instant with wasm-compatible wrapper (#9189)
waynexia Feb 29, 2024
f4c7797
Uplift keys/dependencies to use more workspace inheritance (#9293)
Jefffrey Feb 29, 2024
e1ed13d
Improve documentation for ExecutionPlanProperties, use consistent fie…
alamb Feb 29, 2024
a8a3c5d
Doc: Workaround for Running cargo test locally without signficant mem…
devinjdangelo Feb 29, 2024
10d5f2d
feat: support `unnest` with additional columns (#9400)
jonahgao Mar 1, 2024
ec67380
Minor: improve the display name of `unnest` expressions (#9412)
jonahgao Mar 1, 2024
9aa01f3
Minor: Move function signature check to planning stage (#9401)
2010YOUY01 Mar 1, 2024
f489525
chore(deps): update substrait requirement from 0.24.0 to 0.25.1 (#9406)
dependabot[bot] Mar 1, 2024
b2ff249
docs: update contributor guide (migration to sqllogictest is done) (#…
SteveLauC Mar 1, 2024
9e39afd
Move the to_timestamp* functions to datafusion-functions (#9388)
Omega359 Mar 1, 2024
2a490e4
Minor: Support LargeList List Range indexing and fix large list handl…
jayzhan211 Mar 1, 2024
1b924ff
NEW Logo (#9385)
pinarbayata Mar 2, 2024
b87cde1
Handle serde for ScalarUDF (#9395)
yyy1000 Mar 2, 2024
10fbf42
Minior: Add negative tests about sqrt function (#9426)
caicancai Mar 2, 2024
83d15e8
Move SpawnedTask from datafusion_physical_plan to new `datafusion_com…
mustafasrepo Mar 2, 2024
487c53b
Re-export datafusion-functions-array (#9433)
andygrove Mar 2, 2024
89aea0a
Support LargeList for ListIndex (#9424)
PsiACE Mar 3, 2024
f229dcc
move ArrayDims, ArrayNdims and Cardinality to datafusion-function-cra…
Weijun-H Mar 3, 2024
4ea536b
refactor: make instr() an alias of strpos() (#9396)
SteveLauC Mar 3, 2024
c43d9b3
Add test case for invalid tz in timestamp literal (#9429)
MohamedAbdeen21 Mar 3, 2024
1a4dc00
Minor: simplify call (#9434)
alamb Mar 4, 2024
dd1cf01
Support IGNORE NULLS for LEAD window function (#9419)
comphead Mar 4, 2024
767760b
fix sqllogicaltest result (#9444)
jackwener Mar 4, 2024
1e6115a
docs: rm duplicate words. (#9449)
my-vegetable-has-exploded Mar 4, 2024
37ca46a
minor: fix cargo clippy some warning (#9442)
jackwener Mar 4, 2024
5188a5d
port regexp_like function and port related tests (#9397)
Lordworms Mar 4, 2024
22255c2
fix: sort_batch function unsupported mixed types with list (#9410)
JasonLi-cn Mar 4, 2024
581fd98
refactor: add `join_unwind` to `SpawnedTask` (#9422)
DDtKey Mar 4, 2024
1e8fa2f
Ignore null LEAD support for small batch sizes. (#9445)
mustafasrepo Mar 4, 2024
492d7bd
fix: casting to ARRAY types failed (#9441)
jonahgao Mar 4, 2024
608b615
fix: reading from partitioned `json` & `arrow` tables (#9431)
korowa Mar 4, 2024
2651437
feat: Support `EscapedStringLiteral`, update sqlparser to `0.44.0` (#…
JasonLi-cn Mar 4, 2024
684b4fa
Minor: fix LEAD test description (#9451)
comphead Mar 4, 2024
a84e5f8
Consolidate `TreeNode` transform and rewrite APIs (#8891)
peter-toth Mar 4, 2024
ac27428
Support `Date32` arguments for `generate_series` (#9420)
Lordworms Mar 4, 2024
31c23dc
Minor: change doc for range (#9455)
Lordworms Mar 5, 2024
036f005
Passing tests
metesynnada Mar 5, 2024
ba938a9
passing tests with proto
metesynnada Mar 5, 2024
cff6bc9
Cargo fix
metesynnada Mar 5, 2024
37ea944
Add missing doc index (#9462)
Weijun-H Mar 5, 2024
ce5dd20
update bigdecimal version (#9471)
comphead Mar 5, 2024
64f998f
chore(deps): update base64 requirement from 0.21 to 0.22 (#9446)
dependabot[bot] Mar 5, 2024
f755626
Port regexp_replace functions and related tests (#9454)
Lordworms Mar 5, 2024
6041dea
Update contributor guide with updated scalar function howto (#9438)
Omega359 Mar 5, 2024
3854419
feat: add support for fixed list wildcard in type signature (#9312)
universalmind303 Mar 5, 2024
2873fd0
Add a `ScalarUDFImpl::simplfy()` API, move `SimplifyInfo` et al to da…
jayzhan211 Mar 5, 2024
3aba67e
Implement IGNORE NULLS for FIRST_VALUE (#9411)
huaxingao Mar 5, 2024
ea01e56
Add plugable handler for `CREATE FUNCTION` (#9333)
milenkovicm Mar 5, 2024
2fd3c4e
Testing and clippy refactors
metesynnada Mar 6, 2024
f141345
Merge remote-tracking branch 'upstream/main' into configuration
metesynnada Mar 6, 2024
cc8a41a
Enable configurable display of partition sizes in the explain stateme…
jayzhan211 Mar 6, 2024
55a223c
After merge corrections
metesynnada Mar 6, 2024
7dfa7ee
Merge remote-tracking branch 'upstream/main' into configuration
metesynnada Mar 6, 2024
4f9acdf
Parquet feature fix
metesynnada Mar 6, 2024
f3836a5
Reduce casts for LEAD/LAG (#9468)
comphead Mar 6, 2024
b0bb337
On datafusion-cli register COPY statements
metesynnada Mar 7, 2024
00525a8
Correcting a test
metesynnada Mar 7, 2024
e5404a1
[CI build] fix chrono suggestions (#9486)
comphead Mar 7, 2024
8d58b03
Make regex dependency optional in datafusion-functions, add CI checks…
alamb Mar 7, 2024
f2f1d96
Merge remote-tracking branch 'upstream/main' into configuration
metesynnada Mar 7, 2024
37b7375
fix: coalesce function should return correct data type (#9459)
viirya Mar 7, 2024
0c0fce3
LEAD/LAG calculate default value once (#9485)
comphead Mar 7, 2024
fc81bf1
chore: simplify the return type of `validate_data_types()` (#9491)
waynexia Mar 8, 2024
a68f1cb
minor: use arrow-rs casting (#9500)
comphead Mar 8, 2024
3c2d510
chore(deps): update substrait requirement from 0.25.1 to 0.27.0 (#9502)
dependabot[bot] Mar 8, 2024
7517430
fix: `generate_series` and `range` panic on edge cases (#9503)
jonahgao Mar 8, 2024
92a471e
Fix undeterministic behaviour of schema nullability of lag window que…
mustafasrepo Mar 8, 2024
32bb26d
Add `to_unixtime` function (#9077)
Tangruilin Mar 8, 2024
9cf44c2
Minor: fixed transformed state return (#9484)
alamb Mar 8, 2024
e3e64fe
test: port strpos test in physical_expr/src/functions to sqllogictest…
SteveLauC Mar 8, 2024
b7f4772
Port ArrayHas family to `functions-array` (#9496)
jayzhan211 Mar 9, 2024
b1d8082
port array_empty and array_length (#9510)
Weijun-H Mar 9, 2024
5537572
fix: `substr_index` not handling negative occurrence correctly (#9475)
jonahgao Mar 9, 2024
ccedcb8
[minor] extract collect file statistics method and add doc (#9490)
Ted-Jiang Mar 9, 2024
afd0f90
test: sqllogictests for multiple tables join (#9480)
korowa Mar 9, 2024
356a307
Add support for ignore nulls for LEAD, LAG in WindowAggExec (#9498)
Lordworms Mar 9, 2024
eebdbe8
Minior: Improve log expr description (#9516)
caicancai Mar 9, 2024
37b3ff3
port flatten (#9523)
Weijun-H Mar 10, 2024
afddb32
feat: Add projection to HashJoinExec. (#9236)
my-vegetable-has-exploded Mar 10, 2024
6710e6d
Add example for `FunctionFactory` (#9482)
milenkovicm Mar 10, 2024
acddecb
Move date_part, date_trunc, date_bin functions to datafusion-function…
Omega359 Mar 10, 2024
0b540ea
fix: support two argument TRIM (#9521)
tshauck Mar 10, 2024
96664ce
Remove physical expr of ListIndex and ListRange, convert to `array_el…
jayzhan211 Mar 10, 2024
f1f0965
feat: function name hints for UDFs (#9407)
SteveLauC Mar 10, 2024
f4107d4
Minor: Improve documentation for registering `AnalyzerRule` (#9520)
alamb Mar 10, 2024
31fcd72
Extend argument types for udf `return_type_from_exprs` (#9522)
jayzhan211 Mar 10, 2024
88187d4
move make_array array_append array_prepend array_concat function to d…
guojidan Mar 10, 2024
4cd3c43
Port `StringToArray` to `function-arrays` subcrate (#9543)
erenavsarogullari Mar 11, 2024
44936ef
Minor: remove `..` pattern matching in sql planner (#9531)
alamb Mar 11, 2024
d927882
fix doc (#9542)
yyy1000 Mar 11, 2024
75ad221
Port `struct` to datafusion-functions (#9546)
yyy1000 Mar 11, 2024
abb0c1f
UDAF and UDWF support aliases (#9489)
lewiszlw Mar 11, 2024
db0a4d2
docs: fix extraneous char (#9560)
tshauck Mar 11, 2024
0d3d274
Initial commit (#9559)
mustafasrepo Mar 11, 2024
6354df6
Port `arrow_typeof` to datafusion-function (#9524)
yyy1000 Mar 11, 2024
02f7e1f
feat: Introduce convert Expr to SQL string API and basic feature (#9517)
backkem Mar 11, 2024
129e682
Review
ozankabak Mar 11, 2024
d2fc02b
Port `ArraySort` to `function-arrays` subcrate (#9551)
erenavsarogullari Mar 12, 2024
9f06c6f
Review visited
metesynnada Mar 12, 2024
e27dfe8
Merge remote-tracking branch 'upstream/main' into configuration
metesynnada Mar 12, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
64 changes: 64 additions & 0 deletions .github/workflows/docs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
on:
push:
branches:
- main
paths:
- .asf.yaml
- .github/workflows/docs.yaml
- docs/**

name: Deploy DataFusion site

jobs:
build-docs:
name: Build docs
runs-on: ubuntu-latest
steps:
- name: Checkout docs sources
uses: actions/checkout@v4

- name: Checkout asf-site branch
uses: actions/checkout@v4
with:
ref: asf-site
path: asf-site

- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: "3.10"

- name: Install dependencies
run: |
set -x
python3 -m venv venv
source venv/bin/activate
pip install -r docs/requirements.txt

- name: Build docs
run: |
set -x
source venv/bin/activate
cd docs
./build.sh

- name: Copy & push the generated HTML
run: |
set -x
cd asf-site/
rsync \
-a \
--delete \
--exclude '/.git/' \
../docs/build/html/ \
./
cp ../.asf.yaml .
touch .nojekyll
git status --porcelain
if [ "$(git status --porcelain)" != "" ]; then
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
git add --all
git commit -m 'Publish built docs triggered by ${{ github.sha }}'
git push || git push --force
fi
10 changes: 8 additions & 2 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -79,14 +79,20 @@ jobs:

# Ensure that the datafusion crate can be built with only a subset of the function
# packages enabled.
- name: Check function packages (array_expressions)
run: cargo check --no-default-features --features=array_expressions -p datafusion

- name: Check function packages (datetime_expressions)
run: cargo check --no-default-features --features=datetime_expressions -p datafusion

- name: Check function packages (encoding_expressions)
run: cargo check --no-default-features --features=encoding_expressions -p datafusion

- name: Check function packages (math_expressions)
run: cargo check --no-default-features --features=math_expressions -p datafusion

- name: Check function packages (array_expressions)
run: cargo check --no-default-features --features=array_expressions -p datafusion
- name: Check function packages (regex_expressions)
run: cargo check --no-default-features --features=regex_expressions -p datafusion

- name: Check Cargo.lock for datafusion-cli
run: |
Expand Down
21 changes: 14 additions & 7 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
exclude = ["datafusion-cli"]
members = [
"datafusion/common",
"datafusion/common_runtime",
"datafusion/core",
"datafusion/expr",
"datafusion/execution",
Expand Down Expand Up @@ -51,6 +52,11 @@ rust-version = "1.72"
version = "36.0.0"

[workspace.dependencies]
# We turn off default-features for some dependencies here so the workspaces which inherit them can
# selectively turn them on if needed, since we can override default-features = true (from false)
# for the inherited dependency but cannot do the reverse (override from true to false).
#
# See for more detaiils: https://github.com/rust-lang/cargo/issues/11329
arrow = { version = "50.0.0", features = ["prettyprint"] }
arrow-array = { version = "50.0.0", default-features = false, features = ["chrono-tz"] }
arrow-buffer = { version = "50.0.0", default-features = false }
Expand All @@ -60,19 +66,20 @@ arrow-ord = { version = "50.0.0", default-features = false }
arrow-schema = { version = "50.0.0", default-features = false }
arrow-string = { version = "50.0.0", default-features = false }
async-trait = "0.1.73"
bigdecimal = "0.4.1"
bigdecimal = "=0.4.1"
bytes = "1.4"
chrono = { version = "0.4.34", default-features = false }
ctor = "0.2.0"
dashmap = "5.4.0"
datafusion = { path = "datafusion/core", version = "36.0.0" }
datafusion-common = { path = "datafusion/common", version = "36.0.0" }
datafusion = { path = "datafusion/core", version = "36.0.0", default-features = false }
datafusion-common = { path = "datafusion/common", version = "36.0.0", default-features = false }
datafusion-common-runtime = { path = "datafusion/common_runtime", version = "36.0.0" }
datafusion-execution = { path = "datafusion/execution", version = "36.0.0" }
datafusion-expr = { path = "datafusion/expr", version = "36.0.0" }
datafusion-functions = { path = "datafusion/functions", version = "36.0.0" }
datafusion-functions-array = { path = "datafusion/functions-array", version = "36.0.0" }
datafusion-optimizer = { path = "datafusion/optimizer", version = "36.0.0" }
datafusion-physical-expr = { path = "datafusion/physical-expr", version = "36.0.0" }
datafusion-optimizer = { path = "datafusion/optimizer", version = "36.0.0", default-features = false }
datafusion-physical-expr = { path = "datafusion/physical-expr", version = "36.0.0", default-features = false }
datafusion-physical-plan = { path = "datafusion/physical-plan", version = "36.0.0" }
datafusion-proto = { path = "datafusion/proto", version = "36.0.0" }
datafusion-sql = { path = "datafusion/sql", version = "36.0.0" }
Expand All @@ -81,7 +88,7 @@ datafusion-substrait = { path = "datafusion/substrait", version = "36.0.0" }
doc-comment = "0.3"
env_logger = "0.11"
futures = "0.3"
half = "2.2.1"
half = { version = "2.2.1", default-features = false }
indexmap = "2.0.0"
itertools = "0.12"
log = "^0.4"
Expand All @@ -92,7 +99,7 @@ parquet = { version = "50.0.0", default-features = false, features = ["arrow", "
rand = "0.8"
rstest = "0.18.0"
serde_json = "1"
sqlparser = { version = "0.43.0", features = ["visitor"] }
sqlparser = { version = "0.44.0", features = ["visitor"] }
tempfile = "3"
thiserror = "1.0.44"
tokio = { version = "1.36", features = ["macros", "rt", "sync"] }
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
[API Docs](https://docs.rs/datafusion/latest/datafusion/) |
[Chat](https://discord.com/channels/885562378132000778/885562378132000781)

<img src="https://arrow.apache.org/datafusion/_images/DataFusion-Logo-Background-White.png" width="256" alt="logo"/>
<img src="./docs/source/_static/images/2x_bgwhite_original.png" width="512" alt="logo"/>

DataFusion is a very fast, extensible query engine for building high-quality data-centric systems in
[Rust](http://rustlang.org), using the [Apache Arrow](https://arrow.apache.org)
Expand Down Expand Up @@ -78,6 +78,7 @@ Default features:
- `array_expressions`: functions for working with arrays such as `array_to_string`
- `compression`: reading files compressed with `xz2`, `bzip2`, `flate2`, and `zstd`
- `crypto_expressions`: cryptographic functions such as `md5` and `sha256`
- `datetime_expressions`: date and time functions such as `to_timestamp`
- `encoding_expressions`: `encode` and `decode` functions
- `parquet`: support for reading the [Apache Parquet] format
- `regex_expressions`: regular expression functions, such as `regexp_match`
Expand Down
16 changes: 8 additions & 8 deletions benchmarks/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,12 @@
[package]
name = "datafusion-benchmarks"
description = "DataFusion Benchmarks"
version = "36.0.0"
version = { workspace = true }
edition = { workspace = true }
authors = ["Apache Arrow <[email protected]>"]
homepage = "https://github.com/apache/arrow-datafusion"
repository = "https://github.com/apache/arrow-datafusion"
license = "Apache-2.0"
authors = { workspace = true }
homepage = { workspace = true }
repository = { workspace = true }
license = { workspace = true }
rust-version = { workspace = true }

[features]
Expand All @@ -33,8 +33,8 @@ snmalloc = ["snmalloc-rs"]

[dependencies]
arrow = { workspace = true }
datafusion = { path = "../datafusion/core", version = "36.0.0" }
datafusion-common = { path = "../datafusion/common", version = "36.0.0" }
datafusion = { workspace = true, default-features = true }
datafusion-common = { workspace = true, default-features = true }
env_logger = { workspace = true }
futures = { workspace = true }
log = { workspace = true }
Expand All @@ -49,4 +49,4 @@ test-utils = { path = "../test-utils/", version = "0.1.0" }
tokio = { workspace = true, features = ["rt-multi-thread", "parking_lot"] }

[dev-dependencies]
datafusion-proto = { path = "../datafusion/proto", version = "36.0.0" }
datafusion-proto = { workspace = true }
3 changes: 2 additions & 1 deletion benchmarks/src/clickbench.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,14 @@
// under the License.

use std::path::Path;
use std::{path::PathBuf, time::Instant};
use std::path::PathBuf;

use datafusion::{
error::{DataFusionError, Result},
prelude::SessionContext,
};
use datafusion_common::exec_datafusion_err;
use datafusion_common::instant::Instant;
use structopt::StructOpt;

use crate::{BenchmarkRun, CommonOpt};
Expand Down
12 changes: 7 additions & 5 deletions benchmarks/src/parquet_filter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,17 +15,19 @@
// specific language governing permissions and limitations
// under the License.

use crate::AccessLogOpt;
use crate::{BenchmarkRun, CommonOpt};
use std::path::PathBuf;

use crate::{AccessLogOpt, BenchmarkRun, CommonOpt};

use arrow::util::pretty;
use datafusion::common::Result;
use datafusion::logical_expr::utils::disjunction;
use datafusion::logical_expr::{lit, or, Expr};
use datafusion::physical_plan::collect;
use datafusion::prelude::{col, SessionContext};
use datafusion::test_util::parquet::{ParquetScanOptions, TestParquetFile};
use std::path::PathBuf;
use std::time::Instant;
use datafusion_common::instant::Instant;

use structopt::StructOpt;

/// Test performance of parquet filter pushdown
Expand Down Expand Up @@ -179,7 +181,7 @@ async fn exec_scan(
debug: bool,
) -> Result<(usize, std::time::Duration)> {
let start = Instant::now();
let exec = test_file.create_scan(Some(filter)).await?;
let exec = test_file.create_scan(ctx, Some(filter)).await?;

let task_ctx = ctx.task_ctx();
let result = collect(exec, task_ctx).await?;
Expand Down
15 changes: 8 additions & 7 deletions benchmarks/src/sort.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,19 +15,20 @@
// specific language governing permissions and limitations
// under the License.

use crate::AccessLogOpt;
use crate::BenchmarkRun;
use crate::CommonOpt;
use std::path::PathBuf;
use std::sync::Arc;

use crate::{AccessLogOpt, BenchmarkRun, CommonOpt};

use arrow::util::pretty;
use datafusion::common::Result;
use datafusion::physical_expr::PhysicalSortExpr;
use datafusion::physical_plan::collect;
use datafusion::physical_plan::sorts::sort::SortExec;
use datafusion::prelude::{SessionConfig, SessionContext};
use datafusion::test_util::parquet::TestParquetFile;
use std::path::PathBuf;
use std::sync::Arc;
use std::time::Instant;
use datafusion_common::instant::Instant;

use structopt::StructOpt;

/// Test performance of sorting large datasets
Expand Down Expand Up @@ -174,7 +175,7 @@ async fn exec_sort(
debug: bool,
) -> Result<(usize, std::time::Duration)> {
let start = Instant::now();
let scan = test_file.create_scan(None).await?;
let scan = test_file.create_scan(ctx, None).await?;
let exec = Arc::new(SortExec::new(expr.to_owned(), scan));
let task_ctx = ctx.task_ctx();
let result = collect(exec, task_ctx).await?;
Expand Down
4 changes: 2 additions & 2 deletions benchmarks/src/tpch/convert.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,12 @@
// specific language governing permissions and limitations
// under the License.

use datafusion_common::instant::Instant;
use std::fs;
use std::path::{Path, PathBuf};
use std::time::Instant;

use datafusion::common::not_impl_err;
use datafusion::error::DataFusionError;

use datafusion::error::Result;
use datafusion::prelude::*;
use parquet::basic::Compression;
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/src/tpch/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ use arrow::datatypes::SchemaBuilder;
use datafusion::{
arrow::datatypes::{DataType, Field, Schema},
common::plan_err,
error::{DataFusionError, Result},
error::Result,
};
use std::fs;
mod run;
Expand Down
28 changes: 15 additions & 13 deletions benchmarks/src/tpch/run.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,14 @@
// specific language governing permissions and limitations
// under the License.

use super::get_query_sql;
use std::path::PathBuf;
use std::sync::Arc;

use super::{
get_query_sql, get_tbl_tpch_table_schema, get_tpch_table_schema, TPCH_TABLES,
};
use crate::{BenchmarkRun, CommonOpt};

use arrow::record_batch::RecordBatch;
use arrow::util::pretty::{self, pretty_format_batches};
use datafusion::datasource::file_format::csv::CsvFormat;
Expand All @@ -26,21 +32,16 @@ use datafusion::datasource::listing::{
ListingOptions, ListingTable, ListingTableConfig, ListingTableUrl,
};
use datafusion::datasource::{MemTable, TableProvider};
use datafusion::error::Result;
use datafusion::physical_plan::display::DisplayableExecutionPlan;
use datafusion::physical_plan::{collect, displayable};
use datafusion::prelude::*;
use datafusion_common::instant::Instant;
use datafusion_common::{DEFAULT_CSV_EXTENSION, DEFAULT_PARQUET_EXTENSION};
use log::info;

use std::path::PathBuf;
use std::sync::Arc;
use std::time::Instant;

use datafusion::error::Result;
use datafusion::prelude::*;
use log::info;
use structopt::StructOpt;

use super::{get_tbl_tpch_table_schema, get_tpch_table_schema, TPCH_TABLES};

/// Run the tpch benchmark.
///
/// This benchmarks is derived from the [TPC-H][1] version
Expand Down Expand Up @@ -253,7 +254,7 @@ impl RunOpt {
}
"parquet" => {
let path = format!("{path}/{table}");
let format = ParquetFormat::default().with_enable_pruning(Some(true));
let format = ParquetFormat::default().with_enable_pruning(true);

(Arc::new(format), path, DEFAULT_PARQUET_EXTENSION)
}
Expand Down Expand Up @@ -298,11 +299,12 @@ struct QueryResult {
// Only run with "ci" mode when we have the data
#[cfg(feature = "ci")]
mod tests {
use std::path::Path;

use super::*;

use datafusion::common::exec_err;
use datafusion::error::{DataFusionError, Result};
use std::path::Path;

use datafusion_proto::bytes::{
logical_plan_from_bytes, logical_plan_to_bytes, physical_plan_from_bytes,
physical_plan_to_bytes,
Expand Down
4 changes: 1 addition & 3 deletions ci/scripts/rust_example.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,13 @@
set -ex
cd datafusion-examples/examples/
cargo fmt --all -- --check
cargo check --examples

files=$(ls .)
for filename in $files
do
example_name=`basename $filename ".rs"`
# Skip tests that rely on external storage and flight
# todo: Currently, catalog.rs is placed in the external-dependence directory because there is a problem parsing
# the parquet file of the external parquet-test that it currently relies on.
# We will wait for this issue[https://github.com/apache/arrow-datafusion/issues/8041] to be resolved.
if [ ! -d $filename ]; then
cargo run --example $example_name
fi
Expand Down
Loading
Loading