Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuration adjustment in Datafusion #4

Closed
wants to merge 152 commits into from
Closed

Conversation

metesynnada
Copy link

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core label Feb 15, 2024
metesynnada and others added 6 commits February 26, 2024 14:19
* support FixedSizeList Type Coercion

* add allow null type coercion parameter

* support null column in FixedSizeList

* Add test

* Add tests for cardinality with fixed size lists

* chore

* fix ci

* add comment

* Fix array_element function signature

* Remove unused imports and simplify code

* Fix array function signatures and behavior

* fix conflict

* fix conflict

* add tests for FixedSizeList

* remove unreacheable null check

* simplify the code

* remove null checking

* reformat output

* simplify code

* add tests for array_dims

* Refactor type coercion functions in datafusion/expr module
…ache#9342)

* feat: expand `unnest`  to accept any single array expression

* unnest null

* review feedback
codonnell and others added 5 commits February 26, 2024 09:16
* fix: downgrade tonic for arrow compatibility

Tonic 0.10 and 0.11 are not API compatible.
Arrow 50 depends on tonic 0.10, and datafusion must match that dependency for compatibility reasons.

* feat: make nested examples runnable

cargo run --example doesn't support nested examples. Nested examples need an explicit block to be runnable.

* fix: fix custom catalog typo and formatting

* docs: add note about upgrading tonic with arrow

* ci: add cargo check for all examples
…pache#9310)

* docs: update parquet_sql_multiple_files.rs with a relative path ex

* style: run cargo fmt

* docs: update comment

* docs: better
* tests: adds tests associated with apache#9237

* style: clippy
* feature: support nvl(ifnull) function

* add sqllogictest

* add docs entry

* Update docs/source/user-guide/sql/scalar_functions.md

Co-authored-by: Jonah Gao <[email protected]>

* fix some code

* fix docs

---------

Co-authored-by: Jonah Gao <[email protected]>
* feat: move abs to datafusion_functions

* fix proto

* fix proto

* fix CI vendored code

* Fix proto

* add support type

* fix signature

* fix typo

* fix test cases

* disable a test case

* remove old code from math_expressions

* feat: add test

* fix clippy

* use unknown for proto

* fix unknown proto
@@ -926,9 +928,30 @@ config_field!(String);
config_field!(bool);
config_field!(usize);
config_field!(f64);
config_field!(u8);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to update default implementation for u8?

let (key, rem) = key.split_once('.').unwrap_or((key, ""));
assert_eq!(key, "kafka");
self.properties.insert(rem.to_owned(), value.to_owned());
println!("key: {}, value: {}", rem, value);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This print is leftover

.set("format.parquet.write_batch_size", "10")
.unwrap();
assert_eq!(table_config.format.parquet.global.write_batch_size, 10);
table_config.set("kafka.bootstrap.servers", "mete").unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable maybe leftover

copy_options,
Default::default(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, you can use HashMap::new() instead to be more explicit

copy_options,
Default::default(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to comment above

copy_options,
Default::default(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to comment above

Ted-Jiang and others added 27 commits March 9, 2024 11:28
* [minor] extract collect file statistics method and add doc

* fix clippy

* fix doc
)

* adding prototype

a

* fix clippy

* modify test

* fix clippy

* fix:clippy

* optimize code

* change tests
* Minior: Improve log expr description

* Minior: Improve log expr description
* Add projection to HashJoinExec.

* fix hashjoin display

* pushdown coalesce batch & fix try_new

* fix slt

* fix test

* fmt & clippy

* fix into_proto

* rm coalesce_batches_pushdown & add one pushdown before CoalesceBatches

* fix slt.

* fmt & clippy

* apply projection to equivalence_properties and output_ordering.

* fix proto.

* fix merge.

* Add more comments and some tests.

* fix test & projection stats

* use collect_columns & fix schema usage
* `FunctionFactory` usage example

* update test to use the same function factory

* Add entry to examples/README.md

* Add SessionContext::with_function_factory

* Update doc and example

* clippy

---------

Co-authored-by: Andrew Lamb <[email protected]>
apache#9435)

* Move date_part, date_trunc, date_bin functions to datafusion-functions

* I do not understand why the logical plan changed but updating the explain text to reflect the change. The physical plan is unchanged.

* Fix fmt

* Improvements to remove datafusion-functions dependency from sq and physical-expr

* Fix function arguments for date_bin, date_trunc and date_part.

* Fix projection change. Add new test date_bin monotonicity

---------

Co-authored-by: Mustafa Akur <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
* fix: support two argument TRIM

* refactor: fix auto space remove

* test: fix test
…ement` and `array_slice` functions (apache#9492)

* remove physical range and index

Signed-off-by: jayzhan211 <[email protected]>

* remove proto

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
* feat: function name hints for UDFs

* refactor: rebase fn to xxx_names()

* style: fix clippy

* style: fix clippy

* Add test

---------

Co-authored-by: Andrew Lamb <[email protected]>
)

* Minor: Improve documentation for registering `AnalyzerRule`

* Apply suggestions from code review

Co-authored-by: comphead <[email protected]>

* Update datafusion/core/src/physical_optimizer/optimizer.rs

---------

Co-authored-by: comphead <[email protected]>
* Extend argument types for udf return type function

Signed-off-by: jayzhan211 <[email protected]>

* rm incorrect assumption

Signed-off-by: jayzhan211 <[email protected]>

* possible empty types

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
…atafusion-functions-array crate (apache#9504)

* move array function

* fix rebase

Signed-off-by: jayzhan211 <[email protected]>

* cleanup to trigger rerun

Signed-off-by: jayzhan211 <[email protected]>

* split functions to different files

Signed-off-by: jayzhan211 <[email protected]>

* fix

Signed-off-by: jayzhan211 <[email protected]>

* fix conflict

Signed-off-by: jayzhan211 <[email protected]>

* clippy

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>
Co-authored-by: jayzhan211 <[email protected]>
* Issue-9497 - Port StringToArray to function-arrays

* Issue-9497 - Fix formatting issues

* Issue-9497 - Format expressions.md documentation
* initial port

* fix clippy

* update cargo in cli

* remove dependency

* resolve conflict

* cargo update in CLI
* UDAF and UDWF support aliases

* Add tests for udaf and udwf aliases

* Fix clippy lint
* Port arrowtypeof

* fmt

* fix test case

* revert test change

---------

Co-authored-by: Andrew Lamb <[email protected]>
…che#9517)

* feat: convert Expr to SQL string

* fix: add license headers

* fix: make Unparser and Dialect public

* Update datafusion/sql/src/unparser/dialect.rs

* fmt

---------

Co-authored-by: Andrew Lamb <[email protected]>
* Issue-9550 - Port ArraySort to function-arrays subcrate

* Issue-9550 - Add test coverage on roundtrip_logical_plan

* Issue-9550 - Address review comments
@ozankabak
Copy link
Collaborator

Merged upstream.

@ozankabak ozankabak closed this Mar 18, 2024
mustafasrepo pushed a commit that referenced this pull request Aug 9, 2024
* Make `CommonSubexprEliminate` top-down like

* fix top-down recursion, fix unit tests to use real a Optimizer to verify behavior on plans

* Extract result of `find_common_exprs` into a struct (#4)

* Extract the result of find_common_exprs into a struct

* Make naming consistent

---------

Co-authored-by: Andrew Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.