Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnwrapCastInComparison produces incorrect results #14303

Open
jonahgao opened this issue Jan 26, 2025 · 6 comments
Open

UnwrapCastInComparison produces incorrect results #14303

jonahgao opened this issue Jan 26, 2025 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@jonahgao
Copy link
Member

jonahgao commented Jan 26, 2025

Describe the bug

I found that UnwrapCastInComparison always assumes the cast operation can succeed, but when it cannot, it results in incorrect optimization results.

To Reproduce

Run query in CLI (compiled from the latest main: f775791)

DataFusion CLI v44.0.0
> with t as (select 1000000 as a) select try_cast(a as smallint) > 1 from t;
+----------------+
| t.a > Int64(1) |
+----------------+
| true           |
+----------------+
1 row(s) fetched.
Elapsed 0.008 seconds.

> with t as (select 1000000 as a) select cast(a as smallint) > 1 from t;
+----------------+
| t.a > Int64(1) |
+----------------+
| true           |
+----------------+
1 row(s) fetched.
Elapsed 0.007 seconds.

Expected behavior

When optimizations are disabled, the above queries will produce different results, which are correct.

> set datafusion.optimizer.max_passes=0;
0 row(s) fetched.
Elapsed 0.003 seconds.

> with t as (select 1000000 as a) select try_cast(a as smallint) > 1 from t;
+----------------+
| t.a > Int64(1) |
+----------------+
| NULL           |
+----------------+
1 row(s) fetched.
Elapsed 0.006 seconds.

> with t as (select 1000000 as a) select cast(a as smallint) > 1 from t;
Arrow error: Cast error: Can't cast value 1000000 to type Int16

Additional context

I don't think this is a very urgent bug because both Spark and DuckDB have similar issues.

@jonahgao jonahgao added the bug Something isn't working label Jan 26, 2025
@Spaarsh
Copy link
Contributor

Spaarsh commented Jan 26, 2025

I have analyzed the code in unwrap_cast_in_comparison.rs. I think adding conditional statement in OptimizerRule implementation of the UnwrapCastInComparison should fix this.

@Spaarsh
Copy link
Contributor

Spaarsh commented Jan 26, 2025

take

@alamb
Copy link
Contributor

alamb commented Jan 28, 2025

@Spaarsh
Copy link
Contributor

Spaarsh commented Feb 6, 2025

@alamb thanks! In order to test if that PR is causing this issue, I made a separate branch and reverted the commit (e0f9f65) that merged that change into main. The incorrect results persisted!

@jonahgao
Copy link
Member Author

jonahgao commented Feb 6, 2025

We might need to restrict this optimization to infallible casts, such as when the cast target type is wider than the input expression.

@Spaarsh
Copy link
Contributor

Spaarsh commented Feb 6, 2025

I think I have found out the main problem here. I added a few debugging statements to print the DataTypes as the Optimizer code is running, here is what I found:

> with t as (select 1000000 as a) select try_cast(a as smallint) > 1 from t;
Input value: Int64(1), Input type: Int64, Target type: Int16
Target type: Int16, Target min: -32768, Target max: 32767
Int64(1) to Int16
Checking if 1 is in range [-32768, 32767]
Input value: Int16(1), Input type: Int16, Target type: Int64
Target type: Int64, Target min: -9223372036854775808, Target max: 9223372036854775807
Int16(1) to Int64
Checking if 1 is in range [-9223372036854775808, 9223372036854775807]
+----------------+
| t.a > Int64(1) |
+----------------+
| true           |
+----------------+
1 row(s) fetched. 
Elapsed 0.002 seconds.

Instead of checking if the value held by a can be converted to smallint, it checks if the 1 can converted to smallint which returns true!

I further verified this by running a command where the right-hand side value is incompatible with the target datatype, and I got this:

> with t as (select 1000000 as a) select try_cast(a as smallint) > 10000000 from t;
Input value: Int64(10000000), Input type: Int64, Target type: Int16
Target type: Int16, Target min: -32768, Target max: 32767
Int64(10000000) to Int16
Checking if 10000000 is in range [-32768, 32767]
Input value: Int64(10000000), Input type: Int64, Target type: Int16
Target type: Int16, Target min: -32768, Target max: 32767
Int64(10000000) to Int16
Checking if 10000000 is in range [-32768, 32767]
+-----------------------+
| t.a > Int64(10000000) |
+-----------------------+
| NULL                  |
+-----------------------+
1 row(s) fetched. 
Elapsed 0.002 seconds.

Since 10000000 can't be converted to smallint, the expected output is seen now.

Same for cast :

> with t as (select 1000000 as a) select cast(a as smallint) > 1 from t;
Input value: Int64(1), Input type: Int64, Target type: Int16
Target type: Int16, Target min: -32768, Target max: 32767
Int64(1) to Int16
Checking if 1 is in range [-32768, 32767]
Input value: Int16(1), Input type: Int16, Target type: Int64
Target type: Int64, Target min: -9223372036854775808, Target max: 9223372036854775807
Int16(1) to Int64
Checking if 1 is in range [-9223372036854775808, 9223372036854775807]
+----------------+
| t.a > Int64(1) |
+----------------+
| true           |
+----------------+
1 row(s) fetched. 
Elapsed 0.002 seconds.
> with t as (select 1000000 as a) select cast(a as smallint) > 10000000 from t;
Input value: Int64(10000000), Input type: Int64, Target type: Int16
Target type: Int16, Target min: -32768, Target max: 32767
Int64(10000000) to Int16
Checking if 10000000 is in range [-32768, 32767]
Input value: Int64(10000000), Input type: Int64, Target type: Int16
Target type: Int16, Target min: -32768, Target max: 32767
Int64(10000000) to Int16
Checking if 10000000 is in range [-32768, 32767]
Arrow error: Cast error: Can't cast value 1000000 to type Int16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants