-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decimal multiply kernel should not cause precision loss #5980
Decimal multiply kernel should not cause precision loss #5980
Conversation
Different to #5675, this doesn't add new expression node |
54397f9
to
343ca79
Compare
There is a compilation error. Going to fix it at #6029. |
cb7e326
to
0a88516
Compare
Some(99193548387), // 0.99193548387 | ||
None, | ||
None, | ||
Some(100813008130), // 1.0081300813 | ||
Some(100000000000), // 1.0 | ||
], | ||
21, | ||
11, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, this division losses precision. Now we get it back.
@@ -4801,18 +4878,18 @@ mod tests { | |||
|
|||
// subtract: decimal array subtract int32 array | |||
let schema = Arc::new(Schema::new(vec![ | |||
Field::new("b", DataType::Int32, true), | |||
Field::new("a", DataType::Decimal128(10, 2), true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously the field order is incorrect. But as we did coerce type on both side of the op anyway, so it still worked before. Now we don't coerce the decimal field (which is wrongly bound to Int32Array) before into binary expression, so wrong field causes an error.
query TTRRRRRRRI | ||
select | ||
l_returnflag, | ||
l_linestatus, | ||
sum(l_quantity) as sum_qty, | ||
sum(l_extendedprice) as sum_base_price, | ||
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, | ||
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, | ||
sum(cast(l_extendedprice as decimal(12,2)) * (1 - l_discount) * (1 + l_tax)) as sum_charge, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cast(cast(sum(case | ||
when nation = 'BRAZIL' then volume | ||
else 0 | ||
end) as decimal(12,2)) / cast(sum(volume) as decimal(12,2)) as decimal(15,2)) as mkt_share |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
big_decimal_to_str( | ||
BigDecimal::from_str(&Decimal::from_i128_with_scale(value, scale).to_string()) | ||
BigDecimal::from_str(&Decimal128Type::format_decimal(value, *precision, *scale)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This deals with the decimal precision issue without additional |
I wonder if this already fixes #4024 |
Yea, just verified locally that this can pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Thanks @Dandandan |
Let's wait ~24hrs so other reviewers can have a chance. |
FYI @mingmwang @andygrove this PR also has some effect on performance, as casting is changed (mostly reduced). |
Ran the benchmarks for TPCH(SF=1) in memory. Performance is mostly the same, except a ~30% improvement for q1 compared to main 🚀 |
🎉 |
Which issue does this PR close?
Closes #5674.
Closes #3387.
Closes #4024.
Rationale for this change
Currently decimal multiplication in DataFusion silently truncates precision of result. It happens generally for regular decimal multiplication which doesn't overflow. Looks like DataFusion uses incomplete decimal precision coercion rule from Spark to coerce sides of decimal multiplication (and other arithmetic operators). The coerced type on two sides of decimal multiplication is not the resulting decimal type of multiplication. This (and how we computes decimal multiplication in the kernels) leads to truncated precision in the result decimal type.
What changes are included in this PR?
TypeCoercion
to physical binary operatorAre these changes tested?
Are there any user-facing changes?