Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decimal multiply kernel should not cause precision loss #5675

Closed
wants to merge 17 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Mar 21, 2023

Which issue does this PR close?

Closes #5674.

Rationale for this change

Currently decimal multiplication in DataFusion silently truncates precision of result. It happens generally for regular decimal multiplication which doesn't overflow. Looks like DataFusion uses incomplete decimal precision coercion rule from Spark to coerce sides of decimal multiplication (and other arithmetic operators). The coerced type on two sides of decimal multiplication is not the resulting decimal type of multiplication. This (and how we computes decimal multiplication in the kernels) leads to truncated precision in the result decimal type.

What changes are included in this PR?

This patch updates type coercion rule for decimal arithmetic operators. To prevent the type coercion rule applied on sides of decimal operators, we need one expression node to wrap coerced sides.

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the physical-expr Physical Expressions label Mar 21, 2023
@viirya viirya marked this pull request as draft March 21, 2023 21:42
@viirya
Copy link
Member Author

viirya commented Mar 23, 2023

There is more broader issue on type coercion. I'm working on this and will update.

@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sql SQL Planner sqllogictest SQL Logic Tests (.slt) substrait labels Mar 24, 2023
@viirya viirya force-pushed the fix_decimal_multiply_precision_loss branch 2 times, most recently from e5e3bcb to b108159 Compare March 24, 2023 06:25
@github-actions github-actions bot removed the sqllogictest SQL Logic Tests (.slt) label Mar 24, 2023
@viirya viirya force-pushed the fix_decimal_multiply_precision_loss branch from b108159 to 9ba3285 Compare March 24, 2023 07:19
@viirya
Copy link
Member Author

viirya commented Mar 27, 2023

rust-decimal only supports precision <= 28...

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Mar 27, 2023
@@ -125,7 +125,7 @@ select
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
sum(cast(l_extendedprice as decimal(12,2)) * (1 - l_discount) * (1 + l_tax)) as sum_charge,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This decimal calculation actually causes overflow on decimal multiplication because the kernel DataFusion uses doesn't allow precision loss. Previously it accidentally works because we implicitly truncate all input decimal value before multiplying...

Copy link
Member Author

@viirya viirya Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I add a cast to get the test result unchanged. To get rid of this cast later, we can use new multiply kernel at arrow-rs which allows precision-loss.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we can add a comment here referencing the new arrow-rs kernel / work. Otherwise this context will likely get forgotten

big_decimal_to_str(
BigDecimal::from_str(&Decimal::from_i128_with_scale(value, scale).to_string())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rust-decimal doesn't allow precision > 28. We can simply use Decimal128Type from arrow-rs to get string of decimal.

@viirya viirya force-pushed the fix_decimal_multiply_precision_loss branch from 32e5dfc to e727f5a Compare March 29, 2023 05:34
Comment on lines +3 to +6
cast(cast(sum(case
when nation = 'BRAZIL' then volume
else 0
end) / sum(volume) as mkt_share
end) as decimal(12,2)) / cast(sum(volume) as decimal(12,2)) as decimal(15,2)) as mkt_share
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current decimal divide kernel fails to compute this. The result decimal type is 38 scale. While divide_dyn_opt_decimal multiplies left array with 10.pow(scale), obviously it overflows because the multiply kernel doesn't allow precision-loss multiplication.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With inner casts wrapping two sum, we can use a result scale which won't cause overflow. The result decimal is correct.

But the answer file is fixed as decimal(15, 2) so the result still fails the check. I only can add another cast to make it as same decimal type to pass this result check.

@viirya viirya force-pushed the fix_decimal_multiply_precision_loss branch from b1292ba to fb9a27a Compare March 30, 2023 16:04
@viirya viirya marked this pull request as ready for review March 30, 2023 17:25
@viirya
Copy link
Member Author

viirya commented Mar 30, 2023

More changes than I expected, but finally making DataFusion all tests happy...

@alamb
Copy link
Contributor

alamb commented Mar 30, 2023

I'll try and find time to review this PR tomorrow

@alamb
Copy link
Contributor

alamb commented Mar 30, 2023

cc @liukun4515

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @viirya

I took a look through this code -- the coercion rule changes make sense to me. However, I don't undertand the need for Expr::PromotePrecision or the new data_type field on Expr. They don't seem to be related to improving the coercion for decimals

@@ -385,6 +408,20 @@ impl Cast {
}
}

/// Cast expression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please add documentation about what PromotePrecision does and in what circumstances it is needed?

@@ -125,7 +125,7 @@ select
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
sum(cast(l_extendedprice as decimal(12,2)) * (1 - l_discount) * (1 + l_tax)) as sum_charge,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we can add a comment here referencing the new arrow-rs kernel / work. Otherwise this context will likely get forgotten

@@ -39,6 +40,13 @@ pub trait ExprSchemable {

/// cast to a type with respect to a schema
fn cast_to<S: ExprSchema>(self, cast_to_type: &DataType, schema: &S) -> Result<Expr>;

/// promote to a type with respect to a schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add some comments to clarify what the difference between "promote" and "cast" is?

@@ -234,12 +236,33 @@ pub struct BinaryExpr {
pub op: Operator,
/// Right-hand side of the expression
pub right: Box<Expr>,
/// The data type of the expression, if known
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the reason for this change -- is it no longer possible to calculate the output type of a BinaryExpr from its inputs?

If the desired output type is different then couldn't we cast the expr to the type?

@viirya
Copy link
Member Author

viirya commented Apr 3, 2023

the coercion rule changes make sense to me. However, I don't undertand the need for Expr::PromotePrecision or the new data_type field on Expr. They don't seem to be related to improving the coercion for decimals

The coercion rule that modifies sides of arithmetic op is not idempotent. Multiple runs of the rule will change it to incorrect result. So we need something to prevent the rule on coerced sides. PromotePrecision is such a thing, it's just a wrapper for the purpose.

For the new data_type on BinaryExpr. The coerced type of decimal arithmetic op is not the same as the result type of it as you can see. So we cannot simply take coerced type of left/right sides and use it as result type. We cannot compute the result type on-the-fly in physical BinaryExec because it depends on original datatypes of sides of the op, but we only have coerced at the moment. So we need to record the result type so we can get it when computing the decimal arithmetic result.

@alamb
Copy link
Contributor

alamb commented Apr 4, 2023

The coercion rule that modifies sides of arithmetic op is not idempotent. Multiple runs of the rule will change it to incorrect result. So we need something to prevent the rule on coerced sides. PromotePrecision is such a thing, it's just a wrapper for the purpose.

For the new data_type on BinaryExpr. The coerced type of decimal arithmetic op is not the same as the result type of it as you can see. So we cannot simply take coerced type of left/right sides and use it as result type. We cannot compute the result type on-the-fly in physical BinaryExec because it depends on original datatypes of sides of the op, but we only have coerced at the moment. So we need to record the result type so we can get it when computing the decimal arithmetic result.

This makes sense -- thank you

My biggest concern with the approach in this PR is that it adds some special case for decimal (which seems reasonable in itself) but that special case bleeds out over the rest of the code / plans, which is probably why it got so big.

I wonder would it be possible to use PromotePrecision only and avoid modifying Expr::Binary to keep the behavior more localized to coercion.

Perhaps something like the following.

/// The target decimal type the expression should be output to
struct PromotePrecision {
  expr: Box<Expr>,
  precision: u8,
  scale: u8,
}

One of the downsides of this approach is that PromotePrecision is now clearly special purpose for Decimal. I am not sure if it would have other uses in the future 🤔

@viirya
Copy link
Member Author

viirya commented Apr 4, 2023

Keeping precision/scale in PromotePrecision seems okay. I don't think PromotePrecision will be used for other uses. It is a specified wrapper (based on its name) for decimal arithmetic type coercion only.

@alamb
Copy link
Contributor

alamb commented Apr 4, 2023

Keeping precision/scale in PromotePrecision seems okay. I don't think PromotePrecision will be used for other uses. It is a specified wrapper (based on its name) for decimal arithmetic type coercion only.

I think then it would be best to keep the data type in Expr::PromotePrecision rather than adding it to BinaryExpr

@liukun4515
Copy link
Contributor

cc @liukun4515

Thanks for your mention

I will take a look it carefully tomorrow.

@jackwener has moved the type coercion to the analysis phase.

Maybe we have some same comments about the rule of type coercion.

@viirya
Copy link
Member Author

viirya commented Apr 5, 2023

Yea, seems I need to rebase this.

@mingmwang
Copy link
Contributor

Does this new PromotePrecision expr only apply to Decimal type?
I remember in the SparkSQL there was a similar PromotePrecision, not sure whether it is for the same purpose.
And looks like in the latest Spark, it is removed or renamed.

@viirya
Copy link
Member Author

viirya commented Apr 7, 2023

Does this new PromotePrecision expr only apply to Decimal type?
I remember in the SparkSQL there was a similar PromotePrecision, not sure whether it is for the same purpose.
And looks like in the latest Spark, it is removed or renamed.

It is for Decimal type only. Yea, the idea is from Spark.

It was removed in latest Spark as the precision promotion is moved to where the computation happens. I was not sure if moving it into decimal kernels is a good idea for DataFusion. Let me re-think about it and maybe refactor this.

@alamb
Copy link
Contributor

alamb commented Apr 11, 2023

marking as draft while it gets worked on

@alamb alamb marked this pull request as draft April 11, 2023 18:01
@viirya
Copy link
Member Author

viirya commented Apr 11, 2023

Thanks @alamb. I'll find some time on this. :)

@viirya
Copy link
Member Author

viirya commented Apr 12, 2023

This new approach is proposed at #5980.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt) substrait
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Decimal multiply kernel may cause precision loss
4 participants