-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Decimal128 averaging code to be vectorizable (and easier to read) #6810
Conversation
@@ -37,45 +37,107 @@ pub fn get_accum_scalar_values_as_arrays( | |||
.collect::<Vec<_>>()) | |||
} | |||
|
|||
pub fn calculate_result_decimal_for_avg( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is called once per output row in the current code. I want to be able to perform the setup calculations once and then call the minimal code per row in a loop. Thus this refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it called per output row? Isn't it called in evaluate
? I think evaluate
is called only to get final aggregate value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it called in evaluate?
Yes
I think evaluate is called only to get final aggregate value?
Correct -- it is called for each group to get the final aggregate value. The ASCII art on this comment #4973 (comment) might help to understand why
/// (passed to `Self::try_new`) | ||
/// * count: total count, stored as a i128 (*NOT* a Decimal128 value) | ||
#[inline(always)] | ||
pub fn avg(&self, sum: i128, count: i128) -> Result<i128> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My goal is to call this function in a loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that you mean you will call this in a loop in new grouping code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly -- #6810 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll review this soon or later in weekend.
Thanks @viirya -- no rush from my perspective |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactoring looks good to me. I think the purpose is to call refactored function (avg
) in other code (not this change) in vectorizable way?
That is right -- in case you are interested, the location is here (and it shows very promising results) |
Co-authored-by: Liang-Chi Hsieh <[email protected]>
…read) (apache#6810) * Refactor Decimal128 averaging code to be vectorizable (and easier to read) * Update datafusion/physical-expr/src/aggregate/utils.rs Co-authored-by: Liang-Chi Hsieh <[email protected]> --------- Co-authored-by: Liang-Chi Hsieh <[email protected]>
Which issue does this PR close?
Related to #4973
Rationale for this change
While working on a POC for new grouping code, I found I needed to call the decimal averaging code, but could not do so in a performant (vectorized) way because it created a
ScalarValue
What changes are included in this PR?
calculate_result_decimal_for_avg
Are these changes tested?
Are there any user-facing changes?