-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Throttle scan to wait for topn filter #55660
base: main
Are you sure you want to change the base?
Conversation
} | ||
} | ||
return accept; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most risky bug in this code is:
Division by zero when calculating backPressureThrottleTime
.
You can modify the code like this:
this.backPressureThrottleTime =
this.backPressureThrottleTimeUpperBound / Math.max(this.backPressureNumRows, 1);
This change prevents division by zero by using Math.max(this.backPressureNumRows, 1)
instead of Math.min(this.backPressureNumRows, 1)
, ensuring the denominator is never zero.
double _current_selectivity{1.0}; | ||
}; | ||
|
||
} // namespace starrocks::pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most risky bug in this code is:
Incorrect calculation of _current_throttle_deadline
in should_throttle()
, which results in the use of milliseconds instead of correctly casting steady_clock::now
.
You can modify the code like this:
#include <chrono> // add this to the includes to use std::chrono_literals
if (_phase == PH_UNTHROTTLE) {
if (_current_num_rows <= _num_rows_limiter.limit()) {
return false;
}
_phase = PH_THROTTLE;
_current_throttle_deadline =
(steady_clock::now() + milliseconds(_throttle_time_limiter.limit())).time_since_epoch().count();
return true;
}
@@ -261,7 +316,6 @@ class ScanOperatorFactory : public SourceOperatorFactory { | |||
|
|||
protected: | |||
ScanNode* const _scan_node; | |||
|
|||
std::shared_ptr<workgroup::ScanTaskGroup> _scan_task_group; | |||
}; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most risky bug in this code is:
Potential segmentation fault due to a null pointer dereference when accessing runtime_bloom_filters()
.
You can modify the code like this:
void evaluate_topn_runtime_filters(Chunk* chunk) {
if (chunk == nullptr || chunk->is_empty() || !_topn_filter_back_pressure) {
return;
}
auto* topn_runtime_filters = runtime_bloom_filters();
if (topn_runtime_filters != nullptr) {
auto input_num_rows = chunk->num_rows();
_init_topn_runtime_filter_counters();
topn_runtime_filters->evaluate(chunk, _topn_filter_eval_context);
_topn_filter_back_pressure->inc_num_rows(input_num_rows);
if (_topn_filter_eval_context.selectivity.empty()) {
_topn_filter_back_pressure->update_selectivity(1.0);
} else {
double selectivity = _topn_filter_eval_context.selectivity.begin()->first;
if (input_num_rows > 1024) {
_topn_filter_back_pressure->update_selectivity(selectivity);
}
}
}
}
32ac329
to
0bc798d
Compare
Signed-off-by: satanson <[email protected]>
0bc798d
to
e7db69b
Compare
|
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]❌ fail : 16 / 124 (12.90%) file detail
|
[FE Incremental Coverage Report]❌ fail : 22 / 49 (44.90%) file detail
|
Why I'm doing:
High associative hash join is time-consuming, since it magnifies data volume many times. if there is a topn operator above it, then we can use this topn filter generated by topn operator to reduce input data volume of the hash join; however, when perform tests on this, the scan operator below hash join always transfers data to the hash join so fast that make the topn filter take effects on scan operator too late, so input data volume of the hash join is not reduced successfully, so we design a back pressure mechanism that works as follows:
Test
when topn filter back pressure mechanism is opened,data volume of left side of hash join is reduced to 1/60.
![image](https://private-user-images.githubusercontent.com/1721321/410801008-19598b26-7205-4ffa-8a47-ba162220b516.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5OTY1MjUsIm5iZiI6MTczODk5NjIyNSwicGF0aCI6Ii8xNzIxMzIxLzQxMDgwMTAwOC0xOTU5OGIyNi03MjA1LTRmZmEtOGE0Ny1iYTE2MjIyMGI1MTYucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIwOCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMDhUMDYzMDI1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9YTI2MDlhMGRjNWNhYzk2NWRhMDI1ZWI2MTEwOWU5M2UzYWM5NGNjNWFhMzEzYjZhYzgzYjA4ZjRkNmUxNWVjNyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.1JfbCaDsYJ1QbxQb_FnLsyFvtWTxzEQmT9yW6t0sMXU)
when it is closed
![image](https://private-user-images.githubusercontent.com/1721321/410801195-aadb0d4e-1507-43b3-8512-c8c30035e7dd.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5OTY1MjUsIm5iZiI6MTczODk5NjIyNSwicGF0aCI6Ii8xNzIxMzIxLzQxMDgwMTE5NS1hYWRiMGQ0ZS0xNTA3LTQzYjMtODUxMi1jOGMzMDAzNWU3ZGQucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIwOCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMDhUMDYzMDI1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MzQ3OTZiMGMyMDNkMjdhZjcyMTRmNzAwODhiZTg4NTcwY2Y2YzdjNzQ5MTY3ZjFjNTJiYTI5NTI5ZWZlYjgzOCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.h2EkEq6w12CpKwxWWDw5T9_c6EKvYVFlQuMA3so_TQk)
data volume of left side of hash join is reduced to 1/60.
disable opt means turn off the optimization;
enable opt(60ms) means turn on the optimization; and back_pressure_throttle_time_upper_bound=60, i.e. total throttle time does not exceeds 60ms.
What I'm doing:
Fixes #issue
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: