Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Throttle scan to wait for topn filter (backport #55660) #55880

Merged
merged 1 commit into from
Feb 14, 2025

Conversation

satanson
Copy link
Contributor

@satanson satanson commented Feb 13, 2025

(cherry picked from commit f86795b)

Why I'm doing:

High associative hash join is time-consuming, since it magnifies data volume many times. if there is a topn operator above it, then we can use this topn filter generated by topn operator to reduce input data volume of the hash join; however, when perform tests on this, the scan operator below hash join always transfers data to the hash join so fast that make the topn filter take effects on scan operator too late, so input data volume of the hash join is not reduced successfully, so we design a back pressure mechanism that works as follows:

  1. scan operator allows rows of 10 times of limit+offset in topn operator to pass through to hash join operator, then wait for a small period of time(e.g. 100ms), we call this period the throttle period.
  2. scan operator has_output return false in throttle period, so scan operator does not transfer any data, just give a chance to topn operator to generate a topn filter.
  3. when current throttle period ends, scan operator use topn filter to filter its output data, if the topn filter is high selective, then scan operator can terminate this back pressure mechanism, just use this topn filter to filter incoming data.
  4. otherwise, scan operator begins an another throttle period.
  5. scan operator maybe begin throttle period for several times which controlled by the session variable: back_pressure_back_rounds, the throttle period equals to back_pressure_throttle_time_upper_bound/back_pressure_back_rounds.
  6. topn_filter_back_pressure_mode is used to turn on/off the back pressure mechanism.

Test

when topn filter back pressure mechanism is opened,data volume of left side of hash join is reduced to 1/60.
image

when it is closed
image

data volume of left side of hash join is reduced to 1/60.

  1. pipeline_dop=0, concurrency=20,back_pressure_back_rounds=3
+===================+==============+
| cases             | latency(sec) |
+===================+==============+
| disable opt       | 11.692       |
| enable opt(60ms)  | 5.920        |
| enable opt(100ms) | 5.853        |
| enable opt(300ms) | 5.959        |
| enable opt(600ms) | 6.279        |
+-------------------+--------------+

disable opt means turn off the optimization;
enable opt(60ms) means turn on the optimization; and back_pressure_throttle_time_upper_bound=60, i.e. total throttle time does not exceeds 60ms.

  1. pipeline_dop=1, back_pressure_throttle_time_upper_bound=300,back_pressure_back_rounds=10
+=============+==================+=================+=========+
| concurrency | disable opt(sec) | enable opt(sec) | speedup |
+=============+==================+=================+=========+
| 1           | 0.991            | 0.953           | 1.0X    |
| 10          | 4.089            | 2.831           | 1.4X    |
| 20          | 7.735            | 5.034           | 1.5X    |
| 40          | 15.210           | 9.688           | 1.5X    |
| 60          | 22.760(OOM)      | 14.600          | 1.5X    |
+-------------+------------------+-----------------+---------+

What I'm doing:

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Signed-off-by: satanson <[email protected]>
(cherry picked from commit f86795b)
Signed-off-by: satanson <[email protected]>
@andyziye andyziye merged commit 49c216e into branch-3.4 Feb 14, 2025
37 checks passed
@andyziye andyziye deleted the branch-3.4_topn_filter_throttle_scan branch February 14, 2025 06:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants