-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker] PIP-322 Fix issue with rate limiters where rates can exceed limits initially and consumption pauses until token balance is positive #24012
Conversation
should we block this PR with the same reason you provided in another PR: #24002 where you said it's fixed as a part of #23930. and not prepared to take anyone's feedback and concerns. ? |
@rdhabalia I apologize for the communication issues on my end. I'm definitely taking all feedback and concerns seriously, though it does take me some time to address everything properly. Last week I was focused on preparing the upcoming 3.0.10, 3.3.5, and 4.0.3 release candidates which are urgent since many users are waiting for a Pulsar release without the critical vulnerability, CVE-2025-24970. After taking a deeper look into this issue, I've found that the pre-PIP-322 rate limiter implementation for dispatch rate limiting actually provides more consistent behavior due to several details in how dispatchers work and how AsyncTokenBucket works. The test case in this PR demonstrates this from an end-to-end perspective. While the excessive negative tokens problem is solved when using AsyncTokenBucket, there's another challenging issue that would require major changes to how dispatchers acquire tokens.
@rdhabalia I completely agree with you! I'll revert the changes in the PIP-322 rate limiter related to using AsyncTokenBucket in dispatch rate limiters and handle it in this PR. We'll postpone the 3.3.5 and 4.0.3 releases until this is resolved. I've already aborted the 3.3.5-candidate-1 and 4.0.3-candidate-1 release votes. |
b013512
to
ecee9b6
Compare
It seems that there's some other regression in both branch-3.3 and master that causes the first second rate to be 2x of the configured rate. Now that the pre-PIP-322 rate limiter is used in this PR, it shows that this isn't related to the use of AsyncTokenBucket or branch-4.0 changes since the problem is present in branch-3.3. (when I cherry-pick the changes) |
I pushed changes to add |
if that's true, then please rethink about this rate-limiter, token-bucket algorithm cab be exploited by malicious or greedy clients and your implementation to prevent it by introducing -ve token has serious implications of delaying or making topics unavailable without any reason. I implemented dispatch-rate limiter 3-4 years ago and we have been using it without any issue with highest accuracy and efficiency including system stability. Yes, I felt that it was very hard for you to take feedbacks even while discussing about this PIP and other other PIP proposed by some user. That's why currently I am not proposing the solution but let's revisit this implementation because in my opinion this implementation won't work because if -ve token stays then it will impact topic's availability and even if we put a cap then also refreshing token happens slowly that user will not be able to get consistent rate instead publiish/dispatch happens in chunk which is not expected behavior. so, please step back, rethink, and let's revisit. |
@rdhabalia Just to clarify: There's no issue in the AsyncTokenBucket core algorithm itself. The remaining problems after #23930 were caused by eventual consistency of getting the token balance. That was a mistake to assume that eventual consistent token balance would be useful. It's not useful for most usecases. I'll be addressing that usability issue with AsyncTokenBucket since token balance should always be consistent externally. As mentioned in the earlier PR comments, there is some change since branch-3.0 which causes the current test to fail. The rate for the first second is about 2x of the configured rate. I'll dig more into that. |
@rdhabalia One of the key behavior differences has been that by default AsyncTokenBucket adds tokens to the bucket very frequently (every 16 milliseconds) based on the consumed time and the "classic" RateLimiterImpl adds tokens once in a rate period (1 second). In this commit 9c0e4cf, I made changes to have similar behavior for the AsyncTokenBucket implementation. It makes the behavior very similar to RateLimiterImpl. The rate during the first second results in 2x rate. You can see the behavior if you wish by |
Seems like a usecase of Pluggable rate-limiter interface. please find below discussion where someone again refused to take the feedback. we have made interface for taking time |
…he problem is fixed - behavior is similar for the pre-PIP-322 implementation
Since the tests prove that problems in AsyncTokenBucket are fixed, after all, I'll make the AsyncTokenBucket implementation the default. Users that would like to switch to the previous implementation can configure it with |
…increase accuracy significantly
Please don't mix two things: bug-fix and new configuration. can you please create a separate PR for asyncToken algo fix and have a new PR to add previous implementation back. also, you might have to create for 2nd PR as it will add new configuration. |
@rdhabalia I understand your concern about separating bug fixes from configuration changes, which is generally good practice. However, given our time constraints with two urgent releases blocked by this PR, I'd really appreciate if we could review it as is. The AsyncTokenBucket class is relatively small (303 lines including the license header), and the actual fix for the consistency issue is quite targeted - just changing the return value to During #23930, I made some changes based on incorrect assumptions that I've now removed in this PR. While I understand that splitting this into multiple PRs might seem cleaner, it could actually create more confusion about review order and cause further delays. If you feel strongly about separating parts like the configurability of the dispatch rate implementation or bringing back the pre-PIP-322 implementation, I'm open to discussing it - I just want to make sure we can move forward efficiently given our release timeline. |
@lhotari they are two different chanegs. 1. fixing a bug and 2. add new implementation. *oh..I must admit, even getting simple things out with you is quite challenging.
and don't worry about delay because we did rush to merge this rate-limiting and made Pulsar unstable. there is no argument: we can't mix two things. also you have to create a PIP as it has configuration change, and we can't break that process. @codelipenghui @eolivelli is it unreasonable to ask two separate PRs for two different things? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Nice work and thanks for adding new unit test to cover the existing behaviors
pulsar-broker/src/main/java/org/apache/pulsar/broker/PulsarService.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/qos/DefaultMonotonicClock.java
Outdated
Show resolved
Hide resolved
@rdhabalia |
…vice.java Co-authored-by: Matteo Merli <[email protected]>
…ceed limits initially and consumption pauses until token balance is positive (#24012) Co-authored-by: Matteo Merli <[email protected]> (cherry picked from commit e547bea)
…ceed limits initially and consumption pauses until token balance is positive (#24012) Co-authored-by: Matteo Merli <[email protected]> (cherry picked from commit e547bea)
@merlimat First, this PR had a configuration change that required to creation of a PIP. second, It had two separate things a. fix bug in the async-token implementation and second revert old implementation. This PR has 52 files changes, it was super difficult to find out which file has a bug fix. I have been spending time to to help in reporting and fixing the issue. You didn't even care to listen and see the feedback because of which @lhotari started again working on it. I was raising many concerns but everything was completely ignored and you just went ahead and merged the PR. It's completely unacceptable. You block PRs from others without any reason, and here I tried to avoid that destructive practice and didn't block it but you just merged it. |
I see the documents here. But I cannot see what's the difference between them. The difference should be documented in a PIP from a high-level perspective. Or at least, if this is a missed part of PIP-322, we should update https://github.com/apache/pulsar/blob/master/pip/pip-322.md for why do we decide to add such config. I didn't have a chance to go through all conversations above, but it seems that the reason is that the implementation of PIP-322 was reported to have many bugs. Though it seems to be fixed by this PR, how can we know if this fix brings a new bug? The key point should be that this feature might not be used in production for some time, so this feature should still be considered as "unstable". Then we should provide an option for users to switch back to the legacy rate limiter. |
I can't go through all conversations above too, but i agree that we should be cautious about critical and huge change that modify tens of files and thousands of code. This pr is created just few days ago, more review are needed especially when there is still controvesy unsolved. |
…ceed limits initially and consumption pauses until token balance is positive (apache#24012) Co-authored-by: Matteo Merli <[email protected]> (cherry picked from commit e547bea) (cherry picked from commit 56233ea)
…ceed limits initially and consumption pauses until token balance is positive (apache#24012) Co-authored-by: Matteo Merli <[email protected]> (cherry picked from commit e547bea) (cherry picked from commit 56233ea)
Fixes #23920
Fixes #24001
Motivation
"PIP-322: Pulsar Rate Limiting Refactoring" changes caused a regression in dispatcher rate limiter and publish rate limiter behavior. When consumption starts, the rate can initially exceed limits, and consumption pauses until the token balance has caught up.
Due to the design of AsyncTokenBucket, the token balance is updated periodically as part of the usage calls. There is no separate scheduler required to update the token balance.
In the initial PIP-322 implementation, consistent token balance calculation was avoided since there was an assumption that the solution to calculate a consistent token balance every 16 milliseconds (configurable) was a sufficient solution. This assumption caused problems since in 16 milliseconds, the rate limiter could let a lot of traffic through which would be accounted by throttling when the token balance got updated. The tests
DispatchRateLimiterOverconsumingTest
andPublishRateLimiterOverconsumingTest
added in this PR reproduced this issue and will help prevent future regressions in this area.Modifications
addTokensResolutionNanos
which configures the interval when new tokens get added to the token balance.System.nanoTime()
Verifying this change
This PR adds a new test case
DispatchRateLimiterOverconsumingTest
which attempts to reproduce the issue described in #24001. The test runs on both PIP-322 implementation and the old dispatch rate limiter implementation.There's currently similar behavior for both implementations. The rate of the first second of the message receiving results in a rate that is 2x the configured rate. This issue doesn't reproduce exactly in the same way on branch-3.0 (experimental commit: lhotari@a602428), however also on branch-3.0, the behavior of the dispatch rate limiter is not always stable for the first 2 seconds. The test has been modified to allow this behavior and this problem can be investigated later since it's not a regression in the dispatch rate limiters.
This PR also reproduces the issue #23920 in a new test case
PublishRateLimiterOverconsumingTest
and demonstrates that this PR addresses the issue.JMH benchmark results
XPS 15 7590 (2019), i9-9980HK CPU @ 2.40GHz
About 440 million ops for the AsyncTokenBucketBenchmark with 100 threads.
On Mac M3 Max, the results are now poor, possibly due to
System.nanoTime()
bottleneck.The DefaultMonotonicClockBenchmark now measures
System.nanoTime()
. On MacOS, performance degrades when more threads are added. That's why the AsyncTokenBucketBenchmark numbers are also bad on MacOS.Previously, the implementation avoided calling System.nanoTime, which added complexity. Since Pulsar deployments target mainly Linux OS, there isn't a need to address the issue now.
The source of slowness of MacOS System.nanoTime can be found in the native code: https://github.com/openjdk/jdk21u/blob/b7d92cd0ae781de6c51ea965c16bd6e0c396e9f7/src/hotspot/os/bsd/os_bsd.cpp#L760-L782
Explained in https://serce.me/posts/2019-05-16-the-matter-of-time#the-nanos-of-the-current-time
Linux uses the Posix implementation in Java 21 and doesn't have such a scalability issue: https://github.com/openjdk/jdk21u/blob/b7d92cd0ae781de6c51ea965c16bd6e0c396e9f7/src/hotspot/os/posix/os_posix.cpp#L1414-L1438
This also PR adds instructions how to run the JMH benchmarks with async-profiler. Initial profiling confirms the assumption that MacOS
System.nanoTime
implementation is the bottleneck.Documentation
doc
doc-required
doc-not-needed
doc-complete