Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.x: TokenAwarePolicy fix bad perf of ReplicaOrdering.RANDOM #427

Merged

Conversation

dkropachev
Copy link
Collaborator

@dkropachev dkropachev commented Jan 27, 2025

ReplicaOrdering.RANDOM shows up to 20% worse throughput. Switching from java.util.Random to ThreadLocalRandom showed 20% improvement:
After PR:

=======================
Results:
Op rate                   :   28,232 op/s  [WRITE: 28,232 op/s]
Partition rate            :   28,232 pk/s  [WRITE: 28,232 pk/s]
Row rate                  :   28,232 row/s [WRITE: 28,232 row/s]
Latency mean              :    0.3 ms [WRITE: 0.3 ms]
Latency median            :    0.2 ms [WRITE: 0.2 ms]
Latency 95th percentile   :    1.7 ms [WRITE: 1.7 ms]
Latency 99th percentile   :    2.2 ms [WRITE: 2.2 ms]
Latency 99.9th percentile :    3.7 ms [WRITE: 3.7 ms]
Latency max               :   29.0 ms [WRITE: 29.0 ms]
Total partitions          :  1,125,000 [WRITE: 1,125,000]
Total errors              :          0 [WRITE: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 00:00:39

Before :

END
=======================
Results:
Op rate                   :   26,094 op/s  [WRITE: 26,094 op/s]
Partition rate            :   26,094 pk/s  [WRITE: 26,094 pk/s]
Row rate                  :   26,094 row/s [WRITE: 26,094 row/s]
Latency mean              :    0.4 ms [WRITE: 0.4 ms]
Latency median            :    0.2 ms [WRITE: 0.2 ms]
Latency 95th percentile   :    1.9 ms [WRITE: 1.9 ms]
Latency 99th percentile   :    2.5 ms [WRITE: 2.5 ms]
Latency 99.9th percentile :    4.0 ms [WRITE: 4.0 ms]
Latency max               :   22.0 ms [WRITE: 22.0 ms]
Total partitions          :  1,125,000 [WRITE: 1,125,000]
Total errors              :          0 [WRITE: 0]
Total GC count            : 0
Total GC memory           : 0.000 KiB
Total GC time             :    0.0 seconds
Avg GC time               :    NaN ms
StdDev GC time            :    0.0 ms
Total operation time      : 00:00:43

Fixes: scylladb/cassandra-stress#38

@dkropachev dkropachev requested a review from Bouncheck January 27, 2025 02:42
`ReplicaOrdering.RANDOM` shows up to `20%` worse throughput.
Switching from `java.util.Random `to `ThreadLocalRandom` showed `20%`
improvement.
@dkropachev dkropachev force-pushed the dk/3.x-fix-bad-performance-of-replica-ordering-random branch from fb59761 to cc98fd0 Compare January 27, 2025 02:44
@mykaul
Copy link

mykaul commented Jan 27, 2025

Do we have a similar optimization potential with 4.x, or is this 3.x-specific? (did not look at the code yet)

Copy link
Collaborator

@Bouncheck Bouncheck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Bouncheck
Copy link
Collaborator

Do we have a similar optimization potential with 4.x, or is this 3.x-specific? (did not look at the code yet)

Looks like 4.x already uses ThreadLocalRandom. There are two uses of Collections.shuffle without it but those are related to reconnection, so that won't have much impact.

@roydahan
Copy link
Collaborator

Thanks @dkropachev, great work identifying the root cause.

@juliayakovlev this should solve the issue you have with c-s 3.17.0, we will need a new c-s release.

@dkropachev dkropachev merged commit 538fe6c into scylla-3.x Jan 27, 2025
10 of 11 checks passed
@dkropachev
Copy link
Collaborator Author

Thanks @dkropachev, great work identifying the root cause.

@juliayakovlev this should solve the issue you have with c-s 3.17.0, we will need a new c-s release.

We need to release driver first.

@roydahan
Copy link
Collaborator

Of course.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants