Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Gateway Server throughput #7077

Merged
merged 4 commits into from
May 15, 2024
Merged

Improve Gateway Server throughput #7077

merged 4 commits into from
May 15, 2024

Conversation

adriansmares
Copy link
Contributor

@adriansmares adriansmares commented May 14, 2024

Summary

This PR is a follow up of some issues observed while deploying Redis-based rate limiting.

Changes

  • Make the source address firewall concurrency safer.
    • The protocol used to store the latest last seen timestamp is now concurrency safe.
  • Do not put extra pressure on Redis when the rate limiting is triggered, by rate limiting logs in memory always.
  • Make packet filtering (rate limiting and firewalling) parallel.
    • Currently, this is 'single goroutined'. This means that any throughput limitations here (like the rate limiting going from in memory to Redis, introducing roundtrip latency) lead to packet loss, as UDP has no retransmissions in case the socket buffer is full.
    • Now, we actually run the firewalls and packet filtering in a worker pool. This has the added advantage of actually making the packet loss visible in the worker pool metrics (because if we receive packets faster than we can filter them, we register a drop).

Testing

Steps

This will be tested on staging1 and other environments. There are no actual tests to be done manually, as this is mainly an optimization PR.

Results

We've been able to heavily reduce the number of Gateway Servers used due to the relief of the single packet processing bottleneck.

Regressions

N/A.

Checklist

  • Scope: The referenced issue is addressed, there are no unrelated changes.
  • Compatibility: The changes are backwards compatible with existing API, storage, configuration and CLI, according to the compatibility commitments in README.md for the chosen target branch.
  • Documentation: Relevant documentation is added or updated.
  • Testing: The steps/process to test this feature are clearly explained including testing for regressions.
  • Infrastructure: If infrastructural changes (e.g., new RPC, configuration) are needed, a separate issue is created in the infrastructural repositories.
  • Changelog: Significant features, behavior changes, deprecations and fixes are added to CHANGELOG.md.
  • Commits: Commit messages follow guidelines in CONTRIBUTING.md, there are no fixup commits left.

@adriansmares adriansmares added performance Something is slow or takes too much CPU/Memory/... scalability This could become a problem at scale labels May 14, 2024
@adriansmares adriansmares added this to the v3.30.2 milestone May 14, 2024
@adriansmares adriansmares self-assigned this May 14, 2024
@github-actions github-actions bot added the c/gateway server This is related to the Gateway Server label May 14, 2024
@adriansmares adriansmares force-pushed the fix/gs-udp-throughput branch from 6e1dfd9 to 8645cfb Compare May 14, 2024 15:05
@adriansmares adriansmares force-pushed the fix/gs-udp-throughput branch from 8645cfb to dc7e194 Compare May 14, 2024 15:06
@adriansmares adriansmares requested a review from KrishnaIyer May 14, 2024 15:48
@adriansmares adriansmares marked this pull request as ready for review May 14, 2024 15:48
@adriansmares adriansmares requested a review from a team as a code owner May 14, 2024 15:48
@adriansmares adriansmares merged commit ba8be7d into v3.30 May 15, 2024
13 checks passed
@adriansmares adriansmares deleted the fix/gs-udp-throughput branch May 15, 2024 08:48
@adriansmares adriansmares mentioned this pull request May 15, 2024
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/gateway server This is related to the Gateway Server performance Something is slow or takes too much CPU/Memory/... scalability This could become a problem at scale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants