fix(perf-test): restore reachable backends, update node logic, and improve observability setup #262

bartsmykla · 2025-01-28T12:31:55Z

This PR fixes issues with performance tests broken in #224

Changes:

Restored reachable backends in the service graph to fix failing tests.
Updated logic to allocate enough nodes for 1000 services with 2 instances each.
Added a new "observability" node group in EKS to keep Prometheus and other tools separate.
- Added a 80GB PersistentVolumeClaim for Prometheus to avoid storage issues when there is a lot of workloads
- Ensured observability components run on the right node group using tolerations and nodeSelector.
Increased timeout in the test that checks certificate distribution to 360s, as generating certificates for 2000 services takes longer than before.

- Added back reachable backends in the service graph to fix failing tests. - Updated node count logic to handle resource requests for 1000 services with 2 instances each. The old logic didn't provide enough nodes. Signed-off-by: Bart Smykla <[email protected]>

lukidzi · 2025-01-28T12:56:41Z

Increased timeout in the test that checks certificate distribution to 360s, as generating certificates for 2000 services takes longer than before.

should this be investigated in kuma?

Added a new "observability" node group in EKS to keep Prometheus and other monitoring tools separate from other workloads. This helps ensure Prometheus has enough resources, especially when monitoring many services. Updated Prometheus setup to: - Use a 80GB PersistentVolumeClaim to avoid running out of space when monitoring large workloads. - Add tolerations and nodeSelector to make sure observability components run on the right node group. Increased timeout in a test from 60s to 600s, as generating certificates for 2000 services takes significantly more time. Signed-off-by: Bart Smykla <[email protected]>

This logic is not used, but was helpful when I was making sure locally that there is no huge difference between reachable services with legacy `kuma.io/service` labels and reachable backends with `MeshServices` Signed-off-by: Bart Smykla <[email protected]>

Signed-off-by: Bart Smykla <[email protected]>

bartsmykla requested a review from lukidzi January 28, 2025 12:31

bartsmykla requested review from Automaat, jakubdyszkiewicz and lobkovilya as code owners January 28, 2025 12:31

bartsmykla removed the request for review from jakubdyszkiewicz January 28, 2025 12:32

lukidzi approved these changes Jan 28, 2025

View reviewed changes

bartsmykla force-pushed the fix/configure-reachable-backends branch from ce73ed2 to 53bdca4 Compare January 28, 2025 13:23

bartsmykla added 9 commits January 29, 2025 13:22

chore: upgrade kuma

0a86ae5

Signed-off-by: Bart Smykla <[email protected]>

chore: go mod tidy

e1a9e44

Signed-off-by: Bart Smykla <[email protected]>

chore: install metrics server in a cluster

d683ad9

Signed-off-by: Bart Smykla <[email protected]>

chore: try to change used token in helm provider

808b2d6

Signed-off-by: Bart Smykla <[email protected]>

chore: change removed resources at namespace termination

fa8171b

Signed-off-by: Bart Smykla <[email protected]>

chore: split terraform files and configure cluster access entry for us

1fc8139

Signed-off-by: Bart Smykla <[email protected]>

chore: add depends_on for aws_eks_cluster

16f2b6b

Signed-off-by: Bart Smykla <[email protected]>

chore: improve make targets

3dde319

Signed-off-by: Bart Smykla <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(perf-test): restore reachable backends, update node logic, and improve observability setup #262

fix(perf-test): restore reachable backends, update node logic, and improve observability setup #262

bartsmykla commented Jan 28, 2025 •

edited

Loading

lukidzi commented Jan 28, 2025 •

edited

Loading

fix(perf-test): restore reachable backends, update node logic, and improve observability setup #262

Are you sure you want to change the base?

fix(perf-test): restore reachable backends, update node logic, and improve observability setup #262

Conversation

bartsmykla commented Jan 28, 2025 • edited Loading

Changes:

lukidzi commented Jan 28, 2025 • edited Loading

bartsmykla commented Jan 28, 2025 •

edited

Loading

lukidzi commented Jan 28, 2025 •

edited

Loading