Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] latency is 0 with 800 connections for reencrypt sample 1 #26

Open
qiliRedHat opened this issue Aug 3, 2023 · 2 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@qiliRedHat
Copy link

qiliRedHat commented Aug 3, 2023

ingress-perf version or commit ID
HAProxy version 2.6.13-234aa6d 2023/05/02

Describe the bug
Run ingress-test with 800 connections on 9 worker nodes, the first reencrypt sample had latency as 0. I reproduced this twice.

To Reproduce
Steps to reproduce the behavior:

  1. Run ingress-perf with config file https://github.com/qiliRedHat/e2e-benchmarking/blob/ingress-config/workloads/ingress-perf/config/800connections.yml

Expected behavior

  1. Should be able to return correct response time.
  2. Shall we have a inggress-perf version or better an parameter to output the whole report as defined in https://github.com/cloud-bulldozer/ingress-perf/blob/main/assets/json.lua?
    Now we only output Rps, avgLatency and P99Latency
    log.Infof("Summary for %s: Rps=%.0f avgLatency=%.0f μs P99Latency=%.0f μs", cfg.Termination, result.TotalAvgRps, result.AvgLatency, result.P99Latency)

Screenshots or output
First reproduce

08-03 19:39:48.450  ./ingress-perf run --cfg config/800connections.yml --loglevel=info --es-server=https://ocp-qe:****@search-ocp-qe-perf-scale-test-elk-hcm7wtsqpxy7xogbu72bor4uve.us-east-1.es.amazonaws.com/ --es-index=ingress-performance
08-03 19:39:48.450  time="2023-08-03 11:39:48" level=info msg="Running ingress performance 2c849357-8554-4b0a-b202-dd576979f73f" file="ingress-perf.go:67"
08-03 19:39:48.450  time="2023-08-03 11:39:48" level=info msg="Creating elastic indexer" file="ingress-perf.go:75"
08-03 19:39:48.451  time="2023-08-03 11:39:48" level=info msg="Starting ingress-perf" file="runner.go:57"
08-03 19:39:48.706  time="2023-08-03 11:39:48" level=info msg="Deploying benchmark assets" file="runner.go:177"
08-03 19:39:50.070  W0803 11:39:49.727008   27198 warnings.go:70] would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true)
08-03 19:39:50.070  time="2023-08-03 11:39:49" level=info msg="Running test 1/5 " file="runner.go:88"
08-03 19:39:50.070  time="2023-08-03 11:39:49" level=info msg="Tool:wrk termination:http servers:45 concurrency:18 procs:1 connections:800 duration:5m0s" file="runner.go:89"
08-03 19:39:52.581  W0803 11:39:52.005255   27198 warnings.go:70] would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true)
08-03 19:40:10.610  time="2023-08-03 11:40:08" level=info msg="Applying tuning patch to ingress controller: {\"spec\":{\"nodePlacement\": {\"nodeSelector\": {\"matchLabels\": {\"node-role.kubernetes.io/infra\": \"\"}}}, \"replicas\": 2}}" file="tuning.go:30"
08-03 19:40:13.873  time="2023-08-03 11:40:13" level=info msg="Running sample 1/5: 5m0s" file="exec.go:82"
08-03 19:45:20.306  time="2023-08-03 11:45:13" level=info msg="Summary for http: Rps=81651 avgLatency=178026 μs P99Latency=255362 μs" file="exec.go:91"
08-03 19:45:20.306  time="2023-08-03 11:45:13" level=info msg="Sleeping for 10s" file="exec.go:94"
08-03 19:45:24.466  time="2023-08-03 11:45:23" level=info msg="Running sample 2/5: 5m0s" file="exec.go:82"
08-03 19:50:30.911  time="2023-08-03 11:50:24" level=info msg="Summary for http: Rps=81733 avgLatency=195559 μs P99Latency=272787 μs" file="exec.go:91"
08-03 19:50:30.912  time="2023-08-03 11:50:24" level=info msg="Sleeping for 10s" file="exec.go:94"
08-03 19:50:34.174  time="2023-08-03 11:50:34" level=info msg="Running sample 3/5: 5m0s" file="exec.go:82"
08-03 19:55:40.662  time="2023-08-03 11:55:34" level=info msg="Summary for http: Rps=83680 avgLatency=188490 μs P99Latency=267085 μs" file="exec.go:91"
08-03 19:55:40.662  time="2023-08-03 11:55:34" level=info msg="Sleeping for 10s" file="exec.go:94"
08-03 19:55:44.908  time="2023-08-03 11:55:44" level=info msg="Running sample 4/5: 5m0s" file="exec.go:82"
08-03 20:00:51.364  time="2023-08-03 12:00:44" level=info msg="Summary for http: Rps=83513 avgLatency=178705 μs P99Latency=254183 μs" file="exec.go:91"
08-03 20:00:51.364  time="2023-08-03 12:00:44" level=info msg="Sleeping for 10s" file="exec.go:94"
08-03 20:00:55.526  time="2023-08-03 12:00:54" level=info msg="Running sample 5/5: 5m0s" file="exec.go:82"
08-03 20:06:01.986  time="2023-08-03 12:05:55" level=info msg="Summary for http: Rps=83448 avgLatency=192505 μs P99Latency=280170 μs" file="exec.go:91"
08-03 20:06:01.986  time="2023-08-03 12:05:55" level=info msg="Sleeping for 10s" file="exec.go:94"
08-03 20:06:06.150  time="2023-08-03 12:06:05" level=info msg="Warmup is enabled, skipping indexing" file="runner.go:140"
08-03 20:06:06.150  time="2023-08-03 12:06:05" level=info msg="Running test 2/5 " file="runner.go:88"
08-03 20:06:06.150  time="2023-08-03 12:06:05" level=info msg="Tool:wrk termination:http servers:45 concurrency:18 procs:1 connections:800 duration:5m0s" file="runner.go:89"
08-03 20:06:06.150  W0803 12:06:05.431992   27198 warnings.go:70] would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true)
08-03 20:06:06.150  time="2023-08-03 12:06:05" level=info msg="Running sample 1/2: 5m0s" file="exec.go:82"
08-03 20:11:13.155  time="2023-08-03 12:11:05" level=info msg="Summary for http: Rps=81167 avgLatency=177154 μs P99Latency=245372 μs" file="exec.go:91"
08-03 20:11:13.155  time="2023-08-03 12:11:05" level=info msg="Sleeping for 10s" file="exec.go:94"
08-03 20:11:16.419  time="2023-08-03 12:11:15" level=info msg="Running sample 2/2: 5m0s" file="exec.go:82"
08-03 20:16:22.857  time="2023-08-03 12:16:16" level=info msg="Summary for http: Rps=82789 avgLatency=175167 μs P99Latency=247899 μs" file="exec.go:91"
08-03 20:16:22.857  time="2023-08-03 12:16:16" level=info msg="Sleeping for 10s" file="exec.go:94"
08-03 20:16:27.022  time="2023-08-03 12:16:26" level=info msg="Indexing finished in 105ms: created=2" file="runner.go:120"
08-03 20:16:27.023  time="2023-08-03 12:16:26" level=info msg="Running test 3/5 " file="runner.go:88"
08-03 20:16:27.023  time="2023-08-03 12:16:26" level=info msg="Tool:wrk termination:edge servers:45 concurrency:18 procs:1 connections:800 duration:5m0s" file="runner.go:89"
08-03 20:16:27.023  W0803 12:16:26.755218   27198 warnings.go:70] would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true)
08-03 20:16:27.023  time="2023-08-03 12:16:26" level=info msg="Running sample 1/2: 5m0s" file="exec.go:82"
08-03 20:21:33.468  time="2023-08-03 12:21:27" level=info msg="Summary for edge: Rps=41513 avgLatency=93210 μs P99Latency=424816 μs" file="exec.go:91"
08-03 20:21:33.468  time="2023-08-03 12:21:27" level=info msg="Sleeping for 10s" file="exec.go:94"
08-03 20:21:37.632  time="2023-08-03 12:21:37" level=info msg="Running sample 2/2: 5m0s" file="exec.go:82"
08-03 20:26:44.068  time="2023-08-03 12:26:37" level=info msg="Summary for edge: Rps=54770 avgLatency=259216 μs P99Latency=380808 μs" file="exec.go:91"
08-03 20:26:44.068  time="2023-08-03 12:26:37" level=info msg="Sleeping for 10s" file="exec.go:94"
08-03 20:26:50.600  time="2023-08-03 12:26:50" level=info msg="Indexing finished in 110ms: created=2" file="runner.go:120"
08-03 20:26:50.600  time="2023-08-03 12:26:50" level=info msg="Running test 4/5 " file="runner.go:88"
08-03 20:26:50.600  time="2023-08-03 12:26:50" level=info msg="Tool:wrk termination:reencrypt servers:45 concurrency:18 procs:1 connections:800 duration:5m0s" file="runner.go:89"
08-03 20:26:50.855  W0803 12:26:50.687996   27198 warnings.go:70] would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true)
08-03 20:26:50.855  time="2023-08-03 12:26:50" level=info msg="Running sample 1/2: 5m0s" file="exec.go:82"
08-03 20:31:57.314  time="2023-08-03 12:31:51" level=info msg="Summary for reencrypt: Rps=11008 avgLatency=0 μs P99Latency=0 μs" file="exec.go:91"
08-03 20:31:57.314  time="2023-08-03 12:31:51" level=info msg="Sleeping for 10s" file="exec.go:94"
08-03 20:32:01.472  time="2023-08-03 12:32:01" level=info msg="Running sample 2/2: 5m0s" file="exec.go:82"
08-03 20:37:07.913  time="2023-08-03 12:37:01" level=info msg="Summary for reencrypt: Rps=31169 avgLatency=33742 μs P99Latency=46910 μs" file="exec.go:91"
08-03 20:37:07.913  time="2023-08-03 12:37:01" level=info msg="Sleeping for 10s" file="exec.go:94"
08-03 20:37:13.152  time="2023-08-03 12:37:12" level=info msg="Indexing finished in 111ms: created=2" file="runner.go:120"
08-03 20:37:13.152  time="2023-08-03 12:37:12" level=info msg="Running test 5/5 " file="runner.go:88"
08-03 20:37:13.152  time="2023-08-03 12:37:12" level=info msg="Tool:wrk termination:passthrough servers:45 concurrency:18 procs:1 connections:800 duration:5m0s" file="runner.go:89"
08-03 20:37:13.152  W0803 12:37:12.900880   27198 warnings.go:70] would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true)
08-03 20:37:13.152  time="2023-08-03 12:37:13" level=info msg="Running sample 1/2: 5m0s" file="exec.go:82"
08-03 20:42:19.599  time="2023-08-03 12:42:13" level=info msg="Summary for passthrough: Rps=124928 avgLatency=119179 μs P99Latency=226514 μs" file="exec.go:91"
08-03 20:42:19.599  time="2023-08-03 12:42:13" level=info msg="Sleeping for 10s" file="exec.go:94"
08-03 20:42:23.758  time="2023-08-03 12:42:23" level=info msg="Running sample 2/2: 5m0s" file="exec.go:82"
08-03 20:47:30.609  time="2023-08-03 12:47:24" level=info msg="Summary for passthrough: Rps=124336 avgLatency=111646 μs P99Latency=237050 μs" file="exec.go:91"
08-03 20:47:30.609  time="2023-08-03 12:47:24" level=info msg="Sleeping for 10s" file="exec.go:94"
08-03 20:47:34.770  time="2023-08-03 12:47:34" level=info msg="Indexing finished in 100ms: created=2" file="runner.go:120"
08-03 20:47:34.770  time="2023-08-03 12:47:34" level=info msg="Cleaning up resources" file="runner.go:156"

Partial of the second reproduce

08-03 22:00:00.441  time="2023-08-03 13:59:59" level=info msg="Tool:wrk termination:reencrypt servers:45 concurrency:18 procs:1 connections:800 duration:5m0s" file="runner.go:89"
08-03 22:00:00.441  W0803 13:59:59.689817  233968 warnings.go:70] would violate PodSecurity "restricted:latest": host namespaces (hostNetwork=true)
08-03 22:00:00.441  time="2023-08-03 13:59:59" level=info msg="Running sample 1/2: 5m0s" file="exec.go:82"
08-03 22:05:06.879  time="2023-08-03 14:05:00" level=info msg="Summary for reencrypt: Rps=10800 avgLatency=0 μs P99Latency=0 μs" file="exec.go:91"
08-03 22:05:06.879  time="2023-08-03 14:05:00" level=info msg="Sleeping for 10s" file="exec.go:94"
08-03 22:05:11.040  time="2023-08-03 14:05:10" level=info msg="Running sample 2/2: 5m0s" file="exec.go:82"
08-03 22:10:17.476  time="2023-08-03 14:10:10" level=info msg="Summary for reencrypt: Rps=27812 avgLatency=1706 μs P99Latency=3826 μs" file="exec.go:91"
08-03 22:10:17.476  time="2023-08-03 14:10:10" level=info msg="Sleeping for 10s" file="exec.go:94"
08-03 22:10:21.643  time="2023-08-03 14:10:20" level=info msg="Indexing finished in 111ms: created=2" file="runner.go:120"
08-03 22:10:21.643  time="2023-08-03 14:10:20" level=info msg="Running test 5/5 " file="runner.go:88"
08-03 22:10:21.643  time="2023-08-03 14:10:20" level=info msg="Tool:wrk termination:passthrough servers:45 concurrency:18 procs:1 connections:800 duration:5m0s" file="runner.go:89"
@qiliRedHat qiliRedHat added the bug Something isn't working label Aug 3, 2023
@rsevilla87
Copy link
Member

I haven't been able to reproduce the issue in a 4.11.17 cluster, although I have a couple of thoughts.

I recommend you to use increase concurrency/procs rather than increasing the number of parallel connections beyond 250 aprox as I have seen that wrk doesn't perform very well when the number of connections if very high.

What ingress-perf prints is the aggregated value of some of the metrics observed across all client pod replicas, for that reason printing the output of every pod in the screen can be very noisy... but thinking it twice, I think we can add this output in a "trace" log level.

You should find out the raw metrics (including a per pod break down) in the corresponding document of elasticsearch instance

@rsevilla87
Copy link
Member

Managed to reproduce it finally, under different conditions though, I set the test duration in 5s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants