Skip to content

Commit

Permalink
Automate longevity test (nginx#1657)
Browse files Browse the repository at this point in the history
Problem: NFR tests are a burden to run manually, taking a lot of time and effort.

Solution: Automate the longevity test to make it easier and faster for a developer to run this test. This test will be run separately from the other NFR tests, due to the fact that it is long-lived. It should not be run in the pipeline. There is still a manual step of collecting dashboard results.

Also separated out functional and nfr tests in the Makefile and README to better separate the two types of tests. These changes force NFR tests to be run in a GKE environment.
  • Loading branch information
sjberman authored and amimimor committed Apr 3, 2024
1 parent bd4dbe5 commit deb1cec
Show file tree
Hide file tree
Showing 36 changed files with 390 additions and 243 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/nfr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -144,9 +144,9 @@ jobs:
working-directory: ./tests
run: |
if ${{ inputs.test_label != 'all' }}; then
sed -i '/^GINKGO_LABEL=/s/=.*/="${{ inputs.test_label }}"/' "scripts/vars.env" && make run-tests-on-vm;
sed -i '/^GINKGO_LABEL=/s/=.*/="${{ inputs.test_label }}"/' "scripts/vars.env" && make nfr-test;
else
make run-tests-on-vm;
make nfr-test;
fi
- name: Cleanup
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -51,3 +51,6 @@ internal/mode/static/nginx/modules/coverage

# Credential files
**/gha-creds-*.json

# SSH config files
*.ssh
2 changes: 1 addition & 1 deletion .yamllint.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ rules:
.github/
deploy/manifests/nginx-gateway.yaml
deploy/manifests/crds
tests/longevity/manifests/cronjob.yaml
tests/suite/manifests/longevity/cronjob.yaml
.goreleaser.yml
new-line-at-end-of-file: enable
new-lines: enable
Expand Down
78 changes: 53 additions & 25 deletions tests/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,10 @@ help: Makefile ## Display this help
create-kind-cluster: ## Create a kind cluster
cd .. && make create-kind-cluster

.PHONY: delete-kind-cluster
delete-kind-cluster: ## Delete kind cluster
kind delete cluster

.PHONY: build-images
build-images: ## Build NGF and NGINX images
cd .. && make PREFIX=$(PREFIX) TAG=$(TAG) build-images
Expand All @@ -48,51 +52,75 @@ load-images: ## Load NGF and NGINX images on configured kind cluster
load-images-with-plus: ## Load NGF and NGINX Plus images on configured kind cluster
cd .. && make PREFIX=$(PREFIX) TAG=$(TAG) load-images-with-plus

test: ## Run the system tests against your default k8s cluster
go test -v ./suite $(GINKGO_FLAGS) -args --gateway-api-version=$(GW_API_VERSION) \
--gateway-api-prev-version=$(GW_API_PREV_VERSION) --image-tag=$(TAG) --version-under-test=$(NGF_VERSION) \
--plus-enabled=$(PLUS_ENABLED) --ngf-image-repo=$(PREFIX) --nginx-image-repo=$(NGINX_PREFIX) \
--pull-policy=$(PULL_POLICY) --k8s-version=$(K8S_VERSION) --service-type=$(GW_SERVICE_TYPE) \
--is-gke-internal-lb=$(GW_SVC_GKE_INTERNAL)
.PHONY: setup-gcp-and-run-tests
setup-gcp-and-run-tests: create-gke-router create-and-setup-vm run-tests-on-vm ## Create and setup a GKE router and GCP VM for tests and run the functional tests

.PHONY: delete-kind-cluster
delete-kind-cluster: ## Delete kind cluster
kind delete cluster
.PHONY: setup-gcp-and-run-nfr-tests
setup-gcp-and-run-nfr-tests: create-gke-router create-and-setup-vm nfr-test ## Create and setup a GKE router and GCP VM for tests and run the NFR tests

.PHONY: run-tests-on-vm
run-tests-on-vm: ## Run the tests on a GCP VM
bash scripts/run-tests-gcp-vm.sh
.PHONY: create-gke-cluster
create-gke-cluster: ## Create a GKE cluster
bash scripts/create-gke-cluster.sh $(CI)

.PHONY: create-and-setup-vm
create-and-setup-vm: ## Create and setup a GCP VM for tests
bash scripts/create-and-setup-gcp-vm.sh

.PHONY: cleanup-vm
cleanup-vm: ## Delete the test GCP VM and delete the firewall rule
bash scripts/cleanup-vm.sh

.PHONY: create-gke-router
create-gke-router: ## Create a GKE router to allow egress traffic from private nodes (allows for external image pulls)
bash scripts/create-gke-router.sh

.PHONY: cleanup-router
cleanup-router: ## Delete the GKE router
bash scripts/cleanup-router.sh
.PHONY: sync-files-to-vm
sync-files-to-vm: ## Syncs your local NGF files with the NGF repo on the VM
bash scripts/sync-files-to-vm.sh

.PHONY: setup-gcp-and-run-tests
setup-gcp-and-run-tests: create-gke-router create-and-setup-vm run-tests-on-vm ## Create and setup a GKE router and GCP VM for tests and run the tests
.PHONY: run-tests-on-vm
run-tests-on-vm: ## Run the functional tests on a GCP VM
bash scripts/run-tests-gcp-vm.sh

.PHONY: nfr-test
nfr-test: ## Run the NFR tests on a GCP VM
NFR=true bash scripts/run-tests-gcp-vm.sh

.PHONY: start-longevity-test
start-longevity-test: ## Start the longevity test to run for 4 days in GKE
START_LONGEVITY=true $(MAKE) nfr-test

.PHONY: stop-longevity-test
stop-longevity-test: ## Stops the longevity test and collects results
STOP_LONGEVITY=true $(MAKE) nfr-test

.PHONY: .vm-nfr-test
.vm-nfr-test: ## Runs the NFR tests on the GCP VM (called by `nfr-test`)
go test -v ./suite -ginkgo.label-filter "nfr" $(GINKGO_FLAGS) -ginkgo.v -args --gateway-api-version=$(GW_API_VERSION) \
--gateway-api-prev-version=$(GW_API_PREV_VERSION) --image-tag=$(TAG) --version-under-test=$(NGF_VERSION) \
--plus-enabled=$(PLUS_ENABLED) --ngf-image-repo=$(PREFIX) --nginx-image-repo=$(NGINX_PREFIX) \
--pull-policy=$(PULL_POLICY) --k8s-version=$(K8S_VERSION) --service-type=$(GW_SERVICE_TYPE) \
--is-gke-internal-lb=$(GW_SVC_GKE_INTERNAL)

.PHONY: test
test: ## Runs the functional tests on your default k8s cluster
go test -v ./suite -ginkgo.label-filter "functional" $(GINKGO_FLAGS) -args --gateway-api-version=$(GW_API_VERSION) \
--gateway-api-prev-version=$(GW_API_PREV_VERSION) --image-tag=$(TAG) --version-under-test=$(NGF_VERSION) \
--plus-enabled=$(PLUS_ENABLED) --ngf-image-repo=$(PREFIX) --nginx-image-repo=$(NGINX_PREFIX) \
--pull-policy=$(PULL_POLICY) --k8s-version=$(K8S_VERSION) --service-type=$(GW_SERVICE_TYPE) \
--is-gke-internal-lb=$(GW_SVC_GKE_INTERNAL)

.PHONY: cleanup-gcp
cleanup-gcp: cleanup-router cleanup-vm delete-gke-cluster ## Cleanup all GCP resources

.PHONY: create-gke-cluster
create-gke-cluster: ## Create a GKE cluster
bash scripts/create-gke-cluster.sh $(CI)
.PHONY: cleanup-router
cleanup-router: ## Delete the GKE router
bash scripts/cleanup-router.sh

.PHONY: cleanup-vm
cleanup-vm: ## Delete the test GCP VM and delete the firewall rule
bash scripts/cleanup-vm.sh

.PHONY: delete-gke-cluster
delete-gke-cluster: ## Delete the GKE cluster
bash scripts/delete-gke-cluster.sh

.PHONY: add-local-ip-to-cluster
add-local-ip-to-cluster: ## Add local IP to the GKE cluster master-authorized-networks
bash scripts/add-local-ip-to-cluster.sh
bash scripts/add-local-ip-auth-networks.sh
99 changes: 83 additions & 16 deletions tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,33 +4,36 @@ The tests in this directory are meant to be run on a live Kubernetes environment
are similar to the existing [conformance tests](../conformance/README.md), but will verify things such as:

- NGF-specific functionality
- Non-Functional requirements testing (such as performance, scale, etc.)
- Non-Functional requirements (NFR) testing (such as performance, scale, etc.)

When running locally, the tests create a port-forward from your NGF Pod to localhost using a port chosen by the
test framework. Traffic is sent over this port. If running on a GCP VM targeting a GKE cluster, the tests will create an
internal LoadBalancer service which will receive the test traffic.

**Important**: NFR tests can only be run on a GKE cluster.

Directory structure is as follows:

- `framework`: contains utility functions for running the tests
- `suite`: contains the test files
- `results`: contains the results files
- `scripts`: contain scripts used to set up the environment and run the tests
- `suite`: contains the test files

**Note**: Existing NFR tests will be migrated into this testing `suite` and results stored in the `results` directory.
> Note: Existing NFR tests will be migrated into this testing `suite` and results stored in the `results` directory.
## Prerequisites

- Kubernetes cluster.
- Docker.
- Golang.

If running the tests on a VM (`make create-vm-and-run-tests` or `make run-tests-on-vm`):
If running NFR tests, or running functional tests in GKE:

- The [gcloud CLI](https://cloud.google.com/sdk/docs/install)
- A GKE cluster (if `master-authorized-networks` is enabled, please set `ADD_VM_IP_AUTH_NETWORKS=true` in your vars.env file)
- Access to GCP Service Account with Kubernetes admin permissions

**Note**: all commands in steps below are executed from the `tests` directory
> Note: all commands in steps below are executed from the `tests` directory
```shell
make
Expand All @@ -52,9 +55,14 @@ delete-kind-cluster Delete kind cluster
help Display this help
load-images-with-plus Load NGF and NGINX Plus images on configured kind cluster
load-images Load NGF and NGINX images on configured kind cluster
run-tests-on-vm Run the tests on a GCP VM
setup-gcp-and-run-tests Create and setup a GKE router and GCP VM for tests and run the tests
test Run the system tests against your default k8s cluster
nfr-test Run the NFR tests on a GCP VM
run-tests-on-vm Run the functional tests on a GCP VM
setup-gcp-and-run-nfr-tests Create and setup a GKE router and GCP VM for tests and run the NFR tests
setup-gcp-and-run-tests Create and setup a GKE router and GCP VM for tests and run the functional tests
start-longevity-test Start the longevity test to run for 4 days in GKE
stop-longevity-test Stops the longevity test and collects results
sync-files-to-vm Syncs your local NGF files with the NGF repo on the VM
test Runs the functional tests on your default k8s cluster
```

**Note:** The following variables are configurable when running the below `make` commands:
Expand All @@ -78,6 +86,8 @@ test Run the system tests against your default k8s clu

This can be done in a cloud provider of choice, or locally using `kind`.

**Important**: NFR tests can only be run on a GKE cluster.

To create a local `kind` cluster:

```makefile
Expand Down Expand Up @@ -128,7 +138,7 @@ make build-images-with-plus load-images-with-plus TAG=$(whoami)

## Step 3 - Run the tests

### 3a - Run the tests locally
### 3a - Run the functional tests locally

```makefile
make test TAG=$(whoami)
Expand All @@ -142,9 +152,9 @@ make test TAG=$(whoami) PLUS_ENABLED=true

### 3b - Run the tests on a GKE cluster from a GCP VM

This step only applies if you would like to run the tests on a GKE cluster from a GCP based VM.
This step only applies if you are running the NFR tests, or would like to run the functional tests on a GKE cluster from a GCP based VM.

Before running the below `make` command, copy the `scripts/vars.env-example` file to `scripts/vars.env` and populate the
Before running the below `make` commands, copy the `scripts/vars.env-example` file to `scripts/vars.env` and populate the
required env vars. `GKE_SVC_ACCOUNT` needs to be the name of a service account that has Kubernetes admin permissions.

In order to run the tests in GCP, you need a few things:
Expand All @@ -153,30 +163,85 @@ In order to run the tests in GCP, you need a few things:
- this assumes that your GKE cluster is using private nodes. If using public nodes, you don't need this.
- GCP VM and firewall rule to send ingress traffic to GKE

To just set up the VM with no router (this will not run the tests):

```makefile
make create-and-setup-vm
```

Otherwise, you can set up the VM, router, and run the tests with a single command. See the options in the sections below.

By default, the tests run using the version of NGF that was `git cloned` during the setup. If you want to make
incremental changes and copy your local changes to the VM to test, you can run

```makefile
make sync-files-to-vm
```

#### Functional Tests

To set up the GCP environment with the router and VM and then run the tests, run the following command:

```makefile
make setup-gcp-and-run-tests
```

If you just need a VM and no router (this will not run the tests):
To use an existing VM to run the tests, run the following

```makefile
make create-and-setup-vm
make run-tests-on-vm
```

#### NFR tests

To set up the GCP environment with the router and VM and then run the tests, run the following command:


```makefile
make setup-gcp-and-run-nfr-tests
```

To use an existing VM to run the tests, run the following

```makefile
make run-tests-on-vm
make nfr-test
```

##### Longevity testing

This test is run on its own (and also not in a pipeline) due to its long-running nature. It will run for 4 days before
the tester must collect the results and complete the test.

To start the longevity test, set up your VM (`create-and-setup-vm`) and run

```makefile
make start-longevity-test
```

<!-- -->
> Note: If you want to change the time period for which the test runs, update the `wrk` commands in `suite/scripts/longevity-wrk.sh` to the time period you want, and run `make sync-files-to-vm`.
<!-- -->
> Note: If you want to re-run the longevity test, you need to clear out the `cafe.example.com` entry from the `/etc/hosts` file on your VM.
You can verify the test is working by checking nginx logs to see traffic flow, and check that the cronjob is running and redeploying apps.

After 4 days (96h), you can complete the longevity tests and collect results. To ensure that the traffic has stopped flowing, you can ssh to the VM using `gcloud compute ssh` and run `ps aux | grep wrk` to verify the `wrk` commands are no longer running. Then, visit the [GCP Monitoring Dashboards](https://console.cloud.google.com/monitoring/dashboards) page and select the `NGF Longevity Test` dashboard. Take PNG screenshots of each chart for the time period in which your test ran, and save those to be added to the results file.

Finally, run

```makefile
make stop-longevity-test
```

This will tear down the test and collect results into a file, where you can add the PNGs of the dashboard.

### Common test amendments

To run all tests with the label "performance", use the GINKGO_LABEL variable:
To run all tests with the label "my-label", use the GINKGO_LABEL variable:

```makefile
make test TAG=$(whoami) GINKGO_LABEL=performance
make test TAG=$(whoami) GINKGO_LABEL=my-label
```

or to pass a specific flag, e.g. run a specific test, use the GINKGO_FLAGS variable:
Expand All @@ -185,6 +250,8 @@ or to pass a specific flag, e.g. run a specific test, use the GINKGO_FLAGS varia
make test TAG=$(whoami) GINKGO_FLAGS='-ginkgo.focus "writes the system info to a results file"'
```

> Note: if filtering on NFR tests (or functional tests on GKE), set the filter in the appropriate field in your `vars.env` file.
If you are running the tests in GCP, add your required label/ flags to `scripts/var.env`.

You can also modify the tests code for a similar outcome. To run a specific test, you can "focus" it by adding the `F`
Expand Down
9 changes: 9 additions & 0 deletions tests/framework/results.go
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,15 @@ func WriteResults(resultsFile *os.File, metrics *Metrics) error {
return reporter.Report(resultsFile)
}

// WriteContent writes basic content to the results file.
func WriteContent(resultsFile *os.File, content string) error {
if _, err := fmt.Fprintln(resultsFile, content); err != nil {
return err
}

return nil
}

// NewCSVEncoder returns a vegeta CSV encoder.
func NewCSVEncoder(w io.Writer) vegeta.Encoder {
return vegeta.NewCSVEncoder(w)
Expand Down
Loading

0 comments on commit deb1cec

Please sign in to comment.