functionaltests: add first test case #14935

endorama · 2024-12-12T17:35:48Z

Motivation/summary

Implement first test verifying upgrade from 8.15.4 to 8.16.0 for a fresh Elastic Cloud deployment.

Provides a new folder structure and helpers to quickly implement functional tests for APM Server on Elastic Cloud.

Checklist

None

How to test these changes

Run cd functionaltests && go test -v ./

Related issues

Part of #14100

mergify · 2024-12-12T17:36:27Z

This pull request does not have a backport label. Could you fix it @endorama? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-7.17 is the label to automatically backport to the 7.17 branch.
backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
backport-8.x is the label to automatically backport to the 8.x branch.

mergify · 2024-12-12T17:36:27Z

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label.

v1v · 2024-12-13T12:35:08Z

.github/workflows/functional-tests.yml

+          secrets: |-
+            EC_API_KEY:elastic-observability/elastic-cloud-observability-team-${{ matrix.environment }}-api-key
+
+      - run: cd functionaltests && go test -v ./


It failed with

go: downloading github.com/apparentlymart/go-textseg/v15 v15.0.0 # github.com/elastic/apm-server/functionaltests FAIL github.com/elastic/apm-server/functionaltests [setup failed] Error: main_test.go:26:2: github.com/elastic/[email protected][33](https://github.com/elastic/apm-server/actions/runs/12315688177/job/34374403286#step:6:34)854-c5983a7ff908: replacement directory ../../apm-perf does not exist FAIL

See https://github.com/elastic/apm-server/actions/runs/12315688177/job/34374403286

@v1v thanks for jumping on this this quickly! The test requires elastic/apm-perf#197, I updated it in 320535a to include the dependency from the PR. I'll trigger the workflow again.

@v1v how can I run the pipeline? I see it has a workflow_dispatch trigger but I don't seem to be able to run it from the UI?

Good question, I created a test branch:

https://github.com/elastic/apm-server/tree/test/funcionaltests

and the workflow is now available:

https://github.com/elastic/apm-server/actions/workflows/functional-tests.yml

workflow_dispatch will be honoured as soon as the workflows exists in main, meanwhile, you can use the test branch for testing your things, or if you prefer, you can modify this PR and add the push event for the branch: funcitonaltests

I've just synced the test branch:

https://github.com/elastic/apm-server/actions/runs/12318045129

v1v · 2024-12-13T15:16:20Z

functionaltests/main_test.go

+	// TODO: how to get these from Elastic Cloud? Is it possible?
+
+	// cleanup
+	t.Log("cleanup")


will this run always regardless of the failures? Just wondering if the cleanup step could be also set as another command we can use in the GH actions.

That's handy if something went wrong, so no leftovers are kept

It was not, I've fixed it in 4532fe5 (#14935)

I added a flag to disable cleanup but it require manual action. In this way any automated test will automatically clean up but it can be overridden manually (for example for testing/troubleshooting).

To follow up on running the cleanup manually, each test is expected to have a tfstate in it's own folder. So it's possible to "manually" (in CI) run terraform destroy from that folder.
In any case cleanup now always run in CI, so leftovers can only happen due to failures in deleting resources (I'm not sure how frequent this may be)

functionaltests/TestUpgrade_8_15_4_to_8_16_0/main.tf

Add utility packages to implement functional tests and Terraform code to bootstrap a Elastic Cloud deployment. Implement first test verifying upgrade from 8.15.4 to 8.16.0 for a fresh Elastic Cloud deployment.

endorama · 2024-12-31T16:39:59Z

The latest run was successful in both qa and production: https://github.com/elastic/apm-server/actions/runs/12561390912, so moving this PR ready for review!

.github/workflows/functional-tests.yml

Co-authored-by: Victor Martinez <[email protected]>

marclop

I think this is a good step forward, thanks for creating the go-based functional test framework.

marclop · 2025-01-08T03:27:54Z

functionaltests/utils_test.go

+// Functional tests are expected to run Terraform code to operate
+// on infrastructure required for each tests and to query Elastic
+// Cloud APIs. In both cases a valid API key is required.
+func ecAPICheck(t *testing.T) error {


It would be best to use require.NotEmpty() inside this function, since *testing.T is being passed, and that we're unlikely to do anything else but error if EC_API_KEY is unset.

func ecAPICheck(t *testing.T) { t.Helper() require.NotEmpty(t, os.Getenv("EC_API_KEY"), "EC_API_KEY env var not set") }

Addressed in e182ae3 (#14935)

marclop · 2025-01-08T03:34:29Z

functionaltests/main_test.go

+	case "pro":
+		return testRegionProduction


I think this should be production, given that target defaults to production, although I see that the execution script has a pro abbreviation. I've never seen that one, but prod is widely used as a shorter version of production.

I think it'd be good to settle on one and avoid invalid values as defaults or valid choices.

Settling on pro as that is what must be used in CI. Adding a comment to clarify this though as I understand the confusion.

Addressed in a031acb

marclop · 2025-01-08T03:40:12Z

functionaltests/main_test.go

+		if expected.PreferIlm {
+			assert.True(t, v.PreferIlm)
+		} else {
+			assert.False(t, v.PreferIlm)
+		}
+		assert.Equal(t, expected.DSManagedBy, v.NextGenerationManagedBy.Name)
+		assert.Len(t, v.Indices, expected.IndicesPerDs)
+
+		for i, index := range v.Indices {
+			assert.Equal(t, expected.IndicesManagedBy[i], index.ManagedBy.Name)
+		}


The PreferIlm check can be simplified and I think it'd be good to add some extra debugging information, given that we're checking individual fields:

Suggested change

if expected.PreferIlm {

assert.True(t, v.PreferIlm)

} else {

assert.False(t, v.PreferIlm)

}

assert.Equal(t, expected.DSManagedBy, v.NextGenerationManagedBy.Name)

assert.Len(t, v.Indices, expected.IndicesPerDs)

for i, index := range v.Indices {

assert.Equal(t, expected.IndicesManagedBy[i], index.ManagedBy.Name)

}

assert.Equal(t, expected.PreferIlm, v.PreferIlm,

"datastream %s should prefer ILM", v.Name,

)

assert.Equal(t, expected.DSManagedBy, v.NextGenerationManagedBy.Name,

`datastream %s should be managed by "%s"`, v.Name, expected.DSManagedBy,

)

assert.Len(t, v.Indices, expected.IndicesPerDs,

"datastream %s should have %d indices", v.Name, expected.IndicesPerDs,

)

for i, index := range v.Indices {

assert.Equal(t, expected.IndicesManagedBy[i], index.ManagedBy.Name,

`index %s should be managed by "%s"`, index.IndexName,

expected.IndicesManagedBy[i],

)

}

I added more debug information in e00ef0a (#14935)

But I did not replace assert.True and False with Equal to provide tailored debugging info for each case.

marclop · 2025-01-08T03:45:32Z

functionaltests/main_test.go

+	g.Logger = zaptest.NewLogger(t, zaptest.Level(zap.InfoLevel))
+	require.NoError(t, err)
+
+	err = g.RunBlocking(context.Background())


This error isn't returned, shouldn't this be returned to the caller?

I addressed this as part of the refactoring to wait for documents to be stored in ES, in fd73107 (#14935)

marclop · 2025-01-08T03:48:58Z

functionaltests/internal/esclient/config.go

+// NewConfig returns a Config intialised from environment variables.
+// func NewConfig() (Config, error) {
+// 	cfg := Config{}
+// 	return cfg, err
+// }


Leftover?

Suggested change

// NewConfig returns a Config intialised from environment variables.

// func NewConfig() (Config, error) {

// cfg := Config{}

// return cfg, err

// }

marclop · 2025-01-08T03:50:57Z

functionaltests/8_15_test.go

+	// Manual tests had failures due to only 4 data streams being reported
+	// when no delay was used. Manual inspection always revealed the correct
+	// number of data streams.
+	time.Sleep(1 * time.Minute)


I'm not sure I love the 1m wait approach. If we know ahead of time how many documents we are going to end up with, wouldn't it be best to ensure that the aggregate result of ApmDocCount() is at least that number? That'll avoid arbitrary waiting times here and there.

The upside of this approach was that it had a very simple implementation. I refactored it to verify docs count in some data streams in fd73107 (#14935). Only data streams not related to aggregations are checked, as aggregations one have different numbers of documents across runs.

I also had to add a check on aggregations data streams, as test where sometimes failing. Added in c48e063 (#14935). As the number of document isn't know in advanced I used the docs count not changing as a proxy. In my test this prevented the failure in 1m aggregation data streams to happen again.

marclop · 2025-01-08T03:54:27Z

functionaltests/8_15_test.go

+	t.Log("check number of documents")
+	newCount, err := ac.ApmDocCount(ctx)
+	require.NoError(t, err)
+	assertDocCountEqual(t, oldCount, newCount)


Are we asserting that the previously indexed data isn't lost somehow? If that's the case or if I've missunderstood, can you add a comment to explain why this is necssary?

yes, that's the reason. It's not necessary but helps ensuring that after upgrade the state didn't change and we can safely proceed with further ingestion and assertions. I added a clarifying comment in fd73107 (#14935)

We never expect this to fail unless something broke during the upgrade that affected APM data.

marclop · 2025-01-08T03:55:25Z

functionaltests/8_15_test.go

+	t.Log("check number of documents")
+	newCount2, err := ac.ApmDocCount(ctx)
+	require.NoError(t, err)
+	assertDocCountGreaterThan(t, oldCount, newCount2)


Greater than is fine but wouldn't it be best to assert the exact number of documents? Or is this something that we expect to change over time?

As part of the refactoring in fd73107 (#14935) I also change this to assert the delta number of documents after the second ingestion run.

Refactor ingest helper into a separate package that wraps apm-perf telemetrygen. The new package provides a similar interface as telemetrygen and adds the RunBlockingWait. This function runs telemetry generation and poll the ES cluster until expected doc count is reached or timeout, whichever comes first. NOTE that RunBlockingWait only waits for docs count in data streams that have a fixed amount of documents after ingestion, ignoring aggregation related data stream which have different document counts on different runs. It updates (esclient.Client).ApmDocCount() to return an easier to assert on value that is then used in RunBlockingWait. To avoid a circular dependency telemetrygen does not use the type alias but the underlying data type. It replaces assertDocCountGreaterThan and assertDocCountEqual with a single assertDocCount helper that uses the new data structure and supports considering previous results in the comparison.

ubuntu-latest is now Ubuntu 24.04 and terraform CLI is not installed anymore. See https://github.com/actions/runner-images/blob/ubuntu24/20250105.1/images/ubuntu/Ubuntu2404-Readme.md The manual step requires manual updates to Terraform version but avoids similar scenarios in the future

endorama · 2025-01-09T11:09:55Z

I run the relevant GitHub Actions and discovered an issue with terraform.

Fixed with latest commit: https://github.com/elastic/apm-server/actions/runs/12688157468/job/35364084828

Aggregation data streams may receive data slightly later than data streams we are checking in RunBlockingWait. This leads to proceeding before data streams have stabilized resulting in failures for docs count on 1m aggregation data streams down the line. To avoid it RunBlockingWait now also checks that docs count does not change for 3 iterations consecutively before considering the wait completed.

mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Dec 12, 2024

endorama mentioned this pull request Dec 12, 2024

Add functional tests for ILM & Data Stream Lifecycle #14100

Open

v1v reviewed Dec 13, 2024

View reviewed changes

functionaltests/TestUpgrade_8_15_4_to_8_16_0/main.tf Outdated Show resolved Hide resolved

endorama force-pushed the functionaltests branch from 8111836 to 20c7e57 Compare December 13, 2024 18:46

v1v and others added 2 commits December 27, 2024 17:22

github-actions: support for functional-tests

874c46a

add functionaltests framework

5ba3747

Add utility packages to implement functional tests and Terraform code to bootstrap a Elastic Cloud deployment. Implement first test verifying upgrade from 8.15.4 to 8.16.0 for a fresh Elastic Cloud deployment.

endorama force-pushed the functionaltests branch from 20c7e57 to 5ba3747 Compare December 27, 2024 16:38

endorama added 7 commits December 27, 2024 17:42

goimports

1707372

add license

265c38f

remove apm-perf override

81f1b8b

remove comment

8edc826

support prod region

a490d4f

production -> pro

999d77a

fix production region

c14da87

endorama marked this pull request as ready for review December 31, 2024 16:40

endorama requested a review from a team as a code owner December 31, 2024 16:40

Merge branch 'main' into functionaltests

b34e570

v1v reviewed Jan 7, 2025

View reviewed changes

.github/workflows/functional-tests.yml Show resolved Hide resolved

.github/workflows/functional-tests.yml Outdated Show resolved Hide resolved

Apply suggestions from code review

8747630

Co-authored-by: Victor Martinez <[email protected]>

marclop reviewed Jan 8, 2025

View reviewed changes

endorama added 5 commits January 9, 2025 09:44

fix target cli flag

a031acb

remove NewConfig

bcc83e3

return errs from ingest

06dd665

require in ecAPICheck

e182ae3

endorama added 4 commits January 9, 2025 09:44

reuse ec_deployment infra tf module

410fab3

goimports

991a9f1

fix RunBlockingWait

34530df

endorama added 2 commits January 9, 2025 17:00

add debugging info to assertDatastreams

e00ef0a

endorama requested a review from marclop January 9, 2025 16:05

v1v approved these changes Jan 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

functionaltests: add first test case #14935

functionaltests: add first test case #14935

endorama commented Dec 12, 2024

mergify bot commented Dec 12, 2024

mergify bot commented Dec 12, 2024

v1v Dec 13, 2024

endorama Dec 13, 2024

endorama Dec 13, 2024

v1v Dec 13, 2024

v1v Dec 13, 2024

endorama Dec 13, 2024

v1v Dec 13, 2024

endorama Dec 13, 2024

endorama Dec 27, 2024

endorama commented Dec 31, 2024

marclop left a comment

marclop Jan 8, 2025

endorama Jan 9, 2025

marclop Jan 8, 2025

endorama Jan 9, 2025

marclop Jan 8, 2025

endorama Jan 9, 2025

marclop Jan 8, 2025

endorama Jan 9, 2025

marclop Jan 8, 2025

marclop Jan 8, 2025

endorama Jan 9, 2025

endorama Jan 9, 2025

marclop Jan 8, 2025

endorama Jan 9, 2025

marclop Jan 8, 2025

endorama Jan 9, 2025

endorama commented Jan 9, 2025

functionaltests: add first test case #14935

Are you sure you want to change the base?

functionaltests: add first test case #14935

Conversation

endorama commented Dec 12, 2024

Motivation/summary

Checklist

How to test these changes

Related issues

mergify bot commented Dec 12, 2024

mergify bot commented Dec 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

endorama commented Dec 31, 2024

marclop left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

endorama commented Jan 9, 2025