[0011] Local Integration Extention for Lucy #40

MartinMinkov · 2023-11-20T19:36:49Z

This RFC introduces an extension to the Lucy integration testing framework for the Mina protocol, enabling local integration testing using Docker containers. This new approach is designed to be lightweight, easy to use, and fast, allowing developers and contributors to run integration tests on their local machines without needing a cloud environment.

SanabriaRusso · 2023-11-23T13:13:05Z

0011-local-integration-test-framework.md

+
+By using Docker containers, we can run the network configuration on a local machine without having to worry about the specifics of running the network. Containers are also portable and consistent across all machines, which makes it easy to run the network on any machine (e.g. inside CI).
+
+One could argue that we could leverage the existing Kubernetes infrastructure and use Kubernetes locally instead of targeting the cloud. This is a valid alternative, but the existing Kubernetes backend is tightly coupled towards GKE, which makes it difficult to port over the existing backend to target a local cluster instead. Furthermore, using containers for local network deployment offers a simple mental model for how the network is deployed and managed compared to Kubernetes.


I would like to introduce a Platform Engineering perspective regarding this point 👀 :

While I fully support the idea that developers should use testing tools closer to their domain, I wonder if the ease of configuration offered by docker-compose may lure us away from providing a test environment closer to what is used in a real production/testing environment 🤔.

For instance, creating a local Kubernetes environment and deploying networks with a Helm infrastructure backend would:

(bad) increase the resources footprint of the environment, but

(nice) provides an environment leveraging existing Charts used in main/test-nets. That is, using workflows closer to what is actually done for deployments.

(bad?) It may require Teams to get involved in the design of Helm Charts (with Plat. Eng. Team support) for what is to be deployed, but..

(nice!)... that's the point! IMHO this is key for enabling efficient Continuous Deployment (CD) workflows with the added benefits, like: idempotent deployments, Git as source of truth and GitOps, ownership, same tooling, etc.

In summary, the goal of this perspective is to highlight that investing in adopting an efficient local Kubernetes setup leveraging a Helm infrastructure backend would produce desirable by-products including: optimization of existing infra components for local tests (e.g., Helm Charts), and ease the transition to CD by introducing it early in the development process (i.e., shift-left).

If this seems to be interesting, I can help identifying the (some) of the requirement gaps 💯 .

Fair points!

The idea of adopting Kubernetes or Docker was discussed early on when this tool was thought about. To answer the question of what technology we need, I think it's important to establish what we want this tool really to do. To me (and I could be wrong here, this is why we need this RFC!), this tool is meant to be a solution that engineers can run on their local machine to do a "quick-pass" on features that are written and have a quick feedback loop. It's not my current understanding that this is intended to be run in a cluster on CI (or if it does, just on one machine), I was assuming that was what the cloud integration of Lucy is attempting to solve. If we do want to run this in CI, is it intended to be run on a single server, or on a cluster? If we start zooming out and thinking about using this local integration across a cluster, it feels close to what we do in GKE currently. Maybe this is something that @bkase @nholland94 and @deepthiskumar can shed some light on?

Assuming that this tool is intended to be run on a developers machine (and/or a single server in CI), I think we should optimize for developer experience over similarity of deployment environments. Using a Kubernetes backend to deploy a network locally and run tests on it can make things more complicated for the developer and understand where the tests fail. We have to introduce more things like Helm charts, dealing with Kubernetes networking and potentially more, which in my opinion leads to a worse developer experience. Also, if we plan to use this in CI and we just use one machine, I'm not sure if the choice between Kubernetes or Docker matters too much, what are your thoughts?

If we use Docker containers, things are kept lean and simple; a simple docker compose file is used as the deployment which is easy to understand, and there isn't much more additional tooling needed. Things will be faster, easier to install and setup, and have much less overhead to think about.

What sorts of functional requirements are important for this tool to achieve in your opinion? That really is the thing we want to have a consensus on.

Another question I have is, given your experience with deploying our nodes in a Kubernets environment, are there implementation details that we have to worry about between network differences? I was going under the assumption that Dockers local network will just work without issues, I could be wrong here 😅

I was thinking that even if the introduction of docker-compose deployments could be very simple, it will end up adding, well, something more to maintain and be aware of.

I think that with fairly few fixes the current Helm Charts could be made as generic as needed as to be deployed on local Kubernetes instances on the dev computer (e.g., KiND, K3s, Minikube or other alternatives will do).

Anyway, this is an interesting discussion as it deals with a tradeoff, as you have identified: dev experience vs deployment reliability. The former aims at offering simplicity as heavy abstraction for fast feature tests, while the latter attempts to extend and use existing infra artifacts.

Would love to spark a discussion here 😄; relevantly, to derive more specific requirements about this tool and its introduction as a test tool in CI. Any insight @stevenplatt @dkijania ?

Good call out @SanabriaRusso

From a higher-level, I think this brings us back to the question of boundaries. In a perfect world, when platform-eng isn't busy with other important work, we could argue this "execution layer" should be owned by platform-eng; though even then, it's not clear: In practice with this particular project, an ownership boundary that was too high up has caused the integration testing framework to not evolve as quickly as we would have liked. Anyway, this is why we're experimenting with pushing the boundary between team ownership slightly downwards from where it was before.

Also, pragmatically, @MartinMinkov has a PR that is 85% complete which implements the docker backend! So we can bring this to production within a couple weeks rather than the much larger task of starting from "scratch". This doesn't mean we can't throw it out later, though, if the Helm Charts get to the point where they run successfully locally.

So I'd suggest: Let's finish this project that's 85% done, get us start relying on running the tests locally and on a single CI server. And then separately prioritize cleaning up the Helm Charts. Whenever that is done and works and is cleaner than the docker-compose solution (ie. behaves at least as well, without confusion to devs when they're iterating locally) and then we can evaluate throwing out the docker-compose backend at that point.

0011-local-integration-test-framework.md

bkase · 2023-11-27T17:35:32Z

0011-local-integration-test-framework.md

+
+A further optimization we can do is apply [`logproc`](https://github.com/MinaProtocol/mina/blob/compatible/src/app/logproc/logproc.ml) to all container output before it's written to the pipe. `logproc` can help us filter logs by log level, which will help us reduce the volume of logs that are written to the pipe. This will help us reduce the number of logs that are written to the pipe and the number of logs that Lucy has to parse.
+
+For example, we could write an entry point script for each container that redirects stdout and stderr to the named pipe with a prefix of the container name. This will allow us to filter logs by container name and will allow us to deal with duplicate logs.


The last 3 sections has:

"Furthermore, we can tag all container logs..."

"A further optimization we can do is..."

"For example, we could write..."

Can you be more specific about what you will do rather than what you may do? Either say "this will happen" or for those which we are not scoping these in, just group these in a section of "Future optimizations" and remove the language. It's okay to include optimizations that are optional based on critieria such as "if when I run test X, it takes longer than Y, then we will do Z".

Sounds good! I clarified the text to be more clear in what we will be building in this section. These were intended to be built 👍

bkase · 2023-11-27T17:46:53Z

0011-local-integration-test-framework.md

+
+By using Docker containers, we can run the network configuration on a local machine without having to worry about the specifics of running the network. Containers are also portable and consistent across all machines, which makes it easy to run the network on any machine (e.g. inside CI).
+
+One could argue that we could leverage the existing Kubernetes infrastructure and use Kubernetes locally instead of targeting the cloud. This is a valid alternative, but the existing Kubernetes backend is tightly coupled towards GKE, which makes it difficult to port over the existing backend to target a local cluster instead. Furthermore, using containers for local network deployment offers a simple mental model for how the network is deployed and managed compared to Kubernetes.


Good call out @SanabriaRusso

From a higher-level, I think this brings us back to the question of boundaries. In a perfect world, when platform-eng isn't busy with other important work, we could argue this "execution layer" should be owned by platform-eng; though even then, it's not clear: In practice with this particular project, an ownership boundary that was too high up has caused the integration testing framework to not evolve as quickly as we would have liked. Anyway, this is why we're experimenting with pushing the boundary between team ownership slightly downwards from where it was before.

Also, pragmatically, @MartinMinkov has a PR that is 85% complete which implements the docker backend! So we can bring this to production within a couple weeks rather than the much larger task of starting from "scratch". This doesn't mean we can't throw it out later, though, if the Helm Charts get to the point where they run successfully locally.

So I'd suggest: Let's finish this project that's 85% done, get us start relying on running the tests locally and on a single CI server. And then separately prioritize cleaning up the Helm Charts. Whenever that is done and works and is cleaner than the docker-compose solution (ie. behaves at least as well, without confusion to devs when they're iterating locally) and then we can evaluate throwing out the docker-compose backend at that point.

0011-local-integration-test-framework.md

bkase · 2023-11-27T17:52:33Z

0011-local-integration-test-framework.md

+
+Using Docker Swarm allows us to handle the orchestration of containers without having to worry about the specifics of running containers on a local machine. Docker Swarm will handle all the networking, log collection, and resource management for us.
+
+### Extending Lucy


can you add one more layer of detail here? Since you've already done a good chunk of the work it should be fairly easy for you to figure out how it works.

ie. turn

For the local backend, it will generate a docker swarm file that specifies the network topology configuration.

into something like:

"The Local_network_config will have a type like

module Config_record = struct type t = { foo: int, bar : int } end

and will include logic for encoding it into a docker-swarm file by doing a single pass and pretty-printing the YAML using library X. Here's a code snippet of one such function... etc.

And can you do the same treatment for how the pipe will be shared for the logging?

Sure thing! I added a note about the config type here

The section for the pipe is already included, it will be shared with using volumes across each service configuration

MartinMinkov added 3 commits November 20, 2023 11:35

local integration test framework

01eb76a

remove prior art

9b2dde1

add to unresolved questions

3beb504

MartinMinkov marked this pull request as ready for review November 21, 2023 17:04

MartinMinkov requested review from bkase, nholland94 and mrmr1993 as code owners November 21, 2023 17:04

MartinMinkov added 6 commits November 21, 2023 15:54

add example script for processing logs

327e4d5

add unresolved question for puppeteer dependency

1330d43

grammar

0a10166

spelling

5a5c642

add link for graphql polling

96c3c65

docker swarm instead of compose

b20c28c

SanabriaRusso reviewed Nov 23, 2023

View reviewed changes

mrmr1993 reviewed Nov 27, 2023

View reviewed changes

0011-local-integration-test-framework.md Outdated Show resolved Hide resolved

0011-local-integration-test-framework.md Outdated Show resolved Hide resolved

bkase reviewed Nov 27, 2023

View reviewed changes

MartinMinkov added 3 commits November 27, 2023 09:57

tidy up

ecac840

change future tense to present tense to clarify implementation plans

c50ecf9

add docker_config type for local backend

4d20cf3

MartinMinkov mentioned this pull request Nov 30, 2023

Lucy Local Framework extension MinaProtocol/mina#14648

Merged

add note about outdated node communication/logging section

1018c24

MartinMinkov mentioned this pull request Jan 11, 2024

Initial draft of local test framework RFC MinaProtocol/mina#8896

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[0011] Local Integration Extention for Lucy #40

[0011] Local Integration Extention for Lucy #40

MartinMinkov commented Nov 20, 2023 •

edited

Loading

SanabriaRusso Nov 23, 2023

MartinMinkov Nov 23, 2023 •

edited

Loading

SanabriaRusso Nov 24, 2023 •

edited

Loading

bkase Nov 27, 2023

bkase Nov 27, 2023

MartinMinkov Nov 27, 2023

bkase Nov 27, 2023

bkase Nov 27, 2023

MartinMinkov Nov 27, 2023


		By using Docker containers, we can run the network configuration on a local machine without having to worry about the specifics of running the network. Containers are also portable and consistent across all machines, which makes it easy to run the network on any machine (e.g. inside CI).

		One could argue that we could leverage the existing Kubernetes infrastructure and use Kubernetes locally instead of targeting the cloud. This is a valid alternative, but the existing Kubernetes backend is tightly coupled towards GKE, which makes it difficult to port over the existing backend to target a local cluster instead. Furthermore, using containers for local network deployment offers a simple mental model for how the network is deployed and managed compared to Kubernetes.


		A further optimization we can do is apply [`logproc`](https://github.com/MinaProtocol/mina/blob/compatible/src/app/logproc/logproc.ml) to all container output before it's written to the pipe. `logproc` can help us filter logs by log level, which will help us reduce the volume of logs that are written to the pipe. This will help us reduce the number of logs that are written to the pipe and the number of logs that Lucy has to parse.

		For example, we could write an entry point script for each container that redirects stdout and stderr to the named pipe with a prefix of the container name. This will allow us to filter logs by container name and will allow us to deal with duplicate logs.


		Using Docker Swarm allows us to handle the orchestration of containers without having to worry about the specifics of running containers on a local machine. Docker Swarm will handle all the networking, log collection, and resource management for us.

		### Extending Lucy

[0011] Local Integration Extention for Lucy #40

Are you sure you want to change the base?

[0011] Local Integration Extention for Lucy #40

Conversation

MartinMinkov commented Nov 20, 2023 • edited Loading

SanabriaRusso Nov 23, 2023

Choose a reason for hiding this comment

MartinMinkov Nov 23, 2023 • edited Loading

Choose a reason for hiding this comment

SanabriaRusso Nov 24, 2023 • edited Loading

Choose a reason for hiding this comment

bkase Nov 27, 2023

Choose a reason for hiding this comment

bkase Nov 27, 2023

Choose a reason for hiding this comment

MartinMinkov Nov 27, 2023

Choose a reason for hiding this comment

bkase Nov 27, 2023

Choose a reason for hiding this comment

bkase Nov 27, 2023

Choose a reason for hiding this comment

MartinMinkov Nov 27, 2023

Choose a reason for hiding this comment

MartinMinkov commented Nov 20, 2023 •

edited

Loading

MartinMinkov Nov 23, 2023 •

edited

Loading

SanabriaRusso Nov 24, 2023 •

edited

Loading