You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the test integration framework has the capabilities to upload and test a network to a cloud environment (specifically GCP) but in addition to this, we want to add functionality to deploy and test a network on a local user machine. The cloud integration uses Terraform and Helm to deploy to a GKE environment to deploy the network and specified nodes. While this method works for cloud environments, we would like a more lightweight solution to run locally. Thus, we have chosen Docker Swarm as the tool of choice for container orchestration.
Docker swarm can configure a "swarm" on a local machine to deploy and manage containers on that specific swarm. Docker Swarm takes as input a docker-compose file in which all container information is specified and will handle the deployment of all containers on that local swarm. When we want to run a network test locally, we can create a swarm and have all the containers deploy via a docker-compose.json file that is built from specified network configurations. Docker swarm also gives us the ability to get logs of all the containers running in an aggregated way, meaning we do not have to query individual containers for their logs. This gives us a way to apply event filters to specific nodes (block producers, snark workers, seed nodes, etc), and check for test success/failure in a portable way.
Requirements
The new local testing framework should be run on a user's local system using Docker as its main engine to create a network and spawn nodes. This feature will be built on top of the existing Test Executive which runs our cloud integration tests. By implementing the interface specified in src/lib/integration_test_lib/intf.ml, we will have an abstract way to specify different testing engines when running the Test Executive.
The specific interface to implement would be:
(** The signature of integration test engines. An integration test engine * provides the core functionality for deploying, monitoring, and * interacting with networks.*)module type S=sig(* unique name identifying the engine (used in test executive cli) *)valname : stringmoduleNetwork_config : Network_config_intfmoduleNetwork : Network_intfmoduleNetwork_manager :
Network_manager_intfwithmoduleNetwork_config:=Network_configandmodule Network :=NetworkmoduleLog_engine : Log_engine_intfwithmoduleNetwork:=Networkend
end
To implement this interface, a new subdirectory will be created in src/lib named integration_local_engine to hold all implementation details for the local engine.
The new local testing engine must implement all existing features which include:
Starting/Stopping nodes dynamically
Sending GraphQL quries to running nodes
Streaming event logs from nodes for further processing
Spawning nodes based on a test configuration
Additionally, the test engine should take a Docker image as input in the CLI.
An example command of using the local testing framework could look like this:
$ test_executive local send-payment --mina-image codaprotocol/coda-daemon-puppeteered:
1.1.5-compatible --debug | tee test.log | logproc -i inline -f '!(.level in \["Spam", "Debug"\])'
Note that this is very similar to the current command of calling the cloud testing framework.
Detailed Design
Orchestration:
To handle container orchestration, we will be utilizing Docker Swarm to spawn and manage containers. Docker Swarm lets us create a cluster and run containers on a cluster to manage availability. We have opted for Docker Swarm instead of other orchestration tools like Kubernetes due to Docker being much easier to run on a local machine while still giving us much of the same benefits. Kubernetes is more complex and is somewhat overkill for what we are trying to achieve with the local testing framework. Both Docker Swarm and Kubernetes can handle container orchestration but the complexity of dealing with Kubernetes does not give much payoff. Additionally, if we want community members to also use this tool, setting up Kubernetes on end-user systems would be even more of a hassle.
Docker Swarm takes a docker-compose file in which it will generate the desired network state. A cluster can be defined in Docker Swarm by issuing docker swarm init which creates the environment in which all containers will be orchestrated on. In the context of our system, we do not need to take advantage of different machines to run these containers on, rather we will run all containers on the local system. Thus, the end result of the swarm will be all containers running locally while Docker Swarm provides availability and other resource management options.
Creating a docker-compose file for local instead of terraform on cloud
In the current cloud architecture, we launch a given network with Terraform. We specify a Network_config.t data structure which holds all necessary information about creating the network and then it is transformed into a Terraform file like so:
typeterraform_config =
{ k8s_context: string;cluster_name: string;cluster_region: string;aws_route53_zone_id: string;testnet_name: string;deploy_graphql_ingress: bool;coda_image: string;coda_agent_image: string;coda_bots_image: string;coda_points_image: string;coda_archive_image: string(* this field needs to be sent as a string to terraform, even though it's a json encoded value *);runtime_config: Yojson.Safe.t
[@to_yojson funj -> `String (Yojson.Safe.to_string j)]
; block_producer_configs: block_producer_config list
; log_precomputed_blocks: bool
; archive_node_count: int
; mina_archive_schema: string
; snark_worker_replicas: int
; snark_worker_fee: string
; snark_worker_public_key: string }
[@@deriving to_yojson]
typet =
{ coda_automation_location: string;debug_arg: bool;keypairs: Network_keypair.tlist;constants: Test_config.constants;terraform: terraform_config }
[@@deriving to_yojson]
We launch the network after all configuration has been applied by running terraform apply
We can leverage some of this existing work by specifying a config for Docker Swarm instead. Docker Compose can use a docker-compose file (which can be specified as a .json file https://docs.docker.com/compose/faq/#can-i-use-json-instead-of-yaml-for-my-compose-file) to launch containers on a given swarm environment. The interface can look mostly the same while cutting out a lot of the specific information needed by Terraform.
By taking a Network_config.t struct, we can transform the data structure into a corresponding docker-compose file that specifies all containers to run as well as any other configurations.
After computing the corresponding docker-compose file, we can simply call docker stack deploy -c local-docker-compose.json testnet_name
The resulting docker-compose.json file can have a service for each type of node that we want to spawn. Services in Docker Swarm are similar to pods in Kubernetes as they will schedule containers to nodes to run specified tasks.
A very generic example format of what the docker-compose.json could look as follows:
Docker Swarm aggregates all logs from containers based on the running services. This makes it easy for us to parse out all logs on a container level without specifying specific containers.
The following is an example of the logs aggregated by Docker Swarm with 2 containers running the ping command.
$ docker service create --name ping --replicas 2 alpine ping 8.8.8.8$ docker service logs ping
ping.2.odlt7ajje64e@node1 |PING8.8.8.8 (8.8.8.8): 56 data bytes
...
ping.1.egjtdoz7tvkt@node1 |PING8.8.8.8 (8.8.8.8): 56 data bytes
...
For our use case, we can specify different node types to be different services. For example, in our docker-compose configuration, we could specify a service for seed nodes, block producers, and snark workers and parse out the logs individually for each service. We can additionally do further computation on the logs to parse out which container is emitting these logs for a more granular level.
These logs can be polled on an interval and processed by a filter as they come in.
Interface To Develop:
The current logging for the cloud framework is done by creating a Google Stackdriver subscription and issuing poll requests for logs while doing some pre-defined filtering.
A similar interface can be written for Docker-Swarm instead. By defining a Service.pull function with a given logger, we can leverage a lot of the work already done by modifying parts of the code where the log formats diverge. All logs can be specified to an output stream, such as stdout or a specified file by the user on their local system.
Work Breakdown/Prio
The following will be a work breakdown of what needs to be done to see this feature to completion:
Implement the Network_Config interface to accept a network configuration and create a corresponding docker-compose.json file.
Implement the Network_manager interface to take a corresponding docker-compose.json file and create a local swarm with the specified container configuration
Implement functionality to simply log all container logs into a single stream (stdout or a file, maybe this can be specified in startup?)
Implement filter on event functionality
Ensure that current integration test specs are able to run on the local framework with success
Unresolved Questions
Is compiling a docker-compose file the right approach for scheduling the containers? The nice thing about using a docker-compose file is that all network management should be automatic.
Is using a different service for each type of node the best effective approach? Would it be better to launch all nodes under the same service in the docker-compose file?
Is polling each service and then aggregating those logs the best approach? Would it be better to do filtering before aggregating?
Does this plan capture the overall direction we want the local testing framework to go?
The text was updated successfully, but these errors were encountered:
Summary
Currently, the test integration framework has the capabilities to upload and test a network to a cloud environment (specifically GCP) but in addition to this, we want to add functionality to deploy and test a network on a local user machine. The cloud integration uses Terraform and Helm to deploy to a GKE environment to deploy the network and specified nodes. While this method works for cloud environments, we would like a more lightweight solution to run locally. Thus, we have chosen
Docker Swarm
as the tool of choice for container orchestration.Docker swarm can configure a "swarm" on a local machine to deploy and manage containers on that specific swarm. Docker Swarm takes as input a
docker-compose
file in which all container information is specified and will handle the deployment of all containers on that local swarm. When we want to run a network test locally, we can create a swarm and have all the containers deploy via adocker-compose.json
file that is built from specified network configurations. Docker swarm also gives us the ability to get logs of all the containers running in an aggregated way, meaning we do not have to query individual containers for their logs. This gives us a way to apply event filters to specific nodes (block producers, snark workers, seed nodes, etc), and check for test success/failure in a portable way.Requirements
The new local testing framework should be run on a user's local system using Docker as its main engine to create a network and spawn nodes. This feature will be built on top of the existing Test Executive which runs our cloud integration tests. By implementing the interface specified in
src/lib/integration_test_lib/intf.ml
, we will have an abstract way to specify different testing engines when running the Test Executive.The specific interface to implement would be:
To implement this interface, a new subdirectory will be created in
src/lib
namedintegration_local_engine
to hold all implementation details for the local engine.The new local testing engine must implement all existing features which include:
Additionally, the test engine should take a Docker image as input in the CLI.
An example command of using the local testing framework could look like this:
Note that this is very similar to the current command of calling the cloud testing framework.
Detailed Design
Orchestration:
To handle container orchestration, we will be utilizing
Docker Swarm
to spawn and manage containers. Docker Swarm lets us create a cluster and run containers on a cluster to manage availability. We have opted for Docker Swarm instead of other orchestration tools like Kubernetes due to Docker being much easier to run on a local machine while still giving us much of the same benefits. Kubernetes is more complex and is somewhat overkill for what we are trying to achieve with the local testing framework. Both Docker Swarm and Kubernetes can handle container orchestration but the complexity of dealing with Kubernetes does not give much payoff. Additionally, if we want community members to also use this tool, setting up Kubernetes on end-user systems would be even more of a hassle.Docker Swarm takes a
docker-compose
file in which it will generate the desired network state. A cluster can be defined in Docker Swarm by issuingdocker swarm init
which creates the environment in which all containers will be orchestrated on. In the context of our system, we do not need to take advantage of different machines to run these containers on, rather we will run all containers on the local system. Thus, the end result of the swarm will be all containers running locally while Docker Swarm provides availability and other resource management options.Creating a docker-compose file for local instead of terraform on cloud
In the current cloud architecture, we launch a given network with
Terraform
. We specify aNetwork_config.t
data structure which holds all necessary information about creating the network and then it is transformed into aTerraform
file like so:mina/src/lib/integration_test_cloud_engine/coda_automation.ml
Line 35 in 67cc420
We launch the network after all configuration has been applied by running
terraform apply
We can leverage some of this existing work by specifying a config for Docker Swarm instead. Docker Compose can use a
docker-compose
file (which can be specified as a.json
file https://docs.docker.com/compose/faq/#can-i-use-json-instead-of-yaml-for-my-compose-file) to launch containers on a given swarm environment. The interface can look mostly the same while cutting out a lot of the specific information needed by Terraform.By taking a
Network_config.t
struct, we can transform the data structure into a correspondingdocker-compose
file that specifies all containers to run as well as any other configurations.After computing the corresponding
docker-compose
file, we can simply calldocker stack deploy -c local-docker-compose.json testnet_name
The resulting
docker-compose.json
file can have a service for each type of node that we want to spawn. Services in Docker Swarm are similar to pods in Kubernetes as they will schedule containers to nodes to run specified tasks.A very generic example format of what the
docker-compose.json
could look as follows:Logging:
Docker Swarm aggregates all logs from containers based on the running services. This makes it easy for us to parse out all logs on a container level without specifying specific containers.
The following is an example of the logs aggregated by Docker Swarm with 2 containers running the ping command.
For our use case, we can specify different node types to be different services. For example, in our docker-compose configuration, we could specify a service for seed nodes, block producers, and snark workers and parse out the logs individually for each service. We can additionally do further computation on the logs to parse out which container is emitting these logs for a more granular level.
These logs can be polled on an interval and processed by a filter as they come in.
Interface To Develop:
The current logging for the cloud framework is done by creating a Google Stackdriver subscription and issuing poll requests for logs while doing some pre-defined filtering.
An example of this is shown below:
mina/src/lib/integration_test_cloud_engine/stack_driver_log_engine.ml
Line 269 in 67cc420
A similar interface can be written for Docker-Swarm instead. By defining a
Service.pull
function with a given logger, we can leverage a lot of the work already done by modifying parts of the code where the log formats diverge. All logs can be specified to an output stream, such as stdout or a specified file by the user on their local system.Work Breakdown/Prio
The following will be a work breakdown of what needs to be done to see this feature to completion:
Network_Config
interface to accept a network configuration and create a correspondingdocker-compose.json
file.Network_manager
interface to take a correspondingdocker-compose.json
file and create a local swarm with the specified container configurationUnresolved Questions
docker-compose
file the right approach for scheduling the containers? The nice thing about using a docker-compose file is that all network management should be automatic.docker-compose
file?The text was updated successfully, but these errors were encountered: