-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Local Test Framework] - Docker Network Config #9000
Conversation
Import the engine interface that will be implemented for the local engine. Additionally add the local engine as a CLI flag to the test_executive
This adds the functionality to create a base docker-compose file from the specified test configs in the integration framework. Each node will be assigned to a service that will be running a container for each node in the network. This implementation currently supports creating a docker-compose file, creating a docker stack to run each container, and basic monitoring before continuing the rest of the test_executive execution. Lots of code in this PR has been stubbed and copied to make the compiler happy. Much of this code will be refactored to support the changes needed for the local engine later.
|
||
let deploy t = | ||
if t.deployed then failwith "network already deployed" ; | ||
[%log' info t.logger] "Deploying network" ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could add let logger = t.logger in ...
and then just use %log
instead of %log'
throughout this function
Util.run_cmd_exn cwd "sh" ["-c"; kubectl_cmd] | ||
|
||
(* | ||
Maybe we get rid of this? We are redirecting all logs to stdout which the test executive will pick up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the redirected logs are the ones from the container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this comment is a little misleading. It was more of a reminder to myself that the logging infrastructure for the local engine will be different than the cloud engine and that this function is subject to change later down the road.
Adds support for snark coordinator and workers based off the test configurations passed into the test executive. Right now, we only implement 1 snark coordinator for all other snark workers. Additionally added a new file for all the default Mina node commands and environment variables.
run_cmd_exn t "docker" | ||
["stack"; "deploy"; "-c"; "compose.json"; t.stack_name] | ||
in | ||
[%log' info t.logger] "DEPLOYED STRING: %s" s ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This snuck in while debugging, will remove.
@@ -0,0 +1,19 @@ | |||
open Cmdliner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you don't need any cli inputs for the local engine (at least not yet). You can just replace this file with:
open Cmdliner
type t = unit
let term = Term.const ()
in | ||
let all_stacks = String.split ~on:'\n' all_stacks_str in | ||
let testnet_dir = | ||
network_config.mina_automation_location |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably just make this a constant directory that is relative to where the executable is run. I don't think you will need the mina_automation_location
for anything else.
|
||
module Compose = struct | ||
module DockerMap = struct | ||
type 'a t = (string, 'a, String.comparator_witness) Map.t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is better summarized as type 'a t = 'a String.Map.t
.
Given that this module only has this type def and the empty
value, you should probably just delete this module and use String.Map
directly (it has both 'a t
and empty
on it already).
let create name = | ||
{type_= "bind"; source= "." ^/ name; target= "/root" ^/ name} | ||
|
||
let to_yojson {type_; source; target} = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need a custom deriver for this. [@@deriving to_yojson]
should work on this type def if you use the [@key "type"]
decorator for the type_
field (see https://github.com/ocaml-ppx/ppx_deriving_yojson#key).
type t = string DockerMap.t | ||
|
||
let create = | ||
List.fold ~init:DockerMap.empty ~f:(fun accum env -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can just be String.Map.of_alist
open Mina_base | ||
open Integration_test_lib | ||
|
||
let version = "3.9" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this docker_version
so it's more obvious what this string is versioning?
@@ -0,0 +1,462 @@ | |||
open Core |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File name nit: It's a little confusing to call this module mina_automation
given that it doesn't interact with our automation/
codebase. Recommend renaming to Docker_network
or Mina_docker
or something like that.
; docker: docker_config } | ||
[@@deriving to_yojson] | ||
|
||
let expand ~logger ~test_name ~(cli_inputs : Cli_inputs.t) ~(debug : bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can work later to improve the code-sharing between this and the existing cloud implementation (there are some shared steps).
let open Node_constants in | ||
let runtime_config = Service.Volume.create "runtime_config" in | ||
let blocks_seed_map = | ||
List.fold network_config.docker.block_producer_configs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of folding over empty maps to fill them up, you can also build an associative list and construct the map from that. Eg, List.map ... ~f:(fun ... -> (k, v)) |> String.Map.of_alist
.
open Async | ||
open Integration_test_lib | ||
|
||
(* Parts of this implementation is still untouched from the Kubernetes implementation. This is done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should work on extracting the graphql definitions from the engine layer. Those should be in the shared Integration_test_lib
library, and the engine should just be in charge of the transport logic for sending the queries. We can revisit this later though, not in this PR.
; archive_node_count: int | ||
; mina_archive_schema: string | ||
; runtime_config: Yojson.Safe.t | ||
[@to_yojson fun j -> `String (Yojson.Safe.to_string j)] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know how the resource allocation works? Maybe that needs to be configurable as well. I know that the default memory allocated by docker on mac is not sufficient to run a node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Will address in a follow up PR.
Import the engine interface that will be implemented for the local engine. Additionally add the local engine as a CLI flag to the test_executive
This adds the functionality to create a base docker-compose file from the specified test configs in the integration framework. Each node will be assigned to a service that will be running a container for each node in the network. This implementation currently supports creating a docker-compose file, creating a docker stack to run each container, and basic monitoring before continuing the rest of the test_executive execution. Lots of code in this PR has been stubbed and copied to make the compiler happy. Much of this code will be refactored to support the changes needed for the local engine later.
Adds support for snark coordinator and workers based off the test configurations passed into the test executive. Right now, we only implement 1 snark coordinator for all other snark workers. Additionally added a new file for all the default Mina node commands and environment variables.
Addressed the feedback given in the Network_Config PR. Made the docker_compose.ml file more simple to read with additional refactors to mina_docker.ml. Additionally removed a lot of the copied over code in swarm_network to clean up the file.
…e/local-engine-network-config
…Protocol/mina into feature/local-engine-network-config
let stack_name = "it-" ^ user ^ "-" ^ git_commit ^ "-" ^ test_name in | ||
(* GENERATE ACCOUNTS AND KEYPAIRS *) | ||
let num_block_producers = List.length block_producers in | ||
let block_producer_keypairs, runtime_accounts = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably stick this logic in some shared function that can be called
} | ||
[@@deriving to_yojson] | ||
|
||
let expand ~logger ~test_name ~cli_inputs:_ ~(debug : bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a large amount of this func doesn't look implementation specific, i feel like we should find a place in the /integration_test_lib dir for many of the pieces. keys, block producer configs, constants, names of things, etc. probably shoulda done this to begin with haha. if we wanna just merge this quicker then no worries but would be nice to rationalize it a bit more, if not now then later for sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally, Nathan mentioned refactoring a lot of the test setup steps into something more reusable. I didn't go down that route just because I didn't want to be refactoring the cloud engine at the same time in this PR. This will be an easy enough change in a separate PR as long as we keep the new interface consistent.
let%bind.Deferred cwd = Unix.getcwd () in | ||
Malleable_error.return (Util.run_cmd_exn cwd "sh" [ "-c"; docker_cmd ]) | ||
|
||
let start ~fresh_state node : unit Malleable_error.t = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all the stuff with the /stop /start puppeteer scripts was kind of a hacky solution we had that mostly had specifically to do with problems with the GCP deployment. mb @nholland94 can give more background about the gory details. i think we should avoid this if we have time. ideally, it would be best for the local engine to just figure out some docker command and make docker shut the service down and start it back up again, or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree that a better solution would be preferable but, I think the puppeteer scripts are actually extremely useful for getting this started. I'm not sure if we have plans to refactor the puppeteer scripts out of the cloud engine but being able to start all node containers via start and stop commands is really useful. I don't think there is a way to do this natively in docker-compose unless I missed something :/
Added support for archive nodes in the local engine. If archive nodes are specified in the test plan, the test executive will download the archive schema to the working directory to use the schema as a bind mount for the postgres container. The archive node will then connect to the postgres container as usual.
Renamed node_config to docker_node_config and refactored some of the modules to have more static references baked in. Additionally refactored mina_docker to use these static references instead of string values in many places.
Added functionality for GraphQL networking to the docker containers. This work was mostly just adding the '--insecure-rest-server' flag to the nodes and opening up the corresponding rest port within the docker-compose file. To allow communication to the docker network from localhost, we map ports on the host starting at 7000 to port 3085 to the node which is used for GraphQL communication.
Moved too #14648 |
Adds the initial implementation of the local test framework. This PR includes the implementation for the
Network_config
interface which adds an interface for defining a Docker Swarm for the specified network. Additionally has some logic for polling the status of the services and starting each puppeteered node if the services are live.An example command to invoke the local test framework after building is:
This will create a directory in the
./automation
folder that contains thecompose
file that has the specification for the Docker Swarm and some test configurations as files. These test configuration files are files such as theruntime
and block producer keys. The following is an example of such file:The local engine runs this file with
docker stack deploy -c ... <swarm_name>
. The local engine will initialize the Docker Swarm and start each puppeteered node once they pass a service health check.TODO:
Network_manager
since most of the implementation has been copied from the cloud version.Implement archive nodeQA snark coordinator node