Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Local Test Framework] - Docker Network Config #9000

Closed
wants to merge 18 commits into from

Conversation

MartinMinkov
Copy link
Contributor

@MartinMinkov MartinMinkov commented Jun 8, 2021

Adds the initial implementation of the local test framework. This PR includes the implementation for the Network_config interface which adds an interface for defining a Docker Swarm for the specified network. Additionally has some logic for polling the status of the services and starting each puppeteered node if the services are live.

An example command to invoke the local test framework after building is:

test_executive local payments --debug --coda-image codaprotocol/coda-daemon-puppeteered:1.1.5-compatible | logproc -i inline -f '!(.level in ["Spam", "Debug"])'

This will create a directory in the ./automation folder that contains the compose file that has the specification for the Docker Swarm and some test configurations as files. These test configuration files are files such as the runtime and block producer keys. The following is an example of such file:

{
  "version": "3.9",
  "services": {
    "seed": {
      "image": "codaprotocol/coda-daemon-puppeteered:1.1.5-compatible",
      "volumes": [
        {
          "type": "bind",
          "source": "./runtime_config",
          "target": "/root/runtime_config"
        }
      ],
      "deploy": { "replicas": 1 },
      "command": [
        "daemon", "-log-level", "Debug", "-log-json",
        "-log-snark-work-gossip", "true", "-client-port", "8301",
        "-generate-genesis-proof", "true", "-peer",
        "/dns4/mina_seed-node/tcp/10401/p2p/12D3KooWCoGWacXE4FRwAX8VqhnWVKhz5TTEecWEuGmiNrDt2XLf",
        "-seed", "-config-file", "/root/runtime_config"
      ],
      "environment": {
        "CODA_LIBP2P_PASS": "",
        "DAEMON_CLIENT_PORT": "8301",
        "DAEMON_METRICS_PORT": "10001",
        "DAEMON_REST_PORT": "3085"
      }
    },
    "test-block-producer-1": {
      "image": "codaprotocol/coda-daemon-puppeteered:1.1.5-compatible",
      "volumes": [
        ...
      ],
      "deploy": { "replicas": 1 },
      "command": [
        ...
      ],
      "environment": {
       ...
      }
    },
    "test-block-producer-2": {
      "image": "codaprotocol/coda-daemon-puppeteered:1.1.5-compatible",
      "volumes": [
        ...
      ],
      "deploy": { "replicas": 1 },
      "command": [
        ...
      ],
      "environment": {
        ...
      }
    },
    "test-block-producer-3": {
      "image": "codaprotocol/coda-daemon-puppeteered:1.1.5-compatible",
      "volumes": [
        ...
      ],
      "deploy": { "replicas": 1 },
      "command": [
        ...
      ],
      "environment": {
        ...
      }
    }
  }
}

The local engine runs this file with docker stack deploy -c ... <swarm_name>. The local engine will initialize the Docker Swarm and start each puppeteered node once they pass a service health check.

TODO:

  • Implement a more sophisticated Network_manager since most of the implementation has been copied from the cloud version.
  • Implement archive node
  • QA snark coordinator node

Import the engine interface that will be implemented for the local
engine. Additionally add the local engine as a CLI flag to the
test_executive
This adds the functionality to create a base docker-compose file from
the specified test configs in the integration framework. Each node will
be assigned to a service that will be running a container for each node
in the network.

This implementation currently supports creating a docker-compose file,
creating a docker stack to run each container, and basic monitoring
before continuing the rest of the test_executive execution.

Lots of code in this PR has been stubbed and copied to make the compiler
happy. Much of this code will be refactored to support the changes
needed for the local engine later.
@MartinMinkov MartinMinkov requested a review from a team as a code owner June 8, 2021 19:27
@MartinMinkov MartinMinkov linked an issue Jun 8, 2021 that may be closed by this pull request
@MartinMinkov MartinMinkov changed the base branch from develop to compatible June 8, 2021 22:18

let deploy t =
if t.deployed then failwith "network already deployed" ;
[%log' info t.logger] "Deploying network" ;
Copy link
Member

@psteckler psteckler Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could add let logger = t.logger in ... and then just use %log instead of %log' throughout this function

Util.run_cmd_exn cwd "sh" ["-c"; kubectl_cmd]

(*
Maybe we get rid of this? We are redirecting all logs to stdout which the test executive will pick up
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the redirected logs are the ones from the container?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this comment is a little misleading. It was more of a reminder to myself that the logging infrastructure for the local engine will be different than the cloud engine and that this function is subject to change later down the road.

Adds support for snark coordinator and workers based off the test
configurations passed into the test executive. Right now, we only
implement 1 snark coordinator for all other snark workers.

Additionally added a new file for all the default Mina node commands and
environment variables.
run_cmd_exn t "docker"
["stack"; "deploy"; "-c"; "compose.json"; t.stack_name]
in
[%log' info t.logger] "DEPLOYED STRING: %s" s ;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This snuck in while debugging, will remove.

@MartinMinkov MartinMinkov linked an issue Jun 11, 2021 that may be closed by this pull request
@@ -0,0 +1,19 @@
open Cmdliner
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you don't need any cli inputs for the local engine (at least not yet). You can just replace this file with:

open Cmdliner
type t = unit
let term = Term.const ()

in
let all_stacks = String.split ~on:'\n' all_stacks_str in
let testnet_dir =
network_config.mina_automation_location
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably just make this a constant directory that is relative to where the executable is run. I don't think you will need the mina_automation_location for anything else.


module Compose = struct
module DockerMap = struct
type 'a t = (string, 'a, String.comparator_witness) Map.t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better summarized as type 'a t = 'a String.Map.t.

Given that this module only has this type def and the empty value, you should probably just delete this module and use String.Map directly (it has both 'a t and empty on it already).

let create name =
{type_= "bind"; source= "." ^/ name; target= "/root" ^/ name}

let to_yojson {type_; source; target} =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need a custom deriver for this. [@@deriving to_yojson] should work on this type def if you use the [@key "type"] decorator for the type_ field (see https://github.com/ocaml-ppx/ppx_deriving_yojson#key).

type t = string DockerMap.t

let create =
List.fold ~init:DockerMap.empty ~f:(fun accum env ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be String.Map.of_alist

open Mina_base
open Integration_test_lib

let version = "3.9"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this docker_version so it's more obvious what this string is versioning?

@@ -0,0 +1,462 @@
open Core
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File name nit: It's a little confusing to call this module mina_automation given that it doesn't interact with our automation/ codebase. Recommend renaming to Docker_network or Mina_docker or something like that.

; docker: docker_config }
[@@deriving to_yojson]

let expand ~logger ~test_name ~(cli_inputs : Cli_inputs.t) ~(debug : bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can work later to improve the code-sharing between this and the existing cloud implementation (there are some shared steps).

let open Node_constants in
let runtime_config = Service.Volume.create "runtime_config" in
let blocks_seed_map =
List.fold network_config.docker.block_producer_configs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of folding over empty maps to fill them up, you can also build an associative list and construct the map from that. Eg, List.map ... ~f:(fun ... -> (k, v)) |> String.Map.of_alist.

open Async
open Integration_test_lib

(* Parts of this implementation is still untouched from the Kubernetes implementation. This is done
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should work on extracting the graphql definitions from the engine layer. Those should be in the shared Integration_test_lib library, and the engine should just be in charge of the transport logic for sending the queries. We can revisit this later though, not in this PR.

; archive_node_count: int
; mina_archive_schema: string
; runtime_config: Yojson.Safe.t
[@to_yojson fun j -> `String (Yojson.Safe.to_string j)] }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know how the resource allocation works? Maybe that needs to be configurable as well. I know that the default memory allocated by docker on mac is not sufficient to run a node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Will address in a follow up PR.

Import the engine interface that will be implemented for the local
engine. Additionally add the local engine as a CLI flag to the
test_executive
This adds the functionality to create a base docker-compose file from
the specified test configs in the integration framework. Each node will
be assigned to a service that will be running a container for each node
in the network.

This implementation currently supports creating a docker-compose file,
creating a docker stack to run each container, and basic monitoring
before continuing the rest of the test_executive execution.

Lots of code in this PR has been stubbed and copied to make the compiler
happy. Much of this code will be refactored to support the changes
needed for the local engine later.
Adds support for snark coordinator and workers based off the test
configurations passed into the test executive. Right now, we only
implement 1 snark coordinator for all other snark workers.

Additionally added a new file for all the default Mina node commands and
environment variables.
Addressed the feedback given in the Network_Config PR. Made the
docker_compose.ml file more simple to read with additional refactors to
mina_docker.ml. Additionally removed a lot of the copied over code in
swarm_network to clean up the file.
…Protocol/mina into feature/local-engine-network-config
let stack_name = "it-" ^ user ^ "-" ^ git_commit ^ "-" ^ test_name in
(* GENERATE ACCOUNTS AND KEYPAIRS *)
let num_block_producers = List.length block_producers in
let block_producer_keypairs, runtime_accounts =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably stick this logic in some shared function that can be called

}
[@@deriving to_yojson]

let expand ~logger ~test_name ~cli_inputs:_ ~(debug : bool)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a large amount of this func doesn't look implementation specific, i feel like we should find a place in the /integration_test_lib dir for many of the pieces. keys, block producer configs, constants, names of things, etc. probably shoulda done this to begin with haha. if we wanna just merge this quicker then no worries but would be nice to rationalize it a bit more, if not now then later for sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally, Nathan mentioned refactoring a lot of the test setup steps into something more reusable. I didn't go down that route just because I didn't want to be refactoring the cloud engine at the same time in this PR. This will be an easy enough change in a separate PR as long as we keep the new interface consistent.

let%bind.Deferred cwd = Unix.getcwd () in
Malleable_error.return (Util.run_cmd_exn cwd "sh" [ "-c"; docker_cmd ])

let start ~fresh_state node : unit Malleable_error.t =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the stuff with the /stop /start puppeteer scripts was kind of a hacky solution we had that mostly had specifically to do with problems with the GCP deployment. mb @nholland94 can give more background about the gory details. i think we should avoid this if we have time. ideally, it would be best for the local engine to just figure out some docker command and make docker shut the service down and start it back up again, or something like that.

Copy link
Contributor Author

@MartinMinkov MartinMinkov Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do agree that a better solution would be preferable but, I think the puppeteer scripts are actually extremely useful for getting this started. I'm not sure if we have plans to refactor the puppeteer scripts out of the cloud engine but being able to start all node containers via start and stop commands is really useful. I don't think there is a way to do this natively in docker-compose unless I missed something :/

Added support for archive nodes in the local engine. If archive nodes
are specified in the test plan, the test executive will download the
archive schema to the working directory to use the schema as a bind
mount for the postgres container. The archive node will then connect to
the postgres container as usual.
@MartinMinkov MartinMinkov linked an issue Jun 29, 2021 that may be closed by this pull request
Renamed node_config to docker_node_config and refactored some of the
modules to have more static references baked in. Additionally refactored
mina_docker to use these static references instead of string values in
many places.
@MartinMinkov MartinMinkov changed the title [Local Test Framework] [WIP] - Docker Network Config [Local Test Framework] - Docker Network Config Jul 7, 2021
MartinMinkov and others added 3 commits July 15, 2021 09:55
Added functionality for GraphQL networking to the docker containers.
This work was mostly just adding the '--insecure-rest-server' flag to
the nodes and opening up the corresponding rest port within the
docker-compose file.

To allow communication to the docker network from localhost, we map
ports on the host starting at 7000 to port 3085 to the node which is
used for GraphQL communication.
@MartinMinkov MartinMinkov requested a review from a team as a code owner December 21, 2021 13:14
@MartinMinkov
Copy link
Contributor Author

Moved too #14648

@MartinMinkov MartinMinkov deleted the feature/local-engine-network-config branch January 11, 2024 02:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants