Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial draft of local test framework RFC #8896

Closed
wants to merge 2 commits into from

Conversation

MartinMinkov
Copy link
Contributor

Proposes a containerization-focused solution on the architecture and overall workflow of the local test framework.

@MartinMinkov MartinMinkov linked an issue May 17, 2021 that may be closed by this pull request
@nholland94 nholland94 changed the base branch from develop to compatible May 17, 2021 20:30
Copy link
Member

@nholland94 nholland94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks great, with the exception of the approach for streaming logs back to the test executive. I think we need to think of a more robust solution upfront for this, since our goal here is to run local networks with snarks turned off (see my other comment for details of why this is relevant).


# Unresolved Questions

- Is compiling a docker-compose file the right approach for scheduling the containers? The nice thing about using a docker-compose file is that all network management should be automatic.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this is the right way to go, especially for the network management bit.


- Is using a different service for each type of node the best effective approach? Would it be better to launch all nodes under the same service in the docker-compose file?

- Is polling each service and then aggregating those logs the best approach? Would it be better to do filtering before aggregating?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the polling logs in the local integration test engine is not the correct approach.

We do a pull-based approach for the cloud engine, and that works out fine for us right now because we pre-filter the logs (so the volume we handle per poll is very low), but more importantly, because we are only running networks with snarks enabled. When snarks are enabled, the networks move pretty slow, so small delays in the order of 5-10 seconds are not a problem.

However, with the local engine, the goal is the run snarkless networks, which move extremely quickly by comparison. In that world, a 5-10 second delay in logs could break the test. Furthermore, the log volume per second of a snarkless network is massive by comparison due to the speed the network operates at. Because of this, I think we both need to pre-filter the logs in the docker containers (to bring the log volume the single-threaded test_executive needs to handle down to a fraction of the size), and we need to be reading those logs off of a pipe in a more push-based fashion.

Another advantage of using a push-based pipe approach here is that you don't need to deal with re-reading logs you already handled, which appears to be a problem you would need to solve with the docker swarm logging interface. You might be able to get by with asking docker swarm to only give you logs that were recorded since the last time you asked for logs, but you have some risk of missing logs in that approach and are relying on docker swarm's performance to quickly perform that query on it's log cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed up with a restructuring of the proposed polling solution to use the push-based pipe approach. 👍


- Is polling each service and then aggregating those logs the best approach? Would it be better to do filtering before aggregating?

- Does this plan capture the overall direction we want the local testing framework to go?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


The new local testing engine must implement all existing features which include:

- Starting/Stopping nodes dynamically
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to mention as a note here: If you continue using the puppeteer container (which I recommend), then this is very easy to do. You can peek at how this works in the cloud engine, but starting/stopping a puppeteered node is as simple as running a shell script in the container which communicates with the puppeteer process.

A good point was made about restructuring the proposed logging
architecture. Instead of polling on an interval with Docker CLI which
can lead to latency issues, we can redirect all stdout to a pipe shared
with the test executive. This change reflects the following changes in
the rfc to address that feedback.
@MartinMinkov
Copy link
Contributor Author

Moved too o1-labs/rfcs#40

@MartinMinkov MartinMinkov deleted the rfc/local-test-framework branch January 11, 2024 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Local Execution Engine RFC
2 participants