From e8393d59c2438ee26097f76b36f0b5d9d5ece5ca Mon Sep 17 00:00:00 2001
From: Jeffrey Shih <34355042+unityjeffrey@users.noreply.github.com>
Date: Tue, 4 Sep 2018 21:16:20 -0700
Subject: [PATCH] Documentation 0.5 Release Check List (Part 1) (#1154)
---
CODE_OF_CONDUCT.md | 5 +-
CONTRIBUTING.md | 60 +++---
MLAgentsSDK/README.md | 1 +
README.md | 94 ++++-----
docs/API-Reference.md | 2 +-
docs/Background-TensorFlow.md | 2 +-
docs/Basic-Guide.md | 32 +--
docs/FAQ.md | 18 +-
docs/Feature-Memory.md | 2 +-
docs/Feature-Monitor.md | 2 +-
docs/Getting-Started-with-Balance-Ball.md | 96 ++++-----
docs/Glossary.md | 4 +-
docs/Installation-Windows.md | 4 +-
docs/Installation.md | 29 +--
docs/Learning-Environment-Best-Practices.md | 4 +-
docs/Learning-Environment-Create-New.md | 77 ++++----
docs/Learning-Environment-Design-Academy.md | 6 +-
docs/Learning-Environment-Design-Agents.md | 147 +++++++-------
docs/Learning-Environment-Design-Brains.md | 42 ++--
...ronment-Design-External-Internal-Brains.md | 26 +--
...ing-Environment-Design-Heuristic-Brains.md | 14 +-
...arning-Environment-Design-Player-Brains.md | 24 +--
docs/Learning-Environment-Design.md | 82 ++++----
docs/Learning-Environment-Examples.md | 72 +++----
docs/Learning-Environment-Executable.md | 19 +-
docs/Limitations.md | 4 +-
docs/ML-Agents-Overview.md | 24 +--
docs/Migrating.md | 19 +-
docs/Python-API.md | 149 ++++++++++++++
docs/Readme.md | 4 +-
docs/Training-Curriculum-Learning.md | 8 +-
docs/Training-Imitation-Learning.md | 24 +--
docs/Training-ML-Agents.md | 13 +-
docs/Training-PPO.md | 2 +-
docs/Training-on-Amazon-Web-Service.md | 2 +-
docs/Training-on-Microsoft-Azure.md | 15 +-
docs/Using-TensorFlow-Sharp-in-Unity.md | 8 +-
docs/Using-Tensorboard.md | 8 +-
.../docs/Getting-Started-with-Balance-Ball.md | 26 +--
docs/localized/zh-CN/docs/Installation.md | 4 +-
.../docs/Learning-Environment-Create-New.md | 4 +-
.../zh-CN/docs/Learning-Environment-Design.md | 14 +-
.../docs/Learning-Environment-Examples.md | 42 ++--
.../zh-CN/docs/ML-Agents-Overview.md | 2 +-
gym-unity/{Readme.md => README.md} | 73 +++++--
ml-agents/README.md | 183 ++----------------
46 files changed, 775 insertions(+), 717 deletions(-)
create mode 100644 MLAgentsSDK/README.md
create mode 100644 docs/Python-API.md
rename gym-unity/{Readme.md => README.md} (58%)
diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
index f16f58d884..24853b18df 100644
--- a/CODE_OF_CONDUCT.md
+++ b/CODE_OF_CONDUCT.md
@@ -67,7 +67,8 @@ members of the project's leadership.
## Attribution
-This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
-available at https://www.contributor-covenant.org/version/1/4/code-of-conduct/
+This Code of Conduct is adapted from the [Contributor Covenant][homepage],
+version 1.4, available at
+https://www.contributor-covenant.org/version/1/4/code-of-conduct/
[homepage]: https://www.contributor-covenant.org
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 9372673ac2..2f1b5f5f16 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -1,54 +1,56 @@
# Contribution Guidelines
-Thank you for your interest in contributing to the ML-Agents toolkit! We are incredibly
-excited to see how members of our community will use and extend the ML-Agents toolkit.
-To facilitate your contributions, we've outlined a brief set of guidelines
-to ensure that your extensions can be easily integrated.
+Thank you for your interest in contributing to the ML-Agents toolkit! We are
+incredibly excited to see how members of our community will use and extend the
+ML-Agents toolkit. To facilitate your contributions, we've outlined a brief set
+of guidelines to ensure that your extensions can be easily integrated.
-### Communication
+## Communication
-First, please read through our [code of conduct](CODE_OF_CONDUCT.md),
-as we expect all our contributors to follow it.
+First, please read through our [code of conduct](CODE_OF_CONDUCT.md), as we
+expect all our contributors to follow it.
-Second, before starting on a project that you intend to contribute
-to the ML-Agents toolkit (whether environments or modifications to the codebase),
-we **strongly** recommend posting on our
-[Issues page](https://github.com/Unity-Technologies/ml-agents/issues) and
-briefly outlining the changes you plan to make. This will enable us to provide
-some context that may be helpful for you. This could range from advice and
-feedback on how to optimally perform your changes or reasons for not doing it.
+Second, before starting on a project that you intend to contribute to the
+ML-Agents toolkit (whether environments or modifications to the codebase), we
+**strongly** recommend posting on our
+[Issues page](https://github.com/Unity-Technologies/ml-agents/issues)
+and briefly outlining the changes you plan to make. This will enable us to
+provide some context that may be helpful for you. This could range from advice
+and feedback on how to optimally perform your changes or reasons for not doing
+it.
Lastly, if you're looking for input on what to contribute, feel free to
reach out to us directly at ml-agents@unity3d.com and/or browse the GitHub
issues with the `contributions welcome` label.
-### Git Branches
+## Git Branches
-Starting with v0.3, we adopted the
+Starting with v0.3, we adopted the
[Gitflow Workflow](http://nvie.com/posts/a-successful-git-branching-model/).
-Consequently, the `master` branch corresponds to the latest release of
+Consequently, the `master` branch corresponds to the latest release of
the project, while the `develop` branch corresponds to the most recent, stable,
version of the project.
Thus, when adding to the project, **please branch off `develop`**
and make sure that your Pull Request (PR) contains the following:
+
* Detailed description of the changes performed
-* Corresponding changes to documentation, unit tests and sample environments
-(if applicable)
+* Corresponding changes to documentation, unit tests and sample environments (if
+ applicable)
* Summary of the tests performed to validate your changes
* Issue numbers that the PR resolves (if any)
-### Environments
+## Environments
-We are also actively open to adding community contributed environments as
-examples, as long as they are small, simple, demonstrate a unique feature of
-the platform, and provide a unique non-trivial challenge to modern
+We are also actively open to adding community contributed environments as
+examples, as long as they are small, simple, demonstrate a unique feature of
+the platform, and provide a unique non-trivial challenge to modern
machine learning algorithms. Feel free to submit these environments with a
-PR explaining the nature of the environment and task.
+PR explaining the nature of the environment and task.
-### Style Guide
+## Style Guide
-When performing changes to the codebase, ensure that you follow the style
-guide of the file you're modifying. For Python, we follow
-[PEP 8](https://www.python.org/dev/peps/pep-0008/). For C#, we will soon be
-adding a formal style guide for our repository.
+When performing changes to the codebase, ensure that you follow the style guide
+of the file you're modifying. For Python, we follow
+[PEP 8](https://www.python.org/dev/peps/pep-0008/).
+For C#, we will soon be adding a formal style guide for our repository.
diff --git a/MLAgentsSDK/README.md b/MLAgentsSDK/README.md
new file mode 100644
index 0000000000..3919d98f18
--- /dev/null
+++ b/MLAgentsSDK/README.md
@@ -0,0 +1 @@
+# ML-Agents SDK
\ No newline at end of file
diff --git a/README.md b/README.md
index b8b235e735..8eac0b4f6c 100755
--- a/README.md
+++ b/README.md
@@ -4,75 +4,81 @@
# Unity ML-Agents Toolkit (Beta)
-**The Unity Machine Learning Agents Toolkit** (ML-Agents) is an open-source Unity plugin
-that enables games and simulations to serve as environments for training
-intelligent agents. Agents can be trained using reinforcement learning,
-imitation learning, neuroevolution, or other machine learning methods through
-a simple-to-use Python API. We also provide implementations (based on
-TensorFlow) of state-of-the-art algorithms to enable game developers
-and hobbyists to easily train intelligent agents for 2D, 3D and VR/AR games.
-These trained agents can be used for multiple purposes, including
-controlling NPC behavior (in a variety of settings such as multi-agent and
-adversarial), automated testing of game builds and evaluating different game
-design decisions pre-release. The ML-Agents toolkit is mutually beneficial for both game
-developers and AI researchers as it provides a central platform where advances
-in AI can be evaluated on Unity’s rich environments and then made accessible
-to the wider research and game developer communities.
+**The Unity Machine Learning Agents Toolkit** (ML-Agents) is an open-source
+Unity plugin that enables games and simulations to serve as environments for
+training intelligent agents. Agents can be trained using reinforcement learning,
+imitation learning, neuroevolution, or other machine learning methods through a
+simple-to-use Python API. We also provide implementations (based on TensorFlow)
+of state-of-the-art algorithms to enable game developers and hobbyists to easily
+train intelligent agents for 2D, 3D and VR/AR games. These trained agents can be
+used for multiple purposes, including controlling NPC behavior (in a variety of
+settings such as multi-agent and adversarial), automated testing of game builds
+and evaluating different game design decisions pre-release. The ML-Agents
+toolkit is mutually beneficial for both game developers and AI researchers as it
+provides a central platform where advances in AI can be evaluated on Unity’s
+rich environments and then made accessible to the wider research and game
+developer communities.
## Features
+
* Unity environment control from Python
* 10+ sample Unity environments
* Support for multiple environment configurations and training scenarios
-* Train memory-enhanced Agents using deep reinforcement learning
+* Train memory-enhanced agents using deep reinforcement learning
* Easily definable Curriculum Learning scenarios
-* Broadcasting of Agent behavior for supervised learning
+* Broadcasting of agent behavior for supervised learning
* Built-in support for Imitation Learning
-* Flexible Agent control with On Demand Decision Making
+* Flexible agent control with On Demand Decision Making
* Visualizing network outputs within the environment
* Simplified set-up with Docker
## Documentation
-* For more information, in addition to installation and usage
-instructions, see our [documentation home](docs/Readme.md).
-* If you have
-used a version of the ML-Agents toolkit prior to v0.4, we strongly recommend
-our [guide on migrating from earlier versions](docs/Migrating.md).
+* For more information, in addition to installation and usage instructions, see
+ our [documentation home](docs/Readme.md).
+* If you have used a version of the ML-Agents toolkit prior to v0.4, we strongly
+ recommend our [guide on migrating from earlier versions](docs/Migrating.md).
## References
We have published a series of blog posts that are relevant for ML-Agents:
-- Overviewing reinforcement learning concepts
-([multi-armed bandit](https://blogs.unity3d.com/2017/06/26/unity-ai-themed-blog-entries/)
-and [Q-learning](https://blogs.unity3d.com/2017/08/22/unity-ai-reinforcement-learning-with-q-learning/))
-- [Using Machine Learning Agents in a real game: a beginner’s guide](https://blogs.unity3d.com/2017/12/11/using-machine-learning-agents-in-a-real-game-a-beginners-guide/)
-- [Post](https://blogs.unity3d.com/2018/02/28/introducing-the-winners-of-the-first-ml-agents-challenge/) announcing the winners of our
-[first ML-Agents Challenge](https://connect.unity.com/challenges/ml-agents-1)
-- [Post](https://blogs.unity3d.com/2018/01/23/designing-safer-cities-through-simulations/)
-overviewing how Unity can be leveraged as a simulator to design safer cities.
+
+* Overviewing reinforcement learning concepts
+ ([multi-armed bandit](https://blogs.unity3d.com/2017/06/26/unity-ai-themed-blog-entries/)
+ and
+ [Q-learning](https://blogs.unity3d.com/2017/08/22/unity-ai-reinforcement-learning-with-q-learning/))
+* [Using Machine Learning Agents in a real game: a beginner’s guide](https://blogs.unity3d.com/2017/12/11/using-machine-learning-agents-in-a-real-game-a-beginners-guide/)
+* [Post](https://blogs.unity3d.com/2018/02/28/introducing-the-winners-of-the-first-ml-agents-challenge/)
+ announcing the winners of our
+ [first ML-Agents Challenge](https://connect.unity.com/challenges/ml-agents-1)
+* [Post](https://blogs.unity3d.com/2018/01/23/designing-safer-cities-through-simulations/)
+ overviewing how Unity can be leveraged as a simulator to design safer cities.
In addition to our own documentation, here are some additional, relevant articles:
-- [Unity AI - Unity 3D Artificial Intelligence](https://www.youtube.com/watch?v=bqsfkGbBU6k)
-- [A Game Developer Learns Machine Learning](https://mikecann.co.uk/machine-learning/a-game-developer-learns-machine-learning-intent/)
-- [Explore Unity Technologies ML-Agents Exclusively on Intel Architecture](https://software.intel.com/en-us/articles/explore-unity-technologies-ml-agents-exclusively-on-intel-architecture)
+
+* [Unity AI - Unity 3D Artificial Intelligence](https://www.youtube.com/watch?v=bqsfkGbBU6k)
+* [A Game Developer Learns Machine Learning](https://mikecann.co.uk/machine-learning/a-game-developer-learns-machine-learning-intent/)
+* [Explore Unity Technologies ML-Agents Exclusively on Intel Architecture](https://software.intel.com/en-us/articles/explore-unity-technologies-ml-agents-exclusively-on-intel-architecture)
## Community and Feedback
-The ML-Agents toolkit is an open-source project and we encourage and welcome contributions.
-If you wish to contribute, be sure to review our
-[contribution guidelines](CONTRIBUTING.md) and
+The ML-Agents toolkit is an open-source project and we encourage and welcome
+contributions. If you wish to contribute, be sure to review our
+[contribution guidelines](CONTRIBUTING.md) and
[code of conduct](CODE_OF_CONDUCT.md).
You can connect with us and the broader community
through Unity Connect and GitHub:
+
* Join our
-[Unity Machine Learning Channel](https://connect.unity.com/messages/c/035fba4f88400000)
-to connect with others using the ML-Agents toolkit and Unity developers enthusiastic
-about machine learning. We use that channel to surface updates
-regarding the ML-Agents toolkit (and, more broadly, machine learning in games).
-* If you run into any problems using the ML-Agents toolkit,
-[submit an issue](https://github.com/Unity-Technologies/ml-agents/issues) and
-make sure to include as much detail as possible.
+ [Unity Machine Learning Channel](https://connect.unity.com/messages/c/035fba4f88400000)
+ to connect with others using the ML-Agents toolkit and Unity developers
+ enthusiastic about machine learning. We use that channel to surface updates
+ regarding the ML-Agents toolkit (and, more broadly, machine learning in
+ games).
+* If you run into any problems using the ML-Agents toolkit,
+ [submit an issue](https://github.com/Unity-Technologies/ml-agents/issues) and
+ make sure to include as much detail as possible.
For any other questions or feedback, connect directly with the ML-Agents
team at ml-agents@unity3d.com.
@@ -86,7 +92,7 @@ of the documentation to one language (Chinese), but we hope to continue
translating more pages and to other languages. Consequently,
we welcome any enhancements and improvements from the community.
-- [Chinese](docs/localized/zh-CN/)
+* [Chinese](docs/localized/zh-CN/)
## License
diff --git a/docs/API-Reference.md b/docs/API-Reference.md
index ec9101ae66..49cb5a4c0a 100644
--- a/docs/API-Reference.md
+++ b/docs/API-Reference.md
@@ -1,7 +1,7 @@
# API Reference
Our developer-facing C# classes (Academy, Agent, Decision and Monitor) have been
-documented to be compatabile with
+documented to be compatible with
[Doxygen](http://www.stack.nl/~dimitri/doxygen/) for auto-generating HTML
documentation.
diff --git a/docs/Background-TensorFlow.md b/docs/Background-TensorFlow.md
index af5104bc7a..ce34d0143d 100644
--- a/docs/Background-TensorFlow.md
+++ b/docs/Background-TensorFlow.md
@@ -16,7 +16,7 @@ to TensorFlow-related tools that we leverage within the ML-Agents toolkit.
performing computations using data flow graphs, the underlying representation of
deep learning models. It facilitates training and inference on CPUs and GPUs in
a desktop, server, or mobile device. Within the ML-Agents toolkit, when you
-train the behavior of an Agent, the output is a TensorFlow model (.bytes) file
+train the behavior of an agent, the output is a TensorFlow model (.bytes) file
that you can then embed within an Internal Brain. Unless you implement a new
algorithm, the use of TensorFlow is mostly abstracted away and behind the
scenes.
diff --git a/docs/Basic-Guide.md b/docs/Basic-Guide.md
index 654dfdc55b..94e5b55fd9 100644
--- a/docs/Basic-Guide.md
+++ b/docs/Basic-Guide.md
@@ -1,6 +1,6 @@
# Basic Guide
-This guide will show you how to use a pretrained model in an example Unity
+This guide will show you how to use a pre-trained model in an example Unity
environment, and show you how to train the model yourself.
If you are not familiar with the [Unity Engine](https://unity3d.com/unity), we
@@ -13,7 +13,7 @@ the basic concepts of Unity.
In order to use the ML-Agents toolkit within Unity, you need to change some
Unity settings first. Also [TensorFlowSharp
plugin](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage)
-is needed for you to use pretrained model within Unity, which is based on the
+is needed for you to use pre-trained model within Unity, which is based on the
[TensorFlowSharp repo](https://github.com/migueldeicaza/TensorFlowSharp).
1. Launch Unity
@@ -70,14 +70,14 @@ if you want to [use an executable](Learning-Environment-Executable.md) or to
`None` if you want to interact with the current scene in the Unity Editor.
More information and documentation is provided in the
-[Python API](../ml-agents/README.md) page.
+[Python API](Python-API.md) page.
## Training the Brain with Reinforcement Learning
### Setting the Brain to External
Since we are going to build this environment to conduct training, we need to set
-the brain used by the agents to **External**. This allows the agents to
+the Brain used by the Agents to **External**. This allows the Agents to
communicate with the external training process when making their decisions.
1. In the **Scene** window, click the triangle icon next to the Ball3DAcademy
@@ -90,17 +90,23 @@ communicate with the external training process when making their decisions.
### Training the environment
1. Open a command or terminal window.
-2. Nagivate to the folder where you installed the ML-Agents toolkit.
+2. Navigate to the folder where you cloned the ML-Agents toolkit repository.
+ **Note**: If you followed the default [installation](Installation.md), then
+ you should be able to run `mlagents-learn` from any directory.
3. Run `mlagents-learn --run-id= --train`
- Where:
+ where:
- `` is the relative or absolute filepath of the
- trainer configuration. The defaults used by environments in the ML-Agents
- SDK can be found in `config/trainer_config.yaml`.
+ trainer configuration. The defaults used by example environments included
+ in `MLAgentsSDK` can be found in `config/trainer_config.yaml`.
- `` is a string used to separate the results of different
training runs
- - And the `--train` tells `mlagents-learn` to run a training session (rather
+ - `--train` tells `mlagents-learn` to run a training session (rather
than inference)
-4. When the message _"Start training by pressing the Play button in the Unity
+4. If you cloned the ML-Agents repo, then you can simply run
+ ```sh
+ mlagents-learn config/trainer_config.yaml --run-id=firstRun --train
+ ```
+5. When the message _"Start training by pressing the Play button in the Unity
Editor"_ is displayed on the screen, you can press the :arrow_forward: button
in Unity to start training in the Editor.
@@ -143,6 +149,7 @@ INFO:mlagents.learn:{'--curriculum': 'None',
'--train': True,
'--worker-id': '0',
'': 'config/trainer_config.yaml'}
+INFO:mlagents.envs:Start training by pressing the Play button in the Unity Editor.
```
**Note**: If you're using Anaconda, don't forget to activate the ml-agents
@@ -152,7 +159,6 @@ If `mlagents-learn` runs correctly and starts training, you should see something
like this:
```console
-INFO:mlagents.envs:Start training by pressing the Play button in the Unity Editor.
INFO:mlagents.envs:
'Ball3DAcademy' started successfully!
Unity Academy name: Ball3DAcademy
@@ -208,7 +214,7 @@ You can press Ctrl+C to stop the training, and your trained model will be at
`models//editor__.bytes` where
`` is the name of the Academy GameObject in the current scene.
This file corresponds to your model's latest checkpoint. You can now embed this
-trained model into your internal brain by following the steps below, which is
+trained model into your Internal Brain by following the steps below, which is
similar to the steps described
[above](#play-an-example-environment-using-pretrained-model).
@@ -229,7 +235,7 @@ similar to the steps described
page.
- For a more detailed walk-through of our 3D Balance Ball environment, check out
the [Getting Started](Getting-Started-with-Balance-Ball.md) page.
-- For a "Hello World" introduction to creating your own learning environment,
+- For a "Hello World" introduction to creating your own Learning Environment,
check out the [Making a New Learning
Environment](Learning-Environment-Create-New.md) page.
- For a series of Youtube video tutorials, checkout the
diff --git a/docs/FAQ.md b/docs/FAQ.md
index 1a1b0d5710..1097037a03 100644
--- a/docs/FAQ.md
+++ b/docs/FAQ.md
@@ -15,39 +15,39 @@ Unity](Installation.md#setting-up-ml-agent-within-unity) for solution.
## TensorFlowSharp flag not turned on
-If you have already imported the TensorFlowSharp plugin, but havn't set
+If you have already imported the TensorFlowSharp plugin, but haven't set
ENABLE_TENSORFLOW flag for your scripting define symbols, you will see the
following error message:
```console
-You need to install and enable the TensorFlowSharp plugin in order to use the internal brain.
+You need to install and enable the TensorFlowSharp plugin in order to use the Internal Brain.
```
This error message occurs because the TensorFlowSharp plugin won't be usage
without the ENABLE_TENSORFLOW flag, refer to [Setting Up The ML-Agents Toolkit
Within Unity](Installation.md#setting-up-ml-agent-within-unity) for solution.
-## Tensorflow epsilon placeholder error
+## TensorFlow epsilon placeholder error
-If you have a graph placeholder set in the internal Brain inspector that is not
+If you have a graph placeholder set in the Internal Brain inspector that is not
present in the TensorFlow graph, you will see some error like this:
```console
-UnityAgentsException: One of the Tensorflow placeholder could not be found. In brain , there are no FloatingPoint placeholder named .
+UnityAgentsException: One of the TensorFlow placeholder could not be found. In brain , there are no FloatingPoint placeholder named .
```
Solution: Go to all of your Brain object, find `Graph placeholders` and change
its `size` to 0 to remove the `epsilon` placeholder.
-Similarly, if you have a graph scope set in the internal Brain inspector that is
+Similarly, if you have a graph scope set in the Internal Brain inspector that is
not correctly set, you will see some error like this:
```console
UnityAgentsException: The node /action could not be found. Please make sure the graphScope / is correct
```
-Solution: Make sure your Graph Scope field matches the corresponding brain
-object name in your Hierachy Inspector when there is multiple brain.
+Solution: Make sure your Graph Scope field matches the corresponding Brain
+object name in your Hierarchy Inspector when there are multiple Brains.
## Environment Permission Error
@@ -101,7 +101,7 @@ UnityEnvironment(file_name=filename, worker_id=X)
## Mean reward : nan
If you receive a message `Mean reward : nan` when attempting to train a model
-using PPO, this is due to the episodes of the learning environment not
+using PPO, this is due to the episodes of the Learning Environment not
terminating. In order to address this, set `Max Steps` for either the Academy or
Agents within the Scene Inspector to a value greater than 0. Alternatively, it
is possible to manually set `done` conditions for episodes from within scripts
diff --git a/docs/Feature-Memory.md b/docs/Feature-Memory.md
index 595d46273c..79b8eead50 100644
--- a/docs/Feature-Memory.md
+++ b/docs/Feature-Memory.md
@@ -1,4 +1,4 @@
-# Memory-enhanced Agents using Recurrent Neural Networks
+# Memory-enhanced agents using Recurrent Neural Networks
## What are memories for
diff --git a/docs/Feature-Monitor.md b/docs/Feature-Monitor.md
index 78bf6eac26..b87d22ebe0 100644
--- a/docs/Feature-Monitor.md
+++ b/docs/Feature-Monitor.md
@@ -7,7 +7,7 @@ process within a Unity scene.
You can track many different things both related and unrelated to the agents
themselves. By default, the Monitor is only active in the *inference* phase, so
-not during training. To change this behaviour, you can activate or deactivate it
+not during training. To change this behavior, you can activate or deactivate it
by calling `SetActive(boolean)`. For example to also show the monitor during
training, you can call it in the `InitializeAcademy()` method of your `Academy`:
diff --git a/docs/Getting-Started-with-Balance-Ball.md b/docs/Getting-Started-with-Balance-Ball.md
index 0dcd19840c..eaba166aa0 100644
--- a/docs/Getting-Started-with-Balance-Ball.md
+++ b/docs/Getting-Started-with-Balance-Ball.md
@@ -2,7 +2,7 @@
This tutorial walks through the end-to-end process of opening a ML-Agents
toolkit example environment in Unity, building the Unity executable, training an
-agent in it, and finally embedding the trained model into the Unity environment.
+Agent in it, and finally embedding the trained model into the Unity environment.
The ML-Agents toolkit includes a number of [example
environments](Learning-Environment-Examples.md) which you can examine to help
@@ -16,7 +16,7 @@ and build the example environments.
This walk-through uses the **3D Balance Ball** environment. 3D Balance Ball
contains a number of platforms and balls (which are all copies of each other).
Each platform tries to keep its ball from falling by rotating either
-horizontally or vertically. In this environment, a platform is an **agent** that
+horizontally or vertically. In this environment, a platform is an **Agent** that
receives a reward for every step that it balances the ball. An agent is also
penalized with a negative reward for dropping the ball. The goal of the training
process is to have the platforms learn to never drop the ball.
@@ -45,7 +45,7 @@ window. The Inspector shows every component on a GameObject.
The first thing you may notice after opening the 3D Balance Ball scene is that
it contains not one, but several platforms. Each platform in the scene is an
-independent agent, but they all share the same brain. 3D Balance Ball does this
+independent agent, but they all share the same Brain. 3D Balance Ball does this
to speed up training since all twelve agents contribute to training in parallel.
### Academy
@@ -56,7 +56,7 @@ properties that control how the environment works. For example, the **Training**
and **Inference Configuration** properties set the graphics and timescale
properties for the Unity application. The Academy uses the **Training
Configuration** during training and the **Inference Configuration** when not
-training. (*Inference* means that the agent is using a trained model or
+training. (*Inference* means that the Agent is using a trained model or
heuristics or direct control — in other words, whenever **not** training.)
Typically, you set low graphics quality and a high time scale for the **Training
configuration** and a high graphics quality and the timescale to `1.0` for the
@@ -73,32 +73,32 @@ three functions you can implement, though they are all optional:
* Academy.InitializeAcademy() — Called once when the environment is launched.
* Academy.AcademyStep() — Called at every simulation step before
- Agent.AgentAction() (and after the agents collect their observations).
+ agent.AgentAction() (and after the Agents collect their observations).
* Academy.AcademyReset() — Called when the Academy starts or restarts the
simulation (including the first time).
-The 3D Balance Ball environment does not use these functions — each agent resets
+The 3D Balance Ball environment does not use these functions — each Agent resets
itself when needed — but many environments do use these functions to control the
-environment around the agents.
+environment around the Agents.
### Brain
The Ball3DBrain GameObject in the scene, which contains a Brain component, is a
child of the Academy object. (All Brain objects in a scene must be children of
-the Academy.) All the agents in the 3D Balance Ball environment use the same
-Brain instance. A Brain doesn't store any information about an agent, it just
-routes the agent's collected observations to the decision making process and
-returns the chosen action to the agent. Thus, all agents can share the same
-brain, but act independently. The Brain settings tell you quite a bit about how
-an agent works.
-
-The **Brain Type** determines how an agent makes its decisions. The **External**
+the Academy.) All the Agents in the 3D Balance Ball environment use the same
+Brain instance. A Brain doesn't store any information about an Agent, it just
+routes the Agent's collected observations to the decision making process and
+returns the chosen action to the Agent. Thus, all Agents can share the same
+Brain, but act independently. The Brain settings tell you quite a bit about how
+an Agent works.
+
+The **Brain Type** determines how an Agent makes its decisions. The **External**
and **Internal** types work together — use **External** when training your
-agents; use **Internal** when using the trained model. The **Heuristic** brain
-allows you to hand-code the agent's logic by extending the Decision class.
-Finally, the **Player** brain lets you map keyboard commands to actions, which
+Agents; use **Internal** when using the trained model. The **Heuristic** Brain
+allows you to hand-code the Agent's logic by extending the Decision class.
+Finally, the **Player** Brain lets you map keyboard commands to actions, which
can be useful when testing your agents and environment. If none of these types
-of brains do what you need, you can implement your own CoreBrain to create your
+of Brains do what you need, you can implement your own CoreBrain to create your
own type.
In this tutorial, you will set the **Brain Type** to **External** for training;
@@ -113,22 +113,22 @@ contain relevant information for the agent to make decisions.
The Brain instance used in the 3D Balance Ball example uses the **Continuous**
vector observation space with a **State Size** of 8. This means that the feature
-vector containing the agent's observations contains eight elements: the `x` and
+vector containing the Agent's observations contains eight elements: the `x` and
`z` components of the platform's rotation and the `x`, `y`, and `z` components
of the ball's relative position and velocity. (The observation values are
-defined in the agent's `CollectObservations()` function.)
+defined in the Agent's `CollectObservations()` function.)
#### Vector Action Space
-An agent is given instructions from the brain in the form of *actions*.
+An Agent is given instructions from the Brain in the form of *actions*.
ML-Agents toolkit classifies actions into two types: the **Continuous** vector
action space is a vector of numbers that can vary continuously. What each
-element of the vector means is defined by the agent logic (the PPO training
+element of the vector means is defined by the Agent logic (the PPO training
process just learns what values are better given particular state observations
based on the rewards received when it tries different values). For example, an
-element might represent a force or torque applied to a `RigidBody` in the agent.
+element might represent a force or torque applied to a `Rigidbody` in the Agent.
The **Discrete** action vector space defines its actions as tables. An action
-given to the agent is an array of indeces into tables.
+given to the Agent is an array of indices into tables.
The 3D Balance Ball example is programmed to use both types of vector action
space. You can try training with both settings to observe whether there is a
@@ -142,39 +142,39 @@ the 3D Balance Ball environment, the Agent components are placed on the twelve
Platform GameObjects. The base Agent object has a few properties that affect its
behavior:
-* **Brain** — Every agent must have a Brain. The brain determines how an agent
- makes decisions. All the agents in the 3D Balance Ball scene share the same
- brain.
-* **Visual Observations** — Defines any Camera objects used by the agent to
+* **Brain** — Every Agent must have a Brain. The Brain determines how an Agent
+ makes decisions. All the Agents in the 3D Balance Ball scene share the same
+ Brain.
+* **Visual Observations** — Defines any Camera objects used by the Agent to
observe its environment. 3D Balance Ball does not use camera observations.
-* **Max Step** — Defines how many simulation steps can occur before the agent
- decides it is done. In 3D Balance Ball, an agent restarts after 5000 steps.
-* **Reset On Done** — Defines whether an agent starts over when it is finished.
- 3D Balance Ball sets this true so that the agent restarts after reaching the
+* **Max Step** — Defines how many simulation steps can occur before the Agent
+ decides it is done. In 3D Balance Ball, an Agent restarts after 5000 steps.
+* **Reset On Done** — Defines whether an Agent starts over when it is finished.
+ 3D Balance Ball sets this true so that the Agent restarts after reaching the
**Max Step** count or after dropping the ball.
-Perhaps the more interesting aspect of an agent is the Agent subclass
-implementation. When you create an agent, you must extend the base Agent class.
+Perhaps the more interesting aspect of an agents is the Agent subclass
+implementation. When you create an Agent, you must extend the base Agent class.
The Ball3DAgent subclass defines the following methods:
-* Agent.AgentReset() — Called when the Agent resets, including at the beginning
+* agent.AgentReset() — Called when the Agent resets, including at the beginning
of a session. The Ball3DAgent class uses the reset function to reset the
platform and ball. The function randomizes the reset values so that the
training generalizes to more than a specific starting position and platform
attitude.
-* Agent.CollectObservations() — Called every simulation step. Responsible for
- collecting the agent's observations of the environment. Since the Brain
- instance assigned to the agent is set to the continuous vector observation
+* agent.CollectObservations() — Called every simulation step. Responsible for
+ collecting the Agent's observations of the environment. Since the Brain
+ instance assigned to the Agent is set to the continuous vector observation
space with a state size of 8, the `CollectObservations()` must call
`AddVectorObs` 8 times.
-* Agent.AgentAction() — Called every simulation step. Receives the action chosen
- by the brain. The Ball3DAgent example handles both the continuous and the
+* agent.AgentAction() — Called every simulation step. Receives the action chosen
+ by the Brain. The Ball3DAgent example handles both the continuous and the
discrete action space types. There isn't actually much difference between the
two state types in this environment — both vector action spaces result in a
small change in platform rotation at each step. The `AgentAction()` function
- assigns a reward to the agent; in this example, an agent receives a small
+ assigns a reward to the Agent; in this example, an Agent receives a small
positive reward for each step it keeps the ball on the platform and a larger,
- negative reward for dropping the ball. An agent is also marked as done when it
+ negative reward for dropping the ball. An Agent is also marked as done when it
drops the ball so that it will reset with a new ball for the next simulation
step.
@@ -193,7 +193,7 @@ has a recent [blog post](https://blog.openai.com/openai-baselines-ppo/)
explaining it.
To train the agents within the Ball Balance environment, we will be using the
-python package. We have provided a convenient script called `mlagents-learn`
+Python package. We have provided a convenient script called `mlagents-learn`
which accepts arguments used to configure both training and inference phases.
We can use `run_id` to identify the experiment and create a folder where the
@@ -219,8 +219,8 @@ environment first.
The `--train` flag tells the ML-Agents toolkit to run in training mode.
**Note**: You can train using an executable rather than the Editor. To do so,
-follow the intructions in [Using an
-Executable](Learning-Environment-Executable.md).
+follow the intructions in
+[Using an Executable](Learning-Environment-Executable.md).
### Observing Training Progress
@@ -264,7 +264,7 @@ From TensorBoard, you will see the summary statistics:
Once the training process completes, and the training process saves the model
(denoted by the `Saved Model` message) you can add it to the Unity project and
-use it with agents having an **Internal** brain type. **Note:** Do not just
+use it with Agents having an **Internal** Brain type. **Note:** Do not just
close the Unity Window once the `Saved Model` message appears. Either wait for
the training process to close the window or press Ctrl+C at the command-line
prompt. If you simply close the window manually, the .bytes file containing the
@@ -285,4 +285,4 @@ Basic Guide page.
To embed the trained model into Unity, follow the later part of [Training the
Brain with Reinforcement
Learning](Basic-Guide.md#training-the-brain-with-reinforcement-learning) section
-of the Basic Buides page.
+of the Basic Guide page.
diff --git a/docs/Glossary.md b/docs/Glossary.md
index 59e5897dca..41524291f6 100644
--- a/docs/Glossary.md
+++ b/docs/Glossary.md
@@ -30,5 +30,5 @@
logic should not be placed here.
* **External Coordinator** - ML-Agents class responsible for communication with
outside processes (in this case, the Python API).
-* **Trainer** - Python class which is responsible for training a given external
- brain. Contains TensorFlow graph which makes decisions for external brain.
+* **Trainer** - Python class which is responsible for training a given External
+ Brain. Contains TensorFlow graph which makes decisions for External Brain.
diff --git a/docs/Installation-Windows.md b/docs/Installation-Windows.md
index 8ee6afde5f..e77e781370 100644
--- a/docs/Installation-Windows.md
+++ b/docs/Installation-Windows.md
@@ -100,7 +100,7 @@ You should see `(ml-agents)` prepended on the last line.
Next, install `tensorflow`. Install this package using `pip` - which is a
package management system used to install Python packages. Latest versions of
-Tensorflow won't work, so you will need to make sure that you install version
+TensorFlow won't work, so you will need to make sure that you install version
1.7.1. In the same Anaconda Prompt, type in the following command _(make sure
you are connected to the internet)_:
@@ -268,7 +268,7 @@ installed. _Please note that case sensitivity matters_.
Next, install `tensorflow-gpu` using `pip`. You'll need version 1.7.1. In an
Anaconda Prompt with the Conda environment ml-agents activated, type in the
-following command to uninstall the tensorflow for cpu and install the tensorflow
+following command to uninstall TensorFlow for cpu and install TensorFlow
for gpu _(make sure you are connected to the internet)_:
```sh
diff --git a/docs/Installation.md b/docs/Installation.md
index 4128238654..ac8adbb0ce 100644
--- a/docs/Installation.md
+++ b/docs/Installation.md
@@ -16,18 +16,22 @@ Build Support_ component when installing Unity.
width="500" border="10" />
-## Clone the Ml-Agents Repository
+## Clone the ML-Agents Toolkit Repository
Once installed, you will want to clone the ML-Agents Toolkit GitHub repository.
git clone https://github.com/Unity-Technologies/ml-agents.git
-The `UnitySDK` directory in this repository contains the Unity Assets to add
-to your projects. The `python` directory contains python packages which provide
-trainers, a python API to interface with Unity, and a package to interface with
-OpenAI Gym.
+The `UnitySDK` subdirectory contains the Unity Assets to add to your projects.
+It also contains many [example environments](Learning-Environment-Examples.md)
+that can be used to help get you familiar with Unity.
-## Install Python (with Dependencies)
+The `ml-agents` subdirectory contains Python packages which provide
+trainers and a Python API to interface with Unity.
+
+The `gym-unity` subdirectory contains a package to interface with OpenAI Gym.
+
+## Install Python and mlagents Package
In order to use ML-Agents toolkit, you need Python 3.6 along with the
dependencies listed in the [requirements file](../ml-agents/requirements.txt).
@@ -51,17 +55,20 @@ guide](Installation-Windows.md) to set up your Python environment.
### Mac and Unix Users
-[Download](https://www.python.org/downloads/) and install Python 3 if you do not
+[Download](https://www.python.org/downloads/) and install Python 3.6 if you do not
already have it.
-If your Python environment doesn't include `pip`, see these
+If your Python environment doesn't include `pip3`, see these
[instructions](https://packaging.python.org/guides/installing-using-linux-tools/#installing-pip-setuptools-wheel-with-linux-package-managers)
on installing it.
-To install dependencies, enter the `ml-agents/` directory and run from
-the command line:
+To install the dependencies and `mlagents` Python package, enter the
+`ml-agents/` subdirectory and run from the command line:
+
+ pip3 install .
- pip install .
+If you installed this correctly, you should be able to run
+`mlagents-learn --help`
## Docker-based Installation
diff --git a/docs/Learning-Environment-Best-Practices.md b/docs/Learning-Environment-Best-Practices.md
index dce01037df..f52b466ac3 100644
--- a/docs/Learning-Environment-Best-Practices.md
+++ b/docs/Learning-Environment-Best-Practices.md
@@ -9,8 +9,8 @@
([learn more here](Training-Curriculum-Learning.md)).
* When possible, it is often helpful to ensure that you can complete the task by
using a Player Brain to control the agent.
-* It is often helpful to make many copies of the agent, and attach the brain to
- be trained to all of these agents. In this way the brain can get more feedback
+* It is often helpful to make many copies of the agent, and attach the Brain to
+ be trained to all of these agents. In this way the Brain can get more feedback
information from all of these agents, which helps it train faster.
## Rewards
diff --git a/docs/Learning-Environment-Create-New.md b/docs/Learning-Environment-Create-New.md
index f15f79db32..841dee398b 100644
--- a/docs/Learning-Environment-Create-New.md
+++ b/docs/Learning-Environment-Create-New.md
@@ -2,7 +2,7 @@
This tutorial walks through the process of creating a Unity Environment. A Unity
Environment is an application built using the Unity Engine which can be used to
-train Reinforcement Learning agents.
+train Reinforcement Learning Agents.

@@ -23,12 +23,12 @@ steps:
methods to update the scene independently of any agents. For example, you can
add, move, or delete agents and other entities in the environment.
3. Add one or more Brain objects to the scene as children of the Academy.
-4. Implement your Agent subclasses. An Agent subclass defines the code an agent
+4. Implement your Agent subclasses. An Agent subclass defines the code an Agent
uses to observe its environment, to carry out assigned actions, and to
calculate the rewards used for reinforcement training. You can also implement
- optional methods to reset the agent when it has finished or failed its task.
+ optional methods to reset the Agent when it has finished or failed its task.
5. Add your Agent subclasses to appropriate GameObjects, typically, the object
- in the scene that represents the agent in the simulation. Each Agent object
+ in the scene that represents the Agent in the simulation. Each Agent object
must be assigned a Brain object.
6. If training, set the Brain type to External and
[run the training process](Training-ML-Agents.md).
@@ -59,8 +59,8 @@ Your Unity **Project** window should contain the following assets:
Next, we will create a very simple scene to act as our ML-Agents environment.
The "physical" components of the environment include a Plane to act as the floor
-for the agent to move around on, a Cube to act as the goal or target for the
-agent to seek, and a Sphere to represent the agent itself.
+for the Agent to move around on, a Cube to act as the goal or target for the
+agent to seek, and a Sphere to represent the Agent itself.
### Create the floor plane
@@ -194,7 +194,7 @@ Then, edit the new `RollerAgent` script:
leave it alone for now.
So far, these are the basic steps that you would use to add ML-Agents to any
-Unity project. Next, we will add the logic that will let our agent learn to roll
+Unity project. Next, we will add the logic that will let our Agent learn to roll
to the cube using reinforcement learning.
In this simple scenario, we don't use the Academy object to control the
@@ -206,8 +206,8 @@ it succeeds or falls trying.
### Initialization and Resetting the Agent
-When the agent reaches its target, it marks itself done and its agent reset
-function moves the target to a random location. In addition, if the agent rolls
+When the Agent reaches its target, it marks itself done and its Agent reset
+function moves the target to a random location. In addition, if the Agent rolls
off the platform, the reset function puts it back onto the floor.
To move the target GameObject, we need a reference to its Transform (which
@@ -215,7 +215,7 @@ stores a GameObject's position, orientation and scale in the 3D world). To get
this reference, add a public field of type `Transform` to the RollerAgent class.
Public fields of a component in Unity get displayed in the Inspector window,
allowing you to choose which GameObject to use as the target in the Unity
-Editor. To reset the agent's velocity (and later to apply force to move the
+Editor. To reset the Agent's velocity (and later to apply force to move the
agent) we need a reference to the Rigidbody component. A
[Rigidbody](https://docs.unity3d.com/ScriptReference/Rigidbody.html) is Unity's
primary element for physics simulation. (See
@@ -244,7 +244,7 @@ public class RollerAgent : Agent
{
if (this.transform.position.y < -1.0)
{
- // The agent fell
+ // The Agent fell
this.transform.position = Vector3.zero;
this.rBody.angularVelocity = Vector3.zero;
this.rBody.velocity = Vector3.zero;
@@ -265,17 +265,17 @@ Next, let's implement the Agent.CollectObservations() function.
### Observing the Environment
The Agent sends the information we collect to the Brain, which uses it to make a
-decision. When you train the agent (or use a trained model), the data is fed
-into a neural network as a feature vector. For an agent to successfully learn a
+decision. When you train the Agent (or use a trained model), the data is fed
+into a neural network as a feature vector. For an Agent to successfully learn a
task, we need to provide the correct information. A good rule of thumb for
deciding what information to collect is to consider what you would need to
calculate an analytical solution to the problem.
-In our case, the information our agent collects includes:
+In our case, the information our Agent collects includes:
* Position of the target. In general, it is better to use the relative position
of other objects rather than the absolute position for more generalizable
- training. Note that the agent only collects the x and z coordinates since the
+ training. Note that the Agent only collects the x and z coordinates since the
floor is aligned with the x-z plane and the y component of the target's
position never changes.
@@ -288,8 +288,8 @@ AddVectorObs(relativePosition.x / 5);
AddVectorObs(relativePosition.z / 5);
```
-* Position of the agent itself within the confines of the floor. This data is
- collected as the agent's distance from each edge of the floor.
+* Position of the Agent itself within the confines of the floor. This data is
+ collected as the Agent's distance from each edge of the floor.
```csharp
// Distance to edges of platform
@@ -299,7 +299,7 @@ AddVectorObs((this.transform.position.z + 5) / 5);
AddVectorObs((this.transform.position.z - 5) / 5);
```
-* The velocity of the agent. This helps the agent learn to control its speed so
+* The velocity of the Agent. This helps the Agent learn to control its speed so
it doesn't overshoot the target and roll off the platform.
```csharp
@@ -346,10 +346,10 @@ The decision of the Brain comes in the form of an action array passed to the
`AgentAction()` function. The number of elements in this array is determined by
the `Vector Action Space Type` and `Vector Action Space Size` settings of the
agent's Brain. The RollerAgent uses the continuous vector action space and needs
-two continuous control signals from the brain. Thus, we will set the Brain
+two continuous control signals from the Brain. Thus, we will set the Brain
`Vector Action Size` to 2. The first element,`action[0]` determines the force
applied along the x axis; `action[1]` determines the force applied along the z
-axis. (If we allowed the agent to move in three dimensions, then we would need
+axis. (If we allowed the Agent to move in three dimensions, then we would need
to set `Vector Action Size` to 3. Each of these values returned by the network
are between `-1` and `1.` Note the Brain really has no idea what the values in
the action array mean. The training process just adjusts the action values in
@@ -369,19 +369,19 @@ rBody.AddForce(controlSignal * speed);
### Rewards
Reinforcement learning requires rewards. Assign rewards in the `AgentAction()`
-function. The learning algorithm uses the rewards assigned to the agent at each
+function. The learning algorithm uses the rewards assigned to the Agent at each
step in the simulation and learning process to determine whether it is giving
-the agent the optimal actions. You want to reward an agent for completing the
-assigned task (reaching the Target cube, in this case) and punish the agent if
+the Agent the optimal actions. You want to reward an Agent for completing the
+assigned task (reaching the Target cube, in this case) and punish the Agent if
it irrevocably fails (falls off the platform). You can sometimes speed up
-training with sub-rewards that encourage behavior that helps the agent complete
+training with sub-rewards that encourage behavior that helps the Agent complete
the task. For example, the RollerAgent reward system provides a small reward if
-the agent moves closer to the target in a step and a small negative reward at
-each step which encourages the agent to complete its task quickly.
+the Agent moves closer to the target in a step and a small negative reward at
+each step which encourages the Agent to complete its task quickly.
The RollerAgent calculates the distance to detect when it reaches the target.
When it does, the code increments the Agent.reward variable by 1.0 and marks the
-agent as finished by setting the agent to done.
+agent as finished by setting the Agent to done.
```csharp
float distanceToTarget = Vector3.Distance(this.transform.position,
@@ -394,14 +394,14 @@ if (distanceToTarget < 1.42f)
}
```
-**Note:** When you mark an agent as done, it stops its activity until it is
-reset. You can have the agent reset immediately, by setting the
+**Note:** When you mark an Agent as done, it stops its activity until it is
+reset. You can have the Agent reset immediately, by setting the
Agent.ResetOnDone property to true in the inspector or you can wait for the
Academy to reset the environment. This RollerBall environment relies on the
`ResetOnDone` mechanism and doesn't set a `Max Steps` limit for the Academy (so
it never resets the environment).
-It can also encourage an agent to finish a task more quickly to assign a
+It can also encourage an Agent to finish a task more quickly to assign a
negative reward at each step:
```csharp
@@ -409,8 +409,8 @@ negative reward at each step:
AddReward(-0.05f);
```
-Finally, to punish the agent for falling off the platform, assign a large
-negative reward and, of course, set the agent to done so that it resets itself
+Finally, to punish the Agent for falling off the platform, assign a large
+negative reward and, of course, set the Agent to done so that it resets itself
in the next step:
```csharp
@@ -471,7 +471,7 @@ window.
Now, that all the GameObjects and ML-Agent components are in place, it is time
to connect everything together in the Unity Editor. This involves assigning the
Brain object to the Agent, changing some of the Agent Components properties, and
-setting the Brain properties so that they are compatible with our agent code.
+setting the Brain properties so that they are compatible with our Agent code.
1. Expand the Academy GameObject in the Hierarchy window, so that the Brain
object is visible.
@@ -501,7 +501,7 @@ Now you are ready to test the environment before training.
It is always a good idea to test your environment manually before embarking on
an extended training run. The reason we have left the Brain set to the
-**Player** type is so that we can control the agent using direct keyboard
+**Player** type is so that we can control the Agent using direct keyboard
control. But first, you need to define the keyboard to action mapping. Although
the RollerAgent only has an `Action Size` of two, we will use one key to specify
positive values and one to specify negative values for each action, for a total
@@ -525,11 +525,11 @@ The **Index** value corresponds to the index of the action array passed to
`AgentAction()` function. **Value** is assigned to action[Index] when **Key** is
pressed.
-Press **Play** to run the scene and use the WASD keys to move the agent around
+Press **Play** to run the scene and use the WASD keys to move the Agent around
the platform. Make sure that there are no errors displayed in the Unity editor
-Console window and that the agent resets when it reaches its target or falls
+Console window and that the Agent resets when it reaches its target or falls
from the platform. Note that for more involved debugging, the ML-Agents SDK
-includes a convenient Monitor class that you can use to easily display agent
+includes a convenient Monitor class that you can use to easily display Agent
status information in the Game window.
One additional test you can perform is to first ensure that your environment and
@@ -557,7 +557,8 @@ to use Unity ML-Agents:
Keep in mind:
* There can only be one Academy game object in a scene.
-* You can have multiple Brain game objects but they must be child of the Academy game object.
+* You can have multiple Brain game objects but they must be child of the Academy
+ game object.
Here is an example of what your scene hierarchy should look like:
diff --git a/docs/Learning-Environment-Design-Academy.md b/docs/Learning-Environment-Design-Academy.md
index 733d4c566a..56420d9f8c 100644
--- a/docs/Learning-Environment-Design-Academy.md
+++ b/docs/Learning-Environment-Design-Academy.md
@@ -1,7 +1,7 @@
# Creating an Academy
An Academy orchestrates all the Agent and Brain objects in a Unity scene. Every
-scene containing agents must contain a single Academy. To use an Academy, you
+scene containing Agents must contain a single Academy. To use an Academy, you
must create your own subclass. However, all the methods you can override are
optional.
@@ -29,7 +29,7 @@ in your Academy subclass.
## Resetting an Environment
Implement an `AcademyReset()` function to alter the environment at the start of
-each episode. For example, you might want to reset an agent to its starting
+each episode. For example, you might want to reset an Agent to its starting
position or move a goal to a random position. An environment resets when the
Academy `Max Steps` count is reached.
@@ -42,7 +42,7 @@ one, particular maze, not mazes in general.
## Controlling an Environment
The `AcademyStep()` function is called at every step in the simulation before
-any agents are updated. Use this function to update objects in the environment
+any Agents are updated. Use this function to update objects in the environment
at every step or during the episode between environment resets. For example, if
you want to add elements to the environment at random intervals, you can put the
logic for creating them in the `AcademyStep()` function.
diff --git a/docs/Learning-Environment-Design-Agents.md b/docs/Learning-Environment-Design-Agents.md
index 089277f2b6..f73cdb3602 100644
--- a/docs/Learning-Environment-Design-Agents.md
+++ b/docs/Learning-Environment-Design-Agents.md
@@ -1,13 +1,13 @@
# Agents
An agent is an actor that can observe its environment and decide on the best
-course of action using those observations. Create agents in Unity by extending
+course of action using those observations. Create Agents in Unity by extending
the Agent class. The most important aspects of creating agents that can
-successfully learn are the observations the agent collects and, for
-reinforcement learning, the reward you assign to estimate the value of the
+successfully learn are the observations the agent collects for
+reinforcement learning and the reward you assign to estimate the value of the
agent's current state toward accomplishing its tasks.
-An agent passes its observations to its brain. The brain, then, makes a decision
+An Agent passes its observations to its Brain. The Brain, then, makes a decision
and passes the chosen action back to the agent. Your agent code must execute the
action, for example, move the agent in one direction or another. In order to
[train an agent using reinforcement learning](Learning-Environment-Design.md),
@@ -15,13 +15,13 @@ your agent must calculate a reward value at each action. The reward is used to
discover the optimal decision-making policy. (A reward is not used by already
trained agents or for imitation learning.)
-The Brain class abstracts out the decision making logic from the agent itself so
-that you can use the same brain in multiple agents. How a brain makes its
-decisions depends on the type of brain it is. An **External** brain simply
-passes the observations from its agents to an external process and then passes
-the decisions made externally back to the agents. An **Internal** brain uses the
+The Brain class abstracts out the decision making logic from the Agent itself so
+that you can use the same Brain in multiple Agents. How a Brain makes its
+decisions depends on the type of Brain it is. An **External** Brain simply
+passes the observations from its Agents to an external process and then passes
+the decisions made externally back to the Agents. An **Internal** Brain uses the
trained policy parameters to make decisions (and no longer adjusts the
-parameters in search of a better decision). The other types of brains do not
+parameters in search of a better decision). The other types of Brains do not
directly involve training, but you might find them useful as part of a training
project. See [Brains](Learning-Environment-Design-Brains.md).
@@ -29,9 +29,9 @@ project. See [Brains](Learning-Environment-Design-Brains.md).
The observation-decision-action-reward cycle repeats after a configurable number
of simulation steps (the frequency defaults to once-per-step). You can also set
-up an agent to request decisions on demand. Making decisions at regular step
+up an Agent to request decisions on demand. Making decisions at regular step
intervals is generally most appropriate for physics-based simulations. Making
-decisions on demand is generally appropriate for situations where agents only
+decisions on demand is generally appropriate for situations where Agents only
respond to specific events or take actions of variable duration. For example, an
agent in a robotic simulator that must provide fine-control of joint torques
should make its decisions every step of the simulation. On the other hand, an
@@ -41,23 +41,23 @@ occur, should use on-demand decision making.
To control the frequency of step-based decision making, set the **Decision
Frequency** value for the Agent object in the Unity Inspector window. Agents
using the same Brain instance can use a different frequency. During simulation
-steps in which no decision is requested, the agent receives the same action
+steps in which no decision is requested, the Agent receives the same action
chosen by the previous decision.
### On Demand Decision Making
-On demand decision making allows agents to request decisions from their brains
+On demand decision making allows Agents to request decisions from their Brains
only when needed instead of receiving decisions at a fixed frequency. This is
useful when the agents commit to an action for a variable number of steps or
when the agents cannot make decisions at the same time. This typically the case
for turn based games, games where agents must react to events or games where
agents can take actions of variable duration.
-When you turn on **On Demand Decisions** for an agent, your agent code must call
+When you turn on **On Demand Decisions** for an Agent, your agent code must call
the `Agent.RequestDecision()` function. This function call starts one iteration
-of the observation-decision-action-reward cycle. The Brain invokes the agent's
+of the observation-decision-action-reward cycle. The Brain invokes the Agent's
`CollectObservations()` method, makes a decision and returns it by calling the
-`AgentAction()` method. The Brain waits for the agent to request the next
+`AgentAction()` method. The Brain waits for the Agent to request the next
decision before starting another iteration.
## Observations
@@ -69,22 +69,22 @@ state of the world. A state observation can take the following forms:
point numbers.
* **Visual Observations** — one or more camera images.
-When you use vector observations for an agent, implement the
+When you use vector observations for an Agent, implement the
`Agent.CollectObservations()` method to create the feature vector. When you use
**Visual Observations**, you only need to identify which Unity Camera objects
will provide images and the base Agent class handles the rest. You do not need
-to implement the `CollectObservations()` method when your agent uses visual
+to implement the `CollectObservations()` method when your Agent uses visual
observations (unless it also uses vector observations).
### Vector Observation Space: Feature Vectors
For agents using a continuous state space, you create a feature vector to
represent the agent's observation at each step of the simulation. The Brain
-class calls the `CollectObservations()` method of each of its agents. Your
+class calls the `CollectObservations()` method of each of its Agents. Your
implementation of this function must call `AddVectorObs` to add vector
observations.
-The observation must include all the information an agent needs to accomplish
+The observation must include all the information an agents needs to accomplish
its task. Without sufficient and relevant information, an agent may learn poorly
or may not learn at all. A reasonable approach for determining what information
should be included is to consider what you would need to calculate an analytical
@@ -123,7 +123,7 @@ with zeros for any missing entities in a specific observation or you can limit
an agent's observations to a fixed subset. For example, instead of observing
every enemy agent in an environment, you could only observe the closest five.
-When you set up an Agent's brain in the Unity Editor, set the following
+When you set up an Agent's Brain in the Unity Editor, set the following
properties to use a continuous vector observation:
* **Space Size** — The state size must match the length of your feature vector.
@@ -201,7 +201,7 @@ used in your normalization formula.
### Multiple Visual Observations
Camera observations use rendered textures from one or more cameras in a scene.
-The brain vectorizes the textures into a 3D Tensor which can be fed into a
+The Brain vectorizes the textures into a 3D Tensor which can be fed into a
convolutional neural network (CNN). For more information on CNNs, see [this
guide](http://cs231n.github.io/convolutional-networks/). You can use camera
observations along side vector observations.
@@ -211,25 +211,25 @@ useful when the state is difficult to describe numerically. However, they are
also typically less efficient and slower to train, and sometimes don't succeed
at all.
-To add a visual observation to an agent, click on the `Add Camera` button in the
+To add a visual observation to an Agent, click on the `Add Camera` button in the
Agent inspector. Then drag the camera you want to add to the `Camera` field. You
-can have more than one camera attached to an agent.
+can have more than one camera attached to an Agent.

In addition, make sure that the Agent's Brain expects a visual observation. In
the Brain inspector, under **Brain Parameters** > **Visual Observations**,
-specify the number of Cameras the agent is using for its visual observations.
+specify the number of Cameras the Agent is using for its visual observations.
For each visual observation, set the width and height of the image (in pixels)
and whether or not the observation is color or grayscale (when `Black And White`
is checked).
## Vector Actions
-An action is an instruction from the brain that the agent carries out. The
-action is passed to the agent as a parameter when the Academy invokes the
+An action is an instruction from the Brain that the agent carries out. The
+action is passed to the Agent as a parameter when the Academy invokes the
agent's `AgentAction()` function. When you specify that the vector action space
-is **Continuous**, the action parameter passed to the agent is an array of
+is **Continuous**, the action parameter passed to the Agent is an array of
control signals with length equal to the `Vector Action Space Size` property.
When you specify a **Discrete** vector action space type, the action parameter
is an array containing integers. Each integer is an index into a list or table
@@ -238,13 +238,13 @@ is an array of indices. The number of indices in the array is determined by the
number of branches defined in the `Branches Size` property. Each branch
corresponds to an action table, you can specify the size of each table by
modifying the `Branches` property. Set the `Vector Action Space Size` and
-`Vector Action Space Type` properties on the Brain object assigned to the agent
+`Vector Action Space Type` properties on the Brain object assigned to the Agent
(using the Unity Editor Inspector window).
Neither the Brain nor the training algorithm know anything about what the action
values themselves mean. The training algorithm simply tries different values for
the action list and observes the affect on the accumulated rewards over time and
-many training episodes. Thus, the only place actions are defined for an agent is
+many training episodes. Thus, the only place actions are defined for an Agent is
in the `AgentAction()` function. You simply specify the type of vector action
space, and, for the continuous vector action space, the number of values, and
then apply the received values appropriately (and consistently) in
@@ -253,16 +253,16 @@ then apply the received values appropriately (and consistently) in
For example, if you designed an agent to move in two dimensions, you could use
either continuous or the discrete vector actions. In the continuous case, you
would set the vector action size to two (one for each dimension), and the
-agent's brain would create an action with two floating point values. In the
+agent's Brain would create an action with two floating point values. In the
discrete case, you would use one Branch with a size of four (one for each
-direction), and the brain would create an action array containing a single
+direction), and the Brain would create an action array containing a single
element with a value ranging from zero to three. Alternatively, you could create
two branches of size two (one for horizontal movement and one for vertical
-movement), and the brain would create an action array containing two elements
+movement), and the Brain would create an action array containing two elements
with values ranging from zero to one.
Note that when you are programming actions for an agent, it is often helpful to
-test your action logic using a **Player** brain, which lets you map keyboard
+test your action logic using a **Player** Brain, which lets you map keyboard
commands to actions. See [Brains](Learning-Environment-Design-Brains.md).
The [3DBall](Learning-Environment-Examples.md#3dball-3d-balance-ball) and
@@ -271,12 +271,12 @@ up to use either the continuous or the discrete vector action spaces.
### Continuous Action Space
-When an agent uses a brain set to the **Continuous** vector action space, the
-action parameter passed to the agent's `AgentAction()` function is an array with
+When an Agent uses a Brain set to the **Continuous** vector action space, the
+action parameter passed to the Agent's `AgentAction()` function is an array with
length equal to the Brain object's `Vector Action Space Size` property value.
The individual values in the array have whatever meanings that you ascribe to
-them. If you assign an element in the array as the speed of an agent, for
-example, the training process learns to control the speed of the agent though
+them. If you assign an element in the array as the speed of an Agent, for
+example, the training process learns to control the speed of the Agent though
this parameter.
The [Reacher example](Learning-Environment-Examples.md#reacher) defines a
@@ -306,15 +306,15 @@ As shown above, you can scale the control values as needed after clamping them.
### Discrete Action Space
-When an agent uses a brain set to the **Discrete** vector action space, the
-action parameter passed to the agent's `AgentAction()` function is an array
+When an Agent uses a Brain set to the **Discrete** vector action space, the
+action parameter passed to the Agent's `AgentAction()` function is an array
containing indices. With the discrete vector action space, `Branches` is an
array of integers, each value corresponds to the number of possibilities for
each branch.
-For example, if we wanted an agent that can move in an plane and jump, we could
+For example, if we wanted an Agent that can move in an plane and jump, we could
define two branches (one for motion and one for jumping) because we want our
-agent be able to move __and__ jump concurently. We define the first branch to
+agent be able to move __and__ jump concurrently. We define the first branch to
have 5 possible actions (don't move, go left, go right, go backward, go forward)
and the second one to have 2 possible actions (don't jump, jump). The
AgentAction method would look something like:
@@ -333,7 +333,7 @@ if (movement == 4) { directionZ = 1; }
// Look up the index in the jump action list:
if (jump == 1 && IsGrounded()) { directionY = 1; }
-// Apply the action results to move the agent
+// Apply the action results to move the Agent
gameObject.GetComponent().AddForce(
new Vector3(
directionX * 40f, directionY * 300f, directionZ * 40f));
@@ -346,9 +346,9 @@ continuous action spaces.
#### Masking Discrete Actions
When using Discrete Actions, it is possible to specify that some actions are
-impossible for the next decision. Then the agent is controlled by an External or
-Internal Brain, the agent will be unable to perform the specified action. Note
-that when the agent is controlled by a Player or Heuristic Brain, the agent will
+impossible for the next decision. Then the Agent is controlled by an External or
+Internal Brain, the Agent will be unable to perform the specified action. Note
+that when the Agent is controlled by a Player or Heuristic Brain, the Agent will
still be able to decide to perform the masked action. In order to mask an
action, call the method `SetActionMask` within the `CollectObservation` method :
@@ -361,11 +361,11 @@ Where:
* `branch` is the index (starting at 0) of the branch on which you want to mask
the action
* `actionIndices` is a list of `int` or a single `int` corresponding to the
- index of theaction that the agent cannot perform.
+ index of the action that the Agent cannot perform.
-For example, if you have an agent with 2 branches and on the first branch
+For example, if you have an Agent with 2 branches and on the first branch
(branch 0) there are 4 possible actions : _"do nothing"_, _"jump"_, _"shoot"_
-and _"change weapon"_. Then with the code bellow, the agent will either _"do
+and _"change weapon"_. Then with the code bellow, the Agent will either _"do
nothing"_ or _"change weapon"_ for his next decision (since action index 1 and 2
are masked)
@@ -388,16 +388,16 @@ the choices an agent makes such that the agent earns the highest cumulative
reward over time. The better your reward mechanism, the better your agent will
learn.
-**Note:** Rewards are not used during inference by a brain using an already
+**Note:** Rewards are not used during inference by a Brain using an already
trained policy and is also not used during imitation learning.
Perhaps the best advice is to start simple and only add complexity as needed. In
general, you should reward results rather than actions you think will lead to
the desired results. To help develop your rewards, you can use the Monitor class
-to display the cumulative reward received by an agent. You can even use a Player
-brain to control the agent while watching how it accumulates rewards.
+to display the cumulative reward received by an Agent. You can even use a Player
+Brain to control the Agent while watching how it accumulates rewards.
-Allocate rewards to an agent by calling the `AddReward()` method in the
+Allocate rewards to an Agent by calling the `AddReward()` method in the
`AgentAction()` function. The reward assigned in any step should be in the range
[-1,1]. Values outside this range can lead to unstable training. The `reward`
value is reset to zero at every step.
@@ -465,7 +465,7 @@ if (IsDone() == false)
SetReward(0.1f);
}
-// When ball falls mark agent as done and give a negative penalty
+// When ball falls mark Agent as done and give a negative penalty
if ((ball.transform.position.y - gameObject.transform.position.y) < -2f ||
Mathf.Abs(ball.transform.position.x - gameObject.transform.position.x) > 3f ||
Mathf.Abs(ball.transform.position.z - gameObject.transform.position.z) > 3f)
@@ -482,15 +482,15 @@ platform.

-* `Brain` - The brain to register this agent to. Can be dragged into the
+* `Brain` - The Brain to register this Agent to. Can be dragged into the
inspector using the Editor.
* `Visual Observations` - A list of `Cameras` which will be used to generate
observations.
* `Max Step` - The per-agent maximum number of steps. Once this number is
- reached, the agent will be reset if `Reset On Done` is checked.
-* `Reset On Done` - Whether the agent's `AgentReset()` function should be called
- when the agent reaches its `Max Step` count or is marked as done in code.
-* `On Demand Decision` - Whether the agent requests decisions at a fixed step
+ reached, the Agent will be reset if `Reset On Done` is checked.
+* `Reset On Done` - Whether the Agent's `AgentReset()` function should be called
+ when the Agent reaches its `Max Step` count or is marked as done in code.
+* `On Demand Decision` - Whether the Agent requests decisions at a fixed step
interval or explicitly requests decisions by calling `RequestDecision()`.
* If not checked, the Agent will request a new decision every `Decision
Frequency` steps and perform an action every step. In the example above,
@@ -507,12 +507,13 @@ platform.
* `RequestAction()` Signals that the Agent is requesting an action. The
action provided to the Agent in this case is the same action that was
provided the last time it requested a decision.
-* `Decision Frequency` - The number of steps between decision requests. Not used if `On Demand Decision`, is true.
+* `Decision Frequency` - The number of steps between decision requests. Not used
+ if `On Demand Decision`, is true.
## Monitoring Agents
We created a helpful `Monitor` class that enables visualizing variables within a
-Unity environment. While this was built for monitoring an Agent's value function
+Unity environment. While this was built for monitoring an agent's value function
throughout the training process, we imagine it can be more broadly useful. You
can learn more [here](Feature-Monitor.md).
@@ -521,27 +522,27 @@ can learn more [here](Feature-Monitor.md).
To add an Agent to an environment at runtime, use the Unity
`GameObject.Instantiate()` function. It is typically easiest to instantiate an
agent from a [Prefab](https://docs.unity3d.com/Manual/Prefabs.html) (otherwise,
-you have to instantiate every GameObject and Component that make up your agent
+you have to instantiate every GameObject and Component that make up your Agent
individually). In addition, you must assign a Brain instance to the new Agent
and initialize it by calling its `AgentReset()` method. For example, the
-following function creates a new agent given a Prefab, Brain instance, location,
+following function creates a new Agent given a Prefab, Brain instance, location,
and orientation:
```csharp
-private void CreateAgent(GameObject agentPrefab, Brain brain, Vector3 position, Quaternion orientation)
+private void CreateAgent(GameObject AgentPrefab, Brain brain, Vector3 position, Quaternion orientation)
{
- GameObject agentObj = Instantiate(agentPrefab, position, orientation);
- Agent agent = agentObj.GetComponent();
- agent.GiveBrain(brain);
- agent.AgentReset();
+ GameObject AgentObj = Instantiate(agentPrefab, position, orientation);
+ Agent Agent = AgentObj.GetComponent();
+ Agent.GiveBrain(brain);
+ Agent.AgentReset();
}
```
## Destroying an Agent
Before destroying an Agent GameObject, you must mark it as done (and wait for
-the next step in the simulation) so that the Brain knows that this agent is no
-longer active. Thus, the best place to destroy an agent is in the
+the next step in the simulation) so that the Brain knows that this Agent is no
+longer active. Thus, the best place to destroy an Agent is in the
`Agent.AgentOnDone()` function:
```csharp
@@ -551,6 +552,6 @@ public override void AgentOnDone()
}
```
-Note that in order for `AgentOnDone()` to be called, the agent's `ResetOnDone`
-property must be false. You can set `ResetOnDone` on the agent's Inspector or in
+Note that in order for `AgentOnDone()` to be called, the Agent's `ResetOnDone`
+property must be false. You can set `ResetOnDone` on the Agent's Inspector or in
code.
diff --git a/docs/Learning-Environment-Design-Brains.md b/docs/Learning-Environment-Design-Brains.md
index 4566614a15..0bb115149d 100644
--- a/docs/Learning-Environment-Design-Brains.md
+++ b/docs/Learning-Environment-Design-Brains.md
@@ -12,24 +12,24 @@ Types:
* [External](Learning-Environment-Design-External-Internal-Brains.md) — The
**External** and **Internal** types typically work together; set **External**
- when training your agents. You can also use the **External** brain to
+ when training your Agents. You can also use the **External** Brain to
communicate with a Python script via the Python `UnityEnvironment` class
included in the Python portion of the ML-Agents SDK.
* [Internal](Learning-Environment-Design-External-Internal-Brains.md) – Set
**Internal** to make use of a trained model.
* [Heuristic](Learning-Environment-Design-Heuristic-Brains.md) – Set
- **Heuristic** to hand-code the agent's logic by extending the Decision class.
+ **Heuristic** to hand-code the Agent's logic by extending the Decision class.
* [Player](Learning-Environment-Design-Player-Brains.md) – Set **Player** to map
- keyboard keys to agent actions, which can be useful to test your agent code.
+ keyboard keys to Agent actions, which can be useful to test your Agent code.
-During training, set your agent's brain type to **External**. To use the trained
-model, import the model file into the Unity project and change the brain type to
+During training, set your Agent's Brain type to **External**. To use the trained
+model, import the model file into the Unity project and change the Brain type to
**Internal**.
The Brain class has several important properties that you can set using the
-Inspector window. These properties must be appropriate for the agents using the
-brain. For example, the `Vector Observation Space Size` property must match the
-length of the feature vector created by an agent exactly. See
+Inspector window. These properties must be appropriate for the Agents using the
+Brain. For example, the `Vector Observation Space Size` property must match the
+length of the feature vector created by an Agent exactly. See
[Agents](Learning-Environment-Design-Agents.md) for information about creating
agents and setting up a Brain instance correctly.
@@ -43,18 +43,18 @@ to a Brain component:
* `Brain Parameters` - Define vector observations, visual observation, and
vector actions for the Brain.
* `Vector Observation`
- * `Space Size` - Length of vector observation for brain.
+ * `Space Size` - Length of vector observation for Brain.
* `Stacked Vectors` - The number of previous vector observations that will
be stacked and used collectively for decision making. This results in the
- effective size of the vector observation being passed to the brain being:
+ effective size of the vector observation being passed to the Brain being:
_Space Size_ x _Stacked Vectors_.
* `Visual Observations` - Describes height, width, and whether to grayscale
visual observations for the Brain.
* `Vector Action`
* `Space Type` - Corresponds to whether action vector contains a single
integer (Discrete) or a series of real-valued floats (Continuous).
- * `Space Size` (Continuous) - Length of action vector for brain.
- * `Branches` (Discrete) - An array of integers, defines multiple concurent
+ * `Space Size` (Continuous) - Length of action vector for Brain.
+ * `Branches` (Discrete) - An array of integers, defines multiple concurrent
discrete actions. The values in the `Branches` array correspond to the
number of possible discrete values for each action branch.
* `Action Descriptions` - A list of strings used to name the available
@@ -69,8 +69,8 @@ to a Brain component:
## Using the Broadcast Feature
-The Player, Heuristic and Internal brains have been updated to support
-broadcast. The broadcast feature allows you to collect data from your agents
+The Player, Heuristic and Internal Brains have been updated to support
+broadcast. The broadcast feature allows you to collect data from your Agents
using a Python program without controlling them.
### How to use: Unity
@@ -82,21 +82,21 @@ To turn it on in Unity, simply check the `Broadcast` box as shown bellow:
### How to use: Python
When you launch your Unity Environment from a Python program, you can see what
-the agents connected to non-external brains are doing. When calling `step` or
-`reset` on your environment, you retrieve a dictionary mapping brain names to
+the Agents connected to non-External Brains are doing. When calling `step` or
+`reset` on your environment, you retrieve a dictionary mapping Brain names to
`BrainInfo` objects. The dictionary contains a `BrainInfo` object for each
-non-external brain set to broadcast as well as for any external brains.
+non-External Brain set to broadcast as well as for any External Brains.
-Just like with an external brain, the `BrainInfo` object contains the fields for
+Just like with an External Brain, the `BrainInfo` object contains the fields for
`visual_observations`, `vector_observations`, `text_observations`,
`memories`,`rewards`, `local_done`, `max_reached`, `agents` and
`previous_actions`. Note that `previous_actions` corresponds to the actions that
-were taken by the agents at the previous step, not the current one.
+were taken by the Agents at the previous step, not the current one.
Note that when you do a `step` on the environment, you cannot provide actions
-for non-external brains. If there are no external brains in the scene, simply
+for non-External Brains. If there are no External Brains in the scene, simply
call `step()` with no arguments.
You can use the broadcast feature to collect data generated by Player,
-Heuristics or Internal brains game sessions. You can then use this data to train
+Heuristics or Internal Brains game sessions. You can then use this data to train
an agent in a supervised context.
diff --git a/docs/Learning-Environment-Design-External-Internal-Brains.md b/docs/Learning-Environment-Design-External-Internal-Brains.md
index bc4368900b..461b83a722 100644
--- a/docs/Learning-Environment-Design-External-Internal-Brains.md
+++ b/docs/Learning-Environment-Design-External-Internal-Brains.md
@@ -1,19 +1,19 @@
# External and Internal Brains
The **External** and **Internal** types of Brains work in different phases of
-training. When training your agents, set their brain types to **External**; when
-using the trained models, set their brain types to **Internal**.
+training. When training your Agents, set their Brain types to **External**; when
+using the trained models, set their Brain types to **Internal**.
## External Brain
When [running an ML-Agents training algorithm](Training-ML-Agents.md), at least
one Brain object in a scene must be set to **External**. This allows the
-training process to collect the observations of agents using that brain and give
-the agents their actions.
+training process to collect the observations of Agents using that Brain and give
+the Agents their actions.
-In addition to using an External brain for training using the ML-Agents learning
-algorithms, you can use an External brain to control agents in a Unity
-environment using an external Python program. See [Python API](../ml-agents/README.md)
+In addition to using an External Brain for training using the ML-Agents learning
+algorithms, you can use an External Brain to control Agents in a Unity
+environment using an external Python program. See [Python API](Python-API.md)
for more information.
Unlike the other types, the External Brain has no properties to set in the Unity
@@ -30,7 +30,7 @@ that you can use with the Internal Brain type.
A __model__ is a mathematical relationship mapping an agent's observations to
its actions. TensorFlow is a software library for performing numerical
computation through data flow graphs. A TensorFlow model, then, defines the
-mathematical relationship between your agent's observations and its actions
+mathematical relationship between your Agent's observations and its actions
using a TensorFlow data flow graph.
### Creating a graph model
@@ -84,11 +84,11 @@ TensorFlow model and are not using an ML-Agents model:
* `Graph Scope` : If you set a scope while training your TensorFlow model, all
your placeholder name will have a prefix. You must specify that prefix here.
Note that if more than one Brain were set to external during training, you
- must give a `Graph Scope` to the internal Brain corresponding to the name of
+ must give a `Graph Scope` to the Internal Brain corresponding to the name of
the Brain GameObject.
* `Batch Size Node Name` : If the batch size is one of the inputs of your
- graph, you must specify the name if the placeholder here. The brain will make
- the batch size equal to the number of agents connected to the brain
+ graph, you must specify the name if the placeholder here. The Brain will make
+ the batch size equal to the number of Agents connected to the Brain
automatically.
* `State Node Name` : If your graph uses the state as an input, you must specify
the name of the placeholder here.
@@ -100,9 +100,9 @@ TensorFlow model and are not using an ML-Agents model:
if the output placeholder here.
* `Observation Placeholder Name` : If your graph uses observations as input, you
must specify it here. Note that the number of observations is equal to the
- length of `Camera Resolutions` in the brain parameters.
+ length of `Camera Resolutions` in the Brain parameters.
* `Action Node Name` : Specify the name of the placeholder corresponding to the
- actions of the brain in your graph. If the action space type is continuous,
+ actions of the Brain in your graph. If the action space type is continuous,
the output must be a one dimensional tensor of float of length `Action Space
Size`, if the action space type is discrete, the output must be a one
dimensional tensor of int of the same length as the `Branches` array.
diff --git a/docs/Learning-Environment-Design-Heuristic-Brains.md b/docs/Learning-Environment-Design-Heuristic-Brains.md
index 726d22dc52..aad74213c0 100644
--- a/docs/Learning-Environment-Design-Heuristic-Brains.md
+++ b/docs/Learning-Environment-Design-Heuristic-Brains.md
@@ -1,7 +1,7 @@
# Heuristic Brain
-The **Heuristic** brain type allows you to hand code an agent's decision making
-process. A Heuristic brain requires an implementation of the Decision interface
+The **Heuristic** Brain type allows you to hand code an Agent's decision making
+process. A Heuristic Brain requires an implementation of the Decision interface
to which it delegates the decision making process.
When you set the **Brain Type** property of a Brain to **Heuristic**, you must
@@ -25,9 +25,9 @@ public class HeuristicLogic : MonoBehaviour, Decision
The Decision interface defines two methods, `Decide()` and `MakeMemory()`.
-The `Decide()` method receives an agents current state, consisting of the
-agent's observations, reward, memory and other aspects of the agent's state, and
-must return an array containing the action that the agent should take. The
+The `Decide()` method receives an Agents current state, consisting of the
+agent's observations, reward, memory and other aspects of the Agent's state, and
+must return an array containing the action that the Agent should take. The
format of the returned action array depends on the **Vector Action Space Type**.
When using a **Continuous** action space, the action array is just a float array
with a length equal to the **Vector Action Space Size** setting. When using a
@@ -38,8 +38,8 @@ function can return for each branch, which don't need to be consecutive
integers.
The `MakeMemory()` function allows you to pass data forward to the next
-iteration of an agent's decision making process. The array you return from
+iteration of an Agent's decision making process. The array you return from
`MakeMemory()` is passed to the `Decide()` function in the next iteration. You
-can use the memory to allow the agent's decision process to take past actions
+can use the memory to allow the Agent's decision process to take past actions
and observations into account when making the current decision. If your
heuristic logic does not require memory, just return an empty array.
diff --git a/docs/Learning-Environment-Design-Player-Brains.md b/docs/Learning-Environment-Design-Player-Brains.md
index 860eb541c9..fadef35257 100644
--- a/docs/Learning-Environment-Design-Player-Brains.md
+++ b/docs/Learning-Environment-Design-Player-Brains.md
@@ -1,21 +1,21 @@
# Player Brain
-The **Player** brain type allows you to control an agent using keyboard
-commands. You can use Player brains to control a "teacher" agent that trains
-other agents during [imitation learning](Training-Imitation-Learning.md). You
-can also use Player brains to test your agents and environment before changing
-their brain types to **External** and running the training process.
+The **Player** Brain type allows you to control an Agent using keyboard
+commands. You can use Player Brains to control a "teacher" Agent that trains
+other Agents during [imitation learning](Training-Imitation-Learning.md). You
+can also use Player Brains to test your Agents and environment before changing
+their Brain types to **External** and running the training process.
## Player Brain properties
-The **Player** brain properties allow you to assign one or more keyboard keys to
+The **Player** Brain properties allow you to assign one or more keyboard keys to
each action and a unique value to send when a key is pressed.

Note the differences between the discrete and continuous action spaces. When a
-brain uses the discrete action space, you can send one integer value as the
-action per step. In contrast, when a brain uses the continuous action space you
+Brain uses the discrete action space, you can send one integer value as the
+action per step. In contrast, when a Brain uses the continuous action space you
can send any number of floating point values (up to the **Vector Action Space
Size** setting).
@@ -28,10 +28,10 @@ Size** setting).
action. (If you press both keys at the same time, deterministic results are not guaranteed.)|
||**Element 0–N**| The mapping of keys to action values. |
|| **Key** | The key on the keyboard. |
-|| **Index** | The element of the agent's action vector to set when this key is
+|| **Index** | The element of the Agent's action vector to set when this key is
pressed. The index value cannot exceed the size of the Action Space (minus 1,
since it is an array index).|
-|| **Value** | The value to send to the agent as its action for the specified
+|| **Value** | The value to send to the Agent as its action for the specified
index when the mapped key is pressed. All other members of the action vector
are set to 0. |
|**Discrete Player Actions**|| The mapping for the discrete vector action space.
@@ -39,10 +39,10 @@ Size** setting).
|| **Size** | The number of key commands defined. |
||**Element 0–N**| The mapping of keys to action values. |
|| **Key** | The key on the keyboard. |
-|| **Branch Index** |The element of the agent's action vector to set when this
+|| **Branch Index** |The element of the Agent's action vector to set when this
key is pressed. The index value cannot exceed the size of the Action Space
(minus 1, since it is an array index).|
-|| **Value** | The value to send to the agent as its action when the mapped key
+|| **Value** | The value to send to the Agent as its action when the mapped key
is pressed. Cannot exceed the max value for the associated branch (minus 1,
since it is an array index).|
diff --git a/docs/Learning-Environment-Design.md b/docs/Learning-Environment-Design.md
index 3f37cd6a00..bcab079fc3 100644
--- a/docs/Learning-Environment-Design.md
+++ b/docs/Learning-Environment-Design.md
@@ -29,30 +29,30 @@ Step-by-step procedures for running the training process are provided in the
Training and simulation proceed in steps orchestrated by the ML-Agents Academy
class. The Academy works with Agent and Brain objects in the scene to step
through the simulation. When either the Academy has reached its maximum number
-of steps or all agents in the scene are _done_, one training episode is
+of steps or all Agents in the scene are _done_, one training episode is
finished.
During training, the external Python training process communicates with the
Academy to run a series of episodes while it collects data and optimizes its
-neural network model. The type of Brain assigned to an agent determines whether
-it participates in training or not. The **External** brain communicates with the
+neural network model. The type of Brain assigned to an Agent determines whether
+it participates in training or not. The **External** Brain communicates with the
external process to train the TensorFlow model. When training is completed
successfully, you can add the trained model file to your Unity project for use
-with an **Internal** brain.
+with an **Internal** Brain.
The ML-Agents Academy class orchestrates the agent simulation loop as follows:
1. Calls your Academy subclass's `AcademyReset()` function.
-2. Calls the `AgentReset()` function for each agent in the scene.
-3. Calls the `CollectObservations()` function for each agent in the scene.
-4. Uses each agent's Brain class to decide on the agent's next action.
+2. Calls the `AgentReset()` function for each Agent in the scene.
+3. Calls the `CollectObservations()` function for each Agent in the scene.
+4. Uses each Agent's Brain class to decide on the Agent's next action.
5. Calls your subclass's `AcademyStep()` function.
-6. Calls the `AgentAction()` function for each agent in the scene, passing in
- the action chosen by the agent's brain. (This function is not called if the
- agent is done.)
-7. Calls the agent's `AgentOnDone()` function if the agent has reached its `Max
+6. Calls the `AgentAction()` function for each Agent in the scene, passing in
+ the action chosen by the Agent's Brain. (This function is not called if the
+ Agent is done.)
+7. Calls the Agent's `AgentOnDone()` function if the Agent has reached its `Max
Step` count or has otherwise marked itself as `done`. Optionally, you can set
- an agent to restart if it finishes before the end of an episode. In this
+ an Agent to restart if it finishes before the end of an episode. In this
case, the Academy calls the `AgentReset()` function.
8. When the Academy reaches its own `Max Step` count, it starts the next episode
again by calling your Academy subclass's `AcademyReset()` function.
@@ -65,7 +65,7 @@ whether you need to implement them or not depends on your specific scenario.
**Note:** The API used by the Python PPO training process to communicate with
and control the Academy during training can be used for other purposes as well.
For example, you could use the API to use Unity as the simulation engine for
-your own machine learning algorithms. See [Python API](../ml-agents/README.md) for more
+your own machine learning algorithms. See [Python API](Python-API.md) for more
information.
## Organizing the Unity Scene
@@ -74,18 +74,18 @@ To train and use the ML-Agents toolkit in a Unity scene, the scene must contain
a single Academy subclass along with as many Brain objects and Agent subclasses
as you need. Any Brain instances in the scene must be attached to GameObjects
that are children of the Academy in the Unity Scene Hierarchy. Agent instances
-should be attached to the GameObject representing that agent.
+should be attached to the GameObject representing that Agent.

-You must assign a brain to every agent, but you can share brains between
-multiple agents. Each agent will make its own observations and act
+You must assign a Brain to every Agent, but you can share Brains between
+multiple Agents. Each Agent will make its own observations and act
independently, but will use the same decision-making logic and, for **Internal**
-brains, the same trained TensorFlow model.
+Brains, the same trained TensorFlow model.
### Academy
-The Academy object orchestrates agents and their decision making processes. Only
+The Academy object orchestrates Agents and their decision making processes. Only
place a single Academy object in a scene.
You must create a subclass of the Academy class (since the base class is
@@ -93,14 +93,14 @@ abstract). When you create your Academy subclass, you can implement the
following methods (all are optional):
* `InitializeAcademy()` — Prepare the environment the first time it launches.
-* `AcademyReset()` — Prepare the environment and agents for the next training
+* `AcademyReset()` — Prepare the environment and Agents for the next training
episode. Use this function to place and initialize entities in the scene as
necessary.
* `AcademyStep()` — Prepare the environment for the next simulation step. The
base Academy class calls this function before calling any `AgentAction()`
methods for the current step. You can use this function to update other
- objects in the scene before the agents take their actions. Note that the
- agents have already collected their observations and chosen an action before
+ objects in the scene before the Agents take their actions. Note that the
+ Agents have already collected their observations and chosen an action before
the Academy invokes this method.
The base Academy classes also defines several important properties that you can
@@ -119,17 +119,17 @@ children of the Academy in the Unity scene hierarchy. Every Agent must be
assigned a Brain, but you can use the same Brain with more than one Agent.
Use the Brain class directly, rather than a subclass. Brain behavior is
-determined by the brain type. During training, set your agent's brain type to
+determined by the Brain type. During training, set your Agent's Brain type to
**External**. To use the trained model, import the model file into the Unity
-project and change the brain type to **Internal**. See
+project and change the Brain type to **Internal**. See
[Brains](Learning-Environment-Design-Brains.md) for details on using the
-different types of brains. You can extend the CoreBrain class to create
-different brain types if the four built-in types don't do what you need.
+different types of Brains. You can extend the CoreBrain class to create
+different Brain types if the four built-in types don't do what you need.
The Brain class has several important properties that you can set using the
-Inspector window. These properties must be appropriate for the agents using the
-brain. For example, the `Vector Observation Space Size` property must match the
-length of the feature vector created by an agent exactly. See
+Inspector window. These properties must be appropriate for the Agents using the
+Brain. For example, the `Vector Observation Space Size` property must match the
+length of the feature vector created by an Agent exactly. See
[Agents](Learning-Environment-Design-Agents.md) for information about creating
agents and setting up a Brain instance correctly.
@@ -144,27 +144,27 @@ the scene that otherwise represents the actor — for example, to a player objec
in a football game or a car object in a vehicle simulation. Every Agent must be
assigned a Brain.
-To create an agent, extend the Agent class and implement the essential
+To create an Agent, extend the Agent class and implement the essential
`CollectObservations()` and `AgentAction()` methods:
-* `CollectObservations()` — Collects the agent's observation of its environment.
-* `AgentAction()` — Carries out the action chosen by the agent's brain and
+* `CollectObservations()` — Collects the Agent's observation of its environment.
+* `AgentAction()` — Carries out the action chosen by the Agent's Brain and
assigns a reward to the current state.
Your implementations of these functions determine how the properties of the
-Brain assigned to this agent must be set.
+Brain assigned to this Agent must be set.
You must also determine how an Agent finishes its task or times out. You can
-manually set an agent to done in your `AgentAction()` function when the agent
-has finished (or irrevocably failed) its task. You can also set the agent's `Max
-Steps` property to a positive value and the agent will consider itself done
+manually set an Agent to done in your `AgentAction()` function when the Agent
+has finished (or irrevocably failed) its task. You can also set the Agent's `Max
+Steps` property to a positive value and the Agent will consider itself done
after it has taken that many steps. When the Academy reaches its own `Max Steps`
-count, it starts the next episode. If you set an agent's `ResetOnDone` property
-to true, then the agent can attempt its task several times in one episode. (Use
-the `Agent.AgentReset()` function to prepare the agent to start again.)
+count, it starts the next episode. If you set an Agent's `ResetOnDone` property
+to true, then the Agent can attempt its task several times in one episode. (Use
+the `Agent.AgentReset()` function to prepare the Agent to start again.)
See [Agents](Learning-Environment-Design-Agents.md) for detailed information
-about programing your own agents.
+about programing your own Agents.
## Environments
@@ -195,8 +195,8 @@ include:
* The training scene must start automatically when your Unity application is
launched by the training process.
-* The scene must include at least one **External** brain.
+* The scene must include at least one **External** Brain.
* The Academy must reset the scene to a valid starting point for each episode of
training.
* A training episode must have a definite end — either using `Max Steps` or by
- each agent setting itself to `done`.
+ each Agent setting itself to `done`.
diff --git a/docs/Learning-Environment-Examples.md b/docs/Learning-Environment-Examples.md
index 6152ee2d4f..a6c883b9cc 100644
--- a/docs/Learning-Environment-Examples.md
+++ b/docs/Learning-Environment-Examples.md
@@ -24,11 +24,11 @@ If you would like to contribute environments, please see our
* Set-up: A linear movement task where the agent must move left or right to
rewarding states.
* Goal: Move to the most reward state.
-* Agents: The environment contains one agent linked to a single brain.
+* Agents: The environment contains one agent linked to a single Brain.
* Agent Reward Function:
* +0.1 for arriving at suboptimal state.
* +1.0 for arriving at optimal state.
-* Brains: One brain with the following observation/action space.
+* Brains: One Brain with the following observation/action space.
* Vector Observation space: One variable corresponding to current state.
* Vector Action space: (Discrete) Two possible actions (Move left, move
right).
@@ -44,11 +44,11 @@ If you would like to contribute environments, please see our
* Goal: The agent must balance the platform in order to keep the ball on it for
as long as possible.
* Agents: The environment contains 12 agents of the same kind, all linked to a
- single brain.
+ single Brain.
* Agent Reward Function:
* +0.1 for every step the ball remains on the platform.
* -1.0 if the ball falls from the platform.
-* Brains: One brain with the following observation/action space.
+* Brains: One Brain with the following observation/action space.
* Vector Observation space: 8 variables corresponding to rotation of platform,
and position, rotation, and velocity of ball.
* Vector Observation space (Hard Version): 5 variables corresponding to
@@ -67,12 +67,12 @@ If you would like to contribute environments, please see our
and obstacles.
* Goal: The agent must navigate the grid to the goal while avoiding the
obstacles.
-* Agents: The environment contains one agent linked to a single brain.
+* Agents: The environment contains one agent linked to a single Brain.
* Agent Reward Function:
* -0.01 for every step.
* +1.0 if the agent navigates to the goal position of the grid (episode ends).
* -1.0 if the agent navigates to an obstacle (episode ends).
-* Brains: One brain with the following observation/action space.
+* Brains: One Brain with the following observation/action space.
* Vector Observation space: None
* Vector Action space: (Discrete) Size of 4, corresponding to movement in
cardinal directions. Note that for this environment,
@@ -93,13 +93,13 @@ If you would like to contribute environments, please see our
net.
* Goal: The agents must bounce ball between one another while not dropping or
sending ball out of bounds.
-* Agents: The environment contains two agent linked to a single brain named
- TennisBrain. After training you can attach another brain named MyBrain to one
+* Agents: The environment contains two agent linked to a single Brain named
+ TennisBrain. After training you can attach another Brain named MyBrain to one
of the agent to play against your trained model.
* Agent Reward Function (independent):
* +0.1 To agent when hitting ball over net.
* -0.1 To agent who let ball hit their ground, or hit ball out of bounds.
-* Brains: One brain with the following observation/action space.
+* Brains: One Brain with the following observation/action space.
* Vector Observation space: 8 variables corresponding to position and velocity
of ball and racket.
* Vector Action space: (Continuous) Size of 2, corresponding to movement
@@ -115,11 +115,11 @@ If you would like to contribute environments, please see our
* Set-up: A platforming environment where the agent can push a block around.
* Goal: The agent must push the block to the goal.
-* Agents: The environment contains one agent linked to a single brain.
+* Agents: The environment contains one agent linked to a single Brain.
* Agent Reward Function:
* -0.0025 for every step.
* +1.0 if the block touches the goal.
-* Brains: One brain with the following observation/action space.
+* Brains: One Brain with the following observation/action space.
* Vector Observation space: (Continuous) 70 variables corresponding to 14
ray-casts each detecting one of three possible objects (wall, goal, or
block).
@@ -137,20 +137,20 @@ If you would like to contribute environments, please see our
* Set-up: A platforming environment where the agent can jump over a wall.
* Goal: The agent must use the block to scale the wall and reach the goal.
-* Agents: The environment contains one agent linked to two different brains. The
- brain the agent is linked to changes depending on the height of the wall.
+* Agents: The environment contains one agent linked to two different Brains. The
+ Brain the agent is linked to changes depending on the height of the wall.
* Agent Reward Function:
* -0.0005 for every step.
* +1.0 if the agent touches the goal.
* -1.0 if the agent falls off the platform.
-* Brains: Two brains, each with the following observation/action space.
- * Vector Observation space: Size of 74, corresponding to 14 raycasts each
+* Brains: Two Brains, each with the following observation/action space.
+ * Vector Observation space: Size of 74, corresponding to 14 ray casts each
detecting 4 possible objects. plus the global position of the agent and
whether or not the agent is grounded.
* Vector Action space: (Discrete) 4 Branches:
* Forward Motion (3 possible actions: Forward, Backwards, No Action)
- * Rotation (3 possible acions: Rotate Left, Rotate Right, No Action)
- * Side Motion (3 possible acions: Left, Right, No Action)
+ * Rotation (3 possible actions: Rotate Left, Rotate Right, No Action)
+ * Side Motion (3 possible actions: Left, Right, No Action)
* Jump (2 possible actions: Jump, No Action)
* Visual Observations: None.
* Reset Parameters: 4, corresponding to the height of the possible walls.
@@ -162,10 +162,10 @@ If you would like to contribute environments, please see our
* Set-up: Double-jointed arm which can move to target locations.
* Goal: The agents must move it's hand to the goal location, and keep it there.
-* Agents: The environment contains 10 agent linked to a single brain.
+* Agents: The environment contains 10 agent linked to a single Brain.
* Agent Reward Function (independent):
* +0.1 Each step agent's hand is in goal location.
-* Brains: One brain with the following observation/action space.
+* Brains: One Brain with the following observation/action space.
* Vector Observation space: 26 variables corresponding to position, rotation,
velocity, and angular velocities of the two arm Rigidbodies.
* Vector Action space: (Continuous) Size of 4, corresponding to torque
@@ -182,11 +182,11 @@ If you would like to contribute environments, please see our
* Goal: The agents must move its body toward the goal direction without falling.
* `CrawlerStaticTarget` - Goal direction is always forward.
* `CrawlerDynamicTarget`- Goal direction is randomized.
-* Agents: The environment contains 3 agent linked to a single brain.
+* Agents: The environment contains 3 agent linked to a single Brain.
* Agent Reward Function (independent):
* +0.03 times body velocity in the goal direction.
* +0.01 times body direction alignment with goal direction.
-* Brains: One brain with the following observation/action space.
+* Brains: One Brain with the following observation/action space.
* Vector Observation space: 117 variables corresponding to position, rotation,
velocity, and angular velocities of each limb plus the acceleration and
angular acceleration of the body.
@@ -203,19 +203,19 @@ If you would like to contribute environments, please see our
* Set-up: A multi-agent environment where agents compete to collect bananas.
* Goal: The agents must learn to move to as many yellow bananas as possible
while avoiding blue bananas.
-* Agents: The environment contains 5 agents linked to a single brain.
+* Agents: The environment contains 5 agents linked to a single Brain.
* Agent Reward Function (independent):
* +1 for interaction with yellow banana
* -1 for interaction with blue banana.
-* Brains: One brain with the following observation/action space.
+* Brains: One Brain with the following observation/action space.
* Vector Observation space: 53 corresponding to velocity of agent (2), whether
agent is frozen and/or shot its laser (2), plus ray-based perception of
objects around agent's forward direction (49; 7 raycast angles with 7
measurements for each).
* Vector Action space: (Discrete) 4 Branches:
* Forward Motion (3 possible actions: Forward, Backwards, No Action)
- * Side Motion (3 possible acions: Left, Right, No Action)
- * Rotation (3 possible acions: Rotate Left, Rotate Right, No Action)
+ * Side Motion (3 possible actions: Left, Right, No Action)
+ * Rotation (3 possible actions: Rotate Left, Rotate Right, No Action)
* Laser (2 possible actions: Laser, No Action)
* Visual Observations (Optional): First-person camera per-agent. Use
`VisualBanana` scene.
@@ -231,12 +231,12 @@ If you would like to contribute environments, please see our
remember it, and use it to move to the correct goal.
* Goal: Move to the goal which corresponds to the color of the block in the
room.
-* Agents: The environment contains one agent linked to a single brain.
+* Agents: The environment contains one agent linked to a single Brain.
* Agent Reward Function (independent):
* +1 For moving to correct goal.
* -0.1 For moving to incorrect goal.
* -0.0003 Existential penalty.
-* Brains: One brain with the following observation/action space:
+* Brains: One Brain with the following observation/action space:
* Vector Observation space: 30 corresponding to local ray-casts detecting
objects, goals, and walls.
* Vector Action space: (Discrete) 1 Branch, 4 actions corresponding to agent
@@ -254,12 +254,12 @@ If you would like to contribute environments, please see our
* Set-up: Environment where the agent needs on-demand decision making. The agent
must decide how perform its next bounce only when it touches the ground.
* Goal: Catch the floating banana. Only has a limited number of jumps.
-* Agents: The environment contains one agent linked to a single brain.
+* Agents: The environment contains one agent linked to a single Brain.
* Agent Reward Function (independent):
* +1 For catching the banana.
* -1 For bouncing out of bounds.
* -0.05 Times the action squared. Energy expenditure penalty.
-* Brains: One brain with the following observation/action space:
+* Brains: One Brain with the following observation/action space:
* Vector Observation space: 6 corresponding to local position of agent and
banana.
* Vector Action space: (Continuous) 3 corresponding to agent force applied for
@@ -276,7 +276,7 @@ If you would like to contribute environments, please see our
* Goal:
* Striker: Get the ball into the opponent's goal.
* Goalie: Prevent the ball from entering its own goal.
-* Agents: The environment contains four agents, with two linked to one brain
+* Agents: The environment contains four agents, with two linked to one Brain
(strikers) and two linked to another (goalies).
* Agent Reward Function (dependent):
* Striker:
@@ -287,7 +287,7 @@ If you would like to contribute environments, please see our
* -1 When ball enters team's goal.
* +0.1 When ball enters opponents goal.
* +0.001 Existential bonus.
-* Brains: Two brain with the following observation/action space:
+* Brains: Two Brain with the following observation/action space:
* Vector Observation space: 112 corresponding to local 14 ray casts, each
detecting 7 possible object types, along with the object's distance.
Perception is in 180 degree view from front of agent.
@@ -306,17 +306,17 @@ If you would like to contribute environments, please see our
* Set-up: Physics-based Humanoids agents with 26 degrees of freedom. These DOFs
correspond to articulation of the following body-parts: hips, chest, spine,
- head, thighs, shins, feets, arms, forearms and hands.
+ head, thighs, shins, feet, arms, forearms and hands.
* Goal: The agents must move its body toward the goal direction as quickly as
possible without falling.
* Agents: The environment contains 11 independent agent linked to a single
- brain.
+ Brain.
* Agent Reward Function (independent):
* +0.03 times body velocity in the goal direction.
* +0.01 times head y position.
* +0.01 times body direction alignment with goal direction.
* -0.01 times head velocity difference from body velocity.
-* Brains: One brain with the following observation/action space.
+* Brains: One Brain with the following observation/action space.
* Vector Observation space: 215 variables corresponding to position, rotation,
velocity, and angular velocities of each limb, along with goal direction.
* Vector Action space: (Continuous) Size of 39, corresponding to target
@@ -333,10 +333,10 @@ If you would like to contribute environments, please see our
pyramid, then navigate to the pyramid, knock it over, and move to the gold
brick at the top.
* Goal: Move to the golden brick on top of the spawned pyramid.
-* Agents: The environment contains one agent linked to a single brain.
+* Agents: The environment contains one agent linked to a single Brain.
* Agent Reward Function (independent):
* +2 For moving to golden brick (minus 0.001 per step).
-* Brains: One brain with the following observation/action space:
+* Brains: One Brain with the following observation/action space:
* Vector Observation space: 148 corresponding to local ray-casts detecting
switch, bricks, golden brick, and walls, plus variable indicating switch
state.
diff --git a/docs/Learning-Environment-Executable.md b/docs/Learning-Environment-Executable.md
index e3e0afab59..829fea10b0 100644
--- a/docs/Learning-Environment-Executable.md
+++ b/docs/Learning-Environment-Executable.md
@@ -29,7 +29,7 @@ environment:
Make sure the Brains in the scene have the right type. For example, if you want
to be able to control your agents from Python, you will need to set the
-corresponding brain to **External**.
+corresponding Brain to **External**.
1. In the **Scene** window, click the triangle icon next to the Ball3DAcademy
object.
@@ -71,7 +71,7 @@ can interact with it.
## Interacting with the Environment
-If you want to use the [Python API](../ml-agents/README.md) to interact with your
+If you want to use the [Python API](Python-API.md) to interact with your
executable, you can pass the name of the executable with the argument
'file_name' of the `UnityEnvironment`. For instance:
@@ -83,12 +83,13 @@ env = UnityEnvironment(file_name=)
## Training the Environment
1. Open a command or terminal window.
-2. Nagivate to the folder where you installed ML-Agents.
-3. Change to the python directory.
-4. Run
+2. Navigate to the folder where you installed the ML-Agents Toolkit. If you
+ followed the default [installation](Installation.md), then navigate to the
+ `ml-agents/` folder.
+3. Run
`mlagents-learn --env= --run-id= --train`
Where:
- * `` is the filepath of the trainer configuration yaml.
+ * `` is the file path of the trainer configuration yaml
* `` is the name and path to the executable you exported from Unity
(without extension)
* `` is a string used to separate the results of different
@@ -97,10 +98,10 @@ env = UnityEnvironment(file_name=)
than inference)
For example, if you are training with a 3DBall executable you exported to the
-ml-agents/python directory, run:
+the directory where you installed the ML-Agents Toolkit, run:
```sh
-mlagents-learn config/trainer_config.yaml --env=3DBall --run-id=first-run --train
+mlagents-learn ../config/trainer_config.yaml --env=3DBall --run-id=firstRun --train
```
And you should see something like
@@ -205,7 +206,7 @@ INFO:mlagents.trainers: first-run-0: Ball3DBrain: Step: 10000. Mean Reward: 27.2
You can press Ctrl+C to stop the training, and your trained model will be at
`models//_.bytes`, which corresponds
to your model's latest checkpoint. You can now embed this trained model into
-your internal brain by following the steps below:
+your Internal Brain by following the steps below:
1. Move your model file into
`UnitySDK/Assets/ML-Agents/Examples/3DBall/TFModels/`.
diff --git a/docs/Limitations.md b/docs/Limitations.md
index cf567b63d5..e16a0335e5 100644
--- a/docs/Limitations.md
+++ b/docs/Limitations.md
@@ -11,7 +11,7 @@ from your agents.
Currently the speed of the game physics can only be increased to 100x real-time.
The Academy also moves in time with FixedUpdate() rather than Update(), so game
-behavior implemented in Update() may be out of sync with the Agent decision
+behavior implemented in Update() may be out of sync with the agent decision
making. See
[Execution Order of Event Functions](https://docs.unity3d.com/Manual/ExecutionOrder.html)
for more information.
@@ -22,7 +22,7 @@ for more information.
As of version 0.3, we no longer support Python 2.
-### Tensorflow support
+### TensorFlow support
Currently the Ml-Agents toolkit uses TensorFlow 1.7.1 due to the version of the
TensorFlowSharp plugin we are using.
diff --git a/docs/ML-Agents-Overview.md b/docs/ML-Agents-Overview.md
index 8454734936..52776a4ed1 100644
--- a/docs/ML-Agents-Overview.md
+++ b/docs/ML-Agents-Overview.md
@@ -214,7 +214,7 @@ enables additional training modes.
border="10" />
-_An example of how a scene containing multiple Agents and Brains might be
+_An example of how a scene containing multiple Agents and Brains might be
configured._
## Training Modes
@@ -264,7 +264,7 @@ for both training and inferences phases and the behaviors of all the Agents in
the scene will be controlled within Python.
We do not currently have a tutorial highlighting this mode, but you can
-learn more about the Python API [here](../ml-agents/README.md).
+learn more about the Python API [here](Python-API.md).
### Curriculum Learning
@@ -333,34 +333,34 @@ kinds of novel and fun environments the community creates. For those new to
training intelligent agents, below are a few examples that can serve as
inspiration:
-- Single-Agent. A single Agent linked to a single Brain, with its own reward
+- Single-Agent. A single agent linked to a single Brain, with its own reward
signal. The traditional way of training an agent. An example is any
single-player game, such as Chicken. [Video
Link](https://www.youtube.com/watch?v=fiQsmdwEGT8&feature=youtu.be).
-- Simultaneous Single-Agent. Multiple independent Agents with independent reward
+- Simultaneous Single-Agent. Multiple independent agents with independent reward
signals linked to a single Brain. A parallelized version of the traditional
training scenario, which can speed-up and stabilize the training process.
Helpful when you have multiple versions of the same character in an
environment who should learn similar behaviors. An example might be training a
dozen robot-arms to each open a door simultaneously. [Video
Link](https://www.youtube.com/watch?v=fq0JBaiCYNA).
-- Adversarial Self-Play. Two interacting Agents with inverse reward signals
+- Adversarial Self-Play. Two interacting agents with inverse reward signals
linked to a single Brain. In two-player games, adversarial self-play can allow
an agent to become increasingly more skilled, while always having the
perfectly matched opponent: itself. This was the strategy employed when
training AlphaGo, and more recently used by OpenAI to train a human-beating
1-vs-1 Dota 2 agent.
-- Cooperative Multi-Agent. Multiple interacting Agents with a shared reward
+- Cooperative Multi-Agent. Multiple interacting agents with a shared reward
signal linked to either a single or multiple different Brains. In this
scenario, all agents must work together to accomplish a task that cannot be
done alone. Examples include environments where each agent only has access to
partial information, which needs to be shared in order to accomplish the task
or collaboratively solve a puzzle.
-- Competitive Multi-Agent. Multiple interacting Agents with inverse reward
+- Competitive Multi-Agent. Multiple interacting s with inverse reward
signals linked to either a single or multiple different Brains. In this
- scenario, agents must compete with one another to either win a competition, or
+ scenario, s must compete with one another to either win a competition, or
obtain some limited set of resources. All team sports fall into this scenario.
-- Ecosystem. Multiple interacting Agents with independent reward signals linked
+- Ecosystem. Multiple interacting s with independent reward signals linked
to either a single or multiple different Brains. This scenario can be thought
of as creating a small world in which animals with different goals all
interact, such as a savanna in which there might be zebras, elephants and
@@ -390,11 +390,11 @@ training process.
learn more about enabling LSTM during training [here](Feature-Memory.md).
- **Monitoring Agent’s Decision Making** - Since communication in ML-Agents is a
- two-way street, we provide an agent Monitor class in Unity which can display
- aspects of the trained agent, such as the agents perception on how well it is
+ two-way street, we provide an Agent Monitor class in Unity which can display
+ aspects of the trained Agent, such as the Agents perception on how well it is
doing (called **value estimates**) within the Unity environment itself. By
leveraging Unity as a visualization tool and providing these outputs in
- real-time, researchers and developers can more easily debug an agent’s
+ real-time, researchers and developers can more easily debug an Agent’s
behavior. You can learn more about using the Monitor class
[here](Feature-Monitor.md).
diff --git a/docs/Migrating.md b/docs/Migrating.md
index 47f7a68984..04768c717f 100644
--- a/docs/Migrating.md
+++ b/docs/Migrating.md
@@ -12,9 +12,12 @@
### Unity API
-* Discrete Actions now use [branches](https://arxiv.org/abs/1711.08946). You can now specify concurrent discrete
- actions. You will need to update the Brain Parameters in the Brain Inspector
- in all your environments that use discrete actions. Refer to the [discrete action documentation](Learning-Environment-Design-Agents.md#discrete-action-space) for more information.
+* Discrete Actions now use [branches](https://arxiv.org/abs/1711.08946). You can
+ now specify concurrent discrete actions. You will need to update the Brain
+ Parameters in the Brain Inspector in all your environments that use discrete
+ actions. Refer to the
+ [discrete action documentation](Learning-Environment-Design-Agents.md#discrete-action-space)
+ for more information.
### Python API
@@ -61,9 +64,9 @@
### Python API
-* We've changed some of the python packages dependencies in requirement.txt
- file. Make sure to run `pip install .` within your `ml-agents/python` folder
- to update your python packages.
+* We've changed some of the Python packages dependencies in requirement.txt
+ file. Make sure to run `pip3 install .` within your `ml-agents/python` folder
+ to update your Python packages.
## Migrating from ML-Agents toolkit v0.2 to v0.3
@@ -83,7 +86,7 @@ in order to ensure a smooth transition.
replaced with a single `learn.py` script as the launching point for training
with ML-Agents. For more information on using `learn.py`, see
[here](Training-ML-Agents.md#training-with-mlagents-learn).
-* Hyperparameters for training brains are now stored in the
+* Hyperparameters for training Brains are now stored in the
`trainer_config.yaml` file. For more information on using this file, see
[here](Training-ML-Agents.md#training-config-file).
@@ -100,7 +103,7 @@ in order to ensure a smooth transition.
* `AgentStep()` has been replaced by `AgentAction()`.
* `WaitTime()` has been removed.
* The `Frame Skip` field of the Academy is replaced by the Agent's `Decision
- Frequency` field, enabling agent to make decisions at different frequencies.
+ Frequency` field, enabling the Agent to make decisions at different frequencies.
* The names of the inputs in the Internal Brain have been changed. You must
replace `state` with `vector_observation` and `observation` with
`visual_observation`. In addition, you must remove the `epsilon` placeholder.
diff --git a/docs/Python-API.md b/docs/Python-API.md
new file mode 100644
index 0000000000..9e993aced2
--- /dev/null
+++ b/docs/Python-API.md
@@ -0,0 +1,149 @@
+# Unity ML-Agents Python Interface and Trainers
+
+The `mlagents` Python package is part of the [ML-Agents
+Toolkit](https://github.com/Unity-Technologies/ml-agents). `mlagents` provides a
+Python API that allows direct interaction with the Unity game engine as well as
+a collection of trainers and algorithms to train agents in Unity environments.
+
+The `mlagents` Python package contains two components: a low level API which
+allows you to interact directly with a Unity Environment (`mlagents.envs`) and
+an entry point to train (`mlagents-learn`) which allows you to train agents in
+Unity Environments using our implementations of reinforcement learning or
+imitation learning.
+
+## mlagents.envs
+
+The ML-Agents Toolkit provides a Python API for controlling the Agent simulation
+loop of an environment or game built with Unity. This API is used by the
+training algorithms inside the ML-Agent Toolkit, but you can also write your own
+Python programs using this API.
+
+The key objects in the Python API include:
+
+- **UnityEnvironment** — the main interface between the Unity application and
+ your code. Use UnityEnvironment to start and control a simulation or training
+ session.
+- **BrainInfo** — contains all the data from Agents in the simulation, such as
+ observations and rewards.
+- **BrainParameters** — describes the data elements in a BrainInfo object. For
+ example, provides the array length of an observation in BrainInfo.
+
+These classes are all defined in the `ml-agents/mlagents/envs` folder of
+the ML-Agents SDK.
+
+To communicate with an Agent in a Unity environment from a Python program, the
+Agent must either use an **External** Brain or use a Brain that is broadcasting
+(has its **Broadcast** property set to true). Your code is expected to return
+actions for Agents with external Brains, but can only observe broadcasting
+Brains (the information you receive for an Agent is the same in both cases).
+
+_Notice: Currently communication between Unity and Python takes place over an
+open socket without authentication. As such, please make sure that the network
+where training takes place is secure. This will be addressed in a future
+release._
+
+### Loading a Unity Environment
+
+Python-side communication happens through `UnityEnvironment` which is located in
+`ml-agents/mlagents/envs`. To load a Unity environment from a built binary
+file, put the file in the same directory as `envs`. For example, if the filename
+of your Unity environment is 3DBall.app, in python, run:
+
+```python
+from mlagents.env import UnityEnvironment
+env = UnityEnvironment(file_name="3DBall", worker_id=0, seed=1)
+```
+
+- `file_name` is the name of the environment binary (located in the root
+ directory of the python project).
+- `worker_id` indicates which port to use for communication with the
+ environment. For use in parallel training regimes such as A3C.
+- `seed` indicates the seed to use when generating random numbers during the
+ training process. In environments which do not involve physics calculations,
+ setting the seed enables reproducible experimentation by ensuring that the
+ environment and trainers utilize the same random seed.
+
+If you want to directly interact with the Editor, you need to use
+`file_name=None`, then press the :arrow_forward: button in the Editor when the
+message _"Start training by pressing the Play button in the Unity Editor"_ is
+displayed on the screen
+
+### Interacting with a Unity Environment
+
+A BrainInfo object contains the following fields:
+
+- **`visual_observations`** : A list of 4 dimensional numpy arrays. Matrix n of
+ the list corresponds to the nth observation of the Brain.
+- **`vector_observations`** : A two dimensional numpy array of dimension `(batch
+ size, vector observation size)`.
+- **`text_observations`** : A list of string corresponding to the Agents text
+ observations.
+- **`memories`** : A two dimensional numpy array of dimension `(batch size,
+ memory size)` which corresponds to the memories sent at the previous step.
+- **`rewards`** : A list as long as the number of Agents using the Brain
+ containing the rewards they each obtained at the previous step.
+- **`local_done`** : A list as long as the number of Agents using the Brain
+ containing `done` flags (whether or not the Agent is done).
+- **`max_reached`** : A list as long as the number of Agents using the Brain
+ containing true if the Agents reached their max steps.
+- **`agents`** : A list of the unique ids of the Agents using the Brain.
+- **`previous_actions`** : A two dimensional numpy array of dimension `(batch
+ size, vector action size)` if the vector action space is continuous and
+ `(batch size, number of branches)` if the vector action space is discrete.
+
+Once loaded, you can use your UnityEnvironment object, which referenced by a
+variable named `env` in this example, can be used in the following way:
+
+- **Print : `print(str(env))`**
+ Prints all parameters relevant to the loaded environment and the external
+ Brains.
+- **Reset : `env.reset(train_model=True, config=None)`**
+ Send a reset signal to the environment, and provides a dictionary mapping
+ Brain names to BrainInfo objects.
+ - `train_model` indicates whether to run the environment in train (`True`) or
+ test (`False`) mode.
+ - `config` is an optional dictionary of configuration flags specific to the
+ environment. For generic environments, `config` can be ignored. `config` is
+ a dictionary of strings to floats where the keys are the names of the
+ `resetParameters` and the values are their corresponding float values.
+ Define the reset parameters on the Academy Inspector window in the Unity
+ Editor.
+- **Step : `env.step(action, memory=None, text_action=None)`**
+ Sends a step signal to the environment using the actions. For each Brain :
+ - `action` can be one dimensional arrays or two dimensional arrays if you have
+ multiple Agents per Brain.
+ - `memory` is an optional input that can be used to send a list of floats per
+ Agents to be retrieved at the next step.
+ - `text_action` is an optional input that be used to send a single string per
+ Agent.
+
+ Returns a dictionary mapping Brain names to BrainInfo objects.
+
+ For example, to access the BrainInfo belonging to a Brain called
+ 'brain_name', and the BrainInfo field 'vector_observations':
+
+ ```python
+ info = env.step()
+ brainInfo = info['brain_name']
+ observations = brainInfo.vector_observations
+ ```
+
+ Note that if you have more than one external Brain in the environment, you
+ must provide dictionaries from Brain names to arrays for `action`, `memory`
+ and `value`. For example: If you have two external Brains named `brain1` and
+ `brain2` each with one Agent taking two continuous actions, then you can
+ have:
+
+ ```python
+ action = {'brain1':[1.0, 2.0], 'brain2':[3.0,4.0]}
+ ```
+
+ Returns a dictionary mapping Brain names to BrainInfo objects.
+- **Close : `env.close()`**
+ Sends a shutdown signal to the environment and closes the communication
+ socket.
+
+## mlagents-learn
+
+For more detailed documentation on using `mlagents-learn`, check out
+[Training ML-Agents](Training-ML-Agents.md)
diff --git a/docs/Readme.md b/docs/Readme.md
index 07b073eece..088cbf6004 100644
--- a/docs/Readme.md
+++ b/docs/Readme.md
@@ -52,5 +52,5 @@
## API Docs
* [API Reference](API-Reference.md)
-* [How to use the Python API](../ml-agents/README.md)
-* [Wrapping Learning Environment as a Gym](../gym-unity/Readme.md)
\ No newline at end of file
+* [How to use the Python API](Python-API.md)
+* [Wrapping Learning Environment as a Gym](../gym-unity/Readme.md)
diff --git a/docs/Training-Curriculum-Learning.md b/docs/Training-Curriculum-Learning.md
index 4ee0cc7032..57a174140b 100644
--- a/docs/Training-Curriculum-Learning.md
+++ b/docs/Training-Curriculum-Learning.md
@@ -33,7 +33,7 @@ accomplish tasks otherwise much more difficult.
Each Brain in an environment can have a corresponding curriculum. These
curriculums are held in what we call a metacurriculum. A metacurriculum allows
-different brains to follow different curriculums within the same environment.
+different Brains to follow different curriculums within the same environment.
### Specifying a Metacurriculum
@@ -90,11 +90,11 @@ the BigWallBrain in the Wall Jump environment.
measure by previous values.
* If `true`, weighting will be 0.75 (new) 0.25 (old).
* `parameters` (dictionary of key:string, value:float array) - Corresponds to
- academy reset parameters to control. Length of each array should be one
+ Academy reset parameters to control. Length of each array should be one
greater than number of thresholds.
Once our curriculum is defined, we have to use the reset parameters we defined
-and modify the environment from the agent's `AgentReset()` function. See
+and modify the environment from the Agent's `AgentReset()` function. See
[WallJumpAgent.cs](https://github.com/Unity-Technologies/ml-agents/blob/master/UnitySDK/Assets/ML-Agents/Examples/WallJump/Scripts/WallJumpAgent.cs)
for an example. Note that if the Academy's __Max Steps__ is not set to some
positive number the environment will never be reset. The Academy must reset
@@ -102,7 +102,7 @@ for the environment to reset.
We will save this file into our metacurriculum folder with the name of its
corresponding Brain. For example, in the Wall Jump environment, there are two
-brains---BigWallBrain and SmallWallBrain. If we want to define a curriculum for
+Brains---BigWallBrain and SmallWallBrain. If we want to define a curriculum for
the BigWallBrain, we will save `BigWallBrain.json` into
`curricula/wall-jump/`.
diff --git a/docs/Training-Imitation-Learning.md b/docs/Training-Imitation-Learning.md
index ea2630fde4..f5f89068ae 100644
--- a/docs/Training-Imitation-Learning.md
+++ b/docs/Training-Imitation-Learning.md
@@ -23,33 +23,33 @@ Machine Learning tasks work.
1. In order to use imitation learning in a scene, the first thing you will need
is to create two Brains, one which will be the "Teacher," and the other which
- will be the "Student." We will assume that the names of the brain
+ will be the "Student." We will assume that the names of the Brain
`GameObject`s are "Teacher" and "Student" respectively.
-2. Set the "Teacher" brain to Player mode, and properly configure the inputs to
+2. Set the "Teacher" Brain to Player mode, and properly configure the inputs to
map to the corresponding actions. **Ensure that "Broadcast" is checked within
the Brain inspector window.**
-3. Set the "Student" brain to External mode.
-4. Link the brains to the desired agents (one agent as the teacher and at least
- one agent as a student).
-5. In `config/trainer_config.yaml`, add an entry for the "Student" brain. Set
+3. Set the "Student" Brain to External mode.
+4. Link the Brains to the desired Agents (one Agent as the teacher and at least
+ one Agent as a student).
+5. In `config/trainer_config.yaml`, add an entry for the "Student" Brain. Set
the `trainer` parameter of this entry to `imitation`, and the
- `brain_to_imitate` parameter to the name of the teacher brain: "Teacher".
+ `brain_to_imitate` parameter to the name of the teacher Brain: "Teacher".
Additionally, set `batches_per_epoch`, which controls how much training to do
each moment. Increase the `max_steps` option if you'd like to keep training
- the agents for a longer period of time.
+ the Agents for a longer period of time.
6. Launch the training process with `mlagents-learn config/trainer_config.yaml
--train --slow`, and press the :arrow_forward: button in Unity when the
message _"Start training by pressing the Play button in the Unity Editor"_ is
displayed on the screen
-7. From the Unity window, control the agent with the Teacher brain by providing
+7. From the Unity window, control the Agent with the Teacher Brain by providing
"teacher demonstrations" of the behavior you would like to see.
-8. Watch as the agent(s) with the student brain attached begin to behave
+8. Watch as the Agent(s) with the student Brain attached begin to behave
similarly to the demonstrations.
-9. Once the Student agents are exhibiting the desired behavior, end the training
+9. Once the Student Agents are exhibiting the desired behavior, end the training
process with `CTL+C` from the command line.
10. Move the resulting `*.bytes` file into the `TFModels` subdirectory of the
Assets folder (or a subdirectory within Assets of your choosing) , and use
- with `Internal` brain.
+ with `Internal` Brain.
### BC Teacher Helper
diff --git a/docs/Training-ML-Agents.md b/docs/Training-ML-Agents.md
index 110cd01ded..17e7966fa6 100644
--- a/docs/Training-ML-Agents.md
+++ b/docs/Training-ML-Agents.md
@@ -22,7 +22,7 @@ with the `mlagents` package and its implementation can be found at
`ml-agents/mlagents/trainers/learn.py`. The [configuration file](#training-config-file),
`config/trainer_config.yaml` specifies the hyperparameters used during training.
You can edit this file with a text editor to add a specific configuration for
-each brain.
+each Brain.
For a broader overview of reinforcement learning, imitation learning and the
ML-Agents training process, see [ML-Agents Toolkit
@@ -48,7 +48,7 @@ mlagents-learn --env= --run-id=
where
-* `` is the filepath of the trainer configuration yaml.
+* `` is the file path of the trainer configuration yaml.
* ``__(Optional)__ is the name (including path) of your Unity
executable containing the agents to be trained. If `` is not passed,
the training will happen in the Editor. Press the :arrow_forward: button in
@@ -63,7 +63,7 @@ contains agents ready to train. To perform the training:
1. [Build the project](Learning-Environment-Executable.md), making sure that you
only include the training scene.
2. Open a terminal or console window.
-3. Navigate to the ml-agents `python` folder.
+3. Navigate to the directory where you installed the ML-Agents Toolkit.
4. Run the following to launch the training process using the path to the Unity
environment you built in step 1:
@@ -75,8 +75,7 @@ During a training session, the training program prints out and saves updates at
regular intervals (specified by the `summary_freq` option). The saved statistics
are grouped by the `run-id` value so you should assign a unique id to each
training run if you plan to view the statistics. You can view these statistics
-using TensorBoard during or after training by running the following command
-(from the ML-Agents python directory):
+using TensorBoard during or after training by running the following command:
```sh
tensorboard --logdir=summaries
@@ -159,8 +158,8 @@ after the GameObject containing the Brain component that should use these
settings. (This GameObject will be a child of the Academy in your scene.)
Sections for the example environments are included in the provided config file.
-| ** Setting ** | **Description** | **Applies To Trainer**|
-| :-- | :-- | :-- |
+| **Setting** | **Description** | **Applies To Trainer**|
+| :-- | :-- | :-- |
| batch_size | The number of experiences in each iteration of gradient descent.| PPO, BC |
| batches_per_epoch | In imitation learning, the number of batches of training examples to collect before training the model.| BC |
| beta | The strength of entropy regularization.| PPO, BC |
diff --git a/docs/Training-PPO.md b/docs/Training-PPO.md
index 3c6100c55f..7f9047ff5f 100644
--- a/docs/Training-PPO.md
+++ b/docs/Training-PPO.md
@@ -223,7 +223,7 @@ into the training process.
### Entropy
-This corresponds to how random the decisions of a brain are. This should
+This corresponds to how random the decisions of a Brain are. This should
consistently decrease during training. If it decreases too soon or not at all,
`beta` should be adjusted (when using discrete action space).
diff --git a/docs/Training-on-Amazon-Web-Service.md b/docs/Training-on-Amazon-Web-Service.md
index b762b86bf1..782a97cb3c 100644
--- a/docs/Training-on-Amazon-Web-Service.md
+++ b/docs/Training-on-Amazon-Web-Service.md
@@ -61,7 +61,7 @@ After launching your EC2 instance using the ami and ssh into it:
source activate python3
```
-2. Clone the ML-Agents repo and install the required python packages
+2. Clone the ML-Agents repo and install the required Python packages
```sh
git clone https://github.com/Unity-Technologies/ml-agents.git
diff --git a/docs/Training-on-Microsoft-Azure.md b/docs/Training-on-Microsoft-Azure.md
index 6aa60099cc..6651fc89b2 100644
--- a/docs/Training-on-Microsoft-Azure.md
+++ b/docs/Training-on-Microsoft-Azure.md
@@ -15,7 +15,7 @@ into your Azure subscription. Once your VM is deployed, SSH into it and run the
following command to complete dependency installation:
```sh
-pip install docopt
+pip3 install docopt
```
Note that, if you choose to deploy the image to an
@@ -66,8 +66,8 @@ To run your training on the VM:
1. [Move](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/copy-files-to-linux-vm-using-scp)
your built Unity application to your Virtual Machine.
-2. Set the `ml-agents` sub-folder of the ml-agents repo to your working
- directory.
+2. Set the the directory where the ML-Agents Toolkit was installed to your
+ working directory.
3. Run the following command:
```sh
@@ -102,10 +102,9 @@ training](Using-Tensorboard.md).
2. Unless you started the training as a background process, connect to your VM
from another terminal instance.
-3. Set the `python` folder in ml-agents to your current working directory.
-4. Run the following command from your `tensorboard --logdir=summaries --host
- 0.0.0.0`
-5. You should now be able to open a browser and navigate to
+3. Run the following command from your terminal
+ `tensorboard --logdir=summaries --host 0.0.0.0`
+4. You should now be able to open a browser and navigate to
`:6060` to view the TensorBoard report.
## Running on Azure Container Instances
@@ -116,6 +115,6 @@ then be shut down. This ensures you aren't leaving a billable VM running when
it isn't needed. You can read more about
[The ML-Agents toolkit support for Docker containers here](Using-Docker.md).
Using ACI enables you to offload training of your models without needing to
-install Python and Tensorflow on your own computer. You can find instructions,
+install Python and TensorFlow on your own computer. You can find instructions,
including a pre-deployed image in DockerHub for you to use, available
[here](https://github.com/druttka/unity-ml-on-azure).
diff --git a/docs/Using-TensorFlow-Sharp-in-Unity.md b/docs/Using-TensorFlow-Sharp-in-Unity.md
index c2b38868e4..30a2fc3bf6 100644
--- a/docs/Using-TensorFlow-Sharp-in-Unity.md
+++ b/docs/Using-TensorFlow-Sharp-in-Unity.md
@@ -64,7 +64,7 @@ You can have additional placeholders for float or integers but they must be
placed in placeholders of dimension 1 and size 1. (Be sure to name them.)
It is important that the inputs and outputs of the graph are exactly the ones
-you receive and return when training your model with an `External` brain. This
+you receive and return when training your model with an `External` Brain. This
means you cannot have any operations such as reshaping outside of the graph. The
object you get by calling `step` or `reset` has fields `vector_observations`,
`visual_observations` and `memories` which must correspond to the placeholders
@@ -94,7 +94,7 @@ both the graph and associated weights. Note that you must save your graph as a
.bytes file so Unity can load it.
In the Unity Editor, you must specify the names of the nodes used by your graph
-in the **Internal** brain Inspector window. If you used a scope when defining
+in the **Internal** Brain Inspector window. If you used a scope when defining
your graph, specify it in the `Graph Scope` field.

@@ -103,8 +103,8 @@ See
[Internal Brain](Learning-Environment-Design-External-Internal-Brains.md#internal-brain)
for more information about using Internal Brains.
-If you followed these instructions well, the agents in your environment that use
-this brain will use your fully trained network to make decisions.
+If you followed these instructions well, the Agents in your environment that use
+this Brain will use your fully trained network to make decisions.
## iOS additional instructions for building
diff --git a/docs/Using-Tensorboard.md b/docs/Using-Tensorboard.md
index e309c9b2f5..a97313705c 100644
--- a/docs/Using-Tensorboard.md
+++ b/docs/Using-Tensorboard.md
@@ -11,10 +11,12 @@ In order to observe the training process, either during training or afterward,
start TensorBoard:
1. Open a terminal or console window:
-2. Navigate to the ml-agents/python folder.
+2. Navigate to the directory where the ML-Agents Toolkit is installed.
3. From the command line run :
- tensorboard --logdir=summaries
+ ```sh
+ tensorboard --logdir=summaries
+ ```
4. Open a browser window and navigate to [localhost:6006](http://localhost:6006).
@@ -34,7 +36,7 @@ When you run the training program, `mlagents-learn`, you can use the
## The ML-Agents toolkit training statistics
-The ML-agents training program saves the following statistics:
+The ML-Agents training program saves the following statistics:

diff --git a/docs/localized/zh-CN/docs/Getting-Started-with-Balance-Ball.md b/docs/localized/zh-CN/docs/Getting-Started-with-Balance-Ball.md
index 29a402b803..58b2798a3e 100755
--- a/docs/localized/zh-CN/docs/Getting-Started-with-Balance-Ball.md
+++ b/docs/localized/zh-CN/docs/Getting-Started-with-Balance-Ball.md
@@ -42,7 +42,7 @@ Inspector 窗口。Inspector 会显示游戏对象上的每个组件。
在打开 3D Balance Ball 场景后,您可能会首先注意到它包含的
不是一个平台,而是多个平台。场景中的每个平台都是
-独立的 agent,但它们全部共享同一个 brain。3D Balance Ball 通过
+独立的 agent,但它们全部共享同一个 Brain。3D Balance Ball 通过
这种方式可以加快训练速度,因为所有 12 个 agent 可以并行参与训练任务。
### Academy
@@ -86,16 +86,16 @@ Academy 的子级。)3D Balance Ball 环境中的所有 agent 使用
Brain 不存储关于 agent 的任何信息,
只是将 agent 收集的观测结果发送到决策过程,
然后将所选的动作返回给 agent。因此,所有 agent 可共享
-同一个 brain,但会独立行动。Brain 设置可以提供很多
+同一个 Brain,但会独立行动。Brain 设置可以提供很多
关于 agent 工作方式的信息。
**Brain Type** 决定了 agent 如何决策。
**External** 和 **Internal** 类型需要协同使用:训练 agent 时使用 **External**,
而在采用经过训练的模型时使用 **Internal**。
-**Heuristic** brain 允许您通过扩展 Decision 类来对 agent 的逻辑进行
-手动编码。最后,**Player** brain 可让您将键盘命令
+**Heuristic** Brain 允许您通过扩展 Decision 类来对 agent 的逻辑进行
+手动编码。最后,**Player** Brain 可让您将键盘命令
映射到动作,这样在测试 agent 和环境时
-会非常有用。如果这些类型的 brain 都不能满足您的需求,您可以
+会非常有用。如果这些类型的 Brain 都不能满足您的需求,您可以
实现自己的 CoreBrain 来创建自有的类型。
在本教程中,进行训练时,需要将 **Brain Type** 设置为 **External**;
@@ -120,12 +120,12 @@ Brain 不存储关于 agent 的任何信息,
**向量运动空间**
-brain 以*动作*的形式向 agent 提供指令。与状态
+Brain 以*动作*的形式向 agent 提供指令。与状态
一样,ML-Agents 将动作分为两种类型:**Continuous**
向量运动空间是一个可以连续变化的数字向量。向量
每个元素的含义都是由 agent 逻辑定义的(PPO 训练过程是一个了解agent的哪种状态更好的过程,这个过程是通过学习不同agent的不同状态会对应多少奖励来实现的)。
例如,一个元素可能表示施加到 agent 某个
-`RigidBody` 上的力或扭矩。**Discrete** 向量运动空间将其动作
+`Rigidbody` 上的力或扭矩。**Discrete** 向量运动空间将其动作
定义为一个表。提供给 agent 的具体动作是这个表的
索引。
@@ -142,9 +142,9 @@ Agent 是在环境中进行观测并采取动作的参与者。
平台游戏对象上。基础 Agent 对象有一些影响其行为的
属性:
-* **Brain** — 每个 Agent 必须有一个 Brain。brain 决定了 agent 如何
+* **Brain** — 每个 Agent 必须有一个 Brain。Brain 决定了 agent 如何
决策。3D Balance Ball 场景中的所有 agent 共享同一个
-brain。
+Brain。
* **Visual Observations** — 定义 agent 用来观测其环境的
任何 Camera 对象。3D Balance Ball 不使用摄像机观测。
* **Max Step** — 定义在 agent 决定自己完成之前可以发生多少个
@@ -167,7 +167,7 @@ Ball3DAgent 子类定义了以下方法:
agent 的 Brain 实例设置为状态大小为 8 的连续向量观测空间,
因此 `CollectObservations()` 必须调用 8 次
`AddVectorObs`。
-* Agent.AgentAction() — 在每个模拟步骤调用。接收 brain 选择的
+* Agent.AgentAction() — 在每个模拟步骤调用。接收 Brain 选择的
动作。Ball3DAgent 示例可以处理连续和离散
运动空间类型。在此环境中,两种状态类型之间实际上
没有太大的差别:这两种向量运动空间在每一步都会
@@ -195,7 +195,7 @@ Unity 场景:

由于我们要建立此环境来进行训练,因此我们需要
-将 agent 使用的 brain 设置为 **External**。这样 agent 在
+将 agent 使用的 Brain 设置为 **External**。这样 agent 在
进行决策时能够与外部训练过程进行通信。
1. 在 **Scene** 窗口中,单击 Ball3DAcademy 对象旁边的三角形
@@ -310,7 +310,7 @@ python3 python/learn.py --run-id= --train
一旦训练过程完成,并且训练过程保存了模型
(通过 `Saved Model` 消息可看出),您便可以将该模型添加到 Unity 项目中,
-然后将其用于 brain 类型为 **Internal** 的 agent。
+然后将其用于 Brain 类型为 **Internal** 的 agent。
### 设置 TensorFlowSharp 支持
@@ -320,7 +320,7 @@ python3 python/learn.py --run-id= --train
1. 确保 TensorFlowSharp 插件位于 `Assets` 文件夹中。
可在
-[此处](https://s3.amazonaws.com/unity-ml-agents/0.3/TFSharpPlugin.unitypackage)下载一个包含 TF# 的 Plugins 文件夹。
+[此处](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage)下载一个包含 TF# 的 Plugins 文件夹。
下载后,双击并将其导入。您可以在 Project 选项卡中
(位于 `Assets` > `ML-Agents` > `Plugins` > `Computer` 下)
检查 TensorFlow 的相关文件来查看是否安装成功
diff --git a/docs/localized/zh-CN/docs/Installation.md b/docs/localized/zh-CN/docs/Installation.md
index c6810264ff..4dda85d049 100755
--- a/docs/localized/zh-CN/docs/Installation.md
+++ b/docs/localized/zh-CN/docs/Installation.md
@@ -40,7 +40,7 @@ Unity Assets。`python` 目录包含训练代码。
### Mac 和 Unix 用户
-如果您的 Python 环境不包括 `pip`,请参阅这些
+如果您的 Python 环境不包括 `pip3`,请参阅这些
[说明](https://packaging.python.org/guides/installing-using-linux-tools/#installing-pip-setuptools-wheel-with-linux-package-managers)
以了解其安装方法。
@@ -56,7 +56,7 @@ Unity Assets。`python` 目录包含训练代码。
## Unity 包
-您可以通过 Unity 包的形式下载TensorFlowSharp 插件([AWS S3链接](https://s3.amazonaws.com/unity-ml-agents/0.3/TFSharpPlugin.unitypackage),[百度盘链接](https://pan.baidu.com/s/1s0mJN8lvuxTcYbs2kL2FqA))
+您可以通过 Unity 包的形式下载TensorFlowSharp 插件([AWS S3链接](https://s3.amazonaws.com/unity-ml-agents/0.4/TFSharpPlugin.unitypackage),[百度盘链接](https://pan.baidu.com/s/1s0mJN8lvuxTcYbs2kL2FqA))
## 帮助
diff --git a/docs/localized/zh-CN/docs/Learning-Environment-Create-New.md b/docs/localized/zh-CN/docs/Learning-Environment-Create-New.md
index 541bc5485d..efee4fa1ea 100755
--- a/docs/localized/zh-CN/docs/Learning-Environment-Create-New.md
+++ b/docs/localized/zh-CN/docs/Learning-Environment-Create-New.md
@@ -256,7 +256,7 @@ Agent 代码的最后一部分是 Agent.AgentAction() 函数,此函数接收 B
**动作**
-Brain 的决策以动作数组的形式传递给 `AgentAction()` 函数。此数组中的元素数量由 agent 的 Brain 的 `Vector Action Space Type` 和 `Vector Action Space Size` 设置确定。RollerAgent 使用连续向量运动空间,并需要 brain 提供的两个连续控制信号。因此,我们要将 Brain `Vector Action Size` 设置为 2。第一个元素 `action[0]` 确定沿 x 轴施加的力;`action[1]` 确定沿 z 轴施加的力。(如果我们允许 agent 以三维方式移动,那么我们需要将 `Vector Action Size` 设置为 3。)注意,Brain 并不知道动作数组中的值是什么意思。训练过程只是根据观测输入来调整动作值,然后看看会得到什么样的奖励。
+Brain 的决策以动作数组的形式传递给 `AgentAction()` 函数。此数组中的元素数量由 agent 的 Brain 的 `Vector Action Space Type` 和 `Vector Action Space Size` 设置确定。RollerAgent 使用连续向量运动空间,并需要 Brain 提供的两个连续控制信号。因此,我们要将 Brain `Vector Action Size` 设置为 2。第一个元素 `action[0]` 确定沿 x 轴施加的力;`action[1]` 确定沿 z 轴施加的力。(如果我们允许 agent 以三维方式移动,那么我们需要将 `Vector Action Size` 设置为 3。)注意,Brain 并不知道动作数组中的值是什么意思。训练过程只是根据观测输入来调整动作值,然后看看会得到什么样的奖励。
RollerAgent 使用 `Rigidbody.AddForce` 函数将 action[] 数组中的值应用到其 Rigidbody 组件 `rBody`:
@@ -392,7 +392,7 @@ public override void AgentAction(float[] vectorAction, string textAction)
1. 选择 Brain 游戏对象以便在 Inspector 中查看该对象的属性。
2. 将 **Brain Type** 设置为 **Player**。
-3. 展开 **Continuous Player Actions**(仅在使用 **Player* brain 时可见)。
+3. 展开 **Continuous Player Actions**(仅在使用 **Player* Brain 时可见)。
4. 将 **Size** 设置为 4。
5. 设置以下映射:
diff --git a/docs/localized/zh-CN/docs/Learning-Environment-Design.md b/docs/localized/zh-CN/docs/Learning-Environment-Design.md
index 7dcdafbb48..3e6be8f3b7 100755
--- a/docs/localized/zh-CN/docs/Learning-Environment-Design.md
+++ b/docs/localized/zh-CN/docs/Learning-Environment-Design.md
@@ -10,7 +10,7 @@ ML-Agents 使用一种称为 [Proximal Policy Optimization (PPO)](https://blog.o
训练和模拟过程以 ML-Agents Academy 类编排的步骤进行。Academy 与场景中的 Agent 和 Brain 对象一起协作逐步完成模拟。当 Academy 已达到其最大步数或场景中的所有 agent 均_完成_时,一个训练场景即完成。
-在训练期间,处于外部的 Python 进程会在训练过程中与 Academy 不断进行通信以便运行一系列场景,同时会收集数据并优化其神经网络模型。分配给 agent 的 Brain 类型决定了我们是否进行训练。**External** brain 会与外部过程进行通信以训练 TensorFlow 模型。成功完成训练后,您可以将经过训练的模型文件添加到您的 Unity 项目中,以便提供给 **Internal** brain 来控制agent的行为。
+在训练期间,处于外部的 Python 进程会在训练过程中与 Academy 不断进行通信以便运行一系列场景,同时会收集数据并优化其神经网络模型。分配给 agent 的 Brain 类型决定了我们是否进行训练。**External** Brain 会与外部过程进行通信以训练 TensorFlow 模型。成功完成训练后,您可以将经过训练的模型文件添加到您的 Unity 项目中,以便提供给 **Internal** Brain 来控制agent的行为。
ML-Agents Academy 类按如下方式编排 agent 模拟循环:
@@ -19,7 +19,7 @@ ML-Agents Academy 类按如下方式编排 agent 模拟循环:
3. 对场景中的每个 agent 调用 `CollectObservations()` 函数。
4. 使用每个 agent 的 Brain 类来决定 agent 的下一动作。
5. 调用您的子类的 `AcademyAct()` 函数。
-6. 对场景中的每个 agent 调用 `AgentAction()` 函数,传入由 agent 的 brain 选择的动作。(如果 agent 已完成,则不调用此函数。)
+6. 对场景中的每个 agent 调用 `AgentAction()` 函数,传入由 agent 的 Brain 选择的动作。(如果 agent 已完成,则不调用此函数。)
7. 如果 agent 已达到其 `Max Step` 计数或者已将其自身标记为 `done`,则调用 agent 的 `AgentOnDone()` 函数。或者,如果某个 agent 在场景结束之前已完成,您可以将其设置为重新开始。在这种情况下,Academy 会调用 `AgentReset()` 函数。
8. 当 Academy 达到其自身的 `Max Step` 计数时,它会通过调用您的 Academy 子类的 `AcademyReset()` 函数来再次开始下一场景。
@@ -33,7 +33,7 @@ ML-Agents Academy 类按如下方式编排 agent 模拟循环:
[Screenshot of scene hierarchy]
-您必须为每个 agent 分配一个 brain,但可以在多个 agent 之间共享 brain。每个 agent 都将进行自己的观测并独立行动,但会使用相同的决策逻辑,而对于 **Internal** brain,则会使用相同的经过训练的 TensorFlow 模型。
+您必须为每个 agent 分配一个 Brain,但可以在多个 agent 之间共享 Brain。每个 agent 都将进行自己的观测并独立行动,但会使用相同的决策逻辑,而对于 **Internal** Brain,则会使用相同的经过训练的 TensorFlow 模型。
### Academy
@@ -53,9 +53,9 @@ Academy 基类还定义了若干可以在 Unity Editor Inspector 中设置的重
Brain 内部封装了决策过程。Brain 对象必须放在 Hierarchy 视图中的 Academy 的子级。我们必须为每个 Agent 分配一个 Brain,但可以在多个 Agent 之间共享同一个 Brain。
-当我们使用 Brain 类的时候不需要使用其子类,而应该直接使用 Brain 这个类。Brain 的行为取决于 brain 的类型。在训练期间,应将 agent 上连接的 Brain 的 Brain Type 设置为 **External**。要使用经过训练的模型,请将模型文件导入 Unity 项目,并将对应 Brain 的 Brain Type 更改为 **Internal**。请参阅 [Brain](/docs/Learning-Environment-Design-Brains.md) 以了解有关使用不同类型的 Brain 的详细信息。如果四种内置的类型不能满足您的需求,您可以扩展 CoreBrain 类以创建其它的 Brain 类型。
+当我们使用 Brain 类的时候不需要使用其子类,而应该直接使用 Brain 这个类。Brain 的行为取决于 Brain 的类型。在训练期间,应将 agent 上连接的 Brain 的 Brain Type 设置为 **External**。要使用经过训练的模型,请将模型文件导入 Unity 项目,并将对应 Brain 的 Brain Type 更改为 **Internal**。请参阅 [Brain](/docs/Learning-Environment-Design-Brains.md) 以了解有关使用不同类型的 Brain 的详细信息。如果四种内置的类型不能满足您的需求,您可以扩展 CoreBrain 类以创建其它的 Brain 类型。
-Brain 类有若干可以使用 Inspector 窗口进行设置的重要属性。对于使用 brain 的 agent,这些属性必须恰当。例如,`Vector Observation Space Size` 属性必须与 agent 创建的特征向量的长度完全匹配。请参阅 [Agent](/docs/Learning-Environment-Design-Agents.md) 以获取有关创建 agent 和正确设置 Brain 实例的信息。
+Brain 类有若干可以使用 Inspector 窗口进行设置的重要属性。对于使用 Brain 的 agent,这些属性必须恰当。例如,`Vector Observation Space Size` 属性必须与 agent 创建的特征向量的长度完全匹配。请参阅 [Agent](/docs/Learning-Environment-Design-Agents.md) 以获取有关创建 agent 和正确设置 Brain 实例的信息。
请参阅 [Brain](/docs/Learning-Environment-Design-Brains.md) 以查看 Brain 属性的完整列表。
@@ -66,7 +66,7 @@ Agent 类代表场景中负责收集观测结果并采取动作的一个参与
要创建 agent,请扩展 Agent 类并实现基本的 `CollectObservations()` 和 `AgentAction()` 方法:
* `CollectObservations()` — 收集 agent 对其环境的观测结果。
-* `AgentAction()` — 执行由 agent 的 brain 选择的动作,并为当前状态分配奖励。
+* `AgentAction()` — 执行由 agent 的 Brain 选择的动作,并为当前状态分配奖励。
这些函数的实现决定了分配给此 agent 的 Brain 的属性要如何设置。
@@ -83,7 +83,7 @@ ML-Agents 中的_环境_可以是 Unity 中构建的任何场景。Unity 场景
在 Unity 中创建训练环境时,必须设置场景以便可以通过外部训练过程来控制场景。注意以下几点:
* 在训练程序启动后,Unity 可执行文件会被自动打开,然后训练场景会自动开始训练。
-* 场景中至少须包括一个 **External** brain。
+* 场景中至少须包括一个 **External** Brain。
* Academy 必须在每一轮训练后将场景重置为有效的初始状态。
* 训练场景必须有明确的结束状态,为此需要使用 `Max Steps`,或让每个 agent 将自身设置为 `done`。
diff --git a/docs/localized/zh-CN/docs/Learning-Environment-Examples.md b/docs/localized/zh-CN/docs/Learning-Environment-Examples.md
index 718f491780..be1e772313 100644
--- a/docs/localized/zh-CN/docs/Learning-Environment-Examples.md
+++ b/docs/localized/zh-CN/docs/Learning-Environment-Examples.md
@@ -19,11 +19,11 @@ Unity ML-Agents 工具包中内置了一些搭建好的学习环境的示例,
* 训练环境:一种线性移动任务,在此任务中 agent 必须向左或向右移动到奖励状态。
* 目标:移动到最高奖励状态。
-* Agent设置:环境包含一个 agent,上面附带了单个 brain。
+* Agent设置:环境包含一个 agent,上面附带了单个 Brain。
* Agent 奖励函数设置:
* 达到次优状态时 +0.1。
* 达到最优状态时 +1.0。
-* Brain 设置:一个有以下观测/运动空间的 brain。
+* Brain 设置:一个有以下观测/运动空间的 Brain。
* 向量观测空间:(离散变量)一个变量,对应于当前状态。
* 向量运动空间:(离散变量)两个可能的动作(向左移动、向右移动)。
* 视觉观测:0
@@ -35,11 +35,11 @@ Unity ML-Agents 工具包中内置了一些搭建好的学习环境的示例,
* 训练环境:一种平衡球任务,在此任务中 agent 需要控制平台。
* 目标:agent 必须平衡平台,以尽可能长时间在平台上保持球不掉落。
-* Agent设置:环境包含 12 个全部链接到单个 brain 的同类 agent。
+* Agent设置:环境包含 12 个全部链接到单个 Brain 的同类 agent。
* Agent 奖励函数设置:
* 球在平台上保持不掉下的每一步都 +0.1。
* 球掉下平台时 -1.0。
-* Brain 设置:一个有以下观测/运动空间的 brain。
+* Brain 设置:一个有以下观测/运动空间的 Brain。
* 向量观测空间:(连续变量)8 个,对应于平台的旋转以及球的位置、旋转和速度。
* 向量观测空间(困难版本,因为观测到的信息减少了):(连续变量)5 个变量,对应于平台的旋转以及球的位置和旋转。
* 向量运动空间:(连续变量)2 个,其中一个值对应于 X 旋转,而另一个值对应于 Z 旋转。
@@ -52,12 +52,12 @@ Unity ML-Agents 工具包中内置了一些搭建好的学习环境的示例,
* 训练环境:某一个典型版本的的grid-world任务。场景包含 agent、目标和障碍。
* 目标:agent 必须在网格中避开障碍的同时移动到目标。
-* Agent设置:环境包含一个链接到单个 brain 的 agent。
+* Agent设置:环境包含一个链接到单个 Brain 的 agent。
* Agent 奖励函数设置:
* 每一步 -0.01。
* agent 导航到目标网格位置时 +1.0(场景结束)。
* agent 移动到障碍物时 -1.0(场景结束)。
-* Brain 设置:一个有以下观测/运动空间的 brain。
+* Brain 设置:一个有以下观测/运动空间的 Brain。
* 向量观测空间:无
* 向量运动空间:(离散变量)4 个,对应于基本方向的移动。
* 视觉观测:一个对应于 GridWorld 自上而下的视图。
@@ -70,11 +70,11 @@ Unity ML-Agents 工具包中内置了一些搭建好的学习环境的示例,
* 训练环境:agent 控制球拍将球弹过球网的双人游戏。
* 目标:agent 必须在彼此之间弹起网球,同时不能丢球或击球出界。
-* Agent设置:环境包含两个链接到单个 brain(名为 TennisBrain)的 agent。在训练之后,您可以将另一个名为 MyBrain 的 brain 附加到其中一个 agent,从而与经过训练的模型进行游戏比赛。
+* Agent设置:环境包含两个链接到单个 Brain TennisBrain)的 agent。在训练之后,您可以将另一个名为 MyBrain 的 Brain 附加到其中一个 agent,从而与经过训练的模型进行游戏比赛。
* Agent 奖励函数设置(agent互相之间独立):
* agent 击球过网时 +0.1。
* agent 让球落入自己的范围或者击球出界时 -0.1。
-* Brain 设置:一个有以下观测/运动空间的 brain。
+* Brain 设置:一个有以下观测/运动空间的 Brain。
* 向量观测空间:(连续变量)8 个,分别对应于球和球拍的位置和速度。
* 向量运动空间:(连续变量)2 个,分别对应于朝向球网或远离球网的运动,以及上下的运动。
* 视觉观测:无
@@ -86,11 +86,11 @@ Unity ML-Agents 工具包中内置了一些搭建好的学习环境的示例,
* 训练环境:一个平台,agent 可以在该平台上推动方块。
* 目标:agent 必须将方块推向目标。
-* Agent设置:环境包含一个链接到单个 brain 的 agent。
+* Agent设置:环境包含一个链接到单个 Brain 的 agent。
* Agent 奖励函数设置:
* 每一步 -0.0025。
* 方块接触到目标时 +1.0。
-* Brain 设置:一个有以下观测/运动空间的 brain。
+* Brain 设置:一个有以下观测/运动空间的 Brain。
* 向量观测空间:(连续变量)15 个,分别对应于 agent、方块和目标的位置和速度。
* 向量运动空间:(连续变量)2 个,分别对应于 X 和 Z 方向的移动。
* 视觉观测:无。
@@ -102,12 +102,12 @@ Unity ML-Agents 工具包中内置了一些搭建好的学习环境的示例,
* 训练环境:一个平台环境,agent 可以在该环境中跳过墙。
* 目标:agent 必须利用一个方块越过墙并到达目标。
-* Agent设置:环境包含一个链接到两个不同 brain 的 agent。agent 链接到的 brain 根据墙的高度而变化。
+* Agent设置:环境包含一个链接到两个不同 Brain 的 agent。agent 链接到的 Brain 根据墙的高度而变化。
* Agent 奖励函数设置:
* 每一步 -0.0005。
* agent 接触到目标时 +1.0。
* agent 掉下平台时 -1.0。
-* Brain 设置:一个有以下观测/运动空间的 brain。
+* Brain 设置:一个有以下观测/运动空间的 Brain。
* 向量观测空间:(连续变量)16 个,分别对应于 agent、方块和目标的位置和速度以及墙的高度。
* 向量运动空间:(离散变量)74 个,分别对应于 14 个射线投射,每个射线投射可检测 4 个可能的物体,加上 agent 的全局位置以及 agent 是否落地。
* 视觉观测:无。
@@ -119,10 +119,10 @@ Unity ML-Agents 工具包中内置了一些搭建好的学习环境的示例,
* 训练环境:可以移动到目标位置的双关节臂。
* 目标:agent 必须将手移动到目标位置,并保持在此处。
-* Agent设置:环境包含 32 个链接到单个 brain 的 agent。
+* Agent设置:环境包含 32 个链接到单个 Brain 的 agent。
* Agent 奖励函数设置(agent互相之间独立):
* 当 agent 的手处于目标位置时,每过一步 +0.1。
-* Brain 设置:一个有以下观测/运动空间的 brain。
+* Brain 设置:一个有以下观测/运动空间的 Brain。
* 向量观测空间:(连续变量)26 个,对应于两个机械臂 Rigidbody 的位置、旋转、速度和角速度。
* 向量运动空间:(连续变量)4 个,对应于两个关节的两个方向上的转动。
* 视觉观测:无
@@ -134,14 +134,14 @@ Unity ML-Agents 工具包中内置了一些搭建好的学习环境的示例,
* 训练环境:一种有 4 个手臂的生物,每个手臂分两节
* 目标:agent 必须沿 x 轴移动其身体,并且保持不跌倒。
-* Agent设置:环境包含 3 个链接到单个 brain 的 agent。
+* Agent设置:环境包含 3 个链接到单个 Brain 的 agent。
* Agent 奖励函数设置(agent互相之间独立):
* +1 乘以 x 方向的速度
* 跌倒时 -1。
* -0.01 乘以动作平方
* -0.05 乘以 y 位置变化
* -0.05 乘以 z 方向的速度
-* Brain 设置:一个有以下观测/运动空间的 brain。
+* Brain 设置:一个有以下观测/运动空间的 Brain。
* 向量观测空间:(连续变量)117 个,对应于每个肢体的位置、旋转、速度和角速度以及身体的加速度和角速度。
* 向量运动空间:(连续变量)12 个,对应于适用于 12 个关节的扭矩。
* 视觉观测:无
@@ -153,11 +153,11 @@ Unity ML-Agents 工具包中内置了一些搭建好的学习环境的示例,
* 训练环境:一个包含多个 agent 的环境,这些 agent 争相收集香蕉。
* 目标:agent 必须学习尽可能接近更多的黄色香蕉,同时避开红色香蕉。
-* Agent设置:环境包含 10 个链接到单个 brain 的 agent。
+* Agent设置:环境包含 10 个链接到单个 Brain 的 agent。
* Agent 奖励函数设置(agent互相之间独立):
* 接触黄色香蕉时 +1
* 接触红色香蕉时 -1。
-* Brain 设置:一个有以下观测/运动空间的 brain。
+* Brain 设置:一个有以下观测/运动空间的 Brain。
* 向量观测空间:(连续变量)51 个,对应于 agent 的速度, agent 前进方向,以及 agent 对周围物体进行基于射线的感知。
* 向量运动空间:(连续变量)3 个,对应于向前移动,绕 y 轴旋转,以及是否使用激光使其他 agent 瘫痪。
* 视觉观测(可选):每个 agent 的第一人称视图。
@@ -169,7 +169,7 @@ Unity ML-Agents 工具包中内置了一些搭建好的学习环境的示例,
* 训练环境:在一个环境中,agent 需要在房间内查找信息、记住信息并使用信息移动到正确目标。
* 目标:移动到与房间内的方块的颜色相对应的目标。
-* Agent设置:环境包含一个链接到单个 brain 的 agent。
+* Agent设置:环境包含一个链接到单个 Brain 的 agent。
* Agent 奖励函数设置(agent互相之间独立):
* 移动到正确目标时 +1。
* 移动到错误目标时 -0.1。
@@ -186,7 +186,7 @@ Unity ML-Agents 工具包中内置了一些搭建好的学习环境的示例,
* 训练环境:在一个环境中,agent 需要按需决策。agent 必须决定在接触地面时如何进行下一次弹跳。
* 目标:抓住漂浮的香蕉。跳跃次数有限。
-* Agent设置:环境包含一个链接到单个 brain 的 agent。
+* Agent设置:环境包含一个链接到单个 Brain 的 agent。
* Agent 奖励函数设置(agent互相之间独立):
* 抓住香蕉时 +1。
* 弹跳出界时 -1。
@@ -205,7 +205,7 @@ Unity ML-Agents 工具包中内置了一些搭建好的学习环境的示例,
* 目标:
* 前锋:让球进入对手的球门。
* 守门员:防止球进入自己的球门。
-* Agent设置:环境包含四个 agent,其中两个链接到一个 brain(前锋),两个链接到另一个 brain(守门员)。
+* Agent设置:环境包含四个 agent,其中两个链接到一个 Brain(前锋),两个链接到另一个 Brain(守门员)。
* Agent 奖励函数设置(agent互相之间非独立):
* 前锋:
* 球进入对手球门时 +1。
diff --git a/docs/localized/zh-CN/docs/ML-Agents-Overview.md b/docs/localized/zh-CN/docs/ML-Agents-Overview.md
index 7c3c899020..f488a74eb7 100644
--- a/docs/localized/zh-CN/docs/ML-Agents-Overview.md
+++ b/docs/localized/zh-CN/docs/ML-Agents-Overview.md
@@ -228,7 +228,7 @@ Brain 类型在训练期间设置为 External,在预测期间设置为 Interna
我们将 Brain 类型切换为 Internal,并加入从训练阶段
生成的 TensorFlow 模型。现在,在预测阶段,军医
仍然继续生成他们的观测结果,但不再将结果发送到
-Python API,而是送入他们的嵌入了的 Tensorflow 模型,
+Python API,而是送入他们的嵌入了的 TensorFlow 模型,
以便生成每个军医在每个时间点上要采取的_最佳_动作。
总结一下:我们的实现是基于 TensorFlow 的,因此,
diff --git a/gym-unity/Readme.md b/gym-unity/README.md
similarity index 58%
rename from gym-unity/Readme.md
rename to gym-unity/README.md
index 92f5ed989b..c83d0bb81f 100755
--- a/gym-unity/Readme.md
+++ b/gym-unity/README.md
@@ -1,26 +1,32 @@
# Unity ML-Agents Gym Wrapper
-A common way in which machine learning researchers interact with simulation environments is via a wrapper provided by OpenAI called `gym`. For more information on the gym interface, see [here](https://github.com/openai/gym).
+A common way in which machine learning researchers interact with simulation
+environments is via a wrapper provided by OpenAI called `gym`. For more
+information on the gym interface, see [here](https://github.com/openai/gym).
-We provide a a gym wrapper, and instructions for using it with existing machine learning algorithms which utilize gyms. Both wrappers provide interfaces on top of our `UnityEnvironment` class, which is the default way of interfacing with a Unity environment via Python.
+We provide a a gym wrapper, and instructions for using it with existing machine
+learning algorithms which utilize gyms. Both wrappers provide interfaces on top
+of our `UnityEnvironment` class, which is the default way of interfacing with a
+Unity environment via Python.
## Installation
The gym wrapper can be installed using:
-```
+```sh
pip install gym_unity
```
or by running the following from the `/gym-unity` directory of the repository:
-```
+```sh
pip install .
```
-
## Using the Gym Wrapper
-The gym interface is available from `gym_unity.envs`. To launch an environmnent from the root of the project repository use:
+
+The gym interface is available from `gym_unity.envs`. To launch an environmnent
+from the root of the project repository use:
```python
from gym_unity.envs import UnityEnv
@@ -29,19 +35,24 @@ env = UnityEnv(environment_filename, worker_id, default_visual, multiagent)
```
* `environment_filename` refers to the path to the Unity environment.
-* `worker_id` refers to the port to use for communication with the environment. Defaults to `0`.
-* `use_visual` refers to whether to use visual observations (True) or vector observations (False) as the default observation provided by the `reset` and `step` functions. Defaults to `False`.
-* `multiagent` refers to whether you intent to launch an environment which contains more than one agent. Defaults to `False`.
+* `worker_id` refers to the port to use for communication with the environment.
+ Defaults to `0`.
+* `use_visual` refers to whether to use visual observations (True) or vector
+ observations (False) as the default observation provided by the `reset` and
+ `step` functions. Defaults to `False`.
+* `multiagent` refers to whether you intent to launch an environment which
+ contains more than one agent. Defaults to `False`.
The returned environment `env` will function as a gym.
-For more on using the gym interface, see our [Jupyter Notebook tutorial](../notebooks/getting-started-gym.ipynb).
+For more on using the gym interface, see our
+[Jupyter Notebook tutorial](../notebooks/getting-started-gym.ipynb).
## Limitation
* It is only possible to use an environment with a single Brain.
* By default the first visual observation is provided as the `observation`, if
- present. Otherwise vector observations are provided.
+ present. Otherwise vector observations are provided.
* All `BrainInfo` output from the environment can still be accessed from the
`info` provided by `env.step(action)`.
* Stacked vector observations are not supported.
@@ -49,15 +60,26 @@ For more on using the gym interface, see our [Jupyter Notebook tutorial](../note
## Running OpenAI Baselines Algorithms
-OpenAI provides a set of open-source maintained and tested Reinforcement Learning algorithms called the [Baselines](https://github.com/openai/baselines).
+OpenAI provides a set of open-source maintained and tested Reinforcement
+Learning algorithms called the [Baselines](https://github.com/openai/baselines).
-Using the provided Gym wrapper, it is possible to train ML-Agents environments using these algorithms. This requires the creation of custom training scripts to launch each algorithm. In most cases these scripts can be created by making slightly modifications to the ones provided for Atari and Mujoco environments.
+Using the provided Gym wrapper, it is possible to train ML-Agents environments
+using these algorithms. This requires the creation of custom training scripts to
+launch each algorithm. In most cases these scripts can be created by making
+slightly modifications to the ones provided for Atari and Mujoco environments.
### Example - DQN Baseline
-In order to train an agent to play the `GridWorld` environment using the Baselines DQN algorithm, create a file called `train_unity.py` within the `baselines/deepq/experiments` subfolder of the baselines repository. This file will be a modification of the `run_atari.py` file within the same folder. Then create and `/envs/` directory within the repository, and build the GridWorld environment to that directory. For more information on building Unity environments, see [here](../docs/Learning-Environment-Executable.md). Add the following code to the `train_unity.py` file:
+In order to train an agent to play the `GridWorld` environment using the
+Baselines DQN algorithm, create a file called `train_unity.py` within the
+`baselines/deepq/experiments` subfolder of the baselines repository. This file
+will be a modification of the `run_atari.py` file within the same folder. Then
+create and `/envs/` directory within the repository, and build the GridWorld
+environment to that directory. For more information on building Unity
+environments, see [here](../docs/Learning-Environment-Executable.md). Add the
+following code to the `train_unity.py` file:
-```
+```python
import gym
from baselines import deepq
@@ -88,20 +110,29 @@ if __name__ == '__main__':
main()
```
+To start the training process, run the following from the root of the baselines
+repository:
-To start the training process, run the following from the root of the baselines repository:
-
-```
+```sh
python -m baselines.deepq.experiments.train_unity
```
### Other Algorithms
-Other algorithms in the Baselines repository can be run using scripts similar to the example provided above. In most cases, the primary changes needed to use a Unity environment are to import `UnityEnv`, and to replace the environment creation code, typically `gym.make()`, with a call to `UnityEnv(env_path)` passing the environment binary path.
+Other algorithms in the Baselines repository can be run using scripts similar to
+the example provided above. In most cases, the primary changes needed to use a
+Unity environment are to import `UnityEnv`, and to replace the environment
+creation code, typically `gym.make()`, with a call to `UnityEnv(env_path)`
+passing the environment binary path.
-A typical rule of thumb is that for vision-based environments, modification should be done to Atari training scripts, and for vector observation environments, modification should be done to Mujoco scripts.
+A typical rule of thumb is that for vision-based environments, modification
+should be done to Atari training scripts, and for vector observation
+environments, modification should be done to Mujoco scripts.
-Some algorithms will make use of `make_atari_env()` or `make_mujoco_env()` functions. These are defined in `baselines/common/cmd_util.py`. In order to use Unity environments for these algorithms, add the following import statement and function to `cmd_utils.py`:
+Some algorithms will make use of `make_atari_env()` or `make_mujoco_env()`
+functions. These are defined in `baselines/common/cmd_util.py`. In order to use
+Unity environments for these algorithms, add the following import statement and
+function to `cmd_utils.py`:
```python
from gym_unity.envs import UnityEnv
diff --git a/ml-agents/README.md b/ml-agents/README.md
index 277cd44d6c..905688d03e 100644
--- a/ml-agents/README.md
+++ b/ml-agents/README.md
@@ -1,177 +1,26 @@
-# Unity ml-agents interface and trainers
+# Unity ML-Agents Python Interface and Trainers
-The `mlagents` package contains two components : The low level API which allows
-you to interact directly with a Unity Environment and a training component whcih
-allows you to train agents in Unity Environments using our implementations of
-reinforcement learning or imitation learning.
+The `mlagents` Python package is part of the
+[ML-Agents Toolkit](https://github.com/Unity-Technologies/ml-agents).
+`mlagents` provides a Python API that allows direct interaction with the Unity
+game engine as well as a collection of trainers and algorithms to train agents
+in Unity environments.
+
+The `mlagents` Python package contains two components: The low level API which
+allows you to interact directly with a Unity Environment (`mlagents.envs`) and
+an entry point to train (`mlagents-learn`) which allows you to train agents in
+Unity Environments using our implementations of reinforcement learning or
+imitation learning.
## Installation
-The `ml-agents` package can be installed using:
+Install `mlagents` with:
```sh
pip install mlagents
```
-or by running the following from the `ml-agents` directory of the repository:
-
-```sh
-pip install .
-```
-
-## `mlagents.envs`
-
-The ML-Agents toolkit provides a Python API for controlling the agent simulation
-loop of a environment or game built with Unity. This API is used by the ML-Agent
-training algorithms (run with `mlagents-learn`), but you can also write your
-Python programs using this API.
-
-The key objects in the Python API include:
-
-- **UnityEnvironment** — the main interface between the Unity application and
- your code. Use UnityEnvironment to start and control a simulation or training
- session.
-- **BrainInfo** — contains all the data from agents in the simulation, such as
- observations and rewards.
-- **BrainParameters** — describes the data elements in a BrainInfo object. For
- example, provides the array length of an observation in BrainInfo.
-
-These classes are all defined in the `ml-agents/mlagents/envs` folder of
-the ML-Agents SDK.
-
-To communicate with an agent in a Unity environment from a Python program, the
-agent must either use an **External** brain or use a brain that is broadcasting
-(has its **Broadcast** property set to true). Your code is expected to return
-actions for agents with external brains, but can only observe broadcasting
-brains (the information you receive for an agent is the same in both cases).
-
-_Notice: Currently communication between Unity and Python takes place over an
-open socket without authentication. As such, please make sure that the network
-where training takes place is secure. This will be addressed in a future
-release._
-
-### Loading a Unity Environment
-
-Python-side communication happens through `UnityEnvironment` which is located in
-`ml-agents/mlagents/envs`. To load a Unity environment from a built binary
-file, put the file in the same directory as `envs`. For example, if the filename
-of your Unity environment is 3DBall.app, in python, run:
-
-```python
-from mlagents.env import UnityEnvironment
-env = UnityEnvironment(file_name="3DBall", worker_id=0, seed=1)
-```
-
-- `file_name` is the name of the environment binary (located in the root
- directory of the python project).
-- `worker_id` indicates which port to use for communication with the
- environment. For use in parallel training regimes such as A3C.
-- `seed` indicates the seed to use when generating random numbers during the
- training process. In environments which do not involve physics calculations,
- setting the seed enables reproducible experimentation by ensuring that the
- environment and trainers utilize the same random seed.
-
-If you want to directly interact with the Editor, you need to use
-`file_name=None`, then press the :arrow_forward: button in the Editor when the
-message _"Start training by pressing the Play button in the Unity Editor"_ is
-displayed on the screen
-
-### Interacting with a Unity Environment
-
-A BrainInfo object contains the following fields:
-
-- **`visual_observations`** : A list of 4 dimensional numpy arrays. Matrix n of
- the list corresponds to the nth observation of the brain.
-- **`vector_observations`** : A two dimensional numpy array of dimension `(batch
- size, vector observation size)`.
-- **`text_observations`** : A list of string corresponding to the agents text
- observations.
-- **`memories`** : A two dimensional numpy array of dimension `(batch size,
- memory size)` which corresponds to the memories sent at the previous step.
-- **`rewards`** : A list as long as the number of agents using the brain
- containing the rewards they each obtained at the previous step.
-- **`local_done`** : A list as long as the number of agents using the brain
- containing `done` flags (whether or not the agent is done).
-- **`max_reached`** : A list as long as the number of agents using the brain
- containing true if the agents reached their max steps.
-- **`agents`** : A list of the unique ids of the agents using the brain.
-- **`previous_actions`** : A two dimensional numpy array of dimension `(batch
- size, vector action size)` if the vector action space is continuous and
- `(batch size, number of branches)` if the vector action space is discrete.
-
-Once loaded, you can use your UnityEnvironment object, which referenced by a
-variable named `env` in this example, can be used in the following way:
-
-- **Print : `print(str(env))`**
- Prints all parameters relevant to the loaded environment and the external
- brains.
-- **Reset : `env.reset(train_model=True, config=None)`**
- Send a reset signal to the environment, and provides a dictionary mapping
- brain names to BrainInfo objects.
- - `train_model` indicates whether to run the environment in train (`True`) or
- test (`False`) mode.
- - `config` is an optional dictionary of configuration flags specific to the
- environment. For generic environments, `config` can be ignored. `config` is
- a dictionary of strings to floats where the keys are the names of the
- `resetParameters` and the values are their corresponding float values.
- Define the reset parameters on the Academy Inspector window in the Unity
- Editor.
-- **Step : `env.step(action, memory=None, text_action=None)`**
- Sends a step signal to the environment using the actions. For each brain :
- - `action` can be one dimensional arrays or two dimensional arrays if you have
- multiple agents per brains.
- - `memory` is an optional input that can be used to send a list of floats per
- agents to be retrieved at the next step.
- - `text_action` is an optional input that be used to send a single string per
- agent.
-
- Returns a dictionary mapping brain names to BrainInfo objects.
-
- For example, to access the BrainInfo belonging to a brain called
- 'brain_name', and the BrainInfo field 'vector_observations':
-
- ```python
- info = env.step()
- brainInfo = info['brain_name']
- observations = brainInfo.vector_observations
- ```
-
- Note that if you have more than one external brain in the environment, you
- must provide dictionaries from brain names to arrays for `action`, `memory`
- and `value`. For example: If you have two external brains named `brain1` and
- `brain2` each with one agent taking two continuous actions, then you can
- have:
-
- ```python
- action = {'brain1':[1.0, 2.0], 'brain2':[3.0,4.0]}
- ```
-
- Returns a dictionary mapping brain names to BrainInfo objects.
-- **Close : `env.close()`**
- Sends a shutdown signal to the environment and closes the communication
- socket.
-
-## `mlagents.trainers`
-
-1. Open a command or terminal window.
-2. Run
-
-```sh
-mlagents-learn --run-id= --train
-```
-
-Where:
-
-- `` is the relative or absolute filepath of the trainer
- configuration. The defaults used by environments in the ML-Agents SDK can be
- found in `config/trainer_config.yaml`.
-- `` is a string used to separate the results of different
- training runs
-- The `--train` flag tells `mlagents-learn` to run a training session (rather
- than inference)
-- `` __(Optional)__ is the path to the Unity executable you
- want to train. __Note:__ If this argument is not passed, the training
- will be made through the editor.
+## Usage & More Information
-For more detailled documentation, check out the
-[ML-Agents toolkit documentation.](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Readme.md)
+For more detailed documentation, check out the
+[ML-Agents Toolkit documentation.](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Readme.md)