Enhance documentation

- Moved "Model integration" section to separate page. - Updated README to include a new "Model integration" link and improved the structure of the "Supported models" section. - Introduced environment variable examples in the CLI documentation for better user guidance. - Removed unused dotenv variable handling from the CLI utils. - Minor adjustments to the evaluation script and example files for consistency. These changes aim to improve user experience and clarity in the documentation and CLI usage.
gradion-ai · Jan 23, 2025 · 7993f3c · 7993f3c
1 parent 46aa683
commit 7993f3c
Show file tree

Hide file tree

Showing 10 changed files with 165 additions and 128 deletions.
diff --git a/README.md b/README.md
@@ -15,11 +15,9 @@ A lightweight library for code-action based agents.
 
 - [Introduction](#introduction)
 - [Key capabilities](#key-capabilities)
-- [Supported models](#supported-models)
 - [Quickstart](#quickstart)
 - [Evaluation](#evaluation)
-
-The `freeact` documentation is available [here](https://gradion-ai.github.io/freeact/).
+- [Supported models](#supported-models)
 
 ## Introduction
 
@@ -35,10 +33,6 @@ The library's architecture emphasizes extensibility and transparency, avoiding t
 
 `freeact` executes all code actions within [`ipybox`](https://gradion-ai.github.io/ipybox/), a secure execution environment built on IPython and Docker that can also be deployed locally. This ensures safe execution of dynamically generated code while maintaining full access to the Python ecosystem. Combined with its lightweight and extensible architecture, `freeact` provides a robust foundation for building adaptable AI agents that can tackle real-world challenges requiring dynamic problem-solving approaches.
 
-## Supported models
-
-In addition to the models we [evaluated](#evaluation), `freeact` also supports any model from any provider that is compatible with the [OpenAI Python SDK](https://github.com/openai/openai-python), including open models deployed locally on [ollama](https://ollama.com/) or [TGI](https://huggingface.co/docs/text-generation-inference/index), for example. See [Model integration](https://gradion-ai.github.io/freeact/models/#model-integration) for details.
-
 ## Quickstart
 
 Install `freeact` using pip:
@@ -57,7 +51,7 @@ ANTHROPIC_API_KEY=...
 GOOGLE_API_KEY=...
 ```
 
-Launch a `freeact` agent with generative Google Search skill using the CLI
+Launch a `freeact` agent with generative Google Search skill using the [CLI](https://gradion-ai.github.io/freeact/cli/):
 
 ```bash
 python -m freeact.cli \
@@ -117,3 +111,7 @@ When comparing our results with smolagents using Claude 3.5 Sonnet on [m-ric/age
 [<img src="docs/eval/eval-plot-comparison.png" alt="Performance comparison" width="60%">](docs/eval/eval-plot-comparison.png)
 
 Interestingly, these results were achieved using zero-shot prompting in `freeact`, while the smolagents implementation utilizes few-shot prompting. To ensure a fair comparison, we employed identical evaluation protocols and tools. You can find all evaluation details [here](evaluation).
+
+## Supported models
+
+In addition to the models we [evaluated](#evaluation), `freeact` also supports the [integration](https://gradion-ai.github.io/freeact/integration/) of new models from any provider that is compatible with the [OpenAI Python SDK](https://github.com/openai/openai-python), including open models deployed locally with [ollama](https://ollama.com/) or [TGI](https://huggingface.co/docs/text-generation-inference/index), for example.
diff --git a/docs/cli.md b/docs/cli.md
@@ -18,3 +18,71 @@ The `freeact` CLI supports entering messages that span multiple lines in two way
 To submit a multiline message, simply press `Enter`.
 
 ![Multiline input](img/multiline.png)
+
+## Environment variables
+
+The CLI reads environment variables from a `.env` file in the current directory and passes them to the [execution environment](installation.md#execution-environment). API keys required for an agent's code action model must be either defined in the `.env` file, passed as command-line arguments, or directly set as variables in the shell.
+
+### Example 1
+
+The [quickstart](quickstart.md) example requires `ANTHROPIC_API_KEY` and `GOOGLE_API_KEY` to be defined in a `.env` file in the current directory. The `ANTHROPIC_API_KEY` is needed for the `claude-3-5-sonnet-20241022` code action model, while the `GOOGLE_API_KEY` is required for the `freeact_skills.search.google.stream.api` skill in the execution environment. Given a `.env` file with the following content:
+
+```env title=".env"
+# Required for Claude 3.5 Sonnet
+ANTHROPIC_API_KEY=your-anthropic-api-key
+
+# Required for generative Google Search via Gemini 2
+GOOGLE_API_KEY=your-google-api-key
+```
+
+the following command will launch an agent with `claude-3-5-sonnet-20241022` as code action model configured with a generative Google search skill implemented by module `freeact_skills.search.google.stream.api`:
+
+```bash
+python -m freeact.cli \
+  --model-name=claude-3-5-sonnet-20241022 \
+  --ipybox-tag=ghcr.io/gradion-ai/ipybox:basic \
+  --skill-modules=freeact_skills.search.google.stream.api
+```
+
+The API key can alternatively be passed as command-line argument:
+
+```bash
+python -m freeact.cli \
+  --model-name=claude-3-5-sonnet-20241022 \
+  --api-key=your-anthropic-api-key \
+  --ipybox-tag=ghcr.io/gradion-ai/ipybox:basic \
+  --skill-modules=freeact_skills.search.google.stream.api
+```
+
+### Example 2
+
+To use models from other providers, such as [accounts/fireworks/models/deepseek-v3](https://fireworks.ai/models/fireworks/deepseek-v3) hosted by [Fireworks](https://fireworks.ai/), you can either provide all required environment variables in a `.env` file:
+
+```env title=".env"
+# Required for DeepSeek V3 hosted by Fireworks
+DEEPSEEK_BASE_URL=https://api.fireworks.ai/inference/v1
+DEEPSEEK_API_KEY=your-deepseek-api-key
+
+# Required for generative Google Search via Gemini 2
+GOOGLE_API_KEY=your-google-api-key
+```
+
+and launch the agent with
+
+```bash
+python -m freeact.cli \
+  --model-name=accounts/fireworks/models/deepseek-v3 \
+  --ipybox-tag=ghcr.io/gradion-ai/ipybox:basic \
+  --skill-modules=freeact_skills.search.google.stream.api
+```
+
+or pass the base URL and API key directly as command-line arguments:
+
+```bash
+python -m freeact.cli \
+  --model-name=accounts/fireworks/models/deepseek-v3 \
+  --base-url=https://api.fireworks.ai/inference/v1 \
+  --api-key=your-deepseek-api-key \
+  --ipybox-tag=ghcr.io/gradion-ai/ipybox:basic \
+  --skill-modules=freeact_skills.search.google.stream.api
+```
diff --git a/docs/index.md b/docs/index.md
@@ -19,16 +19,16 @@ The library's architecture emphasizes extensibility and transparency, avoiding t
 ## Next steps
 
 - [Quickstart](quickstart.md) - Launch your first `freeact` agent and interact with it on the command line
+- [Installation](installation.md) - Installation instructions and configuration of execution environments
 - [Building blocks](blocks.md) - Learn about the essential components of a `freeact` agent system
 - [Tutorials](tutorials/index.md) - Tutorials demonstrating the `freeact` building blocks
 
 ## Further reading
 
-- [Installation](installation.md) - Detailed instructions for building custom execution environments
-- [Command line](cli.md) - Minimalistic command-line interface for running `freeact` agents
-- [Supported models](models.md) - Overview of evaluated models and how to [integrate new ones](models.md#model-integration).
-- [Streaming protocol](streaming.md) - Protocol for streaming model responses and execution results
-- [Evaluation results](evaluation.md) - Evaluation of `freeact` performance incl. a smolagents comparison
+- [Command line interface](cli.md) - Guide to using `freeact` agents on the command line
+- [Supported models](models.md) - Overview of models [evaluated](evaluation.md) with `freeact`
+- [Model integration](integration.md) - Guidelines for integrating new models into `freeact`
+- [Streaming protocol](streaming.md) - Specification for streaming model responses and execution results
 
 ## Status
 

diff --git a/docs/integration.md b/docs/integration.md
@@ -0,0 +1,80 @@
+# Model integration
+
+`freeact` provides both a low-level and high-level API for integrating new models.
+
+- The [low-level API](api/model.md) defines the `CodeActModel` interface and related abstractions
+- The [high-level API](api/generic.md) provides a `GenericModel` class based on the [OpenAI Python SDK](https://github.com/openai/openai-python)
+
+### Low-level API
+
+The low-level API is not further described here. For implementation examples, see the [`freeact.model.claude`](https://github.com/gradion-ai/freeact/tree/main/freeact/model/claude) or [`freeact.model.gemini`](https://github.com/gradion-ai/freeact/tree/main/freeact/model/gemini) packages.
+
+### High-level API
+
+The high-level API supports usage of models from any provider that is compatible with the [OpenAI Python SDK](https://github.com/openai/openai-python). To use a model, you need to provide prompt templates that guide it to generate code actions. You can either reuse existing templates or create your own. Then, you can either create an instance of `GenericModel` or subclass it.
+
+The following subsections demonstrate this using Qwen 2.5 Coder 32B Instruct as an example, showing how to use it both via the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index) and locally with [ollama](https://ollama.com/).
+
+#### Prompt templates
+
+Start with model-specific prompt templates that guide Qwen 2.5 Coder Instruct models to generate code actions. For example:
+
+```python title="freeact/model/qwen/prompt.py"
+--8<-- "freeact/model/qwen/prompt.py"
+```
+
+!!! Note
+
+    These prompt templates are still experimental. They work reasonably well for larger Qwen 2.5 Coder models, but need optimization for smaller ones.
+
+!!! Tip
+
+    While tested with Qwen 2.5 Coder Instruct, these prompt templates can also serve as a good starting point for other models (as we did for DeepSeek V3, for example).
+
+#### Model definition
+
+Although we could instantiate `GenericModel` directly with these prompt templates, `freeact` provides a `QwenCoder` subclass for convenience:
+
+```python title="freeact/model/qwen/model.py"
+--8<-- "freeact/model/qwen/model.py"
+```
+
+#### Model usage
+
+Here's a Python example that uses `QwenCoder` as code action model in a `freeact` agent. The model is accessed via the Hugging Face Inference API:
+
+```python title="freeact/examples/qwen.py"
+--8<-- "freeact/examples/qwen.py"
+```
+
+1. Your Hugging Face [user access token](https://huggingface.co/docs/hub/en/security-tokens)
+
+2. Interact with the agent via a CLI
+
+Run it with:
+
+```bash
+HF_TOKEN=<your-huggingface-token> python -m freeact.examples.qwen
+```
+
+Alternatively, use the default `freeact` [CLI](cli.md) directly:
+
+```bash
+python -m freeact.cli \
+  --model-name=Qwen/Qwen2.5-Coder-32B-Instruct \
+  --base-url=https://api-inference.huggingface.co/v1/ \
+  --api-key=<your-huggingface-token> \
+  --ipybox-tag=ghcr.io/gradion-ai/ipybox:basic \
+  --skill-modules=freeact_skills.search.google.stream.api
+```
+
+For using the same model deployed locally with [ollama](https://ollama.com/), modify `--model-name`, `--base-url` and `--api-key` to match your local deployment:
+
+```bash
+python -m freeact.cli \
+  --model-name=qwen2.5-coder:32b-instruct-fp16 \
+  --base-url=http://localhost:11434/v1 \
+  --api-key=ollama \
+  --ipybox-tag=ghcr.io/gradion-ai/ipybox:basic \
+  --skill-modules=freeact_skills.search.google.stream.api
+```
diff --git a/docs/models.md b/docs/models.md
@@ -1,9 +1,5 @@
 # Supported models
 
-In addition to the models we evaluated, `freeact` also supports any model from any provider that is compatible with the [OpenAI Python SDK](https://github.com/openai/openai-python), including open models deployed locally on [ollama](https://ollama.com/) or [TGI](https://huggingface.co/docs/text-generation-inference/index), for example. See [Model integration](#model-integration) for details.
-
-## Evaluated models
-
 The following models have been [evaluated](evaluation.md) with `freeact`:
 
 - Claude 3.5 Sonnet (20241022)
@@ -12,85 +8,12 @@ The following models have been [evaluated](evaluation.md) with `freeact`:
 - Qwen 2.5 Coder 32B Instruct
 - DeepSeek V3
 
-For these models, `freeact` provides model-specific prompt templates.
-
-!!! Tip
-
-    For best performance, we recommend using Claude 3.5 Sonnet. Support for Gemini 2.0 Flash, Qwen 2.5 Coder and DeepSeek V3 is still experimental. The Qwen 2.5 Coder integration is described in [Model integration](#model-integration). The DeepSeek V3 integration follows the same pattern using a custom model class.
-
-## Model integration
-
-`freeact` provides both a low-level and high-level API for integrating new models.
-
-- The [low-level API](api/model.md) defines the `CodeActModel` interface and related abstractions
-- The [high-level API](api/generic.md) provides a `GenericModel` implementation of `CodeActModel` using the [OpenAI Python SDK](https://github.com/openai/openai-python)
-
-### Low-level API
-
-The low-level API is not further described here. For implementation examples see packages [claude](https://github.com/gradion-ai/freeact/tree/main/freeact/model/claude) or [gemini](https://github.com/gradion-ai/freeact/tree/main/freeact/model/gemini).
-
-### High-level API
-
-The high-level API support usage of any model from any provider that is compatible with the [OpenAI Python SDK](https://github.com/openai/openai-python), including models deployed locally on [ollama](https://ollama.com/) or [TGI](https://huggingface.co/docs/text-generation-inference/index), for example. This is shown in the following for Qwen 2.5 Coder 32B Instruct.
-
-#### Prompt templates
-
-Start with model-specific prompt templates that guide Qwen 2.5 Coder Instruct models to generate code actions:
-
-```python title="freeact/model/qwen/prompt.py"
---8<-- "freeact/model/qwen/prompt.py"
-```
+For these models, `freeact` provides model-specific prompt templates. 
 
 !!! Note
 
-    These prompt templates are still experimental.
+    In addition to the models we evaluated, `freeact` also supports the [integration](integration.md) of new models from any provider that is compatible with the [OpenAI Python SDK](https://github.com/openai/openai-python), including open models deployed locally with [ollama](https://ollama.com/) or [TGI](https://huggingface.co/docs/text-generation-inference/index), for example.
 
 !!! Tip
 
-    While tested with Qwen 2.5 Coder Instruct, these prompt templates can also serve as a good starting point for other models (as we did for DeepSeek V3, for example).
-
-#### Model definition
-
-Although we could instantiate `GenericModel` directly with these prompt templates, `freeact` provides a `QwenCoder` subclass for convenience.
-
-```python title="freeact/model/qwen/model.py"
---8<-- "freeact/model/qwen/model.py"
-```
-
-#### Model usage
-
-Here's a Python example that uses `QwenCoder` in an interactive CLI:
-
-```python title="freeact/examples/qwen.py"
---8<-- "freeact/examples/qwen.py"
-```
-
-1. Your Hugging Face [user access token](https://huggingface.co/docs/hub/en/security-tokens)
-
-Run it with:
-
-```bash
-HF_TOKEN=<your-huggingface-token> python -m freeact.examples.qwen
-```
-
-Or use the `freeact` CLI directly:
-
-```bash
-python -m freeact.cli \
-  --model-name=Qwen/Qwen2.5-Coder-32B-Instruct \
-  --base-url=https://api-inference.huggingface.co/v1/ \
-  --api-key=<your-huggingface-token> \
-  --ipybox-tag=ghcr.io/gradion-ai/ipybox:basic \
-  --skill-modules=freeact_skills.search.google.stream.api
-```
-
-For using the same model deployed locally on [ollama](https://ollama.com/), for example, change `--model-name`, `--base-url` and `--api-key` to match your local deployment:
-
-```bash
-python -m freeact.cli \
-  --model-name=qwen2.5-coder:32b-instruct-fp16 \
-  --base-url=http://localhost:11434/v1 \
-  --api-key=ollama \
-  --ipybox-tag=ghcr.io/gradion-ai/ipybox:basic \
-  --skill-modules=freeact_skills.search.google.stream.api
-```
+    For best performance, we recommend Claude 3.5 Sonnet, with DeepSeek V3 as a close second. Support for Gemini 2.0 Flash, Qwen 2.5 Coder, and DeepSeek V3 remains experimental as we continue to optimize their prompt templates.
diff --git a/docs/tutorials/basics.md b/docs/tutorials/basics.md
@@ -69,7 +69,7 @@ The Python example above is part of the `freeact` package and can be run with:
 python -m freeact.examples.basics
 ```
 
-For formatted and colored console output, as shown in the [example conversation](#example-conversation), you can use the `freeact` CLI:
+For formatted and colored console output, as shown in the [example conversation](#example-conversation), you can use the `freeact` [CLI](../cli.md):
 
 ```shell
 --8<-- "freeact/examples/commands.txt:cli-basics-claude"

diff --git a/evaluation/evaluate.py b/evaluation/evaluate.py
@@ -25,7 +25,6 @@
     QwenCoder,
     execution_environment,
 )
-from freeact.cli.utils import dotenv_variables
 
 app = typer.Typer()
 
@@ -224,7 +223,6 @@ async def run_agent(
         executor_key="agent-evaluation",
         ipybox_tag="ghcr.io/gradion-ai/ipybox:eval",
         log_file=Path("logs", "agent-evaluation.log"),
-        env_vars=dotenv_variables(),
     ) as env:
         skill_sources = await env.executor.get_module_sources(
             ["google_search.api", "visit_webpage.api"],

diff --git a/freeact/cli/utils.py b/freeact/cli/utils.py
@@ -1,11 +1,9 @@
 import platform
-from contextlib import asynccontextmanager
 from pathlib import Path
 from typing import Dict
 
 import aiofiles
 import prompt_toolkit
-from dotenv import dotenv_values
 from PIL import Image
 from prompt_toolkit.key_binding import KeyBindings
 from rich.console import Console
@@ -19,36 +17,7 @@
     CodeActAgentTurn,
     CodeActModelTurn,
     CodeExecution,
-    CodeExecutionContainer,
-    CodeExecutor,
 )
-from freeact.logger import Logger
-
-
-def dotenv_variables() -> dict[str, str]:
-    return {k: v for k, v in dotenv_values().items() if v is not None}
-
-
-@asynccontextmanager
-async def execution_environment(
-    executor_key: str = "default",
-    ipybox_tag: str = "ghcr.io/gradion-ai/ipybox:minimal",
-    env_vars: dict[str, str] = dotenv_variables(),
-    workspace_path: Path | str = Path("workspace"),
-    log_file: Path | str = Path("logs", "agent.log"),
-):
-    async with CodeExecutionContainer(
-        tag=ipybox_tag,
-        env=env_vars,
-        workspace_path=workspace_path,
-    ) as container:
-        async with CodeExecutor(
-            key=executor_key,
-            port=container.port,
-            workspace=container.workspace,
-        ) as executor:
-            async with Logger(file=log_file) as logger:
-                yield executor, logger
 
 
 async def stream_conversation(agent: CodeActAgent, console: Console, show_token_usage: bool = False, **kwargs):

diff --git a/freeact/examples/qwen.py b/freeact/examples/qwen.py
@@ -23,7 +23,7 @@ async def main():
         )
 
         agent = CodeActAgent(model=model, executor=env.executor)
-        await stream_conversation(agent, console=Console())
+        await stream_conversation(agent, console=Console())  # (2)!
 
 
 if __name__ == "__main__":