Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue #6361: [Feature]: Document the App Browser Feature in the OpenHands Documentation Page #6362

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions docs/modules/usage/how-to/app-browser.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# App Browser

The App Browser is a feature in OpenHands that allows you to monitor and verify the AI agent's web interactions in real-time. When the agent performs actions in a web browser (like navigating to URLs, clicking buttons, or filling forms), the App Browser displays screenshots of what the agent sees, helping you ensure that the agent is interacting with web pages correctly.

## Features

- **URL Display**: Shows the current URL the agent is visiting
- **Live Screenshots**: Displays real-time screenshots of the web pages the agent is interacting with
- **Visual Verification**: Helps you verify that the agent's web interactions are working as intended

## How It Works

1. When the agent performs web interactions using the `browser` tool, it captures screenshots of the web pages
2. These screenshots are displayed in the App Browser panel in real-time
3. You can see exactly what the agent sees, making it easier to debug or verify web interactions

## Use Cases

The App Browser is particularly useful when:

- Debugging web automation tasks
- Verifying that the agent is interacting with the correct elements on a page
- Ensuring web scraping or form filling tasks are working correctly
- Monitoring the agent's progress during web-based tasks

## Location

You can find the App Browser panel in the OpenHands UI. It displays "No page loaded" when the agent is not currently performing any web interactions.
28 changes: 28 additions & 0 deletions docs/modules/usage/how-to/gui-mode.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,34 @@ The main interface consists of several key components:
- **Settings Button**: A gear icon that opens the settings modal, allowing you to adjust your configuration at any time.
- **Workspace Panel**: Displays the files and folders in your workspace, allowing you to navigate and view files, or the agent's past commands or web browsing history.

### App Browser Feature

The App Browser is a powerful feature that allows the AI assistant to interact with frontend applications:

- **Purpose**: Enables the AI to view, navigate, and interact with web-based user interfaces, making it capable of testing and debugging frontend applications.
- **Capabilities**:
- **Navigation**: The AI can browse web pages and navigate through different sections of the application.
- **Interaction**: Supports clicking buttons, filling forms, and other common web interactions.
- **Visual Feedback**: The AI can see and interpret the application's interface, helping with UI-related tasks.
- **Testing**: Facilitates automated testing of frontend applications by allowing the AI to simulate user interactions.

#### Using the App Browser

1. **Accessing the Browser**:
- The browser view appears in the workspace panel when the AI is interacting with web applications.
- You can see the current page and the AI's interactions in real-time.

2. **Common Use Cases**:
- Testing frontend applications
- Debugging UI issues
- Automating web-based workflows
- Validating user interface changes

3. **Browser Controls**:
- The AI automatically handles navigation and interaction
- You can observe the AI's actions in the browser view
- The chat interface allows you to guide the AI's interactions with the application

### Interacting with the AI

1. Type your question, request, or task description in the input box.
Expand Down
5 changes: 5 additions & 0 deletions docs/sidebars.ts
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,11 @@ const sidebars: SidebarsConfig = {
label: 'Github Actions',
id: 'usage/how-to/github-action',
},
{
type: 'doc',
label: 'App Browser',
id: 'usage/how-to/app-browser',
},
],
},
{
Expand Down
Loading