Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebVoyager Baseline Agent & Benchmark #282

Open
wants to merge 54 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 49 commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
05173ee
init
alckasoc Jan 17, 2025
4e88641
Merge branch 'main' into webvoy
alckasoc Jan 17, 2025
b688565
Merge branch 'main' into webvoy
alckasoc Jan 17, 2025
44533e8
add json examples
alckasoc Jan 17, 2025
0398c16
some files
alckasoc Jan 17, 2025
38b9dbc
fix sorting error for get_task_ids_by_domain
chuongnguyen26 Jan 17, 2025
66706f2
utils
alckasoc Jan 17, 2025
c940f1a
readme
alckasoc Jan 17, 2025
50472b5
.
alckasoc Jan 17, 2025
0901c92
data manager
alckasoc Jan 17, 2025
21c677f
data manager init
alckasoc Jan 17, 2025
c17f086
ref answer getters
alckasoc Jan 17, 2025
1dd32da
getter
alckasoc Jan 17, 2025
3936e71
gaia data maanger
alckasoc Jan 17, 2025
39dbd68
auto lint
alckasoc Jan 17, 2025
02881aa
rename
alckasoc Jan 17, 2025
1448d05
.
alckasoc Jan 17, 2025
975476b
base strategy for webvoy
chuongnguyen26 Jan 17, 2025
1528259
Merge branch 'webvoy' of https://github.com/alckasoc/agential into we…
alckasoc Jan 17, 2025
07d32fa
add selenium
alckasoc Jan 17, 2025
bebead0
webvoy general
chuongnguyen26 Jan 17, 2025
9d12c3f
.
alckasoc Jan 17, 2025
c22a610
Merge branch 'webvoy' of https://github.com/alckasoc/agential into we…
alckasoc Jan 17, 2025
5900c53
some code
alckasoc Jan 18, 2025
48ec17b
finished general and output and finishing up agent
chuongnguyen26 Jan 18, 2025
e6e1e9b
reformatting
chuongnguyen26 Jan 18, 2025
f8cc747
finishing up webvoyager integration
chuongnguyen26 Jan 19, 2025
0e60364
lotta changes
alckasoc Jan 20, 2025
c8fc7e2
fix base benchmark classes
alckasoc Jan 20, 2025
2e62441
lottac hanges
alckasoc Jan 20, 2025
013ad35
ok
alckasoc Jan 20, 2025
d1492b8
.
alckasoc Jan 20, 2025
68af977
.
alckasoc Jan 20, 2025
0863a93
add close
alckasoc Jan 21, 2025
049e61d
clean up
alckasoc Jan 21, 2025
cd2f10d
ok
alckasoc Jan 21, 2025
f64c7fe
.
alckasoc Jan 21, 2025
95c610c
some changes
alckasoc Jan 21, 2025
8196003
eval done
alckasoc Jan 21, 2025
c6a8e7b
fix import
alckasoc Jan 21, 2025
8605017
auto lint
alckasoc Jan 21, 2025
7f042ae
.
alckasoc Jan 22, 2025
7564262
experiment scripts
alckasoc Jan 26, 2025
03dc444
big changes
alckasoc Jan 27, 2025
958190d
finished agent
chuongnguyen26 Feb 1, 2025
6ff93a6
fix osworldnb
alckasoc Feb 1, 2025
24e27da
Merge branch 'webvoy' of https://github.com/alckasoc/agential into we…
alckasoc Feb 1, 2025
df6c60d
add unittest
chuongnguyen26 Feb 2, 2025
3baa38a
added test for general
chuongnguyen26 Feb 2, 2025
59849d0
fix
alckasoc Feb 2, 2025
b4f787f
Merge branch 'webvoy' of https://github.com/alckasoc/agential into we…
alckasoc Feb 2, 2025
fda09fb
.
alckasoc Feb 2, 2025
b9c15f7
fix
alckasoc Feb 3, 2025
a94ee8c
fix expel sh
alckasoc Feb 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
174 changes: 174 additions & 0 deletions agential/agents/computer_use/webvoyager_baseline/agent.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
"""WebVoyagerBaseline Agent.

Original Paper: https://arxiv.org/abs/2401.13919
Paper Repository: https://github.com/MinorJerry/WebVoyager
"""

from typing import Any, Dict, List

Check warning on line 7 in agential/agents/computer_use/webvoyager_baseline/agent.py

View check run for this annotation

Codecov / codecov/patch

agential/agents/computer_use/webvoyager_baseline/agent.py#L7

Added line #L7 was not covered by tests

from agential.agents.base.agent import BaseAgent
from agential.agents.computer_use.webvoyager_baseline.output import WebVoyagerBaseOutput
from agential.agents.computer_use.webvoyager_baseline.prompts import (

Check warning on line 11 in agential/agents/computer_use/webvoyager_baseline/agent.py

View check run for this annotation

Codecov / codecov/patch

agential/agents/computer_use/webvoyager_baseline/agent.py#L9-L11

Added lines #L9 - L11 were not covered by tests
SYSTEM_PROMPT,
SYSTEM_PROMPT_TEXT_ONLY,
)
from agential.agents.computer_use.webvoyager_baseline.strategies.base import (

Check warning on line 15 in agential/agents/computer_use/webvoyager_baseline/agent.py

View check run for this annotation

Codecov / codecov/patch

agential/agents/computer_use/webvoyager_baseline/agent.py#L15

Added line #L15 was not covered by tests
WebVoyagerBaseStrategy,
)
from agential.agents.computer_use.webvoyager_baseline.strategies.general import (

Check warning on line 18 in agential/agents/computer_use/webvoyager_baseline/agent.py

View check run for this annotation

Codecov / codecov/patch

agential/agents/computer_use/webvoyager_baseline/agent.py#L18

Added line #L18 was not covered by tests
WebVoyagerGeneralStrategy,
)
from agential.core.llm import LLM, BaseLLM

Check warning on line 21 in agential/agents/computer_use/webvoyager_baseline/agent.py

View check run for this annotation

Codecov / codecov/patch

agential/agents/computer_use/webvoyager_baseline/agent.py#L21

Added line #L21 was not covered by tests

WEBVOYAGER_BASELINE_AGENT_STRATEGIES = {"webvoyager": WebVoyagerGeneralStrategy}

Check warning on line 23 in agential/agents/computer_use/webvoyager_baseline/agent.py

View check run for this annotation

Codecov / codecov/patch

agential/agents/computer_use/webvoyager_baseline/agent.py#L23

Added line #L23 was not covered by tests


class WebVoyagerBaseline(BaseAgent):

Check warning on line 26 in agential/agents/computer_use/webvoyager_baseline/agent.py

View check run for this annotation

Codecov / codecov/patch

agential/agents/computer_use/webvoyager_baseline/agent.py#L26

Added line #L26 was not covered by tests
"""An agent designed for WebVoyager environments, capable of processing observations and generating actions.

Attributes:
platform (str): The platform on which the agent operates (e.g., 'ubuntu').
model (BaseLLM): The language model used for generating responses and processing instructions.
max_tokens (int): Maximum tokens for the response.
top_p (float): Probability mass for nucleus sampling.
temperature (float): Temperature parameter for controlling randomness.
action_space (str): The available action space for the agent.
observation_type (str): The type of observation provided (e.g., 'screenshot', 'a11y_tree').
max_trajectory_length (int): Maximum steps allowed in a trajectory.
a11y_tree_max_tokens (int): Maximum tokens for accessibility tree observations.
testing (bool): If the agent is in testing mode.
benchmark (str): The benchmark name the agent is designed for.
strategy (WebVoyagerBaseStrategy): The strategy used by the agent.
thoughts (List): Accumulated thoughts during the agent's operation.
actions (List): Actions taken by the agent.
observations (List): Observations received by the agent.
"""

def __init__( ###### Clean Up Attributes ############

Check warning on line 47 in agential/agents/computer_use/webvoyager_baseline/agent.py

View check run for this annotation

Codecov / codecov/patch

agential/agents/computer_use/webvoyager_baseline/agent.py#L47

Added line #L47 was not covered by tests
self,
seed: int = None,
max_attached_imgs: int = 1,
temperature: float = 1.0,
text_only: bool = False,
llm: BaseLLM = LLM(model="gpt-4o"),
testing: bool = False,
benchmark: str = "osworld",
**strategy_kwargs: Any,
):
"""Initializes the OSWorldBaseline.

Args:
platform (str): The platform on which the agent operates.
llm (BaseLLM): The language model instance.
max_tokens (int): Maximum number of tokens for responses.
top_p (float): Nucleus sampling probability.
temperature (float): Sampling temperature.
action_space (str): The action space type.
observation_type (str): The type of observations.
max_trajectory_length (int): Maximum number of steps in a trajectory.
a11y_tree_max_tokens (int): Maximum tokens for accessibility tree observations.
testing (bool): Whether the agent is in testing mode.
benchmark (str): The benchmark for this agent.
**strategy_kwargs (Any): Additional arguments for the strategy.
"""
super().__init__(llm=llm, benchmark=benchmark, testing=testing)
self.seed = seed
self.max_attached_imgs = max_attached_imgs
self.temperature = temperature
self.text_only = text_only
self.llm = llm
self.testing = testing
self.benchmark = benchmark

Check warning on line 81 in agential/agents/computer_use/webvoyager_baseline/agent.py

View check run for this annotation

Codecov / codecov/patch

agential/agents/computer_use/webvoyager_baseline/agent.py#L74-L81

Added lines #L74 - L81 were not covered by tests

self.thoughts: List = []
self.actions: List = []
self.observations: List = []

Check warning on line 85 in agential/agents/computer_use/webvoyager_baseline/agent.py

View check run for this annotation

Codecov / codecov/patch

agential/agents/computer_use/webvoyager_baseline/agent.py#L83-L85

Added lines #L83 - L85 were not covered by tests

self.strategy = WebVoyagerBaseline.get_strategy(

Check warning on line 87 in agential/agents/computer_use/webvoyager_baseline/agent.py

View check run for this annotation

Codecov / codecov/patch

agential/agents/computer_use/webvoyager_baseline/agent.py#L87

Added line #L87 was not covered by tests
benchmark=self.benchmark,
llm=self.llm,
testing=self.testing,
**strategy_kwargs,
)

def get_prompts(self, textonly: bool, benchmark: str = "", **kwargs: Any) -> str:

Check warning on line 94 in agential/agents/computer_use/webvoyager_baseline/agent.py

View check run for this annotation

Codecov / codecov/patch

agential/agents/computer_use/webvoyager_baseline/agent.py#L94

Added line #L94 was not covered by tests
"""Retrieve the appropriate system prompt based on the observation type and action space.

Returns:
str: The system prompt for the agent.

Raises:
ValueError: If the action space or observation type is invalid.
"""
if textonly:
return SYSTEM_PROMPT_TEXT_ONLY

Check warning on line 104 in agential/agents/computer_use/webvoyager_baseline/agent.py

View check run for this annotation

Codecov / codecov/patch

agential/agents/computer_use/webvoyager_baseline/agent.py#L103-L104

Added lines #L103 - L104 were not covered by tests
else:
return SYSTEM_PROMPT

Check warning on line 106 in agential/agents/computer_use/webvoyager_baseline/agent.py

View check run for this annotation

Codecov / codecov/patch

agential/agents/computer_use/webvoyager_baseline/agent.py#L106

Added line #L106 was not covered by tests

@staticmethod
def get_strategy(benchmark: str, **kwargs: Any) -> WebVoyagerBaseStrategy:

Check warning on line 109 in agential/agents/computer_use/webvoyager_baseline/agent.py

View check run for this annotation

Codecov / codecov/patch

agential/agents/computer_use/webvoyager_baseline/agent.py#L108-L109

Added lines #L108 - L109 were not covered by tests
"""Returns the strategy corresponding to the benchmark.

Args:
benchmark (str): The benchmark name.
**kwargs (Any): Additional arguments for the strategy.

Returns:
WebVoyagerBaseStrategy: The strategy instance.

Raises:
ValueError: If the benchmark is unsupported.
"""
if benchmark not in WEBVOYAGER_BASELINE_AGENT_STRATEGIES:
raise ValueError(f"Unsupported benchmark: {benchmark} for agent ReAct")

Check warning on line 123 in agential/agents/computer_use/webvoyager_baseline/agent.py

View check run for this annotation

Codecov / codecov/patch

agential/agents/computer_use/webvoyager_baseline/agent.py#L122-L123

Added lines #L122 - L123 were not covered by tests

strategy = WEBVOYAGER_BASELINE_AGENT_STRATEGIES[benchmark]
return strategy(**kwargs)

Check warning on line 126 in agential/agents/computer_use/webvoyager_baseline/agent.py

View check run for this annotation

Codecov / codecov/patch

agential/agents/computer_use/webvoyager_baseline/agent.py#L125-L126

Added lines #L125 - L126 were not covered by tests

@staticmethod
def get_fewshots(

Check warning on line 129 in agential/agents/computer_use/webvoyager_baseline/agent.py

View check run for this annotation

Codecov / codecov/patch

agential/agents/computer_use/webvoyager_baseline/agent.py#L128-L129

Added lines #L128 - L129 were not covered by tests
benchmark: str = "", fewshot_type: str = "", **kwargs: Any
) -> Dict[str, str]:
"""Retrieve few-shot examples based on the benchmark.

Args:
benchmark (str): The benchmark name.
fewshot_type (str): The benchmark few-shot type.
**kwargs (Any): Additional arguments.

Returns:
Dict[str, str]: A dictionary of few-shot examples.
"""
return {"benchmark": benchmark, "fewshot_type": fewshot_type}

Check warning on line 142 in agential/agents/computer_use/webvoyager_baseline/agent.py

View check run for this annotation

Codecov / codecov/patch

agential/agents/computer_use/webvoyager_baseline/agent.py#L142

Added line #L142 was not covered by tests

def generate(

Check warning on line 144 in agential/agents/computer_use/webvoyager_baseline/agent.py

View check run for this annotation

Codecov / codecov/patch

agential/agents/computer_use/webvoyager_baseline/agent.py#L144

Added line #L144 was not covered by tests
self, obs: Dict[str, Any], task: Dict[str, Any], prompt: str = ""
) -> WebVoyagerBaseOutput:
"""Processes a given instruction and observations to generate a response.

Args:
instruction (str): Instruction for the agent.
obs (Dict[str, Any]): Observations from the environment.
task (Dict[str, Any]): Task to generate action for.
prompt (str, optional): Predefined prompt for the agent. Defaults to "".

Returns:
Tuple[str, List, List]: A response from the agent, the list of actions,
and additional messages.
"""
if not prompt:
system_prompt_text_only = self.get_prompts(textonly=True)["prompt"]
system_prompt = self.get_prompts(textonly=False)["prompt"]

Check warning on line 161 in agential/agents/computer_use/webvoyager_baseline/agent.py

View check run for this annotation

Codecov / codecov/patch

agential/agents/computer_use/webvoyager_baseline/agent.py#L159-L161

Added lines #L159 - L161 were not covered by tests

webvoyager_base_output: WebVoyagerBaseOutput = self.strategy.generate(

Check warning on line 163 in agential/agents/computer_use/webvoyager_baseline/agent.py

View check run for this annotation

Codecov / codecov/patch

agential/agents/computer_use/webvoyager_baseline/agent.py#L163

Added line #L163 was not covered by tests
system_prompt=system_prompt,
system_prompt_text_only=system_prompt_text_only,
seed=self.seed,
max_attached_imgs=self.max_attached_imgs,
temperature=self.temperature,
text_only=self.text_only,
task=task,
obs=obs
)

return webvoyager_base_output

Check warning on line 174 in agential/agents/computer_use/webvoyager_baseline/agent.py

View check run for this annotation

Codecov / codecov/patch

agential/agents/computer_use/webvoyager_baseline/agent.py#L174

Added line #L174 was not covered by tests
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from typing import Any, Dict, TypedDict

from selenium import webdriver
from selenium.webdriver.remote.webdriver import WebDriver


class AccessibilityTreeNode(TypedDict):
Expand Down Expand Up @@ -102,7 +103,7 @@ class BrowserInfo(TypedDict):

def fetch_browser_info(
# page: Page,
browser: webdriver,
browser: WebDriver,
) -> BrowserInfo:
"""Fetches detailed information about the browser state, including the DOM tree
and window configuration.
Expand Down
20 changes: 20 additions & 0 deletions agential/agents/computer_use/webvoyager_baseline/output.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
"""WebVoyagerBaseline structured output module."""

from typing import Any, Dict

from pydantic import Field

from agential.agents.base.output import BaseAgentOutput


class WebVoyagerBaseOutput(BaseAgentOutput):
"""WebVoyagerBaseOutput structured output class.

Attributes:
additional_info Dict[str, Any]: A dictionary of observations, thoughts, and actions of WebVoyagerBaselineAgent Output.
"""

additional_info: Dict[str, Any] = Field(
...,
description="A dictionary of observations, thoughts, and actions of WebVoyagerBaselineAgent Output.",
)
Loading
Loading