feat(workflow): Implement a simplified CoAct workflow #3770

ryanhoangt · 2024-09-07T13:49:58Z

Short description of the problem this fixes or functionality that this introduces. This may be used for the CHANGELOG

This PR implements a simplified multi-agent workflow inspired by the CoAct paper.
Currently, in swe-bench eval, there are complex instances that OpenHands fails, especially ones that single CodeActAgent overlooks the buggy location. If we have a grounding test case for the issue, this workflow seems to help.
An overkill-ish successful trajectory with replanning can be found here.
A task which CoActPlannerAgent finished but CodeActAgent failed (I expected both to be able to complete it):
- CoAct traj
- CodeAct traj

Give a summary of what the PR does, explaining any non-trivial design decisions

Modify CodeAct to make it accept delegated task.
Implement 2 new agents, planner and executor with the same abilities as CodeAct, different system prompts, additional action parsers.
Nit: adjust the delegate message shown on UI.

Some next steps to improve this may be:

Try eval on some swe-bench-lite instances.
Adjust the system/user prompts and few-shot examples to further specialize the two agents. Also define the structure for the plan (e.g., its components, etc). 2 agents can now cooperate to finish a swe-bench issue.
Use meta prompt to reinforce the actions of agents, to make sure it follows the workflow.
Experiment with ability for the global planner to refuse the replan request from executor.
Implement ability for the delegated agent (e.g., BrowsingAgent or CoActExecutorAgent) to persist its history through the multi-turn conversation.

Link of any specific issues this addresses

#3077

…nd UI

agenthub/coact_agent/meta/executor_prompt.py

openhands/events/observation/browse.py

tobitege · 2024-09-07T18:06:52Z

just fyi, the integration tests seem to fail because of some

"action_suffix": "browse"

in some results.

ryanhoangt · 2024-09-07T18:16:56Z

just fyi, the integration tests seem to fail because of some
"action_suffix": "browse"
in some results.

Thanks, still waiting for reviews on it, if it is good to go I will look into the tests.

… few-shot examples

neubig

Hey, thanks a bunch for this @ryanhoangt !

I browsed through the code, and I think it's implemented quite well. Personally I think the next step could be to test if it gets good benchmark scores.

ryanhoangt · 2024-09-10T02:44:25Z

Hey, thanks a bunch for this @ryanhoangt !

I browsed through the code, and I think it's implemented quite well. Personally I think the next step could be to test if it gets good benchmark scores.

Thanks Prof., I'll do and update how it goes soon.

agenthub/coact_agent/executor/system_prompt.j2

tobitege · 2024-09-10T17:57:19Z

It might be in the paper(s), but I don't quite like that the prompts now talk of agent, while anywhere else it is assistant. 🤔

ryanhoangt · 2024-09-18T06:24:56Z

I think it's visible when you look at the trajectories linked above, I'm looking now at the first of those 2, and step 9 is like:

Re the json in the visualizer, seems like it is because we don't format the finish action yet.

prompt_039.log - It has an observation using JSON.

Good catch, this seems to be another bug. Might be because this action is not handled properly:

return AgentFinishAction(thought=thought, outputs={'content': thought})

There's something else that looks suspicious to me just after this. The next prompt sent to the LLM is from the Executor, and its prompt includes some text from the Planner-specific prompt

Yeah I also noticed this issue. My intention is to make the Planner include the full user message (hence the full problem statement in swe-bench) to give executor some more context, but sometimes it included the message from the few-shot examples, or the "Now, let's come up with 2 global plans sequentially." as you saw, which is problematic.

I thought this section about "let's come up with 2 global plans sequentially" is part of the Planner agent prompt, and "playing the role of a subordinate employee" is the Executor. (Then the phases are written by the Planner for the Executor.) Isn't that the case? Does the above look expected?

"let's come up with 2 global plans sequentially" - this is an extra piece of prompt used only in swe-bench evaluation for CoActPlanner. Similar to CodeActSWEAgent below, it can be used to steer the agent a bit to be better at a specific task, but I'm not sure the current "2 global plans" is the optimal way to go. In CodeActAgent there're many cases where the agent just fixed the issues without creating any tests.

OpenHands/evaluation/swe_bench/run_infer.py

Lines 248 to 290 in 41ddba8

    
               if agent_class == 'CodeActSWEAgent': 
        
                   instruction = ( 
        
                       'We are currently solving the following issue within our repository. Here is the issue text:\n' 
        
                       '--- BEGIN ISSUE ---\n' 
        
                       f'{instance.problem_statement}\n' 
        
                       '--- END ISSUE ---\n\n' 
        
                   ) 
        
                   if USE_HINT_TEXT and instance.hints_text: 
        
                       instruction += ( 
        
                           f'--- BEGIN HINTS ---\n{instance.hints_text}\n--- END HINTS ---\n' 
        
                       ) 
        
                   instruction += f"""Now, you're going to solve this issue on your own. Your terminal session has started and you're in the repository's root directory. You can use any bash commands or the special interface to help you. Edit all the files you need to and run any checks or tests that you want. 
        
           Remember, YOU CAN ONLY ENTER ONE COMMAND AT A TIME. You should always wait for feedback after every command. 
        
           When you're satisfied with all of the changes you've made, you can run the following command: <execute_bash> exit </execute_bash>. 
        
           Note however that you cannot use any interactive session commands (e.g. vim) in this environment, but you can write scripts and run them. E.g. you can write a python script and then run it with `python <script_name>.py`. 
        
           NOTE ABOUT THE EDIT COMMAND: Indentation really matters! When editing a file, make sure to insert appropriate indentation before each line! 
        
           IMPORTANT TIPS: 
        
           1. Always start by trying to replicate the bug that the issues discusses. 
        
               If the issue includes code for reproducing the bug, we recommend that you re-implement that in your environment, and run it to make sure you can reproduce the bug. 
        
               Then start trying to fix it. 
        
               When you think you've fixed the bug, re-run the bug reproduction script to make sure that the bug has indeed been fixed. 
        
               If the bug reproduction script does not print anything when it successfully runs, we recommend adding a print("Script completed successfully, no errors.") command at the end of the file, 
        
               so that you can be sure that the script indeed ran fine all the way through. 
        
           2. If you run a command and it doesn't work, try running a different command. A command that did not work once will not work the second time unless you modify it! 
        
           3. If you open a file and need to get to an area around a specific line that is not in the first 100 lines, say line 583, don't just use the scroll_down command multiple times. Instead, use the goto 583 command. It's much quicker. 
        
           4. If the bug reproduction script requires inputting/reading a specific file, such as buggy-input.png, and you'd like to understand how to input that file, conduct a search in the existing repo code, to see whether someone else has already done that. Do this by running the command: find_file("buggy-input.png") If that doesn't work, use the linux 'find' command. 
        
           5. Always make sure to look at the currently open file and the current working directory (which appears right after the currently open file). The currently open file might be in a different directory than the working directory! Note that some commands, such as 'create', open files, so they might change the current  open file. 
        
           6. When editing files, it is easy to accidentally specify a wrong line number or to write code with incorrect indentation. Always check the code after you issue an edit to make sure that it reflects what you wanted to accomplish. If it didn't, issue another command to fix it. 
        
           [Current directory: /workspace/{workspace_dir_name}] 
        
           """ 
        
               else: 
        
                   # Testing general agents 
        
                   instruction = (

enyst · 2024-09-18T14:14:06Z

I wonder if it's better if we include the user message we want in the Executor ourselves, rather than nudge the LLM to include it. We know exactly the snippet we want, after all.

ryanhoangt · 2024-09-18T14:19:56Z

Yeah that makes sense, I can try doing that in the next run

ryanhoangt · 2024-09-26T02:34:37Z

Okay finally the score is converging to what we want, thanks @enyst for all the improvement suggestions! On the subset of 93 verified instances, CoAct resolved 33/93 while CodeAct resolved 39/93.

Some plots:

Comparing instances resolved in each category, seems like CoAct doesn't perform very well on easy level instances:

I'm gonna upload the trajectories to Huggingface shortly.

agenthub/codeact_agent/codeact_agent.py

enyst · 2024-09-26T06:18:39Z

agenthub/codeact_agent/codeact_agent.py

@@ -257,6 +265,10 @@ def _get_messages(self, state: State) -> list[Message]:
            else:
                raise ValueError(f'Unknown event type: {type(event)}')

+            if message and message.role == 'user' and not self.initial_task_str[0]:
+                # first user message
+                self.initial_task_str[0] = message.content[0].text


Just wondering, do we still need this?

enyst · 2024-09-26T06:24:01Z

Cheers! This is great news. ❤️

The reason I suggested we take a look at the default agent changes, was just to make sure that it doesn't change its normal behavior. Give or take some details that I'm guessing integration tests will be unhappy with, so we can see and fix them if so, I think it shouldn't be a problem.

ryanhoangt · 2024-09-28T04:31:36Z

The reason I suggested we take a look at the default agent changes, was just to make sure that it doesn't change its normal behavior. Give or take some details that I'm guessing integration tests will be unhappy with, so we can see and fix them if so, I think it shouldn't be a problem.

The trajectory is uploaded to the visualizer here. I'm going to run evaluation on all 300 instances with the remote runtime to see how it goes, also clean up code a bit and fix tests.

mamoodi · 2024-11-01T15:56:18Z

Hello @ryanhoangt. Just checking in to see if this is something that you will continue working on? There's lots of changes that have gone in recently and don't want you to run into too many hard to resolve conflicts as it seems like it's an involved PR.

ryanhoangt · 2024-11-02T06:27:49Z

Hey @mamoodi, thanks for checking in. I’m a bit tied up with other tasks at the moment, so I won’t be able to get back to this right away. Maybe we can close the PR for now and I will try to circle back when I have more bandwidth.

mamoodi · 2024-11-14T16:01:05Z

As per Ryan's comment, I'm going to close this PR for now. Whenever Ryan is ready, it will be reopened. Thank you.

ryanhoangt added 13 commits September 5, 2024 08:55

make codeact accept delegated task

48ebc6e

implement a draft planner agent that can delegate task to CodeAct

136f10a

add executor agent and refactor planner agent

6ebcf02

experiment: adjust prompt to make BrowsingAgent better potentially?

9b4430d

modify prompt and action parser for executor agent

135f26d

fix bug agent using wrong prompt files, improve logging on terminal a…

9da4217

…nd UI

revert debugging in browsing_agent.py

a963080

revert BrowsingAgent prompt

25d7a74

fix: local agent should finish instead of delegate

56edccb

Merge branch 'main' into coact-workflow-draft

cd660c0

update prompt of planner agent

567297c

add meta prompt to use in the future to reinforce workflow

2264e70

update README

136da2b

enyst reviewed Sep 7, 2024

View reviewed changes

agenthub/coact_agent/meta/executor_prompt.py Outdated Show resolved Hide resolved

enyst reviewed Sep 7, 2024

View reviewed changes

openhands/events/observation/browse.py Outdated Show resolved Hide resolved

ryanhoangt added 2 commits September 7, 2024 16:47

revert change in browser obs

d5b5831

fix copy paste error

c864237

neubig self-requested a review September 8, 2024 13:32

ryanhoangt added 2 commits September 8, 2024 16:19

update run_infer and README

39d8d0a

specialize global planner agent by removing editing skills and update…

fc44e17

… few-shot examples

ryanhoangt force-pushed the coact-workflow-draft branch from 2890cc5 to fc44e17 Compare September 9, 2024 12:23

Merge branch 'main' into coact-workflow-draft

5cadc38

neubig reviewed Sep 9, 2024

View reviewed changes

neubig mentioned this pull request Sep 10, 2024

Ketan simplified co act workflow #3793

Closed

improve planner agent's prompt

af0b3d7

enyst reviewed Sep 10, 2024

View reviewed changes

agenthub/coact_agent/executor/system_prompt.j2 Outdated Show resolved Hide resolved

ryanhoangt added 8 commits September 20, 2024 07:00

modify prompt location

dbc5b9d

Merge branch 'main' into coact-workflow-draft

942ff4e

add ability to inject problem statement into executor

10cc53b

Merge branch 'main' into coact-workflow-draft

9a7c5a9

fix bug in injecting initial user message

038313a

place 'reason'beefore 'description' in plan structure

34cabef

update executor's system prompt

c5303f2

final touch

f962fd6

enyst reviewed Sep 26, 2024

View reviewed changes

agenthub/codeact_agent/codeact_agent.py Outdated Show resolved Hide resolved

enyst reviewed Sep 26, 2024

View reviewed changes

Merge branch 'main' into coact-workflow-draft

7b4a2da

ryanhoangt added 7 commits September 28, 2024 06:25

fix unit tests

05cc97f

revert change in AgentDelegateAction

8b4eb1b

update mocks for intg tests

481a3e4

fix bug in CodeActAgent

ff90ff4

Merge branch 'main' into coact-workflow-draft

7b285c2

Merge branch 'main' into coact-workflow-draft

5e70aca

Merge branch 'main' into coact-workflow-draft

0823c1c

mamoodi closed this Nov 14, 2024

enyst mentioned this pull request Dec 27, 2024

Remove while True in AgentController #5868

Merged

1 task

enyst mentioned this pull request Jan 10, 2025

Delegation fixes #6165

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(workflow): Implement a simplified CoAct workflow #3770

feat(workflow): Implement a simplified CoAct workflow #3770

ryanhoangt commented Sep 7, 2024 •

edited

Loading

tobitege commented Sep 7, 2024

ryanhoangt commented Sep 7, 2024 •

edited

Loading

neubig left a comment

ryanhoangt commented Sep 10, 2024

tobitege commented Sep 10, 2024

ryanhoangt commented Sep 18, 2024 •

edited

Loading

enyst commented Sep 18, 2024

ryanhoangt commented Sep 18, 2024

ryanhoangt commented Sep 26, 2024

enyst Sep 26, 2024

enyst commented Sep 26, 2024

ryanhoangt commented Sep 28, 2024

mamoodi commented Nov 1, 2024

ryanhoangt commented Nov 2, 2024

mamoodi commented Nov 14, 2024

feat(workflow): Implement a simplified CoAct workflow #3770

feat(workflow): Implement a simplified CoAct workflow #3770

Conversation

ryanhoangt commented Sep 7, 2024 • edited Loading

tobitege commented Sep 7, 2024

ryanhoangt commented Sep 7, 2024 • edited Loading

neubig left a comment

Choose a reason for hiding this comment

ryanhoangt commented Sep 10, 2024

tobitege commented Sep 10, 2024

ryanhoangt commented Sep 18, 2024 • edited Loading

enyst commented Sep 18, 2024

ryanhoangt commented Sep 18, 2024

ryanhoangt commented Sep 26, 2024

enyst Sep 26, 2024

Choose a reason for hiding this comment

enyst commented Sep 26, 2024

ryanhoangt commented Sep 28, 2024

mamoodi commented Nov 1, 2024

ryanhoangt commented Nov 2, 2024

mamoodi commented Nov 14, 2024

ryanhoangt commented Sep 7, 2024 •

edited

Loading

ryanhoangt commented Sep 7, 2024 •

edited

Loading

ryanhoangt commented Sep 18, 2024 •

edited

Loading