Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat) Add trajectory replay for headless mode #6215

Merged
merged 9 commits into from
Jan 18, 2025

Conversation

li-boxuan
Copy link
Collaborator

@li-boxuan li-boxuan commented Jan 13, 2025

End-user friendly description of the problem this fixes or functionality that this introduces

Add trajectory replay feature for headless mode.

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

To test it out, add replay_trajectory_path to config.toml, and then

poetry run python openhands/core/main.py

TODOs in this PR:

TODOs in next PR:

  • support GUI mode
  • add integration tests with trajectory replay

Link of any specific issues this addresses

#6049


To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:74705a7-nikolaik   --name openhands-app-74705a7   docker.all-hands.dev/all-hands-ai/openhands:74705a7

@li-boxuan
Copy link
Collaborator Author

li-boxuan commented Jan 14, 2025

Demo:

Step 1. Run OpenHands and generate a trajectory

workspace_base="./workspace"
save_trajectory_path="./traj/demo.json"

With command: poetry run python openhands/core/main.py -t "Please goto GitHub trending and clone the most popular repo" -l claude

Step 2. Run OpenHands with trajectory replay

workspace_base="./workspace2"
replay_trajectory_path="./traj/demo.json"

With command: poetry run python openhands/core/main.py -l wrongkey (LLM config is not required, here's to showcase that a wrong key doesn't matter)

Step 3. Check the result

Screenshot 2025-01-13 at 11 26 45 PM

FYI: example trajectory
demo.json

@li-boxuan li-boxuan marked this pull request as ready for review January 14, 2025 07:28
@li-boxuan li-boxuan requested review from xingyaoww and enyst January 14, 2025 07:29
openhands/controller/agent_controller.py Outdated Show resolved Hide resolved
openhands/controller/agent_controller.py Outdated Show resolved Hide resolved
@li-boxuan li-boxuan requested a review from xingyaoww January 16, 2025 03:33
Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks! Only a nit

openhands/core/main.py Show resolved Hide resolved
openhands/core/main.py Show resolved Hide resolved
self.replay_events is not None
and self.replay_index < len(self.replay_events)
and isinstance(self.replay_events[self.replay_index], Action)
and self.replay_events[self.replay_index].source != EventSource.USER
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see! This has to work just fine... I do wonder though, what could break it?

  • delegation? because in that case, the controller itself handles AgentDelegateAction and creates a MessageAction for the new guy, and puts in the stream; the trajectory must have saved the old one too. I don't think it needs to be in scope of this PR, though.
  • MessageActions with source user, which happened after the initial task?

Copy link
Collaborator Author

@li-boxuan li-boxuan Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet delegation won't work; I am not even sure if trajectory properly records delegation.

MessageActions with source user, which happened after the initial task?

yeah that is just not possible with headless mode, but it would be interesting when we enable this in GUI mode as well... I'd love to have that functionality working, so that people can just upload trajectory to replay some recorded events first and then start working with agents.

Oh actually headless mode does allow interactive inputs, that would be interesting to test out as the next step.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet delegation won't work; I am not even sure if trajectory properly records delegation.

I did think of it here! Though I didn't test this use case after the last changes (and for a long time really).

Copy link
Collaborator Author

@li-boxuan li-boxuan Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MessageActions with source user, which happened after the initial task?

Tested in headless mode and it just worked!

Steps 0-5 were from replay

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it continues normally. But it can't replay the new one, together with that user message? 😅

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I mean, at this end of the run in your image, the event stream has all events, those first 5 steps plus 3 other steps. The agent history has them too. If we save trajectory now, traj2.json should contain all of them, including the user message in the middle. Can we replay this traj2.json?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 no, because "wait_for_response": true from agent message would trigger a AWAITING_USER_INPUT state.

A workaround is to manually fix the trajectory and change AWAITING_USER_INPUT to false. And that works!

A hack in the code is to somehow not AWAITING_USER_INPUT if there's a next action to replay. (Not implemented)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I wouldn't hack this. Maybe a next iteration of this feature, when we accept user messages, could take care of it because the user message should (?) change the agent state.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it needs some design in the next iteration. I am thinking that the agent state space should be a subset under replay mode. AWAITING_USER_INPUT is not a valid state during replay; it's only valid after a replay (or equivalently, without replay).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense too!

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this, let's take it in as soon as Xingyao's concerns are addressed, mine are just brainstorming really. I'm sure we will have other opportunities to play with this kind of thing, it's just a great start!

Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now!

@li-boxuan li-boxuan enabled auto-merge (squash) January 18, 2025 05:18
@li-boxuan li-boxuan merged commit 4383be1 into main Jan 18, 2025
15 checks passed
@li-boxuan li-boxuan deleted the boxuanli/trajectory-replay branch January 18, 2025 05:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants