-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Failing to run OpenRouter AND Ollama #5310
Comments
The issue in trajectory (1) is from a bug in the |
…ded for `create` command * Update `OHEditor.__call__` to raise `EditorToolParameterMissingError` with appropriate message * Add test case to verify the error is raised when `file_text` is missing for `create` command fixes All-Hands-AI/OpenHands#5310
@ryanhoangt how can I solve the other two then? What other info should I give regarding the matter to help debug this? |
Can you check the logs in terminal to see what errors happened that caused the state to change? You can also set |
For my own understanding are we expecting Qwen2.5 coder to be able to work well enough with OpenHands? |
Probably not, from the discussion here seems like it's not very good at tool using. I also saw that from the trajectories @BradKML posted recently. Would be nice if we have the swe-bench score for it though. @BradKML I think in the discussion they also mentioned there were some issues with OpenRouter, could that probably be the case for the errors in the 2 later trajectories? Can you try another provider besides ollama just to see if it works? |
@ryanhoangt so currently Ollama tool use is meh, and got a few pull requests. Make sense.
Extra note: okay now with a proper instruct model, it pretends that it did something when it did not? https://www.all-hands.dev/share?share_id=c151515f5d38a08393bb2cbf1a9cd1f207fe0dd1c832f238f0ec8ab1e0e15fc6 https://www.all-hands.dev/share?share_id=ec00360072ed24855106caba8749df79fabf93da289edfc76f56a2ec939d7ce3 Addendum: Using OpenRouter with |
Switching to Llama3 70B through OpenRouter using the non-advanced settings panels yield the same set of issues https://www.all-hands.dev/share?share_id=cc3af61bf54669abce07879e0bc810bce8bdb815735d0766b2776beb872afc0c Trying another Ollama model to see how it goes Ollama with Ollama with OpenCoder to see if another SOTA SLM can do the job, welp it managed to screw with file creation and got stuck on a loop (probably would be useful to add guard rails on long-form outputs) https://www.all-hands.dev/share?share_id=33f618f4bc251656660f93b809a27c190dd14deeacb8b60725a768e770dbe8ee |
OpenRouter with Gemma 2 27B (since it is not on the core dashboard) to see if switching models lead to some extra issues when it comes to running the service... #4920 with BTW this has always been about CodeAct Agent as the default, with WSL/Docker installation. But to @mamoodi it is better to achieve two goals simultaneously to make things better
Reference for the future #496 |
Testing 0.15 with Ollama instead, and it is not writing file
Examining Phi-3 and something similar happens (expected to write bad code but not being to write file is worse)
BTW another issue with OpenRouter that might be related to the issue #3435 P.S. another Agentic loop for OpenRouter through the default settings panel https://www.all-hands.dev/share?share_id=27f4d1c00ac45c6026e54116bc294c7dcb86227941d64fa4d1e284d95c8dd3c0
@ryanhoangt if I use |
For reference @ryanhoangt it seems like OpenRouter setup with CodeActAgent as default, will force itself to use Git files that are non-existent. openrouter_fails.txt Will test Ollama and dump its logs tomorrow and see why there are discrepancies between two providers of the same model |
Ran OpenRouter a second time but there is (a) clearly no Addendum 2: it seems that once the error is hit, it is stuck, then it automatically resumes in the command line? how_can_this.txt why_is_it_restarting.txt |
@BradKML Thank you for the details reports and logs! Please let me note a couple of things quickly, on the latest: Re:
If you are referring to logging like this:
You can actually ignore this. There is indeed no gitignore; the spam-log is a bug frankly, it doesn't affect anything but it's annoying. That's not the agent nor the LLM; it doesn't affect the actions; it's the execution environment wasting time. In the log, I see what looks like the LLM installed some packages whose versions don't match, and when it got an exception it started editing the site-packages files.
Re:
I'm not sure what you mean by state changing bug? At first sight, what I see in the log seems to be that the LLM didn't include the appropriate indentation, when asking for edits, so edits weren't performed. |
Re: function calling Just to clarify, we introduced relatively recently (a month or two now) function calling, in the sense of using the actual "tool use" or "function calls" APIs offered by some providers. There are only a few models (and their providers) that worked very well with it, and it is enabled only for those. A number of models tested clearly worked better without this: we fallback instead to define a sort of function calling via prompting, like we always did prior to this feature. (just tell the model what and how to answer if it wants this action done). More importantly: please see for details and data on what worked in tests. A lot of models are not capable to follow instructions well enough, or when they sort of do they get confused at some point because of their own multi-step history and starting trying to ask for actions that don't exist etc. |
Re:
That issue was fixed a while ago. I updated the issue to make it clear it was actually fixed. Sorry for the confusion. Re:
Sorry, can you elaborate? |
Re: docker. I'm not sure, but if you suspect there's an issue with docker (workspace or perms etc), please open another issue for that, so that we can look into it on its own. It's too different to deal with it here. Re: stuck in loops.
|
Re:
It doesn't show in the trajectories shared via Share feedback. DEBUG=1 shows the debug logging in the console and log file, but the Share feedback sends only the actual events I think; not actual logs. |
@enyst thanks for all the replies
|
That's one option, I think. If you want to try, a PR is most welcome. I assume we will evaluate the approach. |
@enyst there are a few possible solutions around this
Borrowed writing from "guides" (weirdly enough their Top-P and temperature are correlated so it is suspicious) https://promptengineering.org/prompt-engineering-with-temperature-and-top-p https://dropchat.co/blog/controlling-gpt-4s-creative-thermostat-understanding-temperature-and-top_p-sampling https://community.openai.com/t/cheat-sheet-mastering-temperature-and-top-p-in-chatgpt-api/172683
Some underlying observations and recommendations:
P.S. about self-repetition in conversation, DRY is seen as an alternative to classical penalty systems. XTC (exclude top choice) also looks interesting but are slightly more forced. Also need to thank @SmartManoj for this SmartManoj#134 (comment) |
Ran another round of stress-test @SmartManoj (but for some reason I can't load the whole log) |
The trajectory is 21 MB. So, the browser couldn't handle that. The following script will save that into a file. import requests
import json
json_data = {
'feedback_id': '4dbb93310608f43026c9843cf184ce93240b31565f6eb913fbcda369d43ec639',
}
response = requests.post(
'https://show-od-trajectory-3u9bw9tx.uc.gateway.dev/show-od-trajectory',
json=json_data,
)
with open('response.json', 'w') as f:
json.dump(response.json(), f) |
@SmartManoj does it look "off" in any way? Like OpenRouter connection errors are still present |
Could you share the error message? (or) Could you add these lines and share again if there is no sensitive data in the error msg? |
Sorry, just putting up with this other error @SmartManoj #6056 |
I wanted to report this update:
|
Is there an existing issue for the same bug?
Describe the bug and reproduction steps
Note: both are using Qwen2.5 coder and all of them failed half way
OpenHands Installation
Docker command in README
OpenHands Version
Latest Docker image with 0.14
Operating System
WSL on Windows
Logs, Errors, Screenshots, and Additional Context
No response
The text was updated successfully, but these errors were encountered: