Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

developer__text_editor tool calls doesn't update files #929

Closed
Kamariw95 opened this issue Jan 30, 2025 · 8 comments
Closed

developer__text_editor tool calls doesn't update files #929

Kamariw95 opened this issue Jan 30, 2025 · 8 comments
Assignees
Labels
help wanted Great issue for non-Block contributors

Comments

@Kamariw95
Copy link

Kamariw95 commented Jan 30, 2025

Describe the bug
when using the cli or the desktop, the developer__text_editor tool_call doesn't actually update the files like Goose expects.

To Reproduce
Steps to reproduce the behavior:

  1. Start a session with goose session
  2. Conduct conversation that results in a <tool_call>
  3. Check repo via git status for file updates.
  4. See no changes on repository.

Expected behavior
The subsequent file that goose has added context, to be updated locally.

Screenshots
Image

Please provide following information:

  • OS & Arch: macOS M2 Pro 32GB
  • Interface: primarily CLI, but also happening in UI.
  • Version: v1.0.0
  • Extensions enabled:
  • Developer, Computer Controller, Memory, git, Knowledge Graph Memory, Fetch. (in UI)
  • Developer (in CLI)
  • Provider & Model: Ollama qwen2.5:7b.

Additional context
When I initially downloaded Goose, I downloaded the Desktop application and it repeatedly asked for the same file permissions. This leads me to believe I'm having some wonky permissions error - but I'm also new to running models and this maybe user error, forgive me if it is.

@jscott-yps
Copy link

Goose is extremely resistant to actually editing files in my own experience so far too. I will ask it to edit a file and it will just respond with the changes in the message flow. It'll take 1 or 2 attempts in responses to get it to actually do the edit itself.

"EDIT THE FILE, DON'T TELL ME HOW" 😄

@salman1993
Copy link
Collaborator

salman1993 commented Jan 30, 2025

during our testing, we have found the tool calling capabilities of 7B models (text editing, bash) is much worse compared to the larger models. the gap is wider when it comes to tool calling vs. general chat completion.

goose works with Anthropic's Claude 3.5 Sonnet. among free options, i'd recommend Gemini free tier now. AFAIK DeepSeek 70B doesn't have great tool calling yet, hopeful that will change soon!

@jscott-yps
Copy link

jscott-yps commented Jan 30, 2025

during our testing, we have found the tool calling capabilities of 7B models (text editing, bash) is much worse compared to the larger models. the gap is wider when it comes to tool calling vs. general chat completion.

goose works with Anthropic's Claude 3.5 Sonnet. if you're looking for a free, i'd recommend Gemini free tier.

I'm not the reporter so i'll shut up 😄 but just wanted to add some context for my comment, i am using GPT-4o

@salman1993 salman1993 added the help wanted Great issue for non-Block contributors label Jan 30, 2025
@Kamariw95
Copy link
Author

Kamariw95 commented Jan 30, 2025

Thank you for your quick response @salman1993.

To be fair, I've swapped to using Claude and this tool is extremely impressive. Is there anything that we could improve the performance of open source models (e.g. DeepSeek, Llama, etc)? should I just use more CPU/GPU power?

@kamal94
Copy link

kamal94 commented Jan 30, 2025

Thank you @salman1993. I have had good experience using Goose with Sonnet-3.5, but as @Kamariw95 noted, it is practically unusable with local models (I tried using Qwen, Llama-3.2, deepseek).

Considering Anthropic's API is rather expensive at scale (I burned through a dollar with a few commands setting up a local repo), it seems like local models would be a great use case for Goose, considering their cost savings (practically cost of electricity).

Is there planned/active work on improving the agent's performance (tool calling in this case) with lower parameter models?

P.S. I found it amusing that our usernames are
salman19**93**
kamal**94**
Kamariw**95**

@salman1993
Copy link
Collaborator

salman1993 commented Jan 30, 2025

haha that is funny! yeah, we are testing out some open models internally right now, example: watt-tool-70B looks promising.

@Kamariw95 so far, i think Llama 3.3 70B instruct is not bad since it was finetuned for tool calling but you can see the difference when you use it compared to Claude 3.5 Sonnet. I find Llama 3.3 struggles after 6-8 tool calls in a loop.

we would love to suggest a cheaper and/or open-source model (hopeful about DeepSeek when they support tool calling natively)!

@michaelneale
Copy link
Collaborator

I have been trying to get deepseek-r1 and others to do tool calling, isn't great, but looking at the default Qwen 2.5 with the change @wendytang has (and another variant of it here): #1021 may help with some of these local models that can do tool calling

@baxen
Copy link
Collaborator

baxen commented Feb 7, 2025

I'm not the reporter so i'll shut up 😄 but just wanted to add some context for my comment, i am using GPT-4o

@jscott-yps 100% i really have to nudge gpt-4o into making the edits, but once it attempts it makes good ones/uses the tools correctly

We've confirmed that the original issue here was the model using an incorrect format for the tool call - and so the tool call itself didn't actually process. Going to close this for now as in the current state we need larger models to get consistent results. But we'll be working on improvements to try to get this to work better with smaller models!

@baxen baxen closed this as completed Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Great issue for non-Block contributors
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants