Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please help me test tool-use (function calling) #514

Open
karthink opened this issue Dec 22, 2024 · 165 comments
Open

Please help me test tool-use (function calling) #514

karthink opened this issue Dec 22, 2024 · 165 comments
Labels
help wanted Extra attention is needed

Comments

@karthink
Copy link
Owner

karthink commented Dec 22, 2024

Note

Current status of tool-use:

This feature is now merged, and available in the master branch.

Backend with streaming without streaming Structured arguments/output Limitations
Anthropic
OpenAI-compatible
Gemini Untested Tools must take arguments
Ollama Untested streaming will be turned off

I've added tool-use/function calling support to all major backends in gptel -- OpenAI-compatible, Claude, Ollama and Gemini.

Demos

screencast_20241222T075329.mp4
gptel-tool-use-filesystem-demo.mp4

Same demo as the previous one, but with permissions and tool result inclusion turned on:

gptel-tool-use-filesystem-confirm-demo.mp4

Call to action

Please help me test it! It's on the feature-tool-use branch. There are multiple ways in which you can help. Ranked from least to most intensive:

  1. Switch to the feature-tool-use branch and just use gptel as normal -- no messing around with tool use. Adding tool use required a significant amount of reworking in gptel's core, so it will help to catch any regressions first. (Remember to reinstall/re-byte-compile the package after switching branches!)

  2. Switch to the branch, define a tool or two, and try using gptel (instructions below). Let me know if something breaks.

  3. Same as 2, but suggest ways that the feature can be improved, especially in the UI department.


What is "tool use"?

"Tool use" or "function calling" is LLM usage where

  • you include a function specification along with your task/question to the LLM.
  • The LLM optionally decides to call the function, and supplies the function call arguments.
  • You run the function call, and (optionally) feed the results back to the LLM. gptel handles this automatically.
  • The LLM completes the task based on the information received.

You can use this to give the LLM awareness of the world, by providing access to APIs, your filesystem, web search, Emacs etc. You can get it to control your Emacs frame, for instance.

How do I enable it in gptel?

There are three steps:

  1. Use a model that supports tool use. Most of the big OpenAI/Anthropic/Google models do, as do llama3.1 and the newer mistral models if you're using Ollama.

  2. (setq gptel-use-tools t)

  3. Write tool definitions. See the documentation of gptel-make-tool. Here is an example of a tool definition:

Tool definition example
(setq gptel-tools         ;; <-- Holds a list of tools
      (list        
       (gptel-make-tool   ;; <-- This is a tool definition
        :function (lambda (location unit)
                    (url-retrieve-synchronously (format "api.weather.com/..."
                                                        location unit)))
        :name "get_weather" ;; <-- Javascript style, snake_case name
        :description "Get the current weather in a given location"
        :args (list '(:name "location"
                      :type "string"
                      :description "The city and state, e.g. San Francisco, CA")
                    '(:name "unit"
                      :type "string"
                      :enum ("celsius" "farenheit")  ;; <-- enum types help reduce hallucinations, optional
                      :description
                      "The unit of temperature, either 'celsius' or 'fahrenheit'"
                      :optional t)))))

And here are a few simple tools for Filesystem/Emacs/Web access. You can copy and evaluate them in your Emacs session:

Code:

Some tool definitions, copy to your Emacs
(gptel-make-tool
 :function (lambda (url)
             (with-current-buffer (url-retrieve-synchronously url)
               (goto-char (point-min)) (forward-paragraph)
               (let ((dom (libxml-parse-html-region (point) (point-max))))
                 (run-at-time 0 nil #'kill-buffer (current-buffer))
                 (with-temp-buffer
                   (shr-insert-document dom)
                   (buffer-substring-no-properties (point-min) (point-max))))))
 :name "read_url"
 :description "Fetch and read the contents of a URL"
 :args (list '(:name "url"
               :type "string"
               :description "The URL to read"))
 :category "web")

(gptel-make-tool
 :function (lambda (buffer text)
             (with-current-buffer (get-buffer-create buffer)
               (save-excursion
                 (goto-char (point-max))
                 (insert text)))
             (format "Appended text to buffer %s" buffer))
 :name "append_to_buffer"
 :description "Append text to the an Emacs buffer.  If the buffer does not exist, it will be created."
 :args (list '(:name "buffer"
               :type "string"
               :description "The name of the buffer to append text to.")
             '(:name "text"
               :type "string"
               :description "The text to append to the buffer."))
 :category "emacs")

;; Message buffer logging tool
(gptel-make-tool
 :function (lambda (text)
             (message "%s" text)
             (format "Message sent: %s" text))
 :name "echo_message"
 :description "Send a message to the *Messages* buffer"
 :args (list '(:name "text"
               :type "string"
               :description "The text to send to the messages buffer"))
 :category "emacs")

;; buffer retrieval tool
(gptel-make-tool
 :function (lambda (buffer)
             (unless (buffer-live-p (get-buffer buffer))
               (error "Error: buffer %s is not live." buffer))
             (with-current-buffer  buffer
               (buffer-substring-no-properties (point-min) (point-max))))
 :name "read_buffer"
 :description "Return the contents of an Emacs buffer"
 :args (list '(:name "buffer"
               :type "string"
               :description "The name of the buffer whose contents are to be retrieved"))
 :category "emacs")


(gptel-make-tool
 :function (lambda (directory)
	     (mapconcat #'identity
                        (directory-files directory)
                        "\n"))
 :name "list_directory"
 :description "List the contents of a given directory"
 :args (list '(:name "directory"
	       :type "string"
	       :description "The path to the directory to list"))
 :category "filesystem")

(gptel-make-tool
 :function (lambda (parent name)
             (condition-case nil
                 (progn
                   (make-directory (expand-file-name name parent) t)
                   (format "Directory %s created/verified in %s" name parent))
               (error (format "Error creating directory %s in %s" name parent))))
 :name "make_directory"
 :description "Create a new directory with the given name in the specified parent directory"
 :args (list '(:name "parent"
	       :type "string"
	       :description "The parent directory where the new directory should be created, e.g. /tmp")
             '(:name "name"
	       :type "string"
	       :description "The name of the new directory to create, e.g. testdir"))
 :category "filesystem")

(gptel-make-tool
 :function (lambda (path filename content)
             (let ((full-path (expand-file-name filename path)))
               (with-temp-buffer
                 (insert content)
                 (write-file full-path))
               (format "Created file %s in %s" filename path)))
 :name "create_file"
 :description "Create a new file with the specified content"
 :args (list '(:name "path"
	       :type "string"
	       :description "The directory where to create the file")
             '(:name "filename"
	       :type "string"
	       :description "The name of the file to create")
             '(:name "content"
	       :type "string"
	       :description "The content to write to the file"))
 :category "filesystem")

(gptel-make-tool
 :function (lambda (filepath)
	     (with-temp-buffer
	       (insert-file-contents (expand-file-name filepath))
	       (buffer-string)))
 :name "read_file"
 :description "Read and display the contents of a file"
 :args (list '(:name "filepath"
	       :type "string"
	       :description "Path to the file to read.  Supports relative paths and ~."))
 :category "filesystem")
An async tool to fetch youtube metadata using yt-dlp
(defun my/gptel-youtube-metadata (callback url)
  (let* ((video-id
          (and (string-match
                (concat
                 "^\\(?:http\\(?:s?://\\)\\)?\\(?:www\\.\\)?\\(?:youtu\\(?:\\(?:\\.be\\|be\\.com\\)/\\)\\)"
                 "\\(?:watch\\?v=\\)?" "\\([^?&]+\\)")
                url)
               (match-string 1 url)))
         (dir (file-name-concat temporary-file-directory "yt-dlp" video-id)))
    (if (file-directory-p dir) (delete-directory dir t))
    (make-directory dir t)
    (let ((default-directory dir) (idx 0)
          (data (list :description nil :transcript nil)))
      (make-process :name "yt-dlp"
                    :command `("yt-dlp" "--write-description" "--skip-download" "--output" "video" ,url)
                    :sentinel (lambda (proc status)
                                (cl-incf idx)
                                (let ((default-directory dir))
                                  (when (file-readable-p "video.description")
                                    (plist-put data :description
                                               (with-temp-buffer
                                                 (insert-file-contents "video.description")
                                                 (buffer-string)))))
                                (when (= idx 2)
                                    (funcall callback (gptel--json-encode data))
                                    (delete-directory dir t))))
      (make-process :name "yt-dlp"
                    :command `("yt-dlp" "--skip-download" "--write-auto-subs" "--sub-langs"
                               "en,-live_chat" "--convert-subs" "srt" "--output" "video" ,url)
                    :sentinel (lambda (proc status)
                                (cl-incf idx)
                                (let ((default-directory dir))
                                  (when (file-readable-p "video.en.srt")
                                    (plist-put data :transcript
                                               (with-temp-buffer
                                                 (insert-file-contents "video.en.srt")
                                                 (buffer-string)))))
                                (when (= idx 2)
                                    (funcall callback (gptel--json-encode data))
                                    (delete-directory dir t)))))))

(gptel-make-tool
 :name "youtube_video_metadata"
 :function #'my/gptel-youtube-metadata
 :description "Find the description and video transcript for a youtube video.  Return a JSON object containing two fields:

\"description\": The video description added by the uploader
\"transcript\": The video transcript in SRT format"
 :args '((:name "url"
          :description "The youtube video URL, for example \"https://www.youtube.com/watch?v=H2qJRnV8ZGA\""
          :type "string"))
 :category "web"
 :async t
 :include t)

As seen in gptel's menu:

screenshot_20241231T062011

See the documentation for gptel-make-tool for details on the keyword arguments.

Tip

@jester7 points out that you can get the LLM to write these tool definitions for you, and eval the Org Babel blocks to use them right away.

Important

Please share tools you write below so I can use them to test for issues.

In this case, the LLM may choose to ask for a call to get_weather if your question is related to the weather, as in the above demo video. You can help it along by saying something like:

Use the provided tools to accomplish this task: ...

Notes

  • Tools can be asynchronous, see the documentation of gptel-make-tool for an example.
  • You can use the tool definition schema to get the LLM to generate JSON for you. This is one of the ways to get LLM APIs to generate JSON output.
  • Right now tool call results are automatically sent back to the LLM. We'll add a way to make this optional, for when the tool needs to run for side-effects only. For now you can make your tool return nil if you want to run it for side-effects only.
  • It is possible to force the LLM to use provided tools -- this switch has not yet been implemented.
  • LLMs may use tools in parallel if multiple tools are specified -- this is fully supported in gptel.
@karthink
Copy link
Owner Author

@agzam -- in case you're interested.

@agzam
Copy link

agzam commented Dec 22, 2024

Oh wow, this is very cool. So many interesting ideas to try. I'm excited, will give it a go. Thank you!

@metachip
Copy link

Excellent. Will give it a go.

@jester7
Copy link

jester7 commented Dec 23, 2024

Wow, this works great. I just created tools for being able to cat files, ls, and creating new files. I had trouble with OpenAI calling the tools but Claude Sonnet works fine. I even had Claude write a couple of tools, then eval the emacs lisp blocks inside the org mode buffer, and have Claude immediately start using them.

I'll help you test these tools step by step:

  1. First, let's create the subdirectories "1" and "2" inside claude-scripts:

Now let's create hello world files in each directory:

Let's verify the structure using the tree_view tool:

Let's verify the content of both files:

Great! I've tested all the tools:

  1. Created directories "1" and "2"
  2. Created hello.txt files in each directory with different content
  3. Used tree_view to show the structure
  4. Read both files to verify their contents

All tools are working as expected! The tree view shows the correct directory structure, and the file contents are exactly as we wrote them. Would you like to try something else with these tools?

@karthink
Copy link
Owner Author

I even had Claude write a couple of tools, then eval the emacs lisp blocks inside the org mode buffer, and have Claude immediately start using them.

Holy bootstrap, Batman!

I had trouble with OpenAI calling the tools

Did it throw an error or just ignore the tools? If it was silent failure you can check the *gptel-log* buffer after running (setq gptel-log-level 'info).

@karthink karthink pinned this issue Dec 23, 2024
@jester7
Copy link

jester7 commented Dec 23, 2024

It seems to be calling the tools but fails at the end. It also failed with gemini, llama, and qwen for me but I have to double check my tools because for simpler use cases I think it was working a while ago. This same prompt works fine with the Claude models sonnet and haiku.

Here is the Messages buffer and attached is the log
gptel-tool-use-log-openai.txt
:
Querying OpenAI... gptel: moving from INIT to WAIT gptel: moving from WAIT to TYPE gptel: moving from TYPE to TOOL error in process sentinel: let: Wrong type argument: stringp, nil error in process sentinel: Wrong type argument: stringp, nil

@karthink
Copy link
Owner Author

karthink commented Dec 23, 2024 via email

@jester7
Copy link

jester7 commented Dec 23, 2024

Update: it seems Gemini, Llama, and Qwen models work only if I make a request that requires a single tool call. For example I did a request to each to summarize a URL and do a directory listing on my local machine and these types of interactions work.

@karthink
Copy link
Owner Author

karthink commented Dec 23, 2024

it seems Gemini, Llama, and Qwen models work only if I make a request that requires a single tool call

Could you try it with this OpenAI backend?

(gptel-make-openai "openai-with-parallel-tool-calls"
  :key YOUR_OPENAI_API_KEY
  :stream t
  :models gptel--openai-models
  :request-params '(:parallel_tool_calls t))

Parallel tool calls are supposed to be enabled by default, so I'm not expecting that this will work, but it would be wise to verify.

@karthink
Copy link
Owner Author

Here is the Messages buffer and attached is the log
gptel-tool-use-log-openai.txt

Could you also share the tool definitions you used in this failed request? I'd like to try reproducing the error here.

@ProjectMoon
Copy link

For ollama, I see the tool being sent to ollama in the gptel log buffer, but none of the models ever actually seem to use the tools. Have tried with Mistral Nemo, Qwen 2.5, Mistral Small, Llama 3.2 vision.

@karthink
Copy link
Owner Author

karthink commented Dec 24, 2024

@ProjectMoon Could you share the tools you wrote so I can try to reproduce these issues?

@ProjectMoon
Copy link

ProjectMoon commented Dec 24, 2024

Just a copy and paste of the example one.

I will try again at some point in the coming days with ollama debugging mode turned on to see what reaches the server.

Edit: also I need to test with a direct ollama connection. This might be (and probably is) a bug in Open WebUI's proxied ollama API.

@jester7
Copy link

jester7 commented Dec 25, 2024

I get these types of errors:
Querying openai-with-parallel-tool-calls... error in process sentinel: let: Wrong type argument: stringp, nil error in process sentinel: Wrong type argument: stringp, nil Querying Claude... This is a test message Claude error: ((HTTP/2 400) invalid_request_error) messages.4: Did not find 1tool_resultblock(s) at the beginning of this message. Messages followingtool_useblocks must begin with a matching number oftool_resultblocks.

Attached are my gptel config files, the regular one I use and a minimal version I made for testing tools using your suggestion "openai-with-parallel-tool-calls".
gptel-minimal.el.txt
gptel-config.el.txt

@karthink
Copy link
Owner Author

@jester7 Thanks for the tool definitions.

I've fixed parallel tool calls for the Claude and OpenAI-compatible APIs. Please update and test both the streaming and non-streaming cases. You can turn off streaming with (setq gptel-stream nil). If something fails please provide the log.

Parallel tool calls with the Gemini and Ollama APIs are still broken. All these APIs validate their inputs differently, and the docs don't contain the validation schema so adding tool calls is a long crapshoot. Still, we truck on.

@karthink
Copy link
Owner Author

karthink commented Dec 28, 2024

Update: Parallel tool calls with Gemini works too, but only as long as all function calls involve arguments. Zero-arity functions like get_scratch_buffer cause the Gemini API to complain.

@ProjectMoon
Copy link

ProjectMoon commented Dec 28, 2024

@ProjectMoon Could you share the tools you wrote so I can try to reproduce these issues?

OK, definitely seems to be more a problem with OpenWebUI's proxied Ollama API... although it was supposedly resolved to be able to pass in structured inputs. I will have to dig into the source code to see if it even does anything with the tools parameter.

I was able to make a tool call when connecting directly to the ollama instance using Mistral Nemo.

Edit: Yep doesn't have the tools param in the API, so it's discarded silently.

@karthink
Copy link
Owner Author

Edit: Yep doesn't have the tools param in the API, so it's discarded silently.

Thanks. When we merge this we should add a note to the README about the tool-use incompatibility with OpenWebUI.

@karthink
Copy link
Owner Author

Parallel tool calls now work with Ollama too, but you have to disable streaming. Ollama does not support tool calls with streaming responses.

@karthink
Copy link
Owner Author

I've updated the opening post with a status table I'll keep up to date.

@ProjectMoon
Copy link

So I added the tools parameter to OpenWebUI (it was just adding a single line to the chat completions form class, it seems). Then I get a response back from the proxied ollama API containing the tool call to use. But unlike when connecting directly, gptel seems to do nothing. Looking at the elisp code, the only thing that makes sense is the content from the OWUI response being non-empty, but both OWUI response and the direct connection response have "content": "" o_O

@karthink
Copy link
Owner Author

karthink commented Dec 28, 2024 via email

@prdnr
Copy link

prdnr commented Dec 29, 2024

  1. Same as 2, but suggest ways that the feature can be improved, especially in the UI department.

I’m only recently picking emacs back up, the last time I regularly used it was before tools like gpt existed. But if I understand correctly: Since the tool results aren’t echoed to the chat buffer created by m-x gptel, and if gptel-send sends the buffer contents before the point, then won’t any added context that results from a tool call get dropped from the conversation in the next message round?

If that is the case, perhaps it would be nice to provide a way to help people capture tool results to the context tooling provided by gptel? Or maybe have the tool results echoed to the chat buffer (perhaps in a folded block?)

@karthink
Copy link
Owner Author

karthink commented Dec 29, 2024 via email

@karthink
Copy link
Owner Author

But unlike when connecting directly, gptel seems to do nothing. Looking at the
elisp code, the only thing that makes sense is the content from the OWUI
response being non-empty, but both OWUI response and the direct connection
response have "content": "" o_O

@ProjectMoon This can happen if you have streaming turned on, since Ollama
doesn't support streaming + tool use. Can you ensure that gptel-stream is set
to nil before testing Ollama + OWUI?

(I will eventually handle this internally, where streaming is automatically
turned off if an Ollama request includes tools. Right now it's not clear where
in the code to put this check.)

@karthink
Copy link
Owner Author

I've added tool selection support to gptel's transient interface:

image

Pressing t to select a tool opens up:

screenshot_20241231T062011

Selecting a category (like "filesystem" or "emacs" here) will toggle all the tools in that category.

Tool selection can be done globally, buffer-locally or for the next request only using the Scope (=) option.

This makes it much more convenient to select the right set of tools for the task at hand. (LLMs get confused if you include a whole bunch of irrelevant tools.)

I've also updated the opening post above with the tool definitions you see in the above image. You can grab them from there and evaluate them. I'm not sure yet if gptel should include any tools by default.

@karthink
Copy link
Owner Author

karthink commented Dec 31, 2024

And here is a demo of using the filesystem toolset to make the LLM do something that's otherwise annoying to do:

gptel-tool-use-filesystem-demo.mp4

@psionic-k
Copy link
Contributor

On another note, I've been curious to see the effectiveness of a tool leveraging eglot that allows models to pull further context of code via LSP. Has anyone already tried that out?

@jarreds Yes. It's viable and I'm working on more of it.

On the gptel side, this will be a directive/context option, not a tool call, so it can be discussed in the relevant (or new) threads instead of here.

I don't quite understand this answer because to me it is definitely a tool call problem. Maybe it will make more sense after my demo.

@karthink
Copy link
Owner Author

karthink commented Feb 2, 2025

If properties blocks are removed now, a temp buffer is already available. Is that the case or was that supposing we did context cleaning?

It was supposing we did context cleaning/prompt filtering. Which we will, for sure. It's just going to take a while because I both have to and want to move slow. Have to because of severe time constraints. Want to because I find the decisions are better and the project more sustainable at lower velocities, and while I can still load the whole thing into my head.

Since gptel is creating most :TOOL_CALL: and :END: text when it inserts tool results, that is the perfect time to add a 'gptel-ignore property and avoid the need to parse for that particular element at all.

From what I see, gptel is inserting many property blocks that would need removal. Those can also be marked with 'gptel-ignore at that time. When opening a persisted chat, only one parse and marking is necessary.

The same strategy can be used to avoid any other repeated calls to org-element-* by just "painting" text that should be ignored once and then not touching it again. In the temp buffer, no parsing is needed to remove text already marked with a property. It's linear scan.

It's going to be at least two parses, not one, but the effort isn't the issue with text properties (unlike org-element).

The problem is that the above is an 80% solution, for several reasons. Chat files are not static records -- properties get added and removed programmatically, as do various other kinds of line noise Org syntax elements. None of these edits will respect existing text properties, causing the actual "painted" state to be a path-dependent mess.

Chat files need to be persisted. If we implement gptel-ignore, the current method of storing response bounds as local variables or Org properties when writing the file will no longer suffice. gptel is already suffering from this 80% persistence solution I implemented in the past, so this will compound the problem. A full redesign is necessary, but I don't have the time for it in the next few months. Simple as it may look, it's going to take a few dozen hours to design and test a more robust persistence mechanism.

That said, I'd like to whip up a quick prototype for you to test, hopefully soon.

In any case, cloning a buffer only required about 1k conses for the *Memory Report* buffer. Since we're already writing the context to a json in some cases, it's on the same order.

Yeah, I'm not particularly worried about the memory, only about Emacs hitching for a second when you send the query. I hate that. I test gptel on a low-end 2012 Thinkpad now and then to ensure that creating the payload and streaming responses remains buttery-smooth.

@psionic-k
Copy link
Contributor

I hate that. I test gptel on a low-end 2012 Thinkpad now and then to ensure that creating the payload and streaming responses remains buttery-smooth.

I'd bank on igc branch landing before long, so the upstream fix will will happen before you get to a downstream bottleneck.

@karthink
Copy link
Owner Author

karthink commented Feb 2, 2025

I don't quite understand this answer because to me it is definitely a tool call problem. Maybe it will make more sense after my demo.

It can be useful to do this reactively via tools, but it's much simpler to include a repo map (identifiers, relationships etc) as context with the original query. This will work with every model and doesn't require tool-use capability or an expensive back-and-forth.

That said, at this point you've played with tool-use applications more than I have, as I spent most of my time just getting the thing working. So I look forward to learning from your experience.

@psionic-k
Copy link
Contributor

That said, I'd like to whip up a quick prototype for you to test, hopefully soon.

My high-value use cases are generally dependent on lots of successive tool calls. Therefore auto-mimicry is a compounding failure rate problem. The model fakes a tool call maybe 2% of the time on the first call. I don't think I've ever seen it. After there are :TOOL_CALL: drawers in context, this goes up to at least 10%. Once there are fake calls in context, I'd say it's at least 30-40% and I'm better off trashing the context. The model is clearly unstable and "attempting" to make a tool call. It will sometimes output several fake calls before finally manifesting a real call, all without user interaction. Given the compounding failure rate, the utility quickly drops to zero where it's quite high on the first series of calls.

I've tried to prompt engineer around this without apparent success. The models are just instantly mesmerized by :TOOL_CALL:. They love it. It's syntax. It's like LLM fentanyl.

@psionic-k
Copy link
Contributor

Here's a crappy first-pass at filtering tool calls header/footer:
psionic-k@35d27e7

I need to test a bit and see if the LLM continues to confuse itself or not.

I'd like to not guess too hard. I haven't seen any first-hand example of how tool calls usually exist in context. I'd like to be as close as possible to what they train on.

Think I saw another first. Making my prompt request org mode output resulted in a fake tool call using an elisp source block instead of a normal ``` fence.

@link0ff
Copy link

link0ff commented Feb 3, 2025

Thanks, FSM works great! Do you have any plans to support the programmatic use of gptel-request where :callback is called only for the final state?

@karthink
Copy link
Owner Author

karthink commented Feb 3, 2025

Here's a crappy first-pass at filtering tool calls header/footer:
psionic-k@35d27e7

This is good enough for your testing, I hope?

I've got a more integrated, single-scan version partially working here. I'll share it soon. It reuses the gptel text property instead of defining another one. Here are the semantics for its possible values, the first two constitute the current behavior:

  • nil or missing: user prompt
  • response: LLM response
  • ignore: Ignored text

Any thoughts on this approach? In the future I plan to change the value format to (response ...) to include more metadata, for provenance tracking etc.

I'd like to not guess too hard. I haven't seen any first-hand example of how tool calls usually exist in context.

If you want to do this properly, you have to populate the messages array with the tool call and response messages when creating the full prompt, and not include it in the response chunk. Then the LLM is guaranteed to "understand" that it called a tool with the provided results in previous conversation turns. This is doable from buffer text but it's going to be messy and fragile.

It's easier if you store the tool call log in an overlay or another text property, but as I mentioned above I don't want to increase the amount of invisible state in the buffer, notwithstanding the ignore property. It becomes harder for the user to reason about the conversation.

Think I saw another first. Making my prompt request org mode output resulted in a fake tool call using an elisp source block instead of a normal ``` fence.

This is very common. gptel should handle this fine already. If it doesn't please report a bug.

@karthink
Copy link
Owner Author

karthink commented Feb 3, 2025

Thanks, FSM works great! Do you have any plans to support the programmatic use of gptel-request where :callback is called only for the final state?

I'm not planning to, no. gptel-request takes an :fsm argument, and the state machine handlers are a superset of (and more flexible than) using callbacks/hooks. So what would be the advantage of adding a custom callback for the final state?

@link0ff
Copy link

link0ff commented Feb 3, 2025

So what would be the advantage of adding a custom callback for the final state?

In the spirit of simplicity of the one-shot prompt example at https://github.com/karthink/gptel/wiki/Defining-custom-gptel-commands such a simple call could transparently handle tools as well. Then :callback could be called at the final state with the accumulated response.

@karthink
Copy link
Owner Author

karthink commented Feb 3, 2025

In the spirit of simplicity of the one-shot prompt example ...

@link0ff Ah, I understand -- gptel already provides this. The gptel-request callback is called multiple times if required, including just before the "final state". You can dispatch on the type of response. Here's an updated version of the one-shot prompt example that handles tool calls, including interactive confirmation before running tools:

(defvar gptel-lookup--history nil)

(defun gptel-lookup (prompt)
  (interactive (list (read-string "Ask ChatGPT: " nil gptel-lookup--history)))
  (when (string= prompt "") (user-error "A prompt is required."))
  (let ((gptel-tools
         (mapcar #'gptel-get-tool '("search_web" "get_youtube_transcript"))) ;or use global value
        (gptel-include-tool-results t) ;or use global value
        (gptel-confirm-tool-calls t))  ;or use global value
    (gptel-request prompt
      :callback
      (lambda (response info)
        (cond
         ;; ERROR
         ((null response)
          (message "gptel-lookup failed with message: %s"
                   (plist-get info :status)))

         ;; Received a RESPONSE or TOOL CALL RESULT
         ((stringp response)
          (with-current-buffer (get-buffer-create "*gptel-lookup*")
            (let ((inhibit-read-only t))
              ;; (erase-buffer)  ;might be called multiple times, don't erase!
              (insert response))
            (special-mode)
            (display-buffer (current-buffer)
                            `((display-buffer-in-side-window)
                              (side . bottom)
                              (window-height . ,#'fit-window-to-buffer)))))
         
         ;; Received TOOL CALL CONFIRMATION or TOOL CALL RESULT
         ((consp response)
          (gptel--display-tool-calls response info 'use-minibuffer)))))))

If the relevant booleans are enabled or let-bound, tool call confirmations are handled via minibuffer prompts, and tool results are passed back to the callback.

The one-shot example in the wiki needs to be updated to use a stringp check instead of (when response ...), but otherwise continues to work. Tool calls are processed automatically if no confirmation is required and the tool result is just ignored by the callback.

@psionic-k
Copy link
Contributor

psionic-k commented Feb 3, 2025

you have to populate the messages array with the tool call and response messages when creating the full prompt

If tool calls are orthogonal to the other necessary gptel text properties (seems so), then we can just add tool-call to the possible properties of gptel and then separate them into just one more bucket for populating the messages array amirite?

It completely makes sense why the model auto-mimics to such a high degree now. I'll see if I can hack this up.

@psionic-k
Copy link
Contributor

psionic-k commented Feb 4, 2025

I think I'm close to implementing messages in context with the tool role. I got to the point that Open AI was complaining that I can't respond with a previous tool call in the history without responding to a tool call. I'm not sure if this means it needs the initial call to be in the turns or if tool call responses are only ever supposed to be sent once or not. If tool responses are only ever supposed to be sent once, I can roll back that work. Otherwise I need to figure out how to retain the initial tool call in the turns, probably by just printing it as the function name / args and giving it a specific property to parse in a slightly different way.

Seems like the API rules out the idea of sending old tool results as tool messages since the chat completions never include the original tool calls as messages.

This was my API error message from attempting to include tool call results in chat completion requests that weren't directly preceded by tool calls.

ChatGPT error: ((HTTP/2 400) invalid_request_error) Invalid parameter: messages with role 'tool' must be a response to a preceeding message with 'tool_calls'.

I don't see the nonces in the API. How does match tool responses to calls if it's all stateless? 🤷

(response ...)

Yes. This is a good way. I was using (tool id). I switched to your way of using 'ignore on the existing gptel property.

I did start ignoring all the extraneous stuff, including the function call and args, response separator etc. I moved the part that adds the response property upstream. The callback for inserting tool use results was the wrong place because it was clobbering 'ignore properties.

General feeling is that auto-mimicry goes way down, so there's that.

@psionic-k
Copy link
Contributor

Aha! Looks like the original tool calls need to be persisted as an assistant message. I just didn't look deep enough into the assistant message spec.

https://platform.openai.com/docs/api-reference/chat/create#chat-create-messages

I can do this by... writing the accepted tool calls to the top of the :TOOL_CALL: block, one per tool result, and then tagging them with 'gptel '(tool-call ...) text property to insert them into the prompts when preparing to send a request. That's quite doable. Then when parsing the buffer, I will just pass them back through json parse to assemble my "assistant" messages.

I'm kind of breaking some things here and there. Only the ChatGPT backend will support this initially.

The biggest thing I don't understand is how context works without a chat buffer. Can you fill me in a bit? Where does context come from for in-place work?

Doing any kind of interposing of responses is looking more difficult though. I think a better approach is to dynamically decide where to insert response headers so that I don't have empty response prefixes. This is for my HK-47 vanity problem.

@psionic-k
Copy link
Contributor

psionic-k commented Feb 4, 2025

master...psionic-k:gptel:filter-tool-call

This is almost completely working on the OpenAI backend. Whenever there is a huge pile of parallel requests, I have an issue I need to look at, but parallel calls on simple tools are working.

Image

I need to completely re-do the commits and I'm sure fix up a lot of other garbage. See the diff. The commits are noisy.

@link0ff
Copy link

link0ff commented Feb 4, 2025

The one-shot example in the wiki needs to be updated to use a stringp check

Thanks, with a stringp check it works nicely. A small detail is that a valid response text can be empty, so a check for null could be extended to e.g. (and (null response) (not (equal (plist-get info :http-status) "200"))).

Regarding the "final state", probably it can't be detected with a universal condition since such thing as (equal (plist-get info :stop-reason) "end_turn") is too API-dependent. Also sometimes the call chain never stops, so maybe need to add some limit on the number of tool calls.

@karthink
Copy link
Owner Author

karthink commented Feb 4, 2025

@psionic-k Are you intending to submit a PR or just creating a proof of concept? In case it's the former -- your approach breaks several aspects of gptel, including chat persistence, internal API boundaries and gptel's "modular" design.

  • If it's the former I can do a review and point these out before you move further in unacceptable directions, giving us both extra work in the process. You'll also have to add the same features to parsing for the Anthropic, Ollama and Gemini backends.
  • If it's a proof of concept, it looks good to me and I like the idea(s). I can refer to it when implementing something along these lines.

("modular" in quotes because I've almost fully detached the gptel-request code from the UI code, but it's not done yet. Your changes move things the other way.)


In general the idea of using text properties for role assignment (prompt, response, tool call etc) is not compatible with chat persistence (to disk), so I try to limit it to just response boundaries. One solution is to go full org-element, but that requires (i) defining new syntax, (ii) performance optimizations and (iii) backwards compatibility guarantees, since org-element is slow and changed significantly between Org 9.6 and 9.8-pre. The few bits of org-element I use in gptel-org have given me plenty of headaches and tech-support nightmares already. In addition, we need to support most of these features in Markdown too, which will require a new set of syntax, TOML front-matter and whatnot.


The biggest thing I don't understand is how context works without a chat buffer. Can you fill me in a bit? Where does context come from for in-place work?

I'm not sure I understand the question. If you're asking how gptel keeps track of tool calls while a back-and-forth multi-turn request is ongoing, it stores intermediate tool calls in the state machine, same as response text. Specifically it modifies the messages array in-place, see uses of gptel--inject-prompt and check

(thread-first
  gptel--fsm-last
  (gptel-fsm-info)
  (plist-get :data)
  (plist-get :messages))

When the interaction ends -- technically, when gptel--fsm-last becomes unreachable and is GC'd -- all this state is thrown away. When running gptel-send, a new state machine instance is created and its state is populated from the buffer text.

Essentially, you are attempting to treat the buffer as a lossless, mutable store of the messages array in between calls to gptel-send, and rehydrate it from the buffer when required. I agree with this ideal, but in practice the "lossless" part is going to be messy.

Doing any kind of interposing of responses is looking more difficult though. I think a better approach is to dynamically decide where to insert response headers so that I don't have empty response prefixes. This is for my HK-47 vanity problem.

I don't know what you mean by "interposing" of responses. I don't plan to support dynamic response header placement because I view the ugliness of your example image (empty response header followed by tool call drawer) as a very niche problem. It requires very specific settings to reproduce, and you can find some way to improve the aesthetics in your personal configuration, possibly using gptel-post-response-functions.

@karthink
Copy link
Owner Author

karthink commented Feb 4, 2025

Thanks, with a stringp check it works nicely. A small detail is that a valid response text can be empty, so a check for null could be extended to e.g. (and (null response) (not (equal (plist-get info :http-status) "200"))).

@link0ff This is not true unless you are using streaming responses, and even then the callback is simply not called if a streamed chunk is empty. (I haven't tested that it's impossible for the callback to get a nil response when streaming, but I'll fix it if there are exceptions.)

The example callback I provided above also needs an additional cond clause to handle streaming responses.

Do you have examples/logs of valid (HTTP 200) empty string responses from any LLM API when not using streaming?

Regarding the "final state", probably it can't be detected with a universal condition since such thing as (equal (plist-get info :stop-reason) "end_turn") is too API-dependent.

Yes, this is correct.

Also sometimes the call chain never stops, so maybe need to add some limit on the number of tool calls.

I don't plan on doing this in gptel, at least right now. You can handle this from the callback as follows:

  1. Run gptel-request with gptel-confirm-tool-calls bound to true.
  2. Then the callback will be called with a list of tool call data as the response, which includes an extra closure (that we'll get to). You can run the tools yourself in this step. You can keep track of the number of tool calls here, or dynamically decide based on the tool call results if you're done.
  3. If you're done, you don't have to do anything. Otherwise call the closure(s) with the tool call result(s) to continue the request.

To be clear, you don't have to do any of the above unless you want full introspection -- the use of gptel--display-tool-calls in my updated example callback above handles all this in the general case, and prompts for tool call confirmation from the minibuffer.

@psionic-k
Copy link
Contributor

intending to submit a PR

Yes. I first just drove in a straight line hack & slashing until I reached what I wanted. Now we have the motivation and context necessary to reconcile.

In general the idea of using text properties for role assignment (prompt, response, tool call etc) is not compatible with chat persistence (to disk)

It's compatible with serialization or decidable re-hydration.

  • Serialization can be done several ways. The obvious drawback (well, not that obvious) is that if you edit the file without the save hook active or after the save hook fires, the text properties will be invalid. The serialization can be embedded or by reference in another file. For documents to be self-contained, the serialization needs to be in a local variables trailer or else updating it updates text property locations.
  • Decidable re-hydration snip will lead to a lot of regrets. Re-hydration depends on the front-end. The LLM can spit out data that will break the rehydration because it is not currently fenced in a way that it can't break out of and we need parsing to maintain any fence. The LLM cannot break text properties on its own.

Since this is all about persisting and not how the live data is handled, I would choose serialization. Text properties are atomic to the contents of the conversation during edits.

I've almost fully detached the gptel-request code from the UI code

These details are needed. I'll let you do review before I opine blindly.

You'll also have to add the same features to parsing for the Anthropic, Ollama and Gemini backends.

TBD. I have two ideas, one of which is to make a feature branch and allow the general demand for other backends like R1 to drive the feature branch back into master. IMO complex tool uses are broken AF without first-class handling of tools and results.

When the interaction ends -- technically, when gptel--fsm-last becomes unreachable and is GC'd -- all this state is thrown away. When running gptel-send, a new state machine instance is created and its state is populated from the buffer text.

Okay, this gives me a clear expectation. Sounds like requests made against a region lose context unless there is an associated buffer somewhere. In the current version, can there be a buffer for context while making region edits?

I have some ideas on the front-end but will cook some more. For now my goal is to figure out the breakage that sometimes happens with tool-result / tool-call correspondence.

@karthink
Copy link
Owner Author

karthink commented Feb 5, 2025

In general the idea of using text properties for role assignment (prompt, response, tool call etc) is not compatible with chat persistence (to disk)

It's compatible with serialization or decidable re-hydration.

  • Serialization can be done several ways. The obvious drawback (well, not that obvious) is that if you edit the file without the save hook active or after the save hook fires, the text properties will be invalid. The serialization can be embedded or by reference in another file. For documents to be self-contained, the serialization needs to be in a local variables trailer or else updating it updates text property locations.

Are you familiar with how persistence works in gptel right now?

  • Decidable re-hydration snip will lead to a lot of regrets. Re-hydration depends on the front-end. The LLM can spit out data that will break the rehydration because it is not currently fenced in a way that it can't break out of and we need parsing to maintain any fence. The LLM cannot break text properties on its own.

I don't know what "decidable re-hydration" means, so all of this went over my head.

I've almost fully detached the gptel-request code from the UI code

These details are needed. I'll let you do review before I opine blindly.

I'll add some comments.

You'll also have to add the same features to parsing for the Anthropic, Ollama and Gemini backends.

TBD. I have two ideas, one of which is to make a feature branch and allow the general demand for other backends like R1 to drive the feature branch back into master. IMO complex tool uses are broken AF without first-class handling of tools and results.

Sorry, I don't understand what you mean here either. I'm not sure what you mean by first-class handling of tools -- there are at least two different interpretations.

I think we are talking past each other, I couldn't follow most of the above points.

When the interaction ends -- technically, when gptel--fsm-last becomes unreachable and is GC'd -- all this state is thrown away. When running gptel-send, a new state machine instance is created and its state is populated from the buffer text.

Okay, this gives me a clear expectation. Sounds like requests made against a region lose context unless there is an associated buffer somewhere. In the current version, can there be a buffer for context while making region edits?

The buffer associated with a request on a region is the buffer the region is in. Every gptel-request is associated with the buffer it originates from (or a specified buffer), whether or not the response is inserted there.

I have some ideas on the front-end but will cook some more. For now my goal is to figure out the breakage that sometimes happens with tool-result / tool-call correspondence.

👍

@psionic-k
Copy link
Contributor

You didn't describe the second idea you had

Intentional. It's speculative until about 48 hours from now?

decidable re-hydration

It means instead of explicitly storing the data we want in the form that we want it, deducing from the buffer content, such as :TOOL_CALL: drawers, to re-create the data we want. No other drawers like that exist. Taking advantage of that, we can decide what a tool call is, so it is decidable.

However, in this example, I'm immediately talking about org syntax, which brings up all the other problems of using the text content / structure. I don't think using the buffer structure is even worth considering as an implicit store of turns.

Text properties are orthogonal to buffer contents and so don't depend on the mode or structure. They can get a little bit janky if the user inserts text with other properties, but structure-based approaches can get janky whenever the LLM outputs something that breaks out of the structure. This already happens whenever it decides to emit org mode headings that break branching context.

Are you familiar with how persistence works in gptel right now?

No. You said it needed an overhaul. Since I'm implementing new turns, I couldn't imagine not overhauling it.

Every gptel-request is associated with the buffer it originates from, whether or not the response is inserted there.

Thanks. I could imagine cases where we want a working buffer and those are just hard questions to answer when digging into a code base.

first-class handling of tools

First-class means... giving them an explicit representation. In master, after the first tool response to the LLM, subsequent calls just cram them in with assistant responses, which they are not.

@karthink
Copy link
Owner Author

karthink commented Feb 5, 2025

You didn't describe the second idea you had

Intentional. It's speculative until about 48 hours from now?

Cool.

Are you familiar with how persistence works in gptel right now?

No. You said it needed an overhaul. Since I'm implementing new turns, I couldn't imagine not overhauling it.

It currently works exactly how you're imagining it -- text property boundaries stored as local variables or Org properties. Try saving your chat buffer to disk.

decidable re-hydration

It means instead of explicitly storing the data we want in the form that we want it, deducing from the buffer content, such as :TOOL_CALL: drawers, to re-create the data we want. No other drawers like that exist. Taking advantage of that, we can decide what a tool call is, so it is decidable.

However, in this example, I'm immediately talking about org syntax, which brings up all the other problems of using the text content / structure. I don't think using the buffer structure is even worth considering as an implicit store of turns.

Thanks for the explanation. I agree with you, but for different reasons. I think syntax can work if the user is aware of it. For example, if they know that everything inside a :TOOL_CALL: drawer will be interpreted as a tool call. Otherwise they might end up changing the meanings of elements without realizing it. I also think imposing syntax imposes a cognitive burden on the user, and I don't want them to have to remember how to (say) demarcate a prompt region or a tool call, or remember things like special #+gptel_* Org keywords. (Nobody can remember what comes after #+options: in Org or what the dozens of babel header args do.)

Text properties are orthogonal to buffer contents and so don't depend on the mode or structure. They can get a little bit janky if the user inserts text with other properties,

They can go completely out of sync. I've seen plenty of notebooks from users reporting bugs now where the bounds applied from text-properties as persisted to the file are completely off. There are many issues in this repo asking me to add syntax because of how unreliable storing buffer positions is. If you forget to turn on gptel-mode after opening the file before editing it -- you're out of sync. If you edit it in another editor or in Emacs without gptel installed, you're out of sync. In this repo there are several active issues with users complaining that the current persistence method is unreliable, suggesting that I switch to using syntax.

My hope is that the UI can be separated out from gptel-request so syntax-y UIs for gptel can be created (not by me) by and for those who prefer it.

but structure-based approaches can get janky whenever the LLM outputs something that breaks out of the structure. This already happens whenever it decides to emit org mode headings that break branching context.

Good point. Your observation about mimicry makes this harder to solve, although I expect strategic use of the ignore property can solve most of it.

Every gptel-request is associated with the buffer it originates from, whether or not the response is inserted there.

Thanks. I could imagine cases where we want a working buffer and those are just hard questions to answer when digging into a code base.

If you have a list of questions the answers to which can drastically speed up your pace of work on gptel we can hop on a voice call + screen share and I can save you some time.

first-class handling of tools

First-class means... giving them an explicit representation. In master, after the first tool response to the LLM, subsequent calls just cram them in with assistant responses, which they are not.

You mean the 1:1 mapping between the messages array and chat buffer that I alluded to previously.

I think you are narrowly focused on a particular style of LLM use which benefits from this faithful mapping, as guided by your interests/experiments with Emacs introspection. There are many other kinds of tasks where this level of fidelity is completely unnecessary, such as when your tool-use is primarily for side-effects, when the LLM edits other buffers by generating a diff, etc.

Even for lookup tasks, quite often the LLM's response as informed by the tool result is all that's needed in the buffer. This is why gptel-include-tool-results defaults to nil.

(The other interpretation I had of "first-class tool call results" is the ability to have structured results, not just strings.)

@psionic-k
Copy link
Contributor

I think you are narrowly focused on a particular style of LLM use

I'm insisting on the correctness of behavior of tools that build up the context. Whatever needs to change as a consequence and whatever needs to be fixed in the cascade is acceptable.

Its a matter of short time before language servers and SLIME etc get used this way, as a RAG source. The results of calls are not out of date quickly for these use cases and should be left in the turns.

If you forget to turn on gptel-mode after opening the file

File-local-variable to activate the mode. Add a gptel hook function that can activate gptel whenever it finds the file local variable. Lots of ways to make these reliable enough.

If you have a list of questions the answers to which can drastically speed up your pace of work on gptel we can hop on a voice call + screen share and I can save you some time.

Mainly I want to figure out the area where architecture is going one way and my changes appear to be going another.

Favored client / handle etc? I think I can do google meet on [email protected] (a gmail ID). Timezone? I'm on Seoul time.

@karthink
Copy link
Owner Author

karthink commented Feb 5, 2025

I'm insisting on the correctness of behavior of tools that build up the context. Whatever needs to change as a consequence and whatever needs to be fixed in the cascade is acceptable.

I agree, and thanks for taking the initiative on this.

Its a matter of short time before language servers and SLIME etc get used this way, as a RAG source. The results of calls are not out of date quickly for these use cases and should be left in the turns.

I'm not sure about the utility of a chat interface for tasks like this. But creating a more accurate mapping between the buffer and the messages array means we'll be prepared if this works great, at least.

File-local-variable to activate the mode. Add a gptel hook function that can activate gptel whenever it finds the file local variable. Lots of ways to make these reliable enough.

#491

Mainly I want to figure out the area where architecture is going one way and my changes appear to be going another.

Yeah. They're mostly small changes so far, but the vector is pointing the wrong way.

Favored client / handle etc? I think I can do google meet

I've emailed you.

@psionic-k
Copy link
Contributor

I found my issue and am having some brainstorm and realizations. The breakage in my current changes happens because = characters are inserted without text properties. It's not clear which function does this or what it is attempting to escape. Example cases of insertion with a bit of context:

  • 'ignore "\n=")))
  • =(tool-result ,
  • global =gptel-include-tool-results'
  • =((?i ,

The callback or later is responsible. I'm clearly propertizing the entire string with gptel (tool-call ID) but these = have no properties.

With or without the problem I found, I think that the markdown conversion to org mode is dangerous simply because looks like a regex solution for a parser problem. What happens when we are processing source blocks that should be about markdown syntax?

I began to desire a block for the tool result. It solves another problem, not being able to see the call and the arguments while folded.

#+begin_tool_call (name: blah :args blah)
,*contents
#+end_tool_call

Note the escaped heading. Org can extract the un-escaped string from a block. It can also escape the string when inserting. Drawer and block folding both break if we don't do this.

Folds to

#+begin_tool_call (name: blah :args blah)...

@karthink
Copy link
Owner Author

karthink commented Feb 5, 2025

I found my issue and am having some brainstorm and realizations. The breakage in my current changes happens because = characters are inserted without text properties. It's not clear which function does this or what it is attempting to escape.

This is happening because you did this a few commits ago:

I moved the part that adds the response property upstream. The callback for inserting tool use results was the wrong place because it was clobbering 'ignore properties.

You are propertizing the response ahead of the :transformer action, i.e. markdown to org conversion. This is one of the changes I mentioned as going in the wrong direction, so I'll talk about it now. The gptel-request machinery is expected to be cut off from UI concerns, and it has many non-gptel-UI consumers. So the propertization should happen within the UI functions, in the insert callback (where it is placed in master). To fix the problem of not being able to propertize the tool call result, you can instead modify the default insert callbacks to take an optional noprop argument:

(defun gptel-curl--stream-insert-response (response info &optional noprop)
   ...)

If noprop is true, it should assume that the response is already propertized. Then the tool call result can be propertized and sent to it with noprop.

With or without the problem I found, I think that the markdown conversion to org mode is dangerous simply because looks like a regex solution for a parser problem.

It's really not, take a closer look at gptel--stream-convert-markdown->org. There are unique challenges with converting a Markdown stream to Org in real-time, and without slowing down Emacs. The converter does a lot more than regex replacement.

What happens when we are processing source blocks that should be about markdown syntax?

Have you tried it? It should work fine unless the source text would also break Markdown, and...

Note the escaped heading. Org can extract the un-escaped string from a block. It can also escape the string when inserting. Drawer and block folding both break if we don't do this.

...except for the leading stars/bullet point issue you mention here. It is easy to fix, but I have deliberately chosen to ignore it for now, along with three other edge cases I'm aware of that no one has ever brought up.

If this is a common occurrence I can fix it in the markdown converter.


EDIT: sorry if I sound a little frustrated. Every now and then I get questions based on the assumption that gptel is doing the simplest/most obvious thing under the hood because the result looks simple. This is followed by the realization that things are more complicated than they assumed because the problems they are expecting to occur already did a while ago and a more complex solution was developed to handle it. (An example: persisting the response bounds as an Org property in the file involves solving a fixed point calculation problem. I've received at least four comments informing me that the bounds as calculated must be wrong because the act of writing the bounds to the buffer changes them.)

@psionic-k
Copy link
Contributor

So the propertization should happen within the UI functions

Makes sense.

I'm seeing the need for the separation in other ways. If I escape the data on the front-end, I need to un-escape it before presenting it to the backend. I think I can handle at least my org mode case in gptel-org, but it means the front-end will always copy the buffer.

I think this is viable. Going to hack it together.

I've had another recurring issue with the model absolutely loving to output - *=verbatim and bold=*: item lists. Not sure if this is a ChatGPT thing or what. Given that there is markdown->org conversion, should I do all my prompts as if it will see and return markdown instead of org mode?

@psionic-k
Copy link
Contributor

I think I'm done with the hack-and-slash phase.

  • The transformer that is designed for handling the LLM input was responsible for the unwanted modifications to the tool results, which introduced unwanted breaks in the text properties.
  • I implemented org escaping so that drawers will work. On the org side before sending the buffer to the backend, I copy the buffer and un-escape these results.

What is definitely bad:

  • the extra arguments to the gptel-curl--stream-insert-response callback are super bad and break user's callbacks
  • any break in tool-result text properties is still game over. I don't know why this would realistically be a problem. What is the user smoking by pasting into the tool-result? But they might while trying to munge a conversation.

Blocks for tool calls are definitely better. Seeing the arguments and function name post-call helps UX.

@karthink
Copy link
Owner Author

karthink commented Feb 5, 2025

the extra arguments to the gptel-curl--stream-insert-response callback are super bad and break user's callbacks

How do they break the user's callbacks?

Also it looks like you could use a single optional argument (raw, say) instead of no-prop and no-transform to avoid both the transformer and the propertization in the response.

any break in tool-result text properties is still game over. I don't know why this would realistically be a problem. What is the user smoking by pasting into the tool-result?

Have you seen the past issues/bug reports in this repo about parsing? Absolutely anything that can happen will happen. It has to be written assuming a two year old will be mashing on the keys.

@karthink
Copy link
Owner Author

karthink commented Feb 7, 2025

@psionic-k I've changed all the buffer parsers (including OpenAI) to use previous-single-property-change, as discussed. (They got shorter and simpler as a result, too.)

You'll have to rebase and fix a merge conflict with your OpenAI buffer parser, but it should be easier to work off the new version since it's closer to your implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests