Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

struggling to create similar agent tool like your spot demo #30

Open
robbyQY opened this issue Dec 5, 2024 · 1 comment
Open

struggling to create similar agent tool like your spot demo #30

robbyQY opened this issue Dec 5, 2024 · 1 comment

Comments

@robbyQY
Copy link

robbyQY commented Dec 5, 2024

Hi! i also saw your your spot demo using rosa, that is very cool! i wonder if that code will be release soon? It would be really helpful for people who are struggling to create custom robot agent.

For example, i am struggling with the following functions (these seems to be solved in your robot):

  1. describe what robot see; Normally i tried creating a tool that receives image from the robot camera topic, return to agent. but i find out the agent (gpt-4o) failed to take the raw image array directly (it even failed when i send it image url string or encoded base64 string)
  2. draw a rqt graph; when i sent a draw rqt_graph request, it was not working:
    Image
    did you write a specific tool in spot robot for that?
    also can your default rosa code capable of calling common ros function? (e.g. ask the frequency of a specific topic, ask what topics are connecting to rosnode A, etc)
@RobRoyce
Copy link
Collaborator

RobRoyce commented Dec 6, 2024

We use a heavily modified (and proprietary) version of Spot, so we can't release the code. However, I can give you some guidance on how we did it.

The main point is that you won't be able to use the underlying ROSA LLM model for vision, since we call bind_tools and there's some weirdness with LangChain. What this means is that you will need to create a new LLM instance within the tool itself.

For example:

def process_images(base64_imgs: list, prompts=None):
    client = ChatOpenAI(api_key=...)
    messages = prompts or [{"role": "system", "content": "Please describe the scene in the image(s)."}]

    messages.extend([{
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{img}",
                    "detail": "high"
                }
            }
        ]
    } for img in base64_imgs])

    response = client.chat.completions.create(model="gpt-4o", messages=messages)
    result = response.choices[0].message.content
    return result

@tool
def describe_scene():
    # get images from camera
    # do any preprocessing
    # convert to base64 (OpenCV is a good choice)
    return process_images(...)

Note that OpenAI API has some requirements about the format of the images. You'll want to check their docs for that. I found that using jpeg encoded in base64.


As for rqt_graph part, ROSA doesn't provide a tool for that by default. You can probably create one pretty easily:

import subprocess

@tool
def rqt_graph():
    "Open the rqt_graph tool"
    subprocess.run(["rqt_graph"]) 
    ...

Note: I don't recommend using subprocess.run in tools if you can avoid it, but this is the easiest way to accomplish what you're trying to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants