struggling to create similar agent tool like your spot demo #30

robbyQY · 2024-12-05T09:48:38Z

Hi! i also saw your your spot demo using rosa, that is very cool! i wonder if that code will be release soon? It would be really helpful for people who are struggling to create custom robot agent.

For example, i am struggling with the following functions (these seems to be solved in your robot):

describe what robot see; Normally i tried creating a tool that receives image from the robot camera topic, return to agent. but i find out the agent (gpt-4o) failed to take the raw image array directly (it even failed when i send it image url string or encoded base64 string)
draw a rqt graph; when i sent a draw rqt_graph request, it was not working:

did you write a specific tool in spot robot for that?
also can your default rosa code capable of calling common ros function? (e.g. ask the frequency of a specific topic, ask what topics are connecting to rosnode A, etc)

RobRoyce · 2024-12-06T06:30:06Z

We use a heavily modified (and proprietary) version of Spot, so we can't release the code. However, I can give you some guidance on how we did it.

The main point is that you won't be able to use the underlying ROSA LLM model for vision, since we call bind_tools and there's some weirdness with LangChain. What this means is that you will need to create a new LLM instance within the tool itself.

For example:

def process_images(base64_imgs: list, prompts=None):
    client = ChatOpenAI(api_key=...)
    messages = prompts or [{"role": "system", "content": "Please describe the scene in the image(s)."}]

    messages.extend([{
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{img}",
                    "detail": "high"
                }
            }
        ]
    } for img in base64_imgs])

    response = client.chat.completions.create(model="gpt-4o", messages=messages)
    result = response.choices[0].message.content
    return result

@tool
def describe_scene():
    # get images from camera
    # do any preprocessing
    # convert to base64 (OpenCV is a good choice)
    return process_images(...)

Note that OpenAI API has some requirements about the format of the images. You'll want to check their docs for that. I found that using jpeg encoded in base64.

As for rqt_graph part, ROSA doesn't provide a tool for that by default. You can probably create one pretty easily:

import subprocess

@tool
def rqt_graph():
    "Open the rqt_graph tool"
    subprocess.run(["rqt_graph"]) 
    ...

Note: I don't recommend using subprocess.run in tools if you can avoid it, but this is the easiest way to accomplish what you're trying to do.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

struggling to create similar agent tool like your spot demo #30

struggling to create similar agent tool like your spot demo #30

robbyQY commented Dec 5, 2024

RobRoyce commented Dec 6, 2024

struggling to create similar agent tool like your spot demo #30

struggling to create similar agent tool like your spot demo #30

Comments

robbyQY commented Dec 5, 2024

RobRoyce commented Dec 6, 2024