You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! i also saw your your spot demo using rosa, that is very cool! i wonder if that code will be release soon? It would be really helpful for people who are struggling to create custom robot agent.
For example, i am struggling with the following functions (these seems to be solved in your robot):
describe what robot see; Normally i tried creating a tool that receives image from the robot camera topic, return to agent. but i find out the agent (gpt-4o) failed to take the raw image array directly (it even failed when i send it image url string or encoded base64 string)
draw a rqt graph; when i sent a draw rqt_graph request, it was not working:
did you write a specific tool in spot robot for that?
also can your default rosa code capable of calling common ros function? (e.g. ask the frequency of a specific topic, ask what topics are connecting to rosnode A, etc)
The text was updated successfully, but these errors were encountered:
We use a heavily modified (and proprietary) version of Spot, so we can't release the code. However, I can give you some guidance on how we did it.
The main point is that you won't be able to use the underlying ROSA LLM model for vision, since we call bind_tools and there's some weirdness with LangChain. What this means is that you will need to create a new LLM instance within the tool itself.
For example:
def process_images(base64_imgs: list, prompts=None):
client = ChatOpenAI(api_key=...)
messages = prompts or [{"role": "system", "content": "Please describe the scene in the image(s)."}]
messages.extend([{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{img}",
"detail": "high"
}
}
]
} for img in base64_imgs])
response = client.chat.completions.create(model="gpt-4o", messages=messages)
result = response.choices[0].message.content
return result
@tool
def describe_scene():
# get images from camera
# do any preprocessing
# convert to base64 (OpenCV is a good choice)
return process_images(...)
Note that OpenAI API has some requirements about the format of the images. You'll want to check their docs for that. I found that using jpeg encoded in base64.
As for rqt_graph part, ROSA doesn't provide a tool for that by default. You can probably create one pretty easily:
Hi! i also saw your your spot demo using rosa, that is very cool! i wonder if that code will be release soon? It would be really helpful for people who are struggling to create custom robot agent.
For example, i am struggling with the following functions (these seems to be solved in your robot):
did you write a specific tool in spot robot for that?
also can your default rosa code capable of calling common ros function? (e.g. ask the frequency of a specific topic, ask what topics are connecting to rosnode A, etc)
The text was updated successfully, but these errors were encountered: