Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to Combined Response Model Structure Output and Function Calling? #709

Closed
minhquan23102000 opened this issue Nov 21, 2024 · 12 comments
Assignees
Labels
enhancement New feature or request mirascope question Further information is needed
Milestone

Comments

@minhquan23102000
Copy link

minhquan23102000 commented Nov 21, 2024

Question

Can we use the response model structure output and function calling at the same times? This would enhance the flexibility and capability of the API by allowing structured outputs alongside the execution of specific functions.

some thing like this:

from mirascope.core import openai
from pydantic import BaseModel


class ReAct(BaseModel):

    thought: str
    action: str

class GetBookAuthor(BaseTool):
    """Returns the author of the book with the given title."""

    title: str = Field(..., description="The title of the book.")

    def call(self) -> str:
        if self.title == "The Name of the Wind":
            return "Patrick Rothfuss"
        elif self.title == "Mistborn: The Final Empire":
            return "Brandon Sanderson"
        else:
            return "Unknown"

@openai.call("gpt-4o-mini", response_model=ReAct, tools=[GetBookAuthor])
def call(text: str) -> str:
    ...

Motivation:

  1. Improved Data Handling: Using a structured response allows for better organization of data, making it easier to parse and utilize in downstream applications.
  2. Dynamic Functionality: The ability to call functions based on the structured output can enhance the interactivity of the API, enabling more complex workflows.

Alternatives:
Currently, users may need to implement separate logic to handle function calls and response structuring, which can lead to increased complexity in code. A built-in mechanism to support this feature would simplify the development process.

@minhquan23102000 minhquan23102000 added the question Further information is needed label Nov 21, 2024
@willbakst
Copy link
Contributor

I'm not quite sure I fully understand the desired behavior. Can you elaborate further on what you expect for the downstream usage of such a function? What does the call method in your example return? How would the user use it?

@minhquan23102000
Copy link
Author

minhquan23102000 commented Nov 22, 2024

The idea is I'd like to impose a specific structure on the model's responses. For instance, I want it to generate a brainstorming phase or a step-by-step reasoning process before providing a final answer. However, when using the response model, I'm unable to leverage the tool_call functionality

For example, I want to archive something like this:

from mirascope.core import (
    openai,
    BaseTool,
    prompt_template,
    Messages,
    BaseMessageParam,
    BaseDynamicConfig,
    litellm,
)
from pydantic import BaseModel, Field
from typing import cast


class ReAct(BaseModel):

    thought: str
    action: str
    have_final_answer: bool


class GetBookAuthor(BaseTool):
    """Returns the author of the book with the given title."""

    title: str = Field(..., description="The title of the book.")

    def call(self) -> str:
        if self.title == "The Name of the Wind":
            return "Patrick Rothfuss"
        elif self.title == "Mistborn: The Final Empire":
            return "Brandon Sanderson"
        else:
            return "Unknown"


class AuthorProfile(BaseTool):
    """Returns the profile of the author with the given name."""

    name: str = Field(..., description="The name of the author.")

    def call(self) -> str:
        return f"Author {self.name} has written many books. He was born in 1977."


@litellm.call(
    "gemini/gemini-1.5-flash-002",
    response_model=ReAct,
    tools=[GetBookAuthor, AuthorProfile],
    json_mode=True,
)
@prompt_template(
    """
    SYSTEM: You are a helpful assistant that can answer questions about books.
    Provide a thought based on the observation.
    Determine the most optimal action to take. 
    If you have the final answer in this step, set have_final_answer to True.
    
    Available tools: 
    - GetBookAuthor: Returns the author of the book with the given title.
    - AuthorProfile: Returns the profile of the author with the given name.
    
    MESSAGES: {history}
    """
)
def reasoning_call(history: list[BaseMessageParam]) -> BaseDynamicConfig:

    return {"computed_fields": {"history": history}}


@litellm.call(
    "gemini/gemini-1.5-flash-002",
)
@prompt_template(
    """
    SYSTEM: You are a helpful assistant that can answer questions about books.
    
    MESSAGES: {history}
    """
)
def final_call(history: list[BaseMessageParam]): ...


text = "Can you tell me more about the author of the book 'The Name of the Wind'?"
history = []

history.append(Messages.User(content=text))
while True:

    ai_response = reasoning_call(history)

    history.append(Messages.Assistant(content=str(ai_response)))

    print(ai_response)

    if tool := ai_response.tool:
        tools_and_outputs = [(tool, tool.call())]
        history += ai_response.tool_message_params(tools_and_outputs)

        print(tools_and_outputs)

    if ai_response.have_final_answer:
        break


# final call
print(final_call(history))

But get error like this:

uv run .\test.py
thought="To find the author of 'The Name of the Wind', I need to use the available tool 'GetBookAuthor'." action="Use the GetBookAuthor tool with the book title 'The Name of the Wind'" have_final_answer=False
Traceback (most recent call last):
  File "D:\Work\personal\personal-agent\test.py", line 82, in <module>
    if tool := ai_response.tool:
               ^^^^^^^^^^^^^^^^
  File "D:\Work\personal\personal-agent\.venv\Lib\site-packages\pydantic\main.py", line 856, in __getattr__
    raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}')
AttributeError: 'ReAct' object has no attribute 'tool'

@willbakst
Copy link
Contributor

When you set response_model=ReAct, the output will be structured into a ReAct instance, which will not have the tool or tools properties because those will only be present on BaseCallResponse.

It's also worth noting that we use tools under the hood (unless json_mode=True) to structure the response into the provided response_model type, so I'm not even sure how we would be able to use both at the same time. The only thing that comes to mind would be to return type ReAct | BaseCallResponse in this case, but I'm not sure that interface makes sense (and I'm honestly not sure it's possible to implement).

IIUC, it seems like what you really want is a structured output that is powered by tool calls? Do you care to be calling the tools yourself? What if we provided a second-layer agent decorator that called the tools under the hood and structured the final response for you?

@minhquan23102000
Copy link
Author

What if we provided a second-layer agent decorator that called the tools under the hood and structured the final response for you?

Yes, I currently implement follow this solution. But the problem is I need to make two call in one reasoning call, and one action call to execute the tool. The cost will be double.

I'm not even sure how we would be able to use both at the same time. The only thing that comes to mind would be to return type ReAct | BaseCallResponse in this case, but I'm not sure that interface makes sense (and I'm honestly not sure it's possible to implement).

It okay, this is just a question to see if possible to improve further. Not a feature request.

Really thank you for take time to support.

@willbakst
Copy link
Contributor

Really thank you for take time to support.

Of course! We're always happy to help and discuss ways to improve the library. We want Mirascope to be the best version of itself it can be.

Yes, I currently implement follow this solution. But the problem is I need to make two call in one reasoning call, and one action call to execute the tool. The cost will be double.

I'm curious what you mean by the cost being double. When calling tools, you'll have to iteratively call the LLM with the updated history (including the tool calls) until the agent returns it's final response. Are you saying it would be double because you would then have to take the final response to make another call to structure the final response?

One solution here would be to provide another tool (e.g. FinalResponse) and tell the agent to call that tool for it's final response. This way you can detect the final response tool and consider that the final response rather than getting the string generation. Ultimately this would be the same thing as using response_model since that uses tools under the hood anyway.

Maybe we could do this as part of the call decorator and have the return type be ResponseModelT | TCallResponse as I mentioned before. Under the hood we would provide response_model as an additional tool and then detect if the response contains the single final response model tool and return that instead of the call response. Would also be worth looking into how this would work with json_mode=True if at all possible (e.g. instructing the model to return it's final response in json matching the response model but to return tools until it's ready to finally respond).

Of course, this would require that we do some additional prompt engineering under the hood to make this work. We would also need to update all of the tool properties to be cached so that the internal tool construction is not duplicated when the user accesses the properties. Might be worth caching those properties anyway.

@minhquan23102000
Copy link
Author

Sorry, I may have misunderstood your previous response in some instances. It would be helpful to provide a simple code snippet for illustration your idea (FinalResponse, and second layer decorator).

Currently, I am implementing the ReAct (Reasoning + Acting) engine as follows. I would appreciate any advice to enhance this flow:

ReAct Engine Implementation Overview

🎯 Purpose

The ReAct (Reasoning + Acting) engine implements a decision-making loop that allows an AI agent to:

  1. Think about the current situation
  2. Take actions
  3. Observe results
  4. Repeat until the task is complete

🔄 Core Flow

graph TD
    A[Start] --> B[Reasoning Step]
    B --> C[Action Step]
    C --> D{Check Conditions}
    D -->|Continue| B
    D -->|Complete/Stuck| E[End]
Loading

💡 Key Components

1. Reasoning Action Model

class ReasoningAction(BaseModel):
    thought: str         # Analysis and observations
    action: str         # Next action to take
    goal_completed: bool # Task completion status
    # ... other control flags

2. Main Loop Structure

async def run(self, agent):
    while not_finished and within_max_attempts:
        # 1. Reasoning Phase
        tools_names = [tool.name for tool in agent.available_tools]

        reasoning_response: ReasoningAction = await self._reasoning_step(
            messages=agent.messages,
            tools_names=tools_names # tool names to infect to the prompt
        )
        
        # 2. Action Phase
       
        result = await self._action_step(agent, reasoning_response)
 # action call to LLM, to execute the action, use the tool and append the result to the history. 
# example code in action step: 
#  msg = f"system-automessage@noreply: INFO: executing action: {reasoning_response.action}"
#        agent.history.append(Messages.User(msg))
# result["break"] = reasoning_response.goal_completed or other flag

        
        
        # 3. Break if conditions met
        if result["break"]:
            break

📝 Example Output Format


Reasoning Call:
* Thought: Analyzing user request for data analysis
* Action: Use pandas_tool to load and examine dataset

Action Call:
* Agent: Use pandas_tool to load and examine dataset
* Function call and append message to history

Reasoning Call:
* Thought: Analyzing the last observation
* Action: I have final answer

BREAK THE REACT

Final Answer:
* I have final answer X Y Z

@minhquan23102000
Copy link
Author

You can see, each question, each turn the agent execute two call:

  1. Reasoning Call
  2. Action Call

that make the cost double

@willbakst
Copy link
Contributor

Ah, I think I understand the issue.

The ReAct agent flow is somewhat outdated by the introduction of tools (function calling) in LLMs. ReAct was published before tools existed, so the idea was that you would have the LLM reason and provide and action, take the action on the LLM's behalf, and then give the LLM the action's output and have it continue in following steps.

With the introduction of tools, you can implement a more modern ReAct agent flow simply by calling tools until the agent no longer calls the tools. The LLM calling a "tool" is the equivalent of the reasoning + action step, and taking the action is equivalent to calling the tool on the LLM's behalf and providing the output as part of the message array.

I recommend reading through our agent docs that cover this more modern "react" agent flow using tools.

If I've misunderstood, let me know!

@minhquan23102000
Copy link
Author

Oh i see, thank you, maybe i make things complicated.

But it still interesting to see that response model can use in the same time as tools.

Because, we can parallel tool calls, so maybe it is possible, to use them at the same time. Or with structure output, json mode perhaps.

This is just an idea and not a feature request or anything, please do not take this too seriously. I don't have a deep understanding of how LLMs generate tokens for tool calls.

However, this could lead to advanced prompt engineering use cases for agents that utilize structured output and tool calling.

Thanks again; I will try the simple ReAct agent flow.

@willbakst
Copy link
Contributor

Sounds good. I'm going to continue thinking on this to see if there is a way to enable setting both response_model and tools at the same time in a way that works well and makes sense.

I'm going to leave this open as I continue to think on this.

We always appreciate the questions, so keep them coming!

@willbakst
Copy link
Contributor

I've gotten a POC for this working for OpenAI :)

I'm hopeful I can get the remaining providers done as well shortly and release this as a feature in the upcoming release!

@willbakst
Copy link
Contributor

This feature has been released in v1.14!

There is not yet explicit documentation for this. I've created #756 to track work for adding the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request mirascope question Further information is needed
Projects
None yet
Development

No branches or pull requests

2 participants