Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[editor][server endpoints][3/n]: Proof-of-Concept for Streaming #615

Closed
wants to merge 3 commits into from

Conversation

rossdanlm
Copy link
Contributor

@rossdanlm rossdanlm commented Dec 26, 2023

[editor][server endpoints][3/n]: Proof-of-Concept for Streaming

The actual call in the run command will be a bit more complex since we'll have to attach a stream callback and probably create a text queue iterator like what we did for Gradio, but this shows that it's possible and shouldn't be too complicated.

Sorry Jonathan, I didn't use any of your super useful generic functions, but we can do that later.

A big thing I wanted to point out is that in the yield generator, I need explicitly return a string (not a json object), otherwise this will not work, so the frontend needs to be able to parse that result somehow. We should sync on this but I feel jsons are strings anyways and as long as we follow the same output "formula" (ex: in the list_models we out data key vs other commands we directly return the AIConfig json), it should be fine.

Test plan

a9d769e6-d8d6-4aad-b450-6b27a75e509a.mp4

Stack created with Sapling. Best reviewed with ReviewStack.

Rossdan Craig [email protected] added 3 commits December 26, 2023 00:32
To test, follow the readme to run the backend server in a terminal, then go to another terminal and enter:

##Test plan
```
alias aiconfig="python -m 'aiconfig.scripts.aiconfig_cli'"
aiconfig edit --aiconfig-path="/Users/rossdancraig/Projects/aiconfig/cookbooks/Getting-Started/travel.aiconfig.json" --server-port=8080 --server-mode=debug_servers
curl http://localhost:8080/api/run -d '{"prompt_name":"get_activities"}' -X POST -H 'Content-Type: application/json'
```

This results in `get_activities` prompt being deleted (notice that the second prompt is now the one that appears first):
<img width="969" alt="Screenshot 2023-12-26 at 00 30 23" src="https://github.com/lastmile-ai/aiconfig/assets/151060367/5dd905c8-7cb6-4c1f-a97b-284337196506">
<img width="824" alt="Screenshot 2023-12-26 at 00 30 49" src="https://github.com/lastmile-ai/aiconfig/assets/151060367/8bfc5003-72d0-4e5a-9bf9-14e849acade7">
TSIA, pretty simple, should be able to pass in params field

See previous diff for example on how to test
The actual call in the `run` command will be a bit more complex since we'll have to attach a stream callback and probably create a text queue iterator like what we did for Gradio, but this shows that it's possible and shouldn't be too complicated.

Sorry Jonathan, I didn't use any of your super useful generic functions, but we can do that later.

A big thing I wanted to point out is that in the `yield` generator, I need explicitly return a string (not a json object), otherwise this will not work, so the frontend needs to be able to parse that result somehow. We should sync on this but I feel jsons are strings anyways and as long as we follow the same output "formula" (ex: in the `list_models` we out `data` key vs other commands we directly return the AIConfig json), it should be fine.
from flask_cors import CORS
from aiconfig.schema import ExecuteResult, Prompt
from flask import Flask, Response, request, stream_with_context
from flask_cors import CORS # TODO: add this to requirements.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add it! :D

@@ -114,17 +117,83 @@ def create() -> FlaskResponse:
state.aiconfig = AIConfigRuntime.create() # type: ignore
return HttpResponseWithAIConfig(message="Created new AIConfig", aiconfig=state.aiconfig).to_flask_format()

@app.route('/api/test_streaming', methods=["POST"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may as well make the real endpoint. It's just us using it anyway; we're not committing to anything big here.

Comment on lines +122 to +126
EXCLUDE_OPTIONS = {
"prompt_index": True,
"file_path": True,
"callback_manager": True,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there a lot of duplication with HttpResponseWithAIConfig.to_flask_format() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea you're right, I was mainly just playing around and trying to get things tested quickly

def generate(num_stream_steps: int):
prompt : Prompt = state.aiconfig.get_prompt('get_activities')
for i in range(num_stream_steps):
time.sleep(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we sleeping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to prove that the streaming is going as planned and I can see outputs bit-by-bit

Comment on lines +187 to +188
params : str = request_json.get("params", None)
stream : bool = request_json.get("stream", True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remove space before :. You can autoformat in-place with the linter.

Comment on lines +136 to +142
output = ExecuteResult(
output_type="execute_result",
execution_count=0,
data = "Rossdan" + str(i+1),
metadata = {},
)
prompt.outputs = [output]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be explicitly constructing this? This feels like unnecessary duplication

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right lol, it's kind of not the greatest. It would be better to just overwrite output.data each time, but we'd still have to initialize it first to an ExecuteResult object (not too hard, just check if output is empty array first) but yea, just a proof of concept, nothing fancy this diff yet

# yield_output = core_utils.JSONObject({"data": aiconfig_json})
# print(f"{yield_output=}")
print(f"{str(aiconfig_json)=}\n")
yield str(aiconfig_json) + "\n\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this iterate over the tokens returned? Doesn't it just yield the same fixed contents of get_activities() every iteration ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aiconfig_json is updated with different output each time. Sorry I forgot to include test video but you can see it now :)

@rholinshead
Copy link
Contributor

So the response itself would be a literal string instead of an object with {data: string}?

A big thing I wanted to point out is that in the yield generator, I need explicitly return a string (not a json object)

Can you explain the reasoning for this? Couldn't you alternatively just yield an object instead of the string? I'm not super familiar with generators but I didn't think they were limited to primitive types?

@jonathanlastmileai
Copy link
Contributor

So the response itself would be a literal string instead of an object with {data: string}?

A big thing I wanted to point out is that in the yield generator, I need explicitly return a string (not a json object)

Can you explain the reasoning for this? Couldn't you alternatively just yield an object instead of the string? I'm not super familiar with generators but I didn't think they were limited to primitive types?

This might be a flask limitation. @rossdanlm should help clarify. I suggest that if this works and there's any question whether flask can do it another way, and it's easy to deal with on frontend, let's just do that. For non-streaming endpoints I have a pretty good understanding what's possible and have returned JSON objects.

@rossdanlm
Copy link
Contributor Author

Can you explain the reasoning for this? Couldn't you alternatively just yield an object instead of the string? I'm not super familiar with generators but I didn't think they were limited to primitive types?

This might be a flask limitation. @rossdanlm should help clarify.

Yea I wasn't able to get regular object to be passed, so had to convert the json to a string and it worked

I suggest that if this works and there's any question whether flask can do it another way, and it's easy to deal with on frontend, let's just do that.

Yea I agree, unless we're able to get this unblocked in 20-30 mins of investigation

rholinshead added a commit that referenced this pull request Dec 27, 2023
# [editor] Set Up run_prompt Callbacks

Setting up the callbacks for `run_prompt`. The request succeeds but
currently the response is not correct -- the aiconfig returned doesn't
have the outputs. This is on main, so will land this and test with
#615

A subsequent PR will also need to link local typescript aiconfig to our
editor package.json in order to have the updated output types until we
can publish the updated package. With the linking, I'll then add output
rendering.

## Testing:
- Make sure /run request succeeds:


https://github.com/lastmile-ai/aiconfig/assets/5060851/2cacc07c-3bfe-4a63-8e74-f912b64260dd
@rholinshead
Copy link
Contributor

Yielding string is fine (and probably expected) since we can stream the json chunks, something like:

    def generate(num_stream_steps: int):
        aiconfig_json: str | None = None
        prompt: Prompt = state.aiconfig.get_prompt(prompt_name)
        for i in range(num_stream_steps):
            time.sleep(1)
            output = ExecuteResult(
                output_type="execute_result",
                execution_count=0,
                data="Rossdan" + str(i + 1),
                metadata={},
            )
            prompt.outputs = [output]
            print(f"Done step {i+1}/{num_stream_steps}...")

            aiconfig_json = state.aiconfig.model_dump(exclude=EXCLUDE_OPTIONS)
            # print(f"{aiconfig_json=}\n")
            # yield_output = core_utils.JSONObject({"data": aiconfig_json})
            # print(f"{yield_output=}")
            print(f"{str(aiconfig_json)=}\n")
            yield json.dumps({"output_chunk": output.model_dump()})
            # yield aiconfig_json

            # HttpResponseWithAIConfig(
            #     message=f"Done step {i+1}/{num_stream_steps}...",
            #     aiconfig=state.aiconfig,
            # ).to_flask_format()

        if aiconfig_json is None:
            aiconfig_json = state.aiconfig.model_dump(exclude=EXCLUDE_OPTIONS)
        yield json.dumps({"aiconfig": aiconfig_json})

Where each intermediate chunk is the output chunk with accumulated content and then a final output of the full aiconfig

@jonathanlastmileai
Copy link
Contributor

Yielding string is fine (and probably expected) since we can stream the json chunks, something like:

    def generate(num_stream_steps: int):
        aiconfig_json: str | None = None
        prompt: Prompt = state.aiconfig.get_prompt(prompt_name)
        for i in range(num_stream_steps):
            time.sleep(1)
            output = ExecuteResult(
                output_type="execute_result",
                execution_count=0,
                data="Rossdan" + str(i + 1),
                metadata={},
            )
            prompt.outputs = [output]
            print(f"Done step {i+1}/{num_stream_steps}...")

            aiconfig_json = state.aiconfig.model_dump(exclude=EXCLUDE_OPTIONS)
            # print(f"{aiconfig_json=}\n")
            # yield_output = core_utils.JSONObject({"data": aiconfig_json})
            # print(f"{yield_output=}")
            print(f"{str(aiconfig_json)=}\n")
            yield json.dumps({"output_chunk": output.model_dump()})
            # yield aiconfig_json

            # HttpResponseWithAIConfig(
            #     message=f"Done step {i+1}/{num_stream_steps}...",
            #     aiconfig=state.aiconfig,
            # ).to_flask_format()

        if aiconfig_json is None:
            aiconfig_json = state.aiconfig.model_dump(exclude=EXCLUDE_OPTIONS)
        yield json.dumps({"aiconfig": aiconfig_json})

Where each intermediate chunk is the output chunk with accumulated content and then a final output of the full aiconfig

Makes sense to me!

@rholinshead
Copy link
Contributor

rholinshead commented Dec 28, 2023

Ok, so for the streaming to work with the client side api we'll use (oboe), we need to wrap the entire response in an array, so:

  • first chunk should sent "[{output_chunk: output}, \n" with opening "["
  • next chunks send just "{output_chunk: output}, \n"
  • end sends "{aiconfig: aiconfig_json}]" with closing "]"

See #651 for example, just need some improvements there:

  • shouldn't need to yield [ or ] separately (needs some python string manipulation to add to the first chunk / end config text
  • can we do this for all run_prompt calls? i.e. even for non-streaming run_prompt, immediately just send [{output: output}, {aiconfig: aiconfig}]" and ideally we can use one endpoint/handling for all run prompt calls

@rossdanlm
Copy link
Contributor Author

Closing because we have updated streaming PR in #683

@rossdanlm rossdanlm closed this Jan 2, 2024
@rossdanlm rossdanlm deleted the pr615 branch January 2, 2024 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants