-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[editor][server endpoints][3/n]: Proof-of-Concept for Streaming #615
Conversation
To test, follow the readme to run the backend server in a terminal, then go to another terminal and enter: ##Test plan ``` alias aiconfig="python -m 'aiconfig.scripts.aiconfig_cli'" aiconfig edit --aiconfig-path="/Users/rossdancraig/Projects/aiconfig/cookbooks/Getting-Started/travel.aiconfig.json" --server-port=8080 --server-mode=debug_servers curl http://localhost:8080/api/run -d '{"prompt_name":"get_activities"}' -X POST -H 'Content-Type: application/json' ``` This results in `get_activities` prompt being deleted (notice that the second prompt is now the one that appears first): <img width="969" alt="Screenshot 2023-12-26 at 00 30 23" src="https://github.com/lastmile-ai/aiconfig/assets/151060367/5dd905c8-7cb6-4c1f-a97b-284337196506"> <img width="824" alt="Screenshot 2023-12-26 at 00 30 49" src="https://github.com/lastmile-ai/aiconfig/assets/151060367/8bfc5003-72d0-4e5a-9bf9-14e849acade7">
TSIA, pretty simple, should be able to pass in params field See previous diff for example on how to test
The actual call in the `run` command will be a bit more complex since we'll have to attach a stream callback and probably create a text queue iterator like what we did for Gradio, but this shows that it's possible and shouldn't be too complicated. Sorry Jonathan, I didn't use any of your super useful generic functions, but we can do that later. A big thing I wanted to point out is that in the `yield` generator, I need explicitly return a string (not a json object), otherwise this will not work, so the frontend needs to be able to parse that result somehow. We should sync on this but I feel jsons are strings anyways and as long as we follow the same output "formula" (ex: in the `list_models` we out `data` key vs other commands we directly return the AIConfig json), it should be fine.
from flask_cors import CORS | ||
from aiconfig.schema import ExecuteResult, Prompt | ||
from flask import Flask, Response, request, stream_with_context | ||
from flask_cors import CORS # TODO: add this to requirements.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add it! :D
@@ -114,17 +117,83 @@ def create() -> FlaskResponse: | |||
state.aiconfig = AIConfigRuntime.create() # type: ignore | |||
return HttpResponseWithAIConfig(message="Created new AIConfig", aiconfig=state.aiconfig).to_flask_format() | |||
|
|||
@app.route('/api/test_streaming', methods=["POST"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may as well make the real endpoint. It's just us using it anyway; we're not committing to anything big here.
EXCLUDE_OPTIONS = { | ||
"prompt_index": True, | ||
"file_path": True, | ||
"callback_manager": True, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there a lot of duplication with HttpResponseWithAIConfig.to_flask_format()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea you're right, I was mainly just playing around and trying to get things tested quickly
def generate(num_stream_steps: int): | ||
prompt : Prompt = state.aiconfig.get_prompt('get_activities') | ||
for i in range(num_stream_steps): | ||
time.sleep(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we sleeping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to prove that the streaming is going as planned and I can see outputs bit-by-bit
params : str = request_json.get("params", None) | ||
stream : bool = request_json.get("stream", True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: remove space before :. You can autoformat in-place with the linter.
output = ExecuteResult( | ||
output_type="execute_result", | ||
execution_count=0, | ||
data = "Rossdan" + str(i+1), | ||
metadata = {}, | ||
) | ||
prompt.outputs = [output] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be explicitly constructing this? This feels like unnecessary duplication
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right lol, it's kind of not the greatest. It would be better to just overwrite output.data each time, but we'd still have to initialize it first to an ExecuteResult
object (not too hard, just check if output is empty array first) but yea, just a proof of concept, nothing fancy this diff yet
# yield_output = core_utils.JSONObject({"data": aiconfig_json}) | ||
# print(f"{yield_output=}") | ||
print(f"{str(aiconfig_json)=}\n") | ||
yield str(aiconfig_json) + "\n\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this iterate over the tokens returned? Doesn't it just yield the same fixed contents of get_activities()
every iteration ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aiconfig_json
is updated with different output each time. Sorry I forgot to include test video but you can see it now :)
So the response itself would be a literal string instead of an object with {data: string}?
Can you explain the reasoning for this? Couldn't you alternatively just yield an object instead of the string? I'm not super familiar with generators but I didn't think they were limited to primitive types? |
This might be a flask limitation. @rossdanlm should help clarify. I suggest that if this works and there's any question whether flask can do it another way, and it's easy to deal with on frontend, let's just do that. For non-streaming endpoints I have a pretty good understanding what's possible and have returned JSON objects. |
Yea I wasn't able to get regular object to be passed, so had to convert the json to a string and it worked
Yea I agree, unless we're able to get this unblocked in 20-30 mins of investigation |
# [editor] Set Up run_prompt Callbacks Setting up the callbacks for `run_prompt`. The request succeeds but currently the response is not correct -- the aiconfig returned doesn't have the outputs. This is on main, so will land this and test with #615 A subsequent PR will also need to link local typescript aiconfig to our editor package.json in order to have the updated output types until we can publish the updated package. With the linking, I'll then add output rendering. ## Testing: - Make sure /run request succeeds: https://github.com/lastmile-ai/aiconfig/assets/5060851/2cacc07c-3bfe-4a63-8e74-f912b64260dd
Yielding string is fine (and probably expected) since we can stream the json chunks, something like:
Where each intermediate chunk is the output chunk with accumulated content and then a final output of the full aiconfig |
Makes sense to me! |
Ok, so for the streaming to work with the client side api we'll use (oboe), we need to wrap the entire response in an array, so:
See #651 for example, just need some improvements there:
|
Closing because we have updated streaming PR in #683 |
[editor][server endpoints][3/n]: Proof-of-Concept for Streaming
The actual call in the
run
command will be a bit more complex since we'll have to attach a stream callback and probably create a text queue iterator like what we did for Gradio, but this shows that it's possible and shouldn't be too complicated.Sorry Jonathan, I didn't use any of your super useful generic functions, but we can do that later.
A big thing I wanted to point out is that in the
yield
generator, I need explicitly return a string (not a json object), otherwise this will not work, so the frontend needs to be able to parse that result somehow. We should sync on this but I feel jsons are strings anyways and as long as we follow the same output "formula" (ex: in thelist_models
we outdata
key vs other commands we directly return the AIConfig json), it should be fine.Test plan
a9d769e6-d8d6-4aad-b450-6b27a75e509a.mp4
Stack created with Sapling. Best reviewed with ReviewStack.