-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stable Diffusion 1.4 #70
Conversation
I'm going to ask you to rebase this to |
@@ -0,0 +1,6 @@ | |||
TT_METAL_DOCKERFILE_VERSION=v0.53.0-rc34 | |||
TT_METAL_COMMIT_SHA_OR_TAG=4da4a5e79a13ece7ff5096c30cef79cb0c504f0e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reference to this commit in tt-metal
anywhere? Curious how you selected this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copy+pasted this from the YOLOv4 server because that required a specific commit as the metal YOLOv4 improvements got reverted as couldn't pass CI.
Should I just use the latest release? That would be release v0.55.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use release v0.55.0 in 4582863
# SPDX-FileCopyrightText: © 2024 Tenstorrent AI ULC | ||
|
||
from locust import HttpUser, task | ||
from utils import sample_file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be get_sample_prompt
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed locust tests in 016af35
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a locustfile
needed for this demo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this isn't needed. The locust test here won't really measure performance, I want it to enqueue 5 prompts into the input queue and process them. I will add this to test_inference_api.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed locust tests in 016af35
app = Flask(__name__) | ||
|
||
# var to indicate ready state | ||
ready = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Global variable ready
: this flag is accessed and modified by multiple threads (warmup and request handlers). This may lead to race conditions. Consider using thread-safe mechanisms like threading.Lock
or Event
to manage state changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
@app.route("/clean_up", methods=["POST"]) | ||
def clean_up(): | ||
with open(json_file_path, "r") as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repeated code blocks : the code repeatedly reads and writes json_file_path
. This can be refactored into utility functions for cleaner logic:
def read_json_file(file_path):
if not os.path.isfile(file_path):
return {"prompts": []}
with open(file_path, "r") as f:
return json.load(f)
def write_json_file(file_path, data):
with open(file_path, "w") as f:
json.dump(data, f, indent=4)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored in a75d0e6
os.remove( | ||
"models/demos/wormhole/stable_diffusion/demo/web_demo/input_prompts.json" | ||
) | ||
print("Deleted json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use logging
instead of print statements for better production diagnostics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used logging
instead of print in f5650c5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some feedback added.
For the license headers, the year should be set to 2025.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this .env file needed?
I'd advocate for using the setup.sh script to generate the .env file for SD1.4. Also, it can use the format that integrates with tt-studio.
The environment variables defined in it are Docker build environment variables, are you sourcing the .env file for Docker build
or does docker compose --env-file
on first run pass the env vars through to the correct ARGS in the Dockerfile? We have been using the .env for setup runtime and dependencies, and keeping the Docker Build variables in documentation. I'm not against this, but it's different than other model implementations and not necessary if the user is running Docker build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you use this for development often? If you are finding it more useful than having docker run
commands in documentation we should consider this for other model implementations.
# default base image, override with --build-arg TT_METAL_DOCKERFILE_VERSION=<version> | ||
ARG TT_METAL_DOCKERFILE_VERSION | ||
|
||
FROM ghcr.io/tenstorrent/tt-metal/tt-metalium-ubuntu-20.04-amd64-release:$TT_METAL_DOCKERFILE_VERSION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the tt-metal release images naming convention
FROM ghcr.io/tenstorrent/tt-metal/tt-metalium-ubuntu-20.04-amd64-release:$TT_METAL_DOCKERFILE_VERSION
recently changed. Further to this, in the vLLM models Dockerfile I added the base container as an ARG TT_METAL_DOCKERFILE_URL
so that the same .Dockerfile can also to support local builds, example to use ubuntu 22.04, which does not have a published tt-metal GHCR image.
&& cd ${TT_METAL_HOME} \ | ||
&& git checkout ${TT_METAL_COMMIT_SHA_OR_TAG} \ | ||
&& git submodule update --init --recursive \ | ||
&& git submodule foreach 'git lfs fetch --all && git lfs pull' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
git submodule foreach 'git lfs fetch --all && git lfs pull'
can now be removed, lfs is not longer required for tt-metal build.
&& chown -R user:user ${HOME_DIR} \ | ||
&& chown -R user:user ${TT_METAL_HOME} | ||
|
||
USER user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the name of the user being user
has caused some confusion when the host also has a user named user
. The username for the other models was changed to container_app_user
to disambiguate. UID=1000 should still be used to retaint compatability with internal cloud k8s deployment sytem. This is passed in as an ARG CONTAINER_APP_UID
in vLLM model .Dockerfile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving it to you to decide, on whether the code should be moved but I don't love having the inference server backend implementation living in tt-metal repo.
The interaction with that server via a combination of HTTP API and file system needs some improvement to be threadsafe. I'd recommend a threadsafe lock on a global dict for example. Django has some facilities for this as well.
Current solution needs example client script, not clear how to call the image API.
does client need to call:
/submit
/update_status
- loop over
get_latest_time
orimage_exists
/get_image
That said, I'd recommend message passing using an inter-process queue, instead of using the filesystem and addtion REST API. Either python multiprocessing or a more robust solution using e.g. zmq (https://github.com/zeromq/pyzmq) which is used in vLLM backend.
I believe you have already overhauled this work for SD3.5, but adding these comments for tracking.
bind = f"0.0.0.0:{7000}" | ||
reload = False | ||
worker_class = "gthread" | ||
threads = 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
running with 16 threads could have issues with thread safety given current server inter communication.
I've added my comments I sent you over slack for tracking. It sounded like you have overhauled this implementation and have a threadsafe scheme in place. Let me know if you want a review on that. |
This PR is superceded by #113 |
This PR adds a flask server implementation of stable diffusion 1.4 and dockerizes it
It creates a new flask server that: