Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable Diffusion 1.4 #70

Closed
wants to merge 55 commits into from
Closed

Stable Diffusion 1.4 #70

wants to merge 55 commits into from

Conversation

bgoelTT
Copy link
Contributor

@bgoelTT bgoelTT commented Jan 20, 2025

This PR adds a flask server implementation of stable diffusion 1.4 and dockerizes it

It creates a new flask server that:

  • implements a warmup process whose completion gates the activation of the main inference and healthcheck endpoints
  • implements the main inference endpoint to submit prompts to be generated to the main backend thread
  • implements a healthcheck endpoint that will only return a 200 code once the server is warmed up and ready to generate images

@bgoelTT bgoelTT self-assigned this Jan 20, 2025
@bgoelTT bgoelTT requested review from tstescoTT and milank94 January 31, 2025 15:35
@tstescoTT
Copy link
Contributor

tstescoTT commented Jan 31, 2025

I'm going to ask you to rebase this to dev and PR to dev pending acceptance of new git flow proposed :)

@@ -0,0 +1,6 @@
TT_METAL_DOCKERFILE_VERSION=v0.53.0-rc34
TT_METAL_COMMIT_SHA_OR_TAG=4da4a5e79a13ece7ff5096c30cef79cb0c504f0e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reference to this commit in tt-metal anywhere? Curious how you selected this one.

Copy link
Contributor Author

@bgoelTT bgoelTT Feb 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copy+pasted this from the YOLOv4 server because that required a specific commit as the metal YOLOv4 improvements got reverted as couldn't pass CI.

Should I just use the latest release? That would be release v0.55.0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use release v0.55.0 in 4582863

# SPDX-FileCopyrightText: © 2024 Tenstorrent AI ULC

from locust import HttpUser, task
from utils import sample_file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be get_sample_prompt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed locust tests in 016af35

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a locustfile needed for this demo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this isn't needed. The locust test here won't really measure performance, I want it to enqueue 5 prompts into the input queue and process them. I will add this to test_inference_api.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed locust tests in 016af35

app = Flask(__name__)

# var to indicate ready state
ready = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global variable ready : this flag is accessed and modified by multiple threads (warmup and request handlers). This may lead to race conditions. Consider using thread-safe mechanisms like threading.Lock or Event to manage state changes.

Copy link
Contributor Author

@bgoelTT bgoelTT Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enforced mutual exclusion of ready flag in 835fc91 then fixed a bug in that implementeation in 63d973c


@app.route("/clean_up", methods=["POST"])
def clean_up():
with open(json_file_path, "r") as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeated code blocks : the code repeatedly reads and writes json_file_path. This can be refactored into utility functions for cleaner logic:

def read_json_file(file_path):
    if not os.path.isfile(file_path):
        return {"prompts": []}
    with open(file_path, "r") as f:
        return json.load(f)

def write_json_file(file_path, data):
    with open(file_path, "w") as f:
        json.dump(data, f, indent=4)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored in a75d0e6

os.remove(
"models/demos/wormhole/stable_diffusion/demo/web_demo/input_prompts.json"
)
print("Deleted json")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use logging instead of print statements for better production diagnostics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used logging instead of print in f5650c5

Copy link
Contributor

@milank94 milank94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some feedback added.

For the license headers, the year should be set to 2025.

@bgoelTT bgoelTT requested a review from milank94 February 3, 2025 14:38
@bgoelTT bgoelTT changed the base branch from main to dev February 4, 2025 20:44
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this .env file needed?

I'd advocate for using the setup.sh script to generate the .env file for SD1.4. Also, it can use the format that integrates with tt-studio.

The environment variables defined in it are Docker build environment variables, are you sourcing the .env file for Docker build or does docker compose --env-file on first run pass the env vars through to the correct ARGS in the Dockerfile? We have been using the .env for setup runtime and dependencies, and keeping the Docker Build variables in documentation. I'm not against this, but it's different than other model implementations and not necessary if the user is running Docker build.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you use this for development often? If you are finding it more useful than having docker run commands in documentation we should consider this for other model implementations.

# default base image, override with --build-arg TT_METAL_DOCKERFILE_VERSION=<version>
ARG TT_METAL_DOCKERFILE_VERSION

FROM ghcr.io/tenstorrent/tt-metal/tt-metalium-ubuntu-20.04-amd64-release:$TT_METAL_DOCKERFILE_VERSION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tt-metal release images naming convention
FROM ghcr.io/tenstorrent/tt-metal/tt-metalium-ubuntu-20.04-amd64-release:$TT_METAL_DOCKERFILE_VERSION
recently changed. Further to this, in the vLLM models Dockerfile I added the base container as an ARG TT_METAL_DOCKERFILE_URL so that the same .Dockerfile can also to support local builds, example to use ubuntu 22.04, which does not have a published tt-metal GHCR image.

&& cd ${TT_METAL_HOME} \
&& git checkout ${TT_METAL_COMMIT_SHA_OR_TAG} \
&& git submodule update --init --recursive \
&& git submodule foreach 'git lfs fetch --all && git lfs pull' \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git submodule foreach 'git lfs fetch --all && git lfs pull' can now be removed, lfs is not longer required for tt-metal build.

&& chown -R user:user ${HOME_DIR} \
&& chown -R user:user ${TT_METAL_HOME}

USER user
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name of the user being user has caused some confusion when the host also has a user named user. The username for the other models was changed to container_app_user to disambiguate. UID=1000 should still be used to retaint compatability with internal cloud k8s deployment sytem. This is passed in as an ARG CONTAINER_APP_UID in vLLM model .Dockerfile.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving it to you to decide, on whether the code should be moved but I don't love having the inference server backend implementation living in tt-metal repo.

The interaction with that server via a combination of HTTP API and file system needs some improvement to be threadsafe. I'd recommend a threadsafe lock on a global dict for example. Django has some facilities for this as well.

Current solution needs example client script, not clear how to call the image API.
does client need to call:

  1. /submit
  2. /update_status
  3. loop over get_latest_time or image_exists
  4. /get_image

That said, I'd recommend message passing using an inter-process queue, instead of using the filesystem and addtion REST API. Either python multiprocessing or a more robust solution using e.g. zmq (https://github.com/zeromq/pyzmq) which is used in vLLM backend.

I believe you have already overhauled this work for SD3.5, but adding these comments for tracking.

bind = f"0.0.0.0:{7000}"
reload = False
worker_class = "gthread"
threads = 16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running with 16 threads could have issues with thread safety given current server inter communication.

@tstescoTT
Copy link
Contributor

I've added my comments I sent you over slack for tracking. It sounded like you have overhauled this implementation and have a threadsafe scheme in place.

Let me know if you want a review on that.

@bgoelTT
Copy link
Contributor Author

bgoelTT commented Feb 24, 2025

This PR is superceded by #113

@bgoelTT bgoelTT closed this Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants