Stable Diffusion 1.4 #70

bgoelTT · 2025-01-20T02:35:54Z

This PR adds a flask server implementation of stable diffusion 1.4 and dockerizes it

It creates a new flask server that:

implements a warmup process whose completion gates the activation of the main inference and healthcheck endpoints
implements the main inference endpoint to submit prompts to be generated to the main backend thread
implements a healthcheck endpoint that will only return a 200 code once the server is warmed up and ready to generate images

tstescoTT · 2025-01-31T22:23:57Z

I'm going to ask you to rebase this to dev and PR to dev pending acceptance of new git flow proposed :)

milank94 · 2025-02-01T02:31:44Z

tt-metal-stable-diffusion-1.4/.env.default

@@ -0,0 +1,6 @@
+TT_METAL_DOCKERFILE_VERSION=v0.53.0-rc34
+TT_METAL_COMMIT_SHA_OR_TAG=4da4a5e79a13ece7ff5096c30cef79cb0c504f0e


Is there a reference to this commit in tt-metal anywhere? Curious how you selected this one.

I copy+pasted this from the YOLOv4 server because that required a specific commit as the metal YOLOv4 improvements got reverted as couldn't pass CI.

Should I just use the latest release? That would be release v0.55.0?

Updated to use release v0.55.0 in 4582863

milank94 · 2025-02-01T02:35:00Z

tt-metal-stable-diffusion-1.4/tests/locustfile.py

+# SPDX-FileCopyrightText: © 2024 Tenstorrent AI ULC
+
+from locust import HttpUser, task
+from utils import sample_file


I think this should be get_sample_prompt?

Removed locust tests in 016af35

milank94 · 2025-02-01T02:50:28Z

tt-metal-stable-diffusion-1.4/tests/locustfile.py

Is a locustfile needed for this demo?

No this isn't needed. The locust test here won't really measure performance, I want it to enqueue 5 prompts into the input queue and process them. I will add this to test_inference_api.py

Removed locust tests in 016af35

milank94 · 2025-02-01T02:57:50Z

tt-metal-stable-diffusion-1.4/server/flaskserver.py

+app = Flask(__name__)
+
+# var to indicate ready state
+ready = False


Global variable ready : this flag is accessed and modified by multiple threads (warmup and request handlers). This may lead to race conditions. Consider using thread-safe mechanisms like threading.Lock or Event to manage state changes.

Enforced mutual exclusion of ready flag in 835fc91 then fixed a bug in that implementeation in 63d973c

milank94 · 2025-02-01T02:59:17Z

tt-metal-stable-diffusion-1.4/server/flaskserver.py

+
+@app.route("/clean_up", methods=["POST"])
+def clean_up():
+    with open(json_file_path, "r") as f:


Repeated code blocks : the code repeatedly reads and writes json_file_path. This can be refactored into utility functions for cleaner logic:

def read_json_file(file_path): if not os.path.isfile(file_path): return {"prompts": []} with open(file_path, "r") as f: return json.load(f) def write_json_file(file_path, data): with open(file_path, "w") as f: json.dump(data, f, indent=4)

Refactored in a75d0e6

milank94 · 2025-02-01T03:00:28Z

tt-metal-stable-diffusion-1.4/server/flaskserver.py

+        os.remove(
+            "models/demos/wormhole/stable_diffusion/demo/web_demo/input_prompts.json"
+        )
+        print("Deleted json")


Use logging instead of print statements for better production diagnostics.

Used logging instead of print in f5650c5

milank94

Some feedback added.

For the license headers, the year should be set to 2025.

…r into ben/sd1.4

…r into ben/sd3.5

… ben/sd3.5

… ben/sd1.4

tstescoTT · 2025-02-24T03:32:54Z

tt-metal-stable-diffusion-1.4/.env.default

Is this .env file needed?

I'd advocate for using the setup.sh script to generate the .env file for SD1.4. Also, it can use the format that integrates with tt-studio.

The environment variables defined in it are Docker build environment variables, are you sourcing the .env file for Docker build or does docker compose --env-file on first run pass the env vars through to the correct ARGS in the Dockerfile? We have been using the .env for setup runtime and dependencies, and keeping the Docker Build variables in documentation. I'm not against this, but it's different than other model implementations and not necessary if the user is running Docker build.

tstescoTT · 2025-02-24T03:33:19Z

tt-metal-stable-diffusion-1.4/docker-compose.yaml

Do you use this for development often? If you are finding it more useful than having docker run commands in documentation we should consider this for other model implementations.

tstescoTT · 2025-02-24T03:34:15Z

tt-metal-stable-diffusion-1.4/stable-diffusion-1.4.src.Dockerfile

+# default base image, override with --build-arg TT_METAL_DOCKERFILE_VERSION=<version>
+ARG TT_METAL_DOCKERFILE_VERSION
+
+FROM ghcr.io/tenstorrent/tt-metal/tt-metalium-ubuntu-20.04-amd64-release:$TT_METAL_DOCKERFILE_VERSION


the tt-metal release images naming convention
FROM ghcr.io/tenstorrent/tt-metal/tt-metalium-ubuntu-20.04-amd64-release:$TT_METAL_DOCKERFILE_VERSION
recently changed. Further to this, in the vLLM models Dockerfile I added the base container as an ARG TT_METAL_DOCKERFILE_URL so that the same .Dockerfile can also to support local builds, example to use ubuntu 22.04, which does not have a published tt-metal GHCR image.

tstescoTT · 2025-02-24T03:34:24Z

tt-metal-stable-diffusion-1.4/stable-diffusion-1.4.src.Dockerfile

+    && cd ${TT_METAL_HOME} \
+    && git checkout ${TT_METAL_COMMIT_SHA_OR_TAG} \
+    && git submodule update --init --recursive \
+    && git submodule foreach 'git lfs fetch --all && git lfs pull' \


git submodule foreach 'git lfs fetch --all && git lfs pull' can now be removed, lfs is not longer required for tt-metal build.

tstescoTT · 2025-02-24T03:34:35Z

tt-metal-stable-diffusion-1.4/stable-diffusion-1.4.src.Dockerfile

+    && chown -R user:user ${HOME_DIR} \
+    && chown -R user:user ${TT_METAL_HOME}
+
+USER user


the name of the user being user has caused some confusion when the host also has a user named user. The username for the other models was changed to container_app_user to disambiguate. UID=1000 should still be used to retaint compatability with internal cloud k8s deployment sytem. This is passed in as an ARG CONTAINER_APP_UID in vLLM model .Dockerfile.

tstescoTT · 2025-02-24T03:45:13Z

tt-metal-stable-diffusion-1.4/server/flaskserver.py

Leaving it to you to decide, on whether the code should be moved but I don't love having the inference server backend implementation living in tt-metal repo.

The interaction with that server via a combination of HTTP API and file system needs some improvement to be threadsafe. I'd recommend a threadsafe lock on a global dict for example. Django has some facilities for this as well.

Current solution needs example client script, not clear how to call the image API.
does client need to call:

/submit

/update_status

loop over get_latest_time or image_exists

/get_image

That said, I'd recommend message passing using an inter-process queue, instead of using the filesystem and addtion REST API. Either python multiprocessing or a more robust solution using e.g. zmq (https://github.com/zeromq/pyzmq) which is used in vLLM backend.

I believe you have already overhauled this work for SD3.5, but adding these comments for tracking.

tstescoTT · 2025-02-24T03:46:16Z

tt-metal-stable-diffusion-1.4/server/gunicorn.conf.py

+bind = f"0.0.0.0:{7000}"
+reload = False
+worker_class = "gthread"
+threads = 16


running with 16 threads could have issues with thread safety given current server inter communication.

tstescoTT · 2025-02-24T03:48:33Z

I've added my comments I sent you over slack for tracking. It sounded like you have overhauled this implementation and have a threadsafe scheme in place.

Let me know if you want a review on that.

bgoelTT · 2025-02-24T18:01:37Z

This PR is superceded by #113

bgoelTT added 3 commits January 19, 2025 19:47

Add initial docker setup

a352aaa

Add sd server as background process in flask server

900c08b

Update README and testing

302f293

bgoelTT self-assigned this Jan 20, 2025

bgoelTT added 5 commits January 20, 2025 11:20

Add ready check mechanism, not finished

136841a

Finish readycheck warmup thread

572cdb3

Add health check endpoint

cd869eb

Add common API token utility and use

f7774d7

Return JSON object

a92e81c

bgoelTT requested review from tstescoTT and milank94 January 31, 2025 15:35

milank94 reviewed Feb 1, 2025

View reviewed changes

milank94 requested changes Feb 1, 2025

View reviewed changes

bgoelTT added 7 commits February 1, 2025 21:24

Use tt-metal release v0.55.0

4582863

Use logging instead of print

f5650c5

Update header year to 2025

ada9877

Refactor repeated functionality

a75d0e6

Add mutual exclusion to ready variable

835fc91

Remove locust tests and add healthcheck test

016af35

Test more API endpoints and fix bug in locking mechanism

63d973c

bgoelTT requested a review from milank94 February 3, 2025 14:38

bgoelTT added 3 commits February 3, 2025 20:46

Use specific fix for sd1.4 web demo

b7f5251

Use newest fix

5439ab2

Use fix to enable repeated prompts

1b39319

Merge branch 'ben/sd1.4' of github.com:tenstorrent/tt-inference-serve…

8c1cb42

…r into ben/sd1.4

bgoelTT changed the base branch from main to dev February 4, 2025 20:44

bgoelTT added 18 commits February 5, 2025 16:03

Merge branch 'ben/sd1.4' of github.com:tenstorrent/tt-inference-serve…

16bc4e9

…r into ben/sd3.5

Initial commit of async worker server

12f1419

Integrate TtStableDiffusion3Pipeline -- debugging segfault

22a7bf4

Add placeholder HF_TOKEN

2e4528f

Finish integration -- add API key requirements

57ecc53

Merge branch 'dev' of github.com:tenstorrent/tt-inference-server into…

5814102

… ben/sd3.5

Add model to setup.sh

a8af17f

Configure docker compose to use persisent volume

f5ad09d

Merge branch 'dev' of github.com:tenstorrent/tt-inference-server into…

0b8f2f5

… ben/sd1.4

Update README.md

37081ee

Add permission entrypoint script to dockerfile

7429218

Update healthcheck route to return JSON response

53dba8c

Use random seed

ea7fb0b

Add API key requirement to main inference endpoints

ef21d84

Update README instructions for building and running

bdfadf3

Update testing requirements all passing now

fd6103e

Rename .env.default to .env.build

5f2e850

Merge SD3.5 upstream

31ba1b7

tstescoTT reviewed Feb 24, 2025

View reviewed changes

bgoelTT closed this Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stable Diffusion 1.4 #70

Stable Diffusion 1.4 #70

bgoelTT commented Jan 20, 2025 •

edited

Loading

tstescoTT commented Jan 31, 2025 •

edited

Loading

milank94 Feb 1, 2025

bgoelTT Feb 1, 2025 •

edited

Loading

bgoelTT Feb 3, 2025

milank94 Feb 1, 2025

bgoelTT Feb 3, 2025

milank94 Feb 1, 2025

bgoelTT Feb 2, 2025

bgoelTT Feb 3, 2025

milank94 Feb 1, 2025

bgoelTT Feb 3, 2025 •

edited

Loading

milank94 Feb 1, 2025

bgoelTT Feb 3, 2025

milank94 Feb 1, 2025

bgoelTT Feb 3, 2025

milank94 left a comment

tstescoTT Feb 24, 2025

tstescoTT Feb 24, 2025

tstescoTT Feb 24, 2025

tstescoTT Feb 24, 2025

tstescoTT Feb 24, 2025

tstescoTT Feb 24, 2025

tstescoTT Feb 24, 2025

tstescoTT commented Feb 24, 2025

bgoelTT commented Feb 24, 2025

		@@ -0,0 +1,6 @@
		TT_METAL_DOCKERFILE_VERSION=v0.53.0-rc34
		TT_METAL_COMMIT_SHA_OR_TAG=4da4a5e79a13ece7ff5096c30cef79cb0c504f0e

Stable Diffusion 1.4 #70

Stable Diffusion 1.4 #70

Conversation

bgoelTT commented Jan 20, 2025 • edited Loading

tstescoTT commented Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

bgoelTT Feb 1, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgoelTT Feb 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

milank94 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tstescoTT commented Feb 24, 2025

bgoelTT commented Feb 24, 2025

bgoelTT commented Jan 20, 2025 •

edited

Loading

tstescoTT commented Jan 31, 2025 •

edited

Loading

bgoelTT Feb 1, 2025 •

edited

Loading

bgoelTT Feb 3, 2025 •

edited

Loading