Release Candidate v0.0.1 #88

tstescoTT · 2025-02-01T05:42:09Z

change log

Default handling of MESH_DEVICE for Llama 3.x models
Modified setup script improvements:
- Improved environment variable handling and persistence storage integration
- Added IMPL_ID field (set to "tt-metal" for all current models)
- Introduced MODEL_VERSION and MODEL_ID variables for better versioning
Add image input support for image-text-to-text models in client scripts and tools
- Added support for image input in trace capturing
- Added new parameters for image width and height
- Implemented handling of both text-only and image+text trace captures
Rename client side scripts batch_size options to max_concurrent to indicate client side concurrent request limits
Fixed the vLLM model registration logic. Added missing ModelRegistry.register_model call for TTLlamaForCausalLM for legacy implementation models
Updated benchmark path handling to use $HOME environment variable instead of hardcoded /home/user path
Add benchmark summary support handling for vllm benchmark script, add documentation example
Added support for a new model "DeepSeek-R1-Distill-Llama-70B" in the model setup configurations

* Default handling of MESH_DEVICE for Llama 3.x models * Modified setup script improvements: * Improved environment variable handling and persistence storage integration * Added IMPL_ID field (set to "tt-metal" for all current models) * Introduced MODEL_VERSION and MODEL_ID variables for better versioning * Add image input support for image-text-to-text models in client scripts and tools * Added support for image input in trace capturing * Added new parameters for image width and height * Implemented handling of both text-only and image+text trace captures * Rename client side scripts batch_size options to max_concurrent to indicate client side concurrent request limits * Fixed the vLLM model registration logic. Added missing ModelRegistry.register_model call for TTLlamaForCausalLM for legacy implementation models * Updated benchmark path handling to use $HOME environment variable instead of hardcoded /home/user path * Add benchmark summary support handling for vllm benchmark script, add documentation example * Added support for a new model "DeepSeek-R1-Distill-Llama-70B" in the model setup configurations

milank94 · 2025-02-01T19:41:36Z

vllm-tt-metal-llama3/README.md

  --shm-size 32G \
  --publish 7000:7000 \
-  ghcr.io/tenstorrent/tt-inference-server/tt-metal-llama3-70b-src-base-vllm:v0.0.1-tt-metal-v0.54.0-rc2-953161188c50 
+  ghcr.io/tenstorrent/tt-inference-server/vllm-llama3-src-dev-ubuntu-20.04-amd64:v0.0.1-47fb1a2fb6e0-2f33504bad49


Is this commit of tt-metal (47fb1a2fb6e0) the latest supported, but missing a release tag? i.e. from the LLM support table, v0.55.0-rc12 is listed as the latest release.

I'd recommend additional testing to use the tt-metal RC release tags. If RC tags are available when we want to start testing for the next drop we can use them from the jump. I believe the RC tags are nightly cuts from main so likelihood of breaking changes introduced is low. I agree we should transition to using tt-metal RCs and releases.

milank94 · 2025-02-01T19:45:05Z

vllm-tt-metal-llama3/src/run_vllm_api_server.py

+        "WH_ARCH_YAML": "wormhole_b0_80_arch_eth_dispatch.yaml",
+    }
+    env_var_map = {
+        "meta-llama/Llama-3.1-70B-Instruct": {


How come only the 70B models require this env_var_map?

Only the legacy implementations need those additional environment variables currently.

ppetrovicTT · 2025-02-03T14:46:04Z

tests/mock_vllm_offline_inference_tt.py

@@ -214,7 +214,7 @@ async def generate_tokens_async(
    parser.add_argument(
        "--prompts_json",
        type=str,
-        default="/home/user/vllm/tt_metal/prompts.json",
+        default="/home/container_app_user/vllm/tt_metal/prompts.json",


vllm_dir here as well?
i'm worried about hardcoding container_app_user

added in bac48be

ppetrovicTT · 2025-02-03T14:46:42Z

tests/mock.vllm.openai.api.dockerfile

@@ -108,9 +111,9 @@ CMD ["/bin/bash", "-c", "source ${PYTHON_ENV_DIR}/bin/activate && python mock_vl

 # Default environment variables for the Llama-3.1-70b-instruct inference server
 # Note: LLAMA3_CKPT_DIR and similar variables get set by mock_vllm_api_server.py
-ENV CACHE_ROOT=/home/user/cache_root
-ENV HF_HOME=/home/user/cache_root/huggingface
+ENV CACHE_ROOT=/home/container_app_user/cache_root


should we use "CONTAINER_APP_USERNAME" variable from above?
in case we change it, it should propagate automatically

added in bac48be

ppetrovicTT · 2025-02-03T14:48:46Z

setup.sh

@@ -111,84 +113,104 @@ setup_model_environment() {
    # Set environment variables based on the model selection
    # note: MODEL_NAME is the directory name for the model weights
    case "$1" in
-        "llama-3.3-70b-instruct")
+        "DeepSeek-R1-Distill-Llama-70B")
+        IMPL_ID="tt-metal"


What is IMPL? I would assume it's the implementation of the model, but since it's "tt-metal", i'm not sure?

yes thats the intent, these are tt-metal implementations (ttnn to be more precise). This is to distinguish between tt-forge implementations or others in future.

ppetrovicTT · 2025-02-03T14:50:33Z

evals/run_evals.sh

@@ -45,7 +45,7 @@ lm_eval \
 --gen_kwargs model=${HF_MODEL_REPO_ID},stop="<|eot_id|>",stream=False \
 --tasks meta_gpqa \
 --batch_size auto \
--output_path /home/user/cache_root/eval_output \
+--output_path /home/container_app_user/cache_root/eval_output \


i think we should use CACHE_ROOT variable here
same below (in several places)

added in bac48be

ppetrovicTT · 2025-02-04T11:54:13Z

benchmarking/vllm_online_benchmark.py

        run_benchmark(
-            benchmark_script="/home/user/vllm/benchmarks/benchmark_serving.py",
+            benchmark_script=f"{user_home}/vllm/benchmarks/benchmark_serving.py",


vllm_dir here as well

added in bac48be

…erfile (#96)

tstescoTT requested review from ppetrovicTT, milank94, rpavlovicTT and mvanniasingheTT February 1, 2025 05:42

tstescoTT changed the title ~~# Release Candidate v0.0.1~~ Release Candidate v0.0.1 Feb 1, 2025

tstescoTT mentioned this pull request Feb 1, 2025

Fixes for Llama 3.x support, adding image input tooling, tt-studio environment variable handling #83

Closed

milank94 reviewed Feb 1, 2025

View reviewed changes

rpavlovicTT approved these changes Feb 3, 2025

View reviewed changes

ppetrovicTT reviewed Feb 3, 2025

View reviewed changes

ppetrovicTT reviewed Feb 4, 2025

View reviewed changes

tstescoTT mentioned this pull request Feb 5, 2025

tstesco/use-env-vars #96

Merged

use CACHE_ROOT and vllm_dir where possible, fix mock.vllm.openai.dock…

bac48be

…erfile (#96)

tstescoTT merged commit 6061606 into main Feb 5, 2025
1 check passed

tstescoTT mentioned this pull request Feb 6, 2025

Model name for vLLM instantiation #71

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release Candidate v0.0.1 #88

Release Candidate v0.0.1 #88

tstescoTT commented Feb 1, 2025

milank94 Feb 1, 2025

tstescoTT Feb 2, 2025

milank94 Feb 1, 2025

tstescoTT Feb 2, 2025

ppetrovicTT Feb 3, 2025

tstescoTT Feb 5, 2025

ppetrovicTT Feb 3, 2025

tstescoTT Feb 5, 2025

ppetrovicTT Feb 3, 2025

tstescoTT Feb 3, 2025

ppetrovicTT Feb 3, 2025

tstescoTT Feb 5, 2025

ppetrovicTT Feb 4, 2025

tstescoTT Feb 5, 2025

Release Candidate v0.0.1 #88

Release Candidate v0.0.1 #88

Conversation

tstescoTT commented Feb 1, 2025

change log

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment