Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release Candidate v0.0.1 #88

Merged
merged 2 commits into from
Feb 5, 2025
Merged

Release Candidate v0.0.1 #88

merged 2 commits into from
Feb 5, 2025

Conversation

tstescoTT
Copy link
Contributor

change log

  • Default handling of MESH_DEVICE for Llama 3.x models
  • Modified setup script improvements:
    • Improved environment variable handling and persistence storage integration
    • Added IMPL_ID field (set to "tt-metal" for all current models)
    • Introduced MODEL_VERSION and MODEL_ID variables for better versioning
  • Add image input support for image-text-to-text models in client scripts and tools
    • Added support for image input in trace capturing
    • Added new parameters for image width and height
    • Implemented handling of both text-only and image+text trace captures
  • Rename client side scripts batch_size options to max_concurrent to indicate client side concurrent request limits
  • Fixed the vLLM model registration logic. Added missing ModelRegistry.register_model call for TTLlamaForCausalLM for legacy implementation models
  • Updated benchmark path handling to use $HOME environment variable instead of hardcoded /home/user path
  • Add benchmark summary support handling for vllm benchmark script, add documentation example
  • Added support for a new model "DeepSeek-R1-Distill-Llama-70B" in the model setup configurations

* Default handling of MESH_DEVICE for Llama 3.x models
* Modified setup script improvements:
    * Improved environment variable handling and persistence storage integration
    * Added IMPL_ID field (set to "tt-metal" for all current models)
    * Introduced MODEL_VERSION and MODEL_ID variables for better versioning
* Add image input support for image-text-to-text models in client scripts and tools
    * Added support for image input in trace capturing
    * Added new parameters for image width and height
    * Implemented handling of both text-only and image+text trace captures
* Rename client side scripts batch_size options to max_concurrent to indicate client side concurrent request limits
* Fixed the vLLM model registration logic. Added missing ModelRegistry.register_model call for TTLlamaForCausalLM for legacy implementation models
* Updated benchmark path handling to use $HOME environment variable instead of hardcoded /home/user path
* Add benchmark summary support handling for vllm benchmark script, add documentation example
* Added support for a new model "DeepSeek-R1-Distill-Llama-70B" in the model setup configurations
--shm-size 32G \
--publish 7000:7000 \
ghcr.io/tenstorrent/tt-inference-server/tt-metal-llama3-70b-src-base-vllm:v0.0.1-tt-metal-v0.54.0-rc2-953161188c50
ghcr.io/tenstorrent/tt-inference-server/vllm-llama3-src-dev-ubuntu-20.04-amd64:v0.0.1-47fb1a2fb6e0-2f33504bad49
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this commit of tt-metal (47fb1a2fb6e0) the latest supported, but missing a release tag? i.e. from the LLM support table, v0.55.0-rc12 is listed as the latest release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend additional testing to use the tt-metal RC release tags. If RC tags are available when we want to start testing for the next drop we can use them from the jump. I believe the RC tags are nightly cuts from main so likelihood of breaking changes introduced is low. I agree we should transition to using tt-metal RCs and releases.

"WH_ARCH_YAML": "wormhole_b0_80_arch_eth_dispatch.yaml",
}
env_var_map = {
"meta-llama/Llama-3.1-70B-Instruct": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come only the 70B models require this env_var_map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the legacy implementations need those additional environment variables currently.

@@ -214,7 +214,7 @@ async def generate_tokens_async(
parser.add_argument(
"--prompts_json",
type=str,
default="/home/user/vllm/tt_metal/prompts.json",
default="/home/container_app_user/vllm/tt_metal/prompts.json",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vllm_dir here as well?
i'm worried about hardcoding container_app_user

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in bac48be

@@ -108,9 +111,9 @@ CMD ["/bin/bash", "-c", "source ${PYTHON_ENV_DIR}/bin/activate && python mock_vl

# Default environment variables for the Llama-3.1-70b-instruct inference server
# Note: LLAMA3_CKPT_DIR and similar variables get set by mock_vllm_api_server.py
ENV CACHE_ROOT=/home/user/cache_root
ENV HF_HOME=/home/user/cache_root/huggingface
ENV CACHE_ROOT=/home/container_app_user/cache_root
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use "CONTAINER_APP_USERNAME" variable from above?
in case we change it, it should propagate automatically

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in bac48be

@@ -111,84 +113,104 @@ setup_model_environment() {
# Set environment variables based on the model selection
# note: MODEL_NAME is the directory name for the model weights
case "$1" in
"llama-3.3-70b-instruct")
"DeepSeek-R1-Distill-Llama-70B")
IMPL_ID="tt-metal"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is IMPL? I would assume it's the implementation of the model, but since it's "tt-metal", i'm not sure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes thats the intent, these are tt-metal implementations (ttnn to be more precise). This is to distinguish between tt-forge implementations or others in future.

@@ -45,7 +45,7 @@ lm_eval \
--gen_kwargs model=${HF_MODEL_REPO_ID},stop="<|eot_id|>",stream=False \
--tasks meta_gpqa \
--batch_size auto \
--output_path /home/user/cache_root/eval_output \
--output_path /home/container_app_user/cache_root/eval_output \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should use CACHE_ROOT variable here
same below (in several places)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in bac48be

run_benchmark(
benchmark_script="/home/user/vllm/benchmarks/benchmark_serving.py",
benchmark_script=f"{user_home}/vllm/benchmarks/benchmark_serving.py",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vllm_dir here as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in bac48be

@tstescoTT tstescoTT mentioned this pull request Feb 5, 2025
@tstescoTT tstescoTT merged commit 6061606 into main Feb 5, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants