Fix genai-perf command line for LLM model type #959

antonaleks · 2025-02-06T10:03:52Z

Fix from this issue #935
If we run model-analyzer from nvcr.io/nvidia/tritonserver:24.08-py3-sdk docker container for model with LLM model type,
It will fail with the following error message:

Command:
genai-perf -m my_model -- -b 1 -u server:8001 -i grpc -f my_model-results.csv --verbose-csv --concurrency-range 64 --measurement-mode count_windows --collect-metrics --metrics-url http://server:8002 --metrics-interval 1000

Error:
2024-10-01 10:42 [INFO] genai_perf.parser:803 - Detected passthrough args: ['-b', '1', '-u', 'server:8001', '-i', 'grpc', '-f', 'my_model-results.csv', '--verbose-csv', '--concurrency-range', '64', '--measurement-mode', 'count_windows', '--collect-metrics', '--metrics-url', 'http://server:8002', '--metrics-interval', '1000']
usage: genai-perf [-h] [--version] {compare,profile} ...
genai-perf: error: argument subcommand: invalid choice: 'my_model' (choose from 'compare', 'profile')
It looks like the genai-perf command line created by model_analyzer missing required mode (genai-perf profile ...).

it seems that genai-perf has changed their CLI and now it requires profile. We could try adding profile to line 328 of perf_analyzer.py. It should now look like:

cmd = ["genai-perf", "profile -m", self._config.models_name()]

look at this issue triton-inference-server#935

nv-braf · 2025-02-06T17:20:23Z

Model Analyzer no longer supports LLMs (as you have noted the interface has changed). I would encourage you to use GenAI-Perf directly as the ability to both checkpoint and sweep through stimulus parameters has recently been added.
https://github.com/triton-inference-server/perf_analyzer/blob/main/genai-perf/docs/analyze.md

antonaleks · 2025-02-07T08:54:41Z

Model Analyzer no longer supports LLMs (as you have noted the interface has changed). I would encourage you to use GenAI-Perf directly as the ability to both checkpoint and sweep through stimulus parameters has recently been added. https://github.com/triton-inference-server/perf_analyzer/blob/main/genai-perf/docs/analyze.md

Thank you for your response! I have a few follow-up questions:

Does the suggested method allow for automatic Triton configuration tuning similar to Model Analyzer? If so, is there any documentation or example on how to achieve that?
If not, is there a plan to introduce a tool or service that can assist with LLM configuration tuning in the future?
What was the reason for discontinuing LLM support in Model Analyzer? Was it due to a shift in focus, technical limitations, or another factor?
If LLM support is no longer available, perhaps it would make sense to remove mentions of this mode from the repository to avoid confusion?

Looking forward to your insights!

nv-braf · 2025-02-07T15:43:17Z

Model Analyzer no longer supports LLMs (as you have noted the interface has changed). I would encourage you to use GenAI-Perf directly as the ability to both checkpoint and sweep through stimulus parameters has recently been added. https://github.com/triton-inference-server/perf_analyzer/blob/main/genai-perf/docs/analyze.md

Thank you for your response! I have a few follow-up questions:

Does the suggested method allow for automatic Triton configuration tuning similar to Model Analyzer? If so, is there any documentation or example on how to achieve that?

If not, is there a plan to introduce a tool or service that can assist with LLM configuration tuning in the future?

What was the reason for discontinuing LLM support in Model Analyzer? Was it due to a shift in focus, technical limitations, or another factor?

If LLM support is no longer available, perhaps it would make sense to remove mentions of this mode from the repository to avoid confusion?

Looking forward to your insights!

No, GenAI-Perf does not support automatic Triton configuration tuning. Can you share what parameters you are interested in tuning?

antonaleks · 2025-02-10T09:11:25Z

Model Analyzer no longer supports LLMs (as you have noted the interface has changed). I would encourage you to use GenAI-Perf directly as the ability to both checkpoint and sweep through stimulus parameters has recently been added. https://github.com/triton-inference-server/perf_analyzer/blob/main/genai-perf/docs/analyze.md

Thank you for your response! I have a few follow-up questions:

Does the suggested method allow for automatic Triton configuration tuning similar to Model Analyzer? If so, is there any documentation or example on how to achieve that?

If not, is there a plan to introduce a tool or service that can assist with LLM configuration tuning in the future?

What was the reason for discontinuing LLM support in Model Analyzer? Was it due to a shift in focus, technical limitations, or another factor?

If LLM support is no longer available, perhaps it would make sense to remove mentions of this mode from the repository to avoid confusion?

Looking forward to your insights!

No, GenAI-Perf does not support automatic Triton configuration tuning. Can you share what parameters you are interested in tuning?

Thank you for the clarification! I have a few more questions regarding configuration tuning for LLMs:

Is there a recommended way to tune Triton parameters such as Max Batch Size, Dynamic Batching, and Instance Group when working with LLMs? If so, what tools or methodologies would you suggest?
Is there a way to automatically search for optimal parameters specific to the vLLM framework when running inference in Triton?
Just to confirm, does Model Analyzer only support tuning for standard ML models (such as image classification, NLP, etc.), while LLMs are now excluded from its scope?

Looking forward to your insights!

antonaleks added 2 commits February 6, 2025 13:00

add profile attribute to genai-perf cli

77348b8

look at this issue triton-inference-server#935

fix: add format str

c8a4f4b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix genai-perf command line for LLM model type #959

Fix genai-perf command line for LLM model type #959

antonaleks commented Feb 6, 2025

nv-braf commented Feb 6, 2025

antonaleks commented Feb 7, 2025

nv-braf commented Feb 7, 2025

antonaleks commented Feb 10, 2025 •

edited

Loading

Fix genai-perf command line for LLM model type #959

Are you sure you want to change the base?

Fix genai-perf command line for LLM model type #959

Conversation

antonaleks commented Feb 6, 2025

nv-braf commented Feb 6, 2025

antonaleks commented Feb 7, 2025

nv-braf commented Feb 7, 2025

antonaleks commented Feb 10, 2025 • edited Loading

antonaleks commented Feb 10, 2025 •

edited

Loading