diff --git a/src/c++/perf_analyzer/genai-perf/README.md b/src/c++/perf_analyzer/genai-perf/README.md index 87d7c525a..87bb8675a 100644 --- a/src/c++/perf_analyzer/genai-perf/README.md +++ b/src/c++/perf_analyzer/genai-perf/README.md @@ -194,6 +194,83 @@ Request throughput (per sec): 4.44 See [Tutorial](docs/tutorial.md) for additional examples. +
+ +# Visualization + +GenAI-Perf can also generate various plots that visualize the performance of the +current profile run. This is disabled by default but users can easily enable it +by passing the `--generate-plots` option when running the benchmark: + +```bash +genai-perf \ + -m gpt2 \ + --service-kind triton \ + --backend tensorrtllm \ + --streaming \ + --concurrency 1 \ + --generate-plots +``` + +This will generate a [set of default plots](docs/compare.md#example-plots) such as: +- Time to first token (TTFT) analysis +- Request latency analysis +- TTFT vs Number of input tokens +- Inter token latencies vs Token positions +- Number of input tokens vs Number of output tokens + + +## Using `compare` Subcommand to Visualize Multiple Runs + +The `compare` subcommand in GenAI-Perf facilitates users in comparing multiple +profile runs and visualizing the differences through plots. + +### Usage +Assuming the user possesses two profile export JSON files, +namely `profile1.json` and `profile2.json`, +they can execute the `compare` subcommand using the `--files` option: + +```bash +genai-perf compare --files profile1.json profile2.json +``` + +Executing the above command will perform the following actions under the +`compare` directory: +1. Generate a YAML configuration file (e.g. `config.yaml`) containing the +metadata for each plot generated during the comparison process. +2. Automatically generate the [default set of plots](docs/compare.md#example-plots) +(e.g. TTFT vs. Number of Input Tokens) that compare the two profile runs. + +``` +compare +├── config.yaml +├── distribution_of_input_tokens_to_generated_tokens.jpeg +├── request_latency.jpeg +├── time_to_first_token.jpeg +├── time_to_first_token_vs_number_of_input_tokens.jpeg +├── token-to-token_latency_vs_output_token_position.jpeg +└── ... +``` + +### Customization +Users have the flexibility to iteratively modify the generated YAML configuration +file to suit their specific requirements. +They can make alterations to the plots according to their preferences and execute +the command with the `--config` option followed by the path to the modified +configuration file: + +```bash +genai-perf compare --config compare/config.yaml +``` + +This command will regenerate the plots based on the updated configuration settings, +enabling users to refine the visual representation of the comparison results as +per their needs. + +See [Compare documentation](docs/compare.md) for more details. + +
+ # Model Inputs GenAI-Perf supports model input prompts from either synthetically generated @@ -203,8 +280,7 @@ inputs, or from the HuggingFace specified using the `--input-dataset` CLI option. When the dataset is synthetic, you can specify the following options: -* `--num-prompts `: The number of unique prompts to generate as stimulus, - >= 1. +* `--num-prompts `: The number of unique prompts to generate as stimulus, >= 1. * `--synthetic-input-tokens-mean `: The mean of number of tokens in the generated prompts when using synthetic data, >= 1. * `--synthetic-input-tokens-stddev `: The standard deviation of number of @@ -215,8 +291,7 @@ When the dataset is coming from HuggingFace, you can specify the following options: * `--input-dataset {openorca,cnn_dailymail}`: HuggingFace dataset to use for benchmarking. -* `--num-prompts `: The number of unique prompts to generate as stimulus, - >= 1. +* `--num-prompts `: The number of unique prompts to generate as stimulus, >= 1. When the dataset is coming from a file, you can specify the following options: @@ -240,6 +315,8 @@ You can optionally set additional model inputs with the following option: model with a singular value, such as `stream:true` or `max_tokens:5`. This flag can be repeated to supply multiple extra inputs. +
+ # Metrics GenAI-Perf collects a diverse set of metrics that captures the performance of @@ -254,6 +331,8 @@ the inference server. | Output Token Throughput | Total number of output tokens from benchmark divided by benchmark duration | None–one value per benchmark | | Request Throughput | Number of final responses from benchmark divided by benchmark duration | None–one value per benchmark | +
+ # Command Line Options ##### `-h` diff --git a/src/c++/perf_analyzer/genai-perf/docs/assets/distribution_of_input_tokens_to_generated_tokens.jpeg b/src/c++/perf_analyzer/genai-perf/docs/assets/distribution_of_input_tokens_to_generated_tokens.jpeg new file mode 100644 index 000000000..e51f5f49f Binary files /dev/null and b/src/c++/perf_analyzer/genai-perf/docs/assets/distribution_of_input_tokens_to_generated_tokens.jpeg differ diff --git a/src/c++/perf_analyzer/genai-perf/docs/assets/request_latency.jpeg b/src/c++/perf_analyzer/genai-perf/docs/assets/request_latency.jpeg new file mode 100644 index 000000000..d681195ff Binary files /dev/null and b/src/c++/perf_analyzer/genai-perf/docs/assets/request_latency.jpeg differ diff --git a/src/c++/perf_analyzer/genai-perf/docs/assets/time_to_first_token.jpeg b/src/c++/perf_analyzer/genai-perf/docs/assets/time_to_first_token.jpeg new file mode 100644 index 000000000..99ca06ee0 Binary files /dev/null and b/src/c++/perf_analyzer/genai-perf/docs/assets/time_to_first_token.jpeg differ diff --git a/src/c++/perf_analyzer/genai-perf/docs/assets/time_to_first_token_vs_number_of_input_tokens.jpeg b/src/c++/perf_analyzer/genai-perf/docs/assets/time_to_first_token_vs_number_of_input_tokens.jpeg new file mode 100644 index 000000000..f3097064a Binary files /dev/null and b/src/c++/perf_analyzer/genai-perf/docs/assets/time_to_first_token_vs_number_of_input_tokens.jpeg differ diff --git a/src/c++/perf_analyzer/genai-perf/docs/assets/token-to-token_latency_vs_output_token_position.jpeg b/src/c++/perf_analyzer/genai-perf/docs/assets/token-to-token_latency_vs_output_token_position.jpeg new file mode 100644 index 000000000..4a179ef8d Binary files /dev/null and b/src/c++/perf_analyzer/genai-perf/docs/assets/token-to-token_latency_vs_output_token_position.jpeg differ diff --git a/src/c++/perf_analyzer/genai-perf/docs/compare.md b/src/c++/perf_analyzer/genai-perf/docs/compare.md new file mode 100644 index 000000000..a7234a035 --- /dev/null +++ b/src/c++/perf_analyzer/genai-perf/docs/compare.md @@ -0,0 +1,251 @@ + + +# GenAI-Perf Compare Subcommand + +There are two approaches for the users to use the `compare` subcommand to create +plots across multiple runs. First is to directly pass the profile export files +with `--files` option + +## Running initially with `--files` option + +If the user does not have a YAML configuration file, +they can run the `compare` subcommand with the `--files` option to generate a +set of default plots as well as a pre-filled YAML config file for the plots. + +```bash +genai-perf compare --files profile1.json profile2.json profile3.json +``` + +This will generate the default plots and compare across the three runs. +GenAI-Perf will also generate an initial YAML configuration file `config.yaml` +that is pre-filled with plot configurations as following: + +```yaml +plot1: + title: Time to First Token + x_metric: '' + y_metric: time_to_first_tokens + x_label: Time to First Token (ms) + y_label: '' + width: 1200 + height: 700 + type: box + paths: + - profile1.json + - profile2.json + - profile3.json + output: compare +plot2: + title: Request Latency + x_metric: '' + y_metric: request_latencies + x_label: Request Latency (ms) + y_label: '' + width: 1200 + height: 700 + type: box + paths: + - profile1.json + - profile2.json + - profile3.json + output: compare +plot3: + title: Distribution of Input Tokens to Generated Tokens + x_metric: num_input_tokens + y_metric: num_output_tokens + x_label: Number of Input Tokens Per Request + y_label: Number of Generated Tokens Per Request + width: 1200 + height: 450 + type: heatmap + paths: + - profile1.json + - profile2.json + - profile3.json + output: compare +plot4: + title: Time to First Token vs Number of Input Tokens + x_metric: num_input_tokens + y_metric: time_to_first_tokens + x_label: Number of Input Tokens + y_label: Time to First Token (ms) + width: 1200 + height: 700 + type: scatter + paths: + - profile1.json + - profile2.json + - profile3.json + output: compare +plot5: + title: Token-to-Token Latency vs Output Token Position + x_metric: token_positions + y_metric: inter_token_latencies + x_label: Output Token Position + y_label: Token-to-Token Latency (ms) + width: 1200 + height: 700 + type: scatter + paths: + - profile1.json + - profile2.json + - profile3.json + output: compare +``` + +Once the user has the YAML configuration file, +they can repeat the process of editing the config file and running with +`--config` option to re-generate the plots iteratively. + +```bash +# edit +vi config.yaml + +# re-generate the plots +genai-perf compare --config config.yaml +``` + +## Running directly with `--config` option + +If the user would like to create a custom plot (other than the default ones provided), +they can build their own YAML configuration file that contains the information +about the plots they would like to generate. +For instance, if the user would like to see how the inter token latencies change +by the number of output tokens, which is not part of the default plots, +they could add the following YAML block to the file: + +```yaml +plot1: + title: Inter Token Latency vs Output Tokens + x_metric: num_output_tokens + y_metric: inter_token_latencies + x_label: Num Output Tokens + y_label: Avg ITL (ms) + width: 1200 + height: 450 + type: scatter + paths: + - + - + output: compare +``` + +After adding the lines, the user can run the following command to generate the +plots specified in the configuration file (in this case, `config.yaml`): + +```bash +genai-perf compare --config config.yaml +``` + +The user can check the generated plots under the output directory: +``` +compare/ +├── inter_token_latency_vs_output_tokens.jpeg +└── ... +``` + +## YAML Schema + +Here are more details about the YAML configuration file and its stricture. +The general YAML schema for the plot configuration looks as following: + +```yaml +plot1: + title: [str] + x_metric: [str] + y_metric: [str] + x_label: [str] + y_label: [str] + width: [int] + height: [int] + type: [scatter,box,heatmap] + paths: + - [str] + - ... + output: [str] + +plot2: + title: [str] + x_metric: [str] + y_metric: [str] + x_label: [str] + y_label: [str] + width: [int] + height: [int] + type: [scatter,box,heatmap] + paths: + - [str] + - ... + output: [str] + +# add more plots +``` + +The user can add as many plots they would like to generate by adding the plot +blocks in the configuration file (they have a key pattern of `plot<#>`, +but that is not required and the user can set it to any arbitrary string). +For each plot block, the user can specify the following configurations: +- `title`: The title of the plot. +- `x_metric`: The name of the metric to be used on the x-axis. +- `y_metric`: The name of the metric to be used on the y-axis. +- `x_label`: The x-axis label (or description) +- `y_label`: The y-axis label (or description) +- `width`: The width of the entire plot +- `height`: The height of the entire plot +- `type`: The type of the plot. It must be one of the three: `scatter`, `box`, +or `heatmap`. +- `paths`: List of paths to the profile export files to compare. +- `output`: The path to the output directory to store all the plots and YAML +configuration file. + +> [!Note] +> User *MUST* provide at least one valid path to the profile export file. + + + +## Example Plots + +Here are the list of sample plots that gets created by default from running the +`compare` subcommand: + +### Distribution of Input Tokens to Generated Tokens + + +### Request Latency Analysis + + +### Time to First Token Analysis + + +### Time to First Token vs. Number of Input Tokens + + +### Token-to-Token Latency vs. Output Token Position + +