Skip to content

Commit

Permalink
Add compare subcommand to README (#664)
Browse files Browse the repository at this point in the history
* Add compare subcommand to README

* Fix rendering issue

* Add compare.md

* Add breakpoints

* Clean up

* Add sample plots

* Adjust size

* Add height to plot

* Add a section about default visualization

* Address feedback
  • Loading branch information
nv-hwoo authored May 21, 2024
1 parent 76ac69a commit f93f012
Show file tree
Hide file tree
Showing 7 changed files with 334 additions and 4 deletions.
87 changes: 83 additions & 4 deletions src/c++/perf_analyzer/genai-perf/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,83 @@ Request throughput (per sec): 4.44

See [Tutorial](docs/tutorial.md) for additional examples.

</br>

# Visualization

GenAI-Perf can also generate various plots that visualize the performance of the
current profile run. This is disabled by default but users can easily enable it
by passing the `--generate-plots` option when running the benchmark:

```bash
genai-perf \
-m gpt2 \
--service-kind triton \
--backend tensorrtllm \
--streaming \
--concurrency 1 \
--generate-plots
```

This will generate a [set of default plots](docs/compare.md#example-plots) such as:
- Time to first token (TTFT) analysis
- Request latency analysis
- TTFT vs Number of input tokens
- Inter token latencies vs Token positions
- Number of input tokens vs Number of output tokens


## Using `compare` Subcommand to Visualize Multiple Runs

The `compare` subcommand in GenAI-Perf facilitates users in comparing multiple
profile runs and visualizing the differences through plots.

### Usage
Assuming the user possesses two profile export JSON files,
namely `profile1.json` and `profile2.json`,
they can execute the `compare` subcommand using the `--files` option:

```bash
genai-perf compare --files profile1.json profile2.json
```

Executing the above command will perform the following actions under the
`compare` directory:
1. Generate a YAML configuration file (e.g. `config.yaml`) containing the
metadata for each plot generated during the comparison process.
2. Automatically generate the [default set of plots](docs/compare.md#example-plots)
(e.g. TTFT vs. Number of Input Tokens) that compare the two profile runs.

```
compare
├── config.yaml
├── distribution_of_input_tokens_to_generated_tokens.jpeg
├── request_latency.jpeg
├── time_to_first_token.jpeg
├── time_to_first_token_vs_number_of_input_tokens.jpeg
├── token-to-token_latency_vs_output_token_position.jpeg
└── ...
```

### Customization
Users have the flexibility to iteratively modify the generated YAML configuration
file to suit their specific requirements.
They can make alterations to the plots according to their preferences and execute
the command with the `--config` option followed by the path to the modified
configuration file:

```bash
genai-perf compare --config compare/config.yaml
```

This command will regenerate the plots based on the updated configuration settings,
enabling users to refine the visual representation of the comparison results as
per their needs.

See [Compare documentation](docs/compare.md) for more details.

</br>

# Model Inputs

GenAI-Perf supports model input prompts from either synthetically generated
Expand All @@ -203,8 +280,7 @@ inputs, or from the HuggingFace
specified using the `--input-dataset` CLI option.

When the dataset is synthetic, you can specify the following options:
* `--num-prompts <int>`: The number of unique prompts to generate as stimulus,
>= 1.
* `--num-prompts <int>`: The number of unique prompts to generate as stimulus, >= 1.
* `--synthetic-input-tokens-mean <int>`: The mean of number of tokens in the
generated prompts when using synthetic data, >= 1.
* `--synthetic-input-tokens-stddev <int>`: The standard deviation of number of
Expand All @@ -215,8 +291,7 @@ When the dataset is coming from HuggingFace, you can specify the following
options:
* `--input-dataset {openorca,cnn_dailymail}`: HuggingFace dataset to use for
benchmarking.
* `--num-prompts <int>`: The number of unique prompts to generate as stimulus,
>= 1.
* `--num-prompts <int>`: The number of unique prompts to generate as stimulus, >= 1.

When the dataset is coming from a file, you can specify the following
options:
Expand All @@ -240,6 +315,8 @@ You can optionally set additional model inputs with the following option:
model with a singular value, such as `stream:true` or `max_tokens:5`. This
flag can be repeated to supply multiple extra inputs.

</br>

# Metrics

GenAI-Perf collects a diverse set of metrics that captures the performance of
Expand All @@ -254,6 +331,8 @@ the inference server.
| <span id="output_token_throughput_metric">Output Token Throughput</span> | Total number of output tokens from benchmark divided by benchmark duration | None–one value per benchmark |
| <span id="request_throughput_metric">Request Throughput</span> | Number of final responses from benchmark divided by benchmark duration | None–one value per benchmark |

</br>

# Command Line Options

##### `-h`
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
251 changes: 251 additions & 0 deletions src/c++/perf_analyzer/genai-perf/docs/compare.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,251 @@
<!--
Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of NVIDIA CORPORATION nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-->

# GenAI-Perf Compare Subcommand

There are two approaches for the users to use the `compare` subcommand to create
plots across multiple runs. First is to directly pass the profile export files
with `--files` option

## Running initially with `--files` option

If the user does not have a YAML configuration file,
they can run the `compare` subcommand with the `--files` option to generate a
set of default plots as well as a pre-filled YAML config file for the plots.

```bash
genai-perf compare --files profile1.json profile2.json profile3.json
```

This will generate the default plots and compare across the three runs.
GenAI-Perf will also generate an initial YAML configuration file `config.yaml`
that is pre-filled with plot configurations as following:

```yaml
plot1:
title: Time to First Token
x_metric: ''
y_metric: time_to_first_tokens
x_label: Time to First Token (ms)
y_label: ''
width: 1200
height: 700
type: box
paths:
- profile1.json
- profile2.json
- profile3.json
output: compare
plot2:
title: Request Latency
x_metric: ''
y_metric: request_latencies
x_label: Request Latency (ms)
y_label: ''
width: 1200
height: 700
type: box
paths:
- profile1.json
- profile2.json
- profile3.json
output: compare
plot3:
title: Distribution of Input Tokens to Generated Tokens
x_metric: num_input_tokens
y_metric: num_output_tokens
x_label: Number of Input Tokens Per Request
y_label: Number of Generated Tokens Per Request
width: 1200
height: 450
type: heatmap
paths:
- profile1.json
- profile2.json
- profile3.json
output: compare
plot4:
title: Time to First Token vs Number of Input Tokens
x_metric: num_input_tokens
y_metric: time_to_first_tokens
x_label: Number of Input Tokens
y_label: Time to First Token (ms)
width: 1200
height: 700
type: scatter
paths:
- profile1.json
- profile2.json
- profile3.json
output: compare
plot5:
title: Token-to-Token Latency vs Output Token Position
x_metric: token_positions
y_metric: inter_token_latencies
x_label: Output Token Position
y_label: Token-to-Token Latency (ms)
width: 1200
height: 700
type: scatter
paths:
- profile1.json
- profile2.json
- profile3.json
output: compare
```
Once the user has the YAML configuration file,
they can repeat the process of editing the config file and running with
`--config` option to re-generate the plots iteratively.

```bash
# edit
vi config.yaml
# re-generate the plots
genai-perf compare --config config.yaml
```

## Running directly with `--config` option

If the user would like to create a custom plot (other than the default ones provided),
they can build their own YAML configuration file that contains the information
about the plots they would like to generate.
For instance, if the user would like to see how the inter token latencies change
by the number of output tokens, which is not part of the default plots,
they could add the following YAML block to the file:

```yaml
plot1:
title: Inter Token Latency vs Output Tokens
x_metric: num_output_tokens
y_metric: inter_token_latencies
x_label: Num Output Tokens
y_label: Avg ITL (ms)
width: 1200
height: 450
type: scatter
paths:
- <path-to-profile-export-file>
- <path-to-profile-export-file>
output: compare
```

After adding the lines, the user can run the following command to generate the
plots specified in the configuration file (in this case, `config.yaml`):

```bash
genai-perf compare --config config.yaml
```

The user can check the generated plots under the output directory:
```
compare/
├── inter_token_latency_vs_output_tokens.jpeg
└── ...
```

## YAML Schema

Here are more details about the YAML configuration file and its stricture.
The general YAML schema for the plot configuration looks as following:

```yaml
plot1:
title: [str]
x_metric: [str]
y_metric: [str]
x_label: [str]
y_label: [str]
width: [int]
height: [int]
type: [scatter,box,heatmap]
paths:
- [str]
- ...
output: [str]
plot2:
title: [str]
x_metric: [str]
y_metric: [str]
x_label: [str]
y_label: [str]
width: [int]
height: [int]
type: [scatter,box,heatmap]
paths:
- [str]
- ...
output: [str]
# add more plots
```

The user can add as many plots they would like to generate by adding the plot
blocks in the configuration file (they have a key pattern of `plot<#>`,
but that is not required and the user can set it to any arbitrary string).
For each plot block, the user can specify the following configurations:
- `title`: The title of the plot.
- `x_metric`: The name of the metric to be used on the x-axis.
- `y_metric`: The name of the metric to be used on the y-axis.
- `x_label`: The x-axis label (or description)
- `y_label`: The y-axis label (or description)
- `width`: The width of the entire plot
- `height`: The height of the entire plot
- `type`: The type of the plot. It must be one of the three: `scatter`, `box`,
or `heatmap`.
- `paths`: List of paths to the profile export files to compare.
- `output`: The path to the output directory to store all the plots and YAML
configuration file.

> [!Note]
> User *MUST* provide at least one valid path to the profile export file.



## Example Plots

Here are the list of sample plots that gets created by default from running the
`compare` subcommand:

### Distribution of Input Tokens to Generated Tokens
<img src="assets/distribution_of_input_tokens_to_generated_tokens.jpeg" width="800" height="300" />

### Request Latency Analysis
<img src="assets/request_latency.jpeg" width="800" height="300" />

### Time to First Token Analysis
<img src="assets/time_to_first_token.jpeg" width="800" height="300" />

### Time to First Token vs. Number of Input Tokens
<img src="assets/time_to_first_token_vs_number_of_input_tokens.jpeg" width="800" height="300" />

### Token-to-Token Latency vs. Output Token Position
<img src="assets/token-to-token_latency_vs_output_token_position.jpeg" width="800" height="300" />

0 comments on commit f93f012

Please sign in to comment.