Skip to content

Commit

Permalink
Remove $ (#343)
Browse files Browse the repository at this point in the history
  • Loading branch information
krishung5 authored Mar 7, 2024
1 parent 34a4db5 commit 0413e46
Show file tree
Hide file tree
Showing 6 changed files with 97 additions and 69 deletions.
10 changes: 5 additions & 5 deletions examples/auto_complete/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!--
# Copyright 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright 2022-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -59,12 +59,12 @@ respectively.
1. Create the model repository:

```console
$ mkdir -p models/nobatch_auto_complete/1/
$ mkdir -p models/batch_auto_complete/1/
mkdir -p models/nobatch_auto_complete/1/
mkdir -p models/batch_auto_complete/1/

# Copy the Python models
$ cp examples/auto_complete/nobatch_model.py models/nobatch_auto_complete/1/model.py
$ cp examples/auto_complete/batch_model.py models/batch_auto_complete/1/model.py
cp examples/auto_complete/nobatch_model.py models/nobatch_auto_complete/1/model.py
cp examples/auto_complete/batch_model.py models/batch_auto_complete/1/model.py
```
**Note that we don't need a model configuration file since Triton will use the
auto-complete model configuration provided in the Python model.**
Expand Down
38 changes: 19 additions & 19 deletions examples/bls/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!--
# Copyright 2021-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -51,17 +51,17 @@ final outputs.
1. Create the model repository:

```console
$ mkdir -p models/add_sub/1
$ mkdir -p models/bls_sync/1
$ mkdir -p models/pytorch/1
mkdir -p models/add_sub/1
mkdir -p models/bls_sync/1
mkdir -p models/pytorch/1

# Copy the Python models
$ cp examples/add_sub/model.py models/add_sub/1/
$ cp examples/add_sub/config.pbtxt models/add_sub/config.pbtxt
$ cp examples/bls/sync_model.py models/bls_sync/1/model.py
$ cp examples/bls/sync_config.pbtxt models/bls_sync/config.pbtxt
$ cp examples/pytorch/model.py models/pytorch/1/
$ cp examples/pytorch/config.pbtxt models/pytorch/
cp examples/add_sub/model.py models/add_sub/1/
cp examples/add_sub/config.pbtxt models/add_sub/config.pbtxt
cp examples/bls/sync_model.py models/bls_sync/1/model.py
cp examples/bls/sync_config.pbtxt models/bls_sync/config.pbtxt
cp examples/pytorch/model.py models/pytorch/1/
cp examples/pytorch/config.pbtxt models/pytorch/
```

2. Start the tritonserver:
Expand Down Expand Up @@ -124,17 +124,17 @@ to construct the final inference response object using these tensors.
1. Create the model repository:

```console
$ mkdir -p models/add_sub/1
$ mkdir -p models/bls_async/1
$ mkdir -p models/pytorch/1
mkdir -p models/add_sub/1
mkdir -p models/bls_async/1
mkdir -p models/pytorch/1

# Copy the Python models
$ cp examples/add_sub/model.py models/add_sub/1/
$ cp examples/add_sub/config.pbtxt models/add_sub/
$ cp examples/bls/async_model.py models/bls_async/1/model.py
$ cp examples/bls/async_config.pbtxt models/bls_async/config.pbtxt
$ cp examples/pytorch/model.py models/pytorch/1/
$ cp examples/pytorch/config.pbtxt models/pytorch/
cp examples/add_sub/model.py models/add_sub/1/
cp examples/add_sub/config.pbtxt models/add_sub/
cp examples/bls/async_model.py models/bls_async/1/model.py
cp examples/bls/async_config.pbtxt models/bls_async/config.pbtxt
cp examples/pytorch/model.py models/pytorch/1/
cp examples/pytorch/config.pbtxt models/pytorch/
```

2. Start the tritonserver:
Expand Down
14 changes: 7 additions & 7 deletions examples/decoupled/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!--
# Copyright 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright 2022-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -47,14 +47,14 @@ real deployment, the model should not allow the caller thread to return from
1. Create the model repository:

```console
$ mkdir -p models/repeat_int32/1
$ mkdir -p models/square_int32/1
mkdir -p models/repeat_int32/1
mkdir -p models/square_int32/1

# Copy the Python models
$ cp examples/decoupled/repeat_model.py models/repeat_int32/1/model.py
$ cp examples/decoupled/repeat_config.pbtxt models/repeat_int32/config.pbtxt
$ cp examples/decoupled/square_model.py models/square_int32/1/model.py
$ cp examples/decoupled/square_config.pbtxt models/square_int32/config.pbtxt
cp examples/decoupled/repeat_model.py models/repeat_int32/1/model.py
cp examples/decoupled/repeat_config.pbtxt models/repeat_int32/config.pbtxt
cp examples/decoupled/square_model.py models/square_int32/1/model.py
cp examples/decoupled/square_config.pbtxt models/square_int32/config.pbtxt
```

2. Start the tritonserver:
Expand Down
24 changes: 12 additions & 12 deletions examples/jax/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
<!--
# Copyright 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright 2022-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -42,9 +42,9 @@ First, download the [client.py](client.py), [config.pbtxt](config.pbtxt) and
Next, at the directory where the three files located, create the model
repository with the following commands:
```
$ mkdir -p models/jax/1
$ mv model.py models/jax/1
$ mv config.pbtxt models/jax
mkdir -p models/jax/1
mv model.py models/jax/1
mv config.pbtxt models/jax
```

## Pull the Triton Docker images
Expand All @@ -55,16 +55,16 @@ to the

To pull the latest containers, run the following commands:
```
$ docker pull nvcr.io/nvidia/tritonserver:<yy.mm>-py3
$ docker pull nvcr.io/nvidia/tritonserver:<yy.mm>-py3-sdk
docker pull nvcr.io/nvidia/tritonserver:<yy.mm>-py3
docker pull nvcr.io/nvidia/tritonserver:<yy.mm>-py3-sdk
```
See the installation steps above for the `<yy.mm>` version.

At the time of writing, the latest version is `23.04`, which translates to the
following commands:
```
$ docker pull nvcr.io/nvidia/tritonserver:23.04-py3
$ docker pull nvcr.io/nvidia/tritonserver:23.04-py3-sdk
docker pull nvcr.io/nvidia/tritonserver:23.04-py3
docker pull nvcr.io/nvidia/tritonserver:23.04-py3-sdk
```

Be sure to replace the `<yy.mm>` with the version pulled for all the remaining
Expand All @@ -75,7 +75,7 @@ parts of this example.
At the directory where we created the JAX models (at where the "models" folder
is located), run the following command:
```
$ docker run --gpus all -it --rm -p 8000:8000 -v `pwd`:/jax nvcr.io/nvidia/tritonserver:<yy.mm>-py3 /bin/bash
docker run --gpus all -it --rm -p 8000:8000 -v `pwd`:/jax nvcr.io/nvidia/tritonserver:<yy.mm>-py3 /bin/bash
```

Inside the container, we need to install JAX to run this example.
Expand All @@ -87,12 +87,12 @@ dependencies.

To install for this example, run the following command:
```
$ pip3 install --upgrade "jax[cuda12_local]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
pip3 install --upgrade "jax[cuda12_local]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
```

Finally, we need to start the Triton Server, run the following command:
```
$ tritonserver --model-repository=/jax/models
tritonserver --model-repository=/jax/models
```

To leave the container for the next step, press: `CTRL + P + Q`.
Expand All @@ -101,7 +101,7 @@ To leave the container for the next step, press: `CTRL + P + Q`.

At the directory where the client.py is located, run the following command:
```
$ docker run --rm --net=host -v `pwd`:/jax nvcr.io/nvidia/tritonserver:<yy.mm>-py3-sdk python3 /jax/client.py
docker run --rm --net=host -v `pwd`:/jax nvcr.io/nvidia/tritonserver:<yy.mm>-py3-sdk python3 /jax/client.py
```

A successful inference will print the following at the end:
Expand Down
56 changes: 42 additions & 14 deletions examples/preprocessing/README.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,71 @@
<!--
# Copyright 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of NVIDIA CORPORATION nor the names of its
# contributors may be used to endorse or promote products derived
# from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
# EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
# PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
# OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-->

# **Preprocessing Using Python Backend Example**
This example shows how to preprocess your inputs using Python backend before it is passed to the TensorRT model for inference. This ensemble model includes an image preprocessing model (preprocess) and a TensorRT model (resnet50_trt) to do inference.

**1. Converting PyTorch Model to ONNX format:**

Run onnx_exporter.py to convert ResNet50 PyTorch model to ONNX format. Width and height dims are fixed at 224 but dynamic axes arguments for dynamic batching are used. Commands from the 2. and 3. subsections shall be executed within this Docker container.

$ docker run -it --gpus=all -v $(pwd):/workspace nvcr.io/nvidia/pytorch:xx.yy-py3 bash
$ pip install numpy pillow torchvision
$ python onnx_exporter.py --save model.onnx
docker run -it --gpus=all -v $(pwd):/workspace nvcr.io/nvidia/pytorch:xx.yy-py3 bash
pip install numpy pillow torchvision
python onnx_exporter.py --save model.onnx

**2. Create the model repository:**

$ mkdir -p model_repository/ensemble_python_resnet50/1
$ mkdir -p model_repository/preprocess/1
$ mkdir -p model_repository/resnet50_trt/1
mkdir -p model_repository/ensemble_python_resnet50/1
mkdir -p model_repository/preprocess/1
mkdir -p model_repository/resnet50_trt/1

# Copy the Python model
$ cp model.py model_repository/preprocess/1
cp model.py model_repository/preprocess/1

**3. Build a TensorRT engine for the ONNX model**

Set the arguments for enabling fp16 precision --fp16. To enable dynamic shapes use --minShapes, --optShapes, and maxShapes with --explicitBatch:

$ trtexec --onnx=model.onnx --saveEngine=./model_repository/resnet50_trt/1/model.plan --explicitBatch --minShapes=input:1x3x224x224 --optShapes=input:1x3x224x224 --maxShapes=input:256x3x224x224 --fp16
trtexec --onnx=model.onnx --saveEngine=./model_repository/resnet50_trt/1/model.plan --explicitBatch --minShapes=input:1x3x224x224 --optShapes=input:1x3x224x224 --maxShapes=input:256x3x224x224 --fp16

**4. Run the command below to start the server container:**

Under python_backend/examples/preprocessing, run this command to start the server docker container:

$ docker run --gpus=all -it --rm -p8000:8000 -p8001:8001 -p8002:8002 -v$(pwd):/workspace/ -v/$(pwd)/model_repository:/models nvcr.io/nvidia/tritonserver:xx.yy-py3 bash
$ pip install numpy pillow torchvision
$ tritonserver --model-repository=/models
docker run --gpus=all -it --rm -p8000:8000 -p8001:8001 -p8002:8002 -v$(pwd):/workspace/ -v/$(pwd)/model_repository:/models nvcr.io/nvidia/tritonserver:xx.yy-py3 bash
pip install numpy pillow torchvision
tritonserver --model-repository=/models

**5. Start the client to test:**

Under python_backend/examples/preprocessing, run the commands below to start the client Docker container:

$ wget https://raw.githubusercontent.com/triton-inference-server/server/main/qa/images/mug.jpg -O "mug.jpg"
$ docker run --rm --net=host -v $(pwd):/workspace/ nvcr.io/nvidia/tritonserver:xx.yy-py3-sdk python client.py --image mug.jpg
$ The result of classification is:COFFEE MUG
wget https://raw.githubusercontent.com/triton-inference-server/server/main/qa/images/mug.jpg -O "mug.jpg"
docker run --rm --net=host -v $(pwd):/workspace/ nvcr.io/nvidia/tritonserver:xx.yy-py3-sdk python client.py --image mug.jpg
The result of classification is:COFFEE MUG

Here, since we input an image of "mug" and the inference result is "COFFEE MUG" which is correct.
24 changes: 12 additions & 12 deletions inferentia/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,18 +60,18 @@ or simply clone with https.
Clone this repo with Github to home repo `/home/ubuntu`.

```
$chmod 777 /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh
$sudo /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh
chmod 777 /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh
sudo /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh
```

Then, start the Triton instance with:
```
$docker run --device /dev/neuron0 <more neuron devices> -v /home/ubuntu/python_backend:/home/ubuntu/python_backend -v /lib/udev:/mylib/udev --shm-size=1g --ulimit memlock=-1 -p 8000:8000 -p 8001:8001 -p 8002:8002 --ulimit stack=67108864 -ti nvcr.io/nvidia/tritonserver:<xx.yy>-py3
docker run --device /dev/neuron0 <more neuron devices> -v /home/ubuntu/python_backend:/home/ubuntu/python_backend -v /lib/udev:/mylib/udev --shm-size=1g --ulimit memlock=-1 -p 8000:8000 -p 8001:8001 -p 8002:8002 --ulimit stack=67108864 -ti nvcr.io/nvidia/tritonserver:<xx.yy>-py3
```
Note 1: The user would need to list any neuron device to run during container initialization.
For example, to use 4 neuron devices on an instance, the user would need to run with:
```
$docker run --device /dev/neuron0 --device /dev/neuron1 --device /dev/neuron2 --device /dev/neuron3 ...`
docker run --device /dev/neuron0 --device /dev/neuron1 --device /dev/neuron2 --device /dev/neuron3 ...`
```
Note 2: `/mylib/udev` is used for Neuron parameter passing.

Expand All @@ -81,7 +81,7 @@ Note 3: For Triton container version xx.yy, please refer to

After starting the Triton container, go into the `python_backend` folder and run the setup script.
```
$source /home/ubuntu/python_backend/inferentia/scripts/setup.sh
source /home/ubuntu/python_backend/inferentia/scripts/setup.sh
```
This script will:
1. Install necessary dependencies
Expand Down Expand Up @@ -118,7 +118,7 @@ triton python model directory.
An example invocation for the `gen_triton_model.py` for PyTorch model can look like:

```
$python3 inferentia/scripts/gen_triton_model.py --model_type pytorch --triton_input INPUT__0,INT64,4x384 INPUT__1,INT64,4x384 INPUT__2,INT64,4x384 --triton_output OUTPUT__0,INT64,4x384 OUTPUT__1,INT64,4x384 --compiled_model /home/ubuntu/bert_large_mlperf_neuron_hack_bs1_dynamic.pt --neuron_core_range 0:3 --triton_model_dir bert-large-mlperf-bs1x4
python3 inferentia/scripts/gen_triton_model.py --model_type pytorch --triton_input INPUT__0,INT64,4x384 INPUT__1,INT64,4x384 INPUT__2,INT64,4x384 --triton_output OUTPUT__0,INT64,4x384 OUTPUT__1,INT64,4x384 --compiled_model /home/ubuntu/bert_large_mlperf_neuron_hack_bs1_dynamic.pt --neuron_core_range 0:3 --triton_model_dir bert-large-mlperf-bs1x4
```

In order for the script to treat the compiled model as TorchScript
Expand Down Expand Up @@ -161,7 +161,7 @@ script to generate triton python model directory.
An example invocation for the `gen_triton_model.py` for TensorFlow model can look like:

```
$python3 gen_triton_model.py --model_type tensorflow --compiled_model /home/ubuntu/inferentia-poc-2.0/scripts-rn50-tf-native/resnet50_mlperf_opt_fp16_compiled_b5_nc1/1 --neuron_core_range 0:3 --triton_model_dir rn50-1neuroncores-bs1x1
python3 gen_triton_model.py --model_type tensorflow --compiled_model /home/ubuntu/inferentia-poc-2.0/scripts-rn50-tf-native/resnet50_mlperf_opt_fp16_compiled_b5_nc1/1 --neuron_core_range 0:3 --triton_model_dir rn50-1neuroncores-bs1x1
```

NOTE: Unlike TorchScript model, TensorFlow SavedModel stores sufficient
Expand Down Expand Up @@ -215,7 +215,7 @@ a valid torchscript file or tensorflow savedmodel.
Now, the server can be launched with the model as below:

```
$tritonserver --model-repository <path_to_model_repository>
tritonserver --model-repository <path_to_model_repository>
```

Note:
Expand Down Expand Up @@ -255,7 +255,7 @@ contains the necessary files to set up testing with a simple add_sub model. The
requires an instance with more than 8 inferentia cores to run, eg:`inf1.6xlarge`.
start the test, run
```
$source <triton path>/python_backend/inferentia/qa/setup_test_enviroment_and_test.sh
source <triton path>/python_backend/inferentia/qa/setup_test_enviroment_and_test.sh
```
where `<triton path>` is usually `/home/ubuntu`/.
This script will pull the [server repo](https://github.com/triton-inference-server/server)
Expand All @@ -265,16 +265,16 @@ Triton Server and Triton SDK.
Note: If you would need to change some of the tests in the server repo,
you would need to run
```
$export TRITON_SERVER_REPO_TAG=<your branch name>
export TRITON_SERVER_REPO_TAG=<your branch name>
```
before running the script.

# Using Triton with Inferentia 2, or Trn1
## pytorch-neuronx and tensorflow-neuronx
1. Similar to the steps for inf1, change the argument to the pre-container and on-container setup scripts to include the `-inf2` or `-trn1`flags e.g.,
```
$chmod 777 /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh
$sudo /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh -inf2
chmod 777 /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh
sudo /home/ubuntu/python_backend/inferentia/scripts/setup-pre-container.sh -inf2
```
2. On the container, followed by the `docker run` command, you can pass similar argument to the setup.sh script
For Pytorch:
Expand Down

0 comments on commit 0413e46

Please sign in to comment.