[ Back to index ]
Click here to see the table of contents.
We expect that you have completed the 1st part of this tutorial and managed to run the MLPerf inference benchmark for object detection with RetinaNet FP32, Open Images and ONNX runtime on a CPU target.
This tutorial shows you how to customize the MLPerf inference benchmark and run it with the reference Python implementation of image classification, ImageNet, ONNX runtime and TVM on CPU.
Note that this tutorial is under preparation and is gradually extended by the MLCommons taskforce on automation and reproducibility.
Note that the CM automation meta-framework and the repository with automation scripts are being continuously updated by the community to improve the portability and interoperability of all reusable components for MLOps and DevOps.
You can get the latest version of the CM framework and automation repository as follows (though be careful since CM CLI and APIs may change):
python3 -m pip install cmind -U
cm pull repo mlcommons@ck --checkout=master
cm run script "get sys-utils-cm" --quiet
We suggest you to use Python virtual environment to avoid mixing up your native installation with MLPerf dependencies. You can use your own one or install Python virtual environment using CM automation as follows:`
cm run script "install python-venv" --version=3.10.8 --name=mlperf
You can run MLPerf image classification by customizing the CLI of our universal CM wrapper for MLPerf inference.
You just need to update flag --model=resnet50
:
cm run script --tags=run,mlperf,inference,generate-run-cmds \
--adr.python.name=mlperf \
--adr.python.version_min=3.8 \
--submitter="Community" \
--implementation=python \
--hw_name=default \
--model=resnet50 \
--backend=onnxruntime \
--device=cpu \
--scenario=Offline \
--mode=accuracy \
--test_query_count=5 \
--quiet \
--clean
In case of a successful run, you should see the following output:
...
accuracy=80.000%, good=4, total=5
...
This CM script will install a small version of the ImageNet dataset (500 images) for testing and will automatically preprocess it with NCHW shape:
cm show cache --tags=get,dataset,imagenet
* cache::242d289d79f54978
Tags: ['ILSVRC', 'dataset', 'get', 'image-classification', 'imagenet', 'original', 'script-artifact-7afd58d287fe4f11', '_2012-500']
Path: /home/fursin/CM/repos/local/cache/242d289d79f54978
* cache::9e1013fd58724e2f
Tags: ['ILSVRC', 'dataset', 'get', 'image-classification', 'imagenet', 'preprocessed', 'script-artifact-f259d490bbaf45f5', '_NCHW']
Path: /home/fursin/CM/repos/local/cache/9e1013fd58724e2f
Check this CM script if you want to detect a full ImageNet validation dataset for submission.
CM will also install a ResNet-50 model (FP32, ONNX) using this CM script:
cm show cache --tags=get,ml-model,resnet50
* cache::eccee9fed2194558
Tags: ['get', 'image-classification', 'ml-model', 'ml-model-resnet50', 'resnet50', 'script-artifact-56203e4e998b4bc0', '_fp32', '_onnx', '_onnx-1.5-opset-11', '_onnx_']
Path: /home/fursin/CM/repos/local/cache/eccee9fed2194558
You can run MLPerf with ResNet50 in performance mode as follows:
cm run script --tags=run,mlperf,inference,generate-run-cmds \
--adr.python.name=mlperf \
--adr.python.version_min=3.8 \
--submitter="Community" \
--implementation=python \
--hw_name=default \
--model=resnet50 \
--backend=onnxruntime \
--device=cpu \
--scenario=Offline \
--mode=performance \
--test_query_count=5 \
--quiet \
--clean
In case of a successful run, you should see the following output:
...
INFO:main:starting TestScenario.Offline
TestScenario.Offline qps=4.98, mean=0.1756, time=0.201, queries=1, tiles=50.0:0.1756,80.0:0.1756,90.0:0.1756,95.0:0.1756,99.0:0.1756,99.9:0.1756
================================================
MLPerf Results Summary
================================================
SUT name : PySUT
Scenario : Offline
Mode : PerformanceOnly
Samples per second: 28.3498
Result is : VALID
Min duration satisfied : Yes
Min queries satisfied : Yes
Early stopping satisfied: Yes
================================================
Additional Stats
================================================
Min latency (ns) : 176368211
Max latency (ns) : 176368211
Mean latency (ns) : 176368211
50.00 percentile latency (ns) : 176368211
90.00 percentile latency (ns) : 176368211
95.00 percentile latency (ns) : 176368211
97.00 percentile latency (ns) : 176368211
99.00 percentile latency (ns) : 176368211
99.90 percentile latency (ns) : 176368211
================================================
Test Parameters Used
================================================
samples_per_query : 5
target_qps : 1
target_latency (ns): 0
max_async_queries : 1
min_duration (ms): 0
max_duration (ms): 0
min_query_count : 1
max_query_count : 5
qsl_rng_seed : 14284205019438841327
sample_index_rng_seed : 4163916728725999944
schedule_rng_seed : 299063814864929621
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
accuracy_log_sampling_target : 0
print_timestamps : 0
performance_issue_unique : 0
performance_issue_same : 0
performance_issue_same_index : 0
performance_sample_count : 1024
No warnings encountered during test.
No errors encountered during test.
...
You can now run MLPerf with ResNet50 in a submission mode:
cm run script --tags=run,mlperf,inference,generate-run-cmds,_submission,_dashboard \
--adr.python.name=mlperf \
--adr.python.version_min=3.8 \
--submitter="Community" \
--implementation=python \
--hw_name=default \
--model=resnet50 \
--backend=onnxruntime \
--device=cpu \
--scenario=Offline \
--test_query_count=500 \
--quiet \
--clean
In case of a successfull run, you should see your crowd-testing results at this live W&B dashboard.
python3 -m pip install cmind -U
cm pull repo mlcommons@ck --checkout=master
cm run script "get sys-utils-cm" --quiet
cm run script "install python-venv" --version=3.10.8 --name=mlperf
cm run script --tags=run,mlperf,inference,generate-run-cmds,_submission,_dashboard \
--adr.python.name=mlperf \
--adr.python.version_min=3.8 \
--submitter="Community" \
--implementation=python \
--hw_name=default \
--model=resnet50 \
--backend=onnxruntime \
--device=cpu \
--scenario=Offline \
--test_query_count=500 \
--quiet \
--clean
You can now run MLPerf inference with the Apache TVM backend that OctoML has recently added to the MLPerf inference vision benchmark:
cm run script --tags=run,mlperf,inference,generate-run-cmds \
--adr.python.name=mlperf \
--adr.python.version_min=3.8 \
--submitter="Community" \
--implementation=python \
--hw_name=default \
--model=resnet50 \
--backend=tvm-onnx \
--device=cpu \
--scenario=Offline \
--mode=accuracy \
--test_query_count=5 \
--quiet \
--clean
This workflow will use other CM scripts to install CMake, LLVM 14+, ONNX to load ONNX models to TVM and will build Apache TVM.
In case of a successful run, you should see the following output:
...
accuracy=80.000%, good=4, total=5
...
If you want to use TVM via PIP install, you can use --adr.tvm.tags=_pip-install
:
cm run script --tags=run,mlperf,inference,generate-run-cmds \
--adr.python.name=mlperf \
--adr.python.version_min=3.8 \
--adr.tvm.tags=_pip-install \
--submitter="Community" \
--implementation=python \
--hw_name=default \
--model=resnet50 \
--backend=tvm-onnx \
--device=cpu \
--scenario=Offline \
--mode=accuracy \
--test_query_count=5 \
--quiet \
--clean
Note that sometimes this benchmark may hang with TVM. You need to stop it and restart it and then it should work fine. We expect the TVM community to fix this problem at some point.
In case of a successful run, you should see the following output:
...
accuracy=80.000%, good=4, total=5
...
cm run script --tags=run,mlperf,inference,generate-run-cmds,_submission,_dashboard \
--adr.python.name=mlperf \
--adr.python.version_min=3.8 \
--submitter="Community" \
--implementation=python \
--hw_name=default \
--model=resnet50 \
--backend=tvm-onnx \
--device=cpu \
--scenario=Offline \
--test_query_count=500 \
--quiet \
--clean
In case of a successfull run, you should see your crowd-testing results at this live W&B dashboard.
python3 -m pip install cmind -U
cm pull repo mlcommons@ck --checkout=master
cm run script "get sys-utils-cm" --quiet
cm run script "install python-venv" --version=3.10.8 --name=mlperf
cm run script --tags=run,mlperf,inference,generate-run-cmds,_submission,_dashboard \
--adr.python.name=mlperf \
--adr.python.version_min=3.8 \
--submitter="Community" \
--implementation=python \
--hw_name=default \
--model=resnet50 \
--backend=tvm-onnx \
--device=cpu \
--scenario=Offline \
--test_query_count=500 \
--quiet \
--clean
Please check other parts of this tutorial to learn how to customize and optimize MLPerf inference benchmark using MLCommons CM (under preparation):
- 1st part: customize MLPerf inference (Python ref implementation, Open images, ONNX, CPU)
- 2nd part: customize MLPerf inference (C++ implementation, CUDA, PyTorch)
- To be continued
You are welcome to join the open MLCommons taskforce on automation and reproducibility to contribute to this project and continue optimizing this benchmark and prepare an official submission for MLPerf inference v3.0 (March 2023) with the help of the community.
See the development roadmap here.
- Grigori Fursin (cTuning foundation, cKnowledge Ltd, MLCommons)
- Arjun Suresh (cTuning foundation, cKnowledge Ltd, MLCommons)
We thank Hai Ah Nam, Steve Leak, Vijay Janappa Reddi, Tom Jablin, Ramesh N Chukka, Peter Mattson, David Kanter, Pablo Gonzalez Mesa, Thomas Zhu, Thomas Schmid and Gaurav Verma for their suggestions and contributions.