diff --git a/README.md b/README.md index 4748a0c9..8fe63c53 100644 --- a/README.md +++ b/README.md @@ -38,7 +38,8 @@ Supported precision Supported chipsets * [Snapdragon 845](https://www.qualcomm.com/products/mobile/snapdragon/smartphones/snapdragon-8-series-mobile-platforms/snapdragon-845-mobile-platform), [Snapdragon 855/855+](https://www.qualcomm.com/products/mobile/snapdragon/smartphones/snapdragon-8-series-mobile-platforms/snapdragon-855-mobile-platform), [Snapdragon 865/865+](https://www.qualcomm.com/products/mobile/snapdragon/smartphones/snapdragon-8-series-mobile-platforms/snapdragon-865-plus-5g-mobile-platform), [Snapdragon 888/888+](https://www.qualcomm.com/products/mobile/snapdragon/smartphones/snapdragon-8-series-mobile-platforms/snapdragon-888-5g-mobile-platform) -* [Snapdragon 8 Gen 1](https://www.qualcomm.com/products/mobile/snapdragon/smartphones/snapdragon-8-series-mobile-platforms/snapdragon-8-gen-1-mobile-platform), [Snapdragon 8 Gen 2](https://www.qualcomm.com/products/mobile/snapdragon/smartphones/snapdragon-8-series-mobile-platforms/snapdragon-8-gen-2-mobile-platform), [Snapdragon 8 Gen 3](https://www.qualcomm.com/products/mobile/snapdragon/smartphones/snapdragon-8-series-mobile-platforms/snapdragon-8-gen-3-mobile-platform), [Snapdragon X Elite](https://www.qualcomm.com/products/mobile/snapdragon/pcs-and-tablets/snapdragon-x-elite) +* [Snapdragon 8 Elite](https://www.qualcomm.com/products/mobile/snapdragon/smartphones/snapdragon-8-series-mobile-platforms/snapdragon-8-elite-mobile-platform), [Snapdragon 8 Gen 3](https://www.qualcomm.com/products/mobile/snapdragon/smartphones/snapdragon-8-series-mobile-platforms/snapdragon-8-gen-3-mobile-platform), [Snapdragon 8 Gen 2](https://www.qualcomm.com/products/mobile/snapdragon/smartphones/snapdragon-8-series-mobile-platforms/snapdragon-8-gen-2-mobile-platform), [Snapdragon 8 Gen 1](https://www.qualcomm.com/products/mobile/snapdragon/smartphones/snapdragon-8-series-mobile-platforms/snapdragon-8-gen-1-mobile-platform) +* [Snapdragon X Elite](https://www.qualcomm.com/products/mobile/snapdragon/pcs-and-tablets/snapdragon-x-elite) Select supported devices * Samsung Galaxy S21 Series, Galaxy S22 Series, Galaxy S23 Series, Galaxy S24 Series @@ -275,6 +276,7 @@ Qualcomm® AI Hub Models is licensed under BSD-3. See the [LICENSE file](../LICE | [ConvNext-Tiny-w8a16-Quantized](https://aihub.qualcomm.com/models/convnext_tiny_w8a16_quantized) | [qai_hub_models.models.convnext_tiny_w8a16_quantized](qai_hub_models/models/convnext_tiny_w8a16_quantized/README.md) | ✔️ | ✔️ | ✔️ | [ConvNext-Tiny-w8a8-Quantized](https://aihub.qualcomm.com/models/convnext_tiny_w8a8_quantized) | [qai_hub_models.models.convnext_tiny_w8a8_quantized](qai_hub_models/models/convnext_tiny_w8a8_quantized/README.md) | ✔️ | ✔️ | ✔️ | [DenseNet-121](https://aihub.qualcomm.com/models/densenet121) | [qai_hub_models.models.densenet121](qai_hub_models/models/densenet121/README.md) | ✔️ | ✔️ | ✔️ +| [DenseNet-121-Quantized](https://aihub.qualcomm.com/models/densenet121_quantized) | [qai_hub_models.models.densenet121_quantized](qai_hub_models/models/densenet121_quantized/README.md) | ✔️ | ✔️ | ✔️ | [EfficientNet-B0](https://aihub.qualcomm.com/models/efficientnet_b0) | [qai_hub_models.models.efficientnet_b0](qai_hub_models/models/efficientnet_b0/README.md) | ✔️ | ✔️ | ✔️ | [GoogLeNet](https://aihub.qualcomm.com/models/googlenet) | [qai_hub_models.models.googlenet](qai_hub_models/models/googlenet/README.md) | ✔️ | ✔️ | ✔️ | [GoogLeNetQuantized](https://aihub.qualcomm.com/models/googlenet_quantized) | [qai_hub_models.models.googlenet_quantized](qai_hub_models/models/googlenet_quantized/README.md) | ✔️ | ✔️ | ✔️ @@ -306,6 +308,7 @@ Qualcomm® AI Hub Models is licensed under BSD-3. See the [LICENSE file](../LICE | [Swin-Small](https://aihub.qualcomm.com/models/swin_small) | [qai_hub_models.models.swin_small](qai_hub_models/models/swin_small/README.md) | ✔️ | ✔️ | ✔️ | [Swin-Tiny](https://aihub.qualcomm.com/models/swin_tiny) | [qai_hub_models.models.swin_tiny](qai_hub_models/models/swin_tiny/README.md) | ✔️ | ✔️ | ✔️ | [VIT](https://aihub.qualcomm.com/models/vit) | [qai_hub_models.models.vit](qai_hub_models/models/vit/README.md) | ✔️ | ✔️ | ✔️ +| [VITQuantized](https://aihub.qualcomm.com/models/vit_quantized) | [qai_hub_models.models.vit_quantized](qai_hub_models/models/vit_quantized/README.md) | ✔️ | ✔️ | ✔️ | [WideResNet50](https://aihub.qualcomm.com/models/wideresnet50) | [qai_hub_models.models.wideresnet50](qai_hub_models/models/wideresnet50/README.md) | ✔️ | ✔️ | ✔️ | [WideResNet50-Quantized](https://aihub.qualcomm.com/models/wideresnet50_quantized) | [qai_hub_models.models.wideresnet50_quantized](qai_hub_models/models/wideresnet50_quantized/README.md) | ✔️ | ✔️ | ✔️ | | | | | @@ -359,7 +362,9 @@ Qualcomm® AI Hub Models is licensed under BSD-3. See the [LICENSE file](../LICE | [MediaPipe-Face-Detection](https://aihub.qualcomm.com/models/mediapipe_face) | [qai_hub_models.models.mediapipe_face](qai_hub_models/models/mediapipe_face/README.md) | ✔️ | ✔️ | ✔️ | [MediaPipe-Face-Detection-Quantized](https://aihub.qualcomm.com/models/mediapipe_face_quantized) | [qai_hub_models.models.mediapipe_face_quantized](qai_hub_models/models/mediapipe_face_quantized/README.md) | ✔️ | ✔️ | ✔️ | [MediaPipe-Hand-Detection](https://aihub.qualcomm.com/models/mediapipe_hand) | [qai_hub_models.models.mediapipe_hand](qai_hub_models/models/mediapipe_hand/README.md) | ✔️ | ✔️ | ✔️ -| [YOLOv11-Detection](qai_hub_models/models/yolov11_det/README.md) | [qai_hub_models.models.yolov11_det](qai_hub_models/models/yolov11_det/README.md) | ✔️ | ✔️ | ✔️ +| [PPE-Detection](https://aihub.qualcomm.com/models/gear_guard_net) | [qai_hub_models.models.gear_guard_net](qai_hub_models/models/gear_guard_net/README.md) | ✔️ | ✔️ | ✔️ +| [Person-Foot-Detection](https://aihub.qualcomm.com/models/foot_track_net) | [qai_hub_models.models.foot_track_net](qai_hub_models/models/foot_track_net/README.md) | ✔️ | ✔️ | ✔️ +| [YOLOv11-Detection](https://aihub.qualcomm.com/models/yolov11_det) | [qai_hub_models.models.yolov11_det](qai_hub_models/models/yolov11_det/README.md) | ✔️ | ✔️ | ✔️ | [YOLOv8-Detection](https://aihub.qualcomm.com/models/yolov8_det) | [qai_hub_models.models.yolov8_det](qai_hub_models/models/yolov8_det/README.md) | ✔️ | ✔️ | ✔️ | [YOLOv8-Detection-Quantized](https://aihub.qualcomm.com/models/yolov8_det_quantized) | [qai_hub_models.models.yolov8_det_quantized](qai_hub_models/models/yolov8_det_quantized/README.md) | ✔️ | ✔️ | ✔️ | [Yolo-NAS](https://aihub.qualcomm.com/models/yolonas) | [qai_hub_models.models.yolonas](qai_hub_models/models/yolonas/README.md) | ✔️ | ✔️ | ✔️ @@ -369,7 +374,7 @@ Qualcomm® AI Hub Models is licensed under BSD-3. See the [LICENSE file](../LICE | [Yolo-v7-Quantized](https://aihub.qualcomm.com/models/yolov7_quantized) | [qai_hub_models.models.yolov7_quantized](qai_hub_models/models/yolov7_quantized/README.md) | ✔️ | ✔️ | ✔️ | | | | | | **Pose Estimation** -| [FaceMap_3DMM](qai_hub_models/models/facemap_3dmm/README.md) | [qai_hub_models.models.facemap_3dmm](qai_hub_models/models/facemap_3dmm/README.md) | ✔️ | ✔️ | ✔️ +| [Facial-Landmark-Detection](https://aihub.qualcomm.com/models/facemap_3dmm) | [qai_hub_models.models.facemap_3dmm](qai_hub_models/models/facemap_3dmm/README.md) | ✔️ | ✔️ | ✔️ | [HRNetPose](https://aihub.qualcomm.com/models/hrnet_pose) | [qai_hub_models.models.hrnet_pose](qai_hub_models/models/hrnet_pose/README.md) | ✔️ | ✔️ | ✔️ | [HRNetPoseQuantized](https://aihub.qualcomm.com/models/hrnet_pose_quantized) | [qai_hub_models.models.hrnet_pose_quantized](qai_hub_models/models/hrnet_pose_quantized/README.md) | ✔️ | ✔️ | ✔️ | [LiteHRNet](https://aihub.qualcomm.com/models/litehrnet) | [qai_hub_models.models.litehrnet](qai_hub_models/models/litehrnet/README.md) | ✔️ | ✔️ | ✔️ @@ -413,6 +418,15 @@ Qualcomm® AI Hub Models is licensed under BSD-3. See the [LICENSE file](../LICE | [Stable-Diffusion-v2.1](https://aihub.qualcomm.com/models/stable_diffusion_v2_1_quantized) | [qai_hub_models.models.stable_diffusion_v2_1_quantized](qai_hub_models/models/stable_diffusion_v2_1_quantized/README.md) | ✔️ | ✔️ | ✔️ | | | | | | **Text Generation** -| [Baichuan-7B](https://aihub.qualcomm.com/models/baichuan_7b_quantized) | [qai_hub_models.models.baichuan_7b_quantized](qai_hub_models/models/baichuan_7b_quantized/README.md) | ✔️ | ✔️ | ✔️ +| [Baichuan2-7B](https://aihub.qualcomm.com/models/baichuan2_7b_quantized) | [qai_hub_models.models.baichuan2_7b_quantized](qai_hub_models/models/baichuan2_7b_quantized/README.md) | ✔️ | ✔️ | ✔️ +| [IBM-Granite-3B-Code-Instruct](https://aihub.qualcomm.com/models/ibm_granite_3b_code_instruct) | [qai_hub_models.models.ibm_granite_3b_code_instruct](qai_hub_models/models/ibm_granite_3b_code_instruct/README.md) | ✔️ | ✔️ | ✔️ +| [IndusQ-1.1B](https://aihub.qualcomm.com/models/indus_1b_quantized) | [qai_hub_models.models.indus_1b_quantized](qai_hub_models/models/indus_1b_quantized/README.md) | ✔️ | ✔️ | ✔️ +| [JAIS-6p7b-Chat](https://aihub.qualcomm.com/models/jais_6p7b_chat_quantized) | [qai_hub_models.models.jais_6p7b_chat_quantized](qai_hub_models/models/jais_6p7b_chat_quantized/README.md) | ✔️ | ✔️ | ✔️ | [Llama-v2-7B-Chat](https://aihub.qualcomm.com/models/llama_v2_7b_chat_quantized) | [qai_hub_models.models.llama_v2_7b_chat_quantized](qai_hub_models/models/llama_v2_7b_chat_quantized/README.md) | ✔️ | ✔️ | ✔️ | [Llama-v3-8B-Chat](https://aihub.qualcomm.com/models/llama_v3_8b_chat_quantized) | [qai_hub_models.models.llama_v3_8b_chat_quantized](qai_hub_models/models/llama_v3_8b_chat_quantized/README.md) | ✔️ | ✔️ | ✔️ +| [Llama-v3.1-8B-Chat](https://aihub.qualcomm.com/models/llama_v3_1_8b_chat_quantized) | [qai_hub_models.models.llama_v3_1_8b_chat_quantized](qai_hub_models/models/llama_v3_1_8b_chat_quantized/README.md) | ✔️ | ✔️ | ✔️ +| [Llama-v3.2-3B-Chat](https://aihub.qualcomm.com/models/llama_v3_2_3b_chat_quantized) | [qai_hub_models.models.llama_v3_2_3b_chat_quantized](qai_hub_models/models/llama_v3_2_3b_chat_quantized/README.md) | ✔️ | ✔️ | ✔️ +| [Mistral-3B](https://aihub.qualcomm.com/models/mistral_3b_quantized) | [qai_hub_models.models.mistral_3b_quantized](qai_hub_models/models/mistral_3b_quantized/README.md) | ✔️ | ✔️ | ✔️ +| [Mistral-7B-Instruct-v0.3](https://aihub.qualcomm.com/models/mistral_7b_instruct_v0_3_quantized) | [qai_hub_models.models.mistral_7b_instruct_v0_3_quantized](qai_hub_models/models/mistral_7b_instruct_v0_3_quantized/README.md) | ✔️ | ✔️ | ✔️ +| [PLaMo-1B](https://aihub.qualcomm.com/models/plamo_1b_quantized) | [qai_hub_models.models.plamo_1b_quantized](qai_hub_models/models/plamo_1b_quantized/README.md) | ✔️ | ✔️ | ✔️ +| [Qwen2-7B-Instruct](https://aihub.qualcomm.com/models/qwen2_7b_instruct_quantized) | [qai_hub_models.models.qwen2_7b_instruct_quantized](qai_hub_models/models/qwen2_7b_instruct_quantized/README.md) | ✔️ | ✔️ | ✔️ diff --git a/qai_hub_models/_version.py b/qai_hub_models/_version.py index 572c45a4..978ed91f 100644 --- a/qai_hub_models/_version.py +++ b/qai_hub_models/_version.py @@ -2,4 +2,4 @@ # Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. # SPDX-License-Identifier: BSD-3-Clause # --------------------------------------------------------------------- -__version__ = "0.15.0" +__version__ = "0.16.2" diff --git a/qai_hub_models/asset_bases.yaml b/qai_hub_models/asset_bases.yaml index 852e96cd..7fe290f8 100644 --- a/qai_hub_models/asset_bases.yaml +++ b/qai_hub_models/asset_bases.yaml @@ -12,3 +12,4 @@ huggingface_path: qualcomm/{model_name} models_website_url: https://aihub.qualcomm.com models_website_relative_path: models/{model_id} email_template: qai_hub_models/scripts/templates/email_template.txt +genie_url: https://github.com/quic/ai-hub-apps/tree/main/tutorials/llm_on_genie diff --git a/qai_hub_models/conftest.py b/qai_hub_models/conftest.py index 9dd11824..f63d7819 100644 --- a/qai_hub_models/conftest.py +++ b/qai_hub_models/conftest.py @@ -4,6 +4,7 @@ # --------------------------------------------------------------------- def pytest_configure(config): config.addinivalue_line("markers", "compile: Run compile tests.") + config.addinivalue_line("markers", "quantize: Run quantize tests.") config.addinivalue_line("markers", "profile: Run profile tests.") config.addinivalue_line("markers", "inference: Run inference tests.") config.addinivalue_line("markers", "trace: Run trace accuracy tests.") diff --git a/qai_hub_models/models/_shared/body_detection/app.py b/qai_hub_models/models/_shared/body_detection/app.py new file mode 100644 index 00000000..9a39326d --- /dev/null +++ b/qai_hub_models/models/_shared/body_detection/app.py @@ -0,0 +1,171 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +from typing import Callable, List + +import numpy as np +import torch + +from qai_hub_models.utils.asset_loaders import load_image +from qai_hub_models.utils.bounding_box_processing import batched_nms, box_xywh_to_xyxy +from qai_hub_models.utils.image_processing import resize_pad + + +def preprocess(img: np.ndarray, height: int, width: int): + """ + Preprocess model input. + + Inputs: + img: np.ndarray + Input image of shape [H, W, C] + height: int + Model input height. + width: int + Model input width + Outputs: + input: torch.Tensor + Preprocessed model input. Shape is (1, C, H, W) + scale: float + Scaling factor of input image and network input image. + pad: List[float] + Top and left padding size. + """ + img = torch.from_numpy(img).permute(2, 0, 1).unsqueeze_(0) / 255.0 + input, scale, pad = resize_pad(img, (height, width)) + return input, scale, pad + + +def decode(output: List[torch.Tensor], thr: float) -> np.ndarray: + """ + Decode model output to bounding boxes, class indices and scores. + + Inputs: + output: List[torch.Tensor] + Model output. + thr: float + Detection threshold. Predictions lower than the thresholds will be discarded. + Outputs: np.ndarray + Detection results. Shape is (N, 6). N is the number of detected objects. Each object is + represented by (class, x1, y1, x2, y2, score) + """ + anchors = [ + [[10, 13], [16, 30], [33, 23]], + [[30, 61], [62, 45], [59, 119]], + [[116, 90], [156, 198], [373, 326]], + ] + strides = (8, 16, 32) + result = [] + for s, out in enumerate(output): + b, h, w, c = out.shape + out = out.reshape(b, h, w, 3, -1) + _, ny, nx, na = out.shape[:-1] + for y in np.arange(ny): + for x in np.arange(nx): + for a in np.arange(na): + pred = out[0, y, x, a] + obj_score = pred[4].sigmoid() + cls_score = pred[5:].max().sigmoid() + score = obj_score * cls_score + if score < thr: + continue + c = np.argmax(pred[5:]) + bx = (pred[0].sigmoid() * 2 - 0.5 + x) * strides[s] + by = (pred[1].sigmoid() * 2 - 0.5 + y) * strides[s] + bw = 4 * pred[2].sigmoid() ** 2 * anchors[s][a][0] + bh = 4 * pred[3].sigmoid() ** 2 * anchors[s][a][1] + + boxes = box_xywh_to_xyxy( + torch.from_numpy(np.array([[[bx, by], [bw, bh]]])) + ) + x1 = boxes[0][0][0].round() + y1 = boxes[0][0][1].round() + x2 = boxes[0][1][0].round() + y2 = boxes[0][1][1].round() + result.append([c, x1, y1, x2, y2, score]) + return np.array(result, dtype=np.float32) + + +def postprocess( + output: List[torch.Tensor], + scale: float, + pad: List[int], + conf_thr: float, + iou_thr: float, +) -> np.ndarray: + """ + Post process model output. + Inputs: + output: List[torch.Tensor] + Multi-scale model output. + scale: float + Scaling factor from input image and model input. + pad: List[int] + Padding sizes from input image and model input. + conf_thr: float + Confidence threshold of detections. + iou_thr: float + IoU threshold for non maximum suppression. + Outputs: np.ndarray + Detected object. Shape is (N, 6). N is the number of detected objects. Each object is + represented by (class, x1, y1, x2, y2, score) + """ + result = decode(output, conf_thr) + + result_final = [] + for c in [0, 1]: + idx = result[:, 0] == c + boxes, scores = batched_nms( + iou_thr, + 0, + torch.from_numpy(result[idx, 1:5]).unsqueeze_(0), + torch.from_numpy(result[idx, -1]).unsqueeze_(0), + ) + scores[0].unsqueeze_(-1) + result_final.append( + torch.concat([torch.zeros_like(scores[0]) + c, boxes[0], scores[0]], 1) + ) + result_final = torch.concat(result_final).numpy() + result_final[:, 1:5] = ( + (result_final[:, 1:5] - np.array([pad[0], pad[1], pad[0], pad[1]])) / scale + ).round() + return result_final + + +class BodyDetectionApp: + """Body detection application""" + + def __init__(self, model: Callable[[torch.Tensor], torch.Tensor]) -> None: + """ + Initialize BodyDetectionApp. + + Inputs: + model: Callable[[torch.Tensor], torch.Tensor] + Detection model. + """ + self.model = model + + def detect(self, imgfile: str, height: int, width: int, conf: float) -> np.ndarray: + """ + Detect objects from input images. + + Inputs: + imgfile: str + Input image file + height: int + Model input height. + width: int + Model input width. + conf: float + Detection threshold. + Outputs: np.ndarray + Detection result. Shape is (N, 6). N is the number of detected objects. Each object is represented by + (cls_id, x1, y1, x2, y2, score) + """ + img = np.array(load_image(imgfile)) + input, scale, pad = preprocess(img, height, width) + output = self.model(input) + for t, o in enumerate(output): + output[t] = o.permute(0, 2, 3, 1).detach() + result = postprocess(output, scale, pad, conf, 0.5) + return result diff --git a/qai_hub_models/models/_shared/body_detection/demo.py b/qai_hub_models/models/_shared/body_detection/demo.py new file mode 100644 index 00000000..4b06fa9a --- /dev/null +++ b/qai_hub_models/models/_shared/body_detection/demo.py @@ -0,0 +1,93 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +from copy import deepcopy + +import numpy as np +import PIL.Image as Image +import torch.nn as nn + +from qai_hub_models.models._shared.body_detection.app import BodyDetectionApp +from qai_hub_models.utils.args import ( + demo_model_from_cli_args, + get_model_cli_parser, + get_on_device_demo_parser, + validate_on_device_demo_args, +) +from qai_hub_models.utils.asset_loaders import load_image +from qai_hub_models.utils.display import display_or_save_image +from qai_hub_models.utils.draw import draw_box_from_corners + + +def plot_result(img: np.ndarray, result: np.ndarray): + """ + Plot detection result. + + Inputs: + img: np.ndarray + Input image. + result: np.ndarray + Detection result. + """ + box_color = ((255, 0, 0), (0, 255, 0)) + for r in result: + corners = np.array( + [[r[1], r[2]], [r[1], r[4]], [r[3], r[2]], [r[3], r[4]]] + ).astype(int) + draw_box_from_corners(img, corners, box_color[int(r[0])]) + return img + + +def BodyDetectionDemo( + is_test: bool, + model_name: nn.Module, + model_id: str, + app_name: BodyDetectionApp, + imgfile: str, + height: int, + width: int, + conf: float, +) -> None: + """ + Object detection demo. + + Input: + is_test: bool. + Is test + model_name: nn.Module + Object detection model. + model_id: str. + Model ID + app_name: BodyDetectionApp + Object detection app. + imgfile: str: + Image file path. + height: int + Input image height. + width: int + Input image width. + conf: float + Detection confidence. + """ + parser = get_model_cli_parser(model_name) + parser = get_on_device_demo_parser(parser, add_output_dir=True) + parser.add_argument( + "--image", + type=str, + default=imgfile, + help="image file path or URL", + ) + args = parser.parse_args([] if is_test else None) + model = demo_model_from_cli_args(model_name, model_id, args) + validate_on_device_demo_args(args, model_id) + + app = app_name(model) + result = app.detect(args.image, height, width, conf) + + if not is_test: + img = np.array(load_image(args.image)) + image_annotated = plot_result(deepcopy(img), result) + display_or_save_image( + Image.fromarray(image_annotated), args.output_dir, "result.jpg" + ) diff --git a/qai_hub_models/models/_shared/body_detection/model.py b/qai_hub_models/models/_shared/body_detection/model.py new file mode 100644 index 00000000..4ef9583d --- /dev/null +++ b/qai_hub_models/models/_shared/body_detection/model.py @@ -0,0 +1,524 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +import math +from copy import deepcopy +from typing import List + +import torch +import torch.nn as nn + + +def make_divisible(x: int, divisor: int) -> int: + """ + Compute the closest number that is larger or equal to X and is divisible by DIVISOR. + + Inputs: + x: int + Input intenger + divisor: int + Divisor for the input number. + Outputs: int + Closest number that is larger or equal to X and is divisible by DIVISOR. + """ + return math.ceil(x / divisor) * divisor + + +class Concat(nn.Module): + """Tensor concatenation module""" + + def __init__(self, dimension: int = 1) -> None: + """ + Inputs: + dimension: int + Dimension to concatenate tensors. + """ + super().__init__() + self.d = dimension + + def forward(self, x: List[torch.Tensor]) -> torch.Tensor: + """ + Inputs: + x: List[torch.Tensor] + List of tensors to be concatenated. + Output: torch.Tensor + Concatenated tensor. + """ + return torch.cat(x, self.d) + + +def autopad(kernel_size: int, p=None) -> int: + """ + Compute padding size from kernel size. + + Input: + kernel_size: int + Kernel size. + p: bool | int + Padding size. + Outputs: int + Padding size + """ + if p is None: + p = ( + kernel_size // 2 + if isinstance(kernel_size, int) + else [x // 2 for x in kernel_size] + ) + return p + + +class FusedConvBatchNorm(nn.Module): + """Module of convolution, batch nornamlization and activation.""" + + def __init__( + self, + in_channels: int, + out_channels: int, + kernel_size: int = 1, + stride: int = 1, + padding=None, + groups: int = 1, + act: bool = True, + ) -> None: + """ + Initialize FusedConvBatchNorm. + + Inputs: + in_channels: int + Input channels. + out_channels: int + Output channels. + kernel_size: int + Kernel size. + stride: int + Convolution stride. + groups: int + Groups of channels for convolution. + act: bool + Whether to enable ReLU activation. + """ + super().__init__() + self.conv = nn.Conv2d( + in_channels, + out_channels, + kernel_size, + stride, + autopad(kernel_size, padding), + groups=groups, + bias=False, + ) + self.bn = nn.BatchNorm2d(out_channels) + self.act = ( + nn.ReLU(True) + if act is True + else (act if isinstance(act, nn.Module) else nn.Identity()) + ) + + def forward(self, x: torch.Tensor) -> torch.Tensor: + """ + Forward computation of FusedConvBatchNorm module. + + Inputs: + x: torch.Tensor + Input tensor + Output: torch.Tensor + Output tensor. + """ + return self.act(self.bn(self.conv(x))) + + +class Bottleneck(nn.Module): + """Bottleneck block""" + + def __init__( + self, + in_channels: int, + out_channels: int, + shortcut: bool = True, + groups: int = 1, + expand_ratio: float = 0.5, + ) -> None: + """ + Initialize Bottleneck module. + + Inputs: + in_channels: int + Input channels. + out_channels: int + Output channels. + shortcut: bool + Whether to enable shortcut connection. + groups: int + Groups of channels for convolution. + expand_ratio: float + Expand ratio of input channels to hidden channels. + """ + super().__init__() + hidden_channels = int(out_channels * expand_ratio) + self.cv1 = FusedConvBatchNorm(in_channels, hidden_channels, 1, 1) + self.cv2 = FusedConvBatchNorm( + hidden_channels, out_channels, 3, 1, groups=groups + ) + self.add = shortcut and in_channels == out_channels + + def forward(self, x: torch.Tensor) -> torch.Tensor: + """ + Forward computation of Bottleneck module. + + Inputs: + x: torch.Tensor + Input tensor. + Outputs: torch.Tensor. + Output tensor. + """ + y = self.cv2(self.cv1(x)) + if self.add: + y += x + return y + + +class C3(nn.Module): + """C3 block""" + + def __init__( + self, + in_channels: int, + out_channels: int, + num_blocks: int = 1, + shortcut: bool = True, + group: int = 1, + expand_ratio: float = 0.5, + ) -> None: + """ + Initialize C3 module. + + Inputs: + in_channels: int + Input channels. + out_channels: int + Output channels. + num_blocks: int + Number of Bottleneck blocks. + shortcut: bool + Whether to enable shortcut connection. + groups: int + Groups of channels for convolution. + expand_ratio: float + Expand ratio of input channels to hidden channels. + """ + super().__init__() + hidden_channels = int(out_channels * expand_ratio) + self.cv1 = FusedConvBatchNorm(in_channels, hidden_channels, 1, 1) + self.cv2 = FusedConvBatchNorm(in_channels, hidden_channels, 1, 1) + self.cv3 = FusedConvBatchNorm(2 * hidden_channels, out_channels, 1) + self.m = nn.Sequential( + *[ + Bottleneck( + hidden_channels, hidden_channels, shortcut, group, expand_ratio=1.0 + ) + for _ in range(num_blocks) + ] + ) + + def forward(self, x: torch.Tensor) -> torch.Tensor: + """ + Forward computation of C3 module. + + Inputs: + x: torch.Tensor. + Input tensor. + Outputs: torch.Tensor. + Output tensor + """ + return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1)) + + +class SPPF(nn.Module): + """Spatial Pyramid Pooling - Fast (SPPF) layer""" + + def __init__( + self, in_channels: int, out_channels: int, kernel_size: int = 5 + ) -> None: + """ + Initialize SPPF module. + + Input: + in_channels: int + Input channels. + out_channels: int + Output channels. + kernel_size: int + Kernel size. + """ + super().__init__() + hiddel_channels = in_channels // 2 + self.cv1 = FusedConvBatchNorm(in_channels, hiddel_channels, 1, 1) + self.cv2 = FusedConvBatchNorm(hiddel_channels * 4, out_channels, 1, 1) + self.m = nn.MaxPool2d( + kernel_size=kernel_size, stride=1, padding=kernel_size // 2 + ) + + def forward(self, x: torch.Tensor) -> torch.Tensor: + """ + Forward computation of SPPF module. + + Inputs: + x: torch.Tensor. + Input tensor. + Outputs: torch.Tensor. + Output tensor + """ + x = self.cv1(x) + y1 = self.m(x) + y2 = self.m(y1) + return self.cv2(torch.cat([x, y1, y2, self.m(y2)], 1)) + + +class Detect(nn.Module): + """Detector head module""" + + def __init__( + self, num_classes: int = 80, anchors: tuple = (), ch: tuple = () + ) -> None: + """ + Initialize Detector module. + + Inputs: + num_classes: int + Number of object classes. + anchors: tuple + Tuple of anchor sizes. + ch: tuple + Input channels of multi-scales + inplace: bool + """ + super().__init__() + self.num_classes = num_classes + self.num_output = num_classes + 5 + self.num_layers = len(anchors) + self.num_anchors = len(anchors[0]) // 2 + self.grid = [torch.zeros(1)] * self.num_layers + self.anchor_grid = [torch.zeros(1)] * self.num_layers + self.register_buffer( + "anchors", torch.tensor(anchors).float().view(self.num_layers, -1, 2) + ) + self.m = nn.ModuleList( + nn.Conv2d(x, self.num_output * self.num_anchors, 1) for x in ch + ) + + def forward(self, x: List[torch.Tensor]) -> List[torch.Tensor]: + """ + Forward computation of Detect module. + + Inputs: + x: List[torch.Tensor]. + Input lisr of tensors. + Outputs: List[torch.Tensor]. + Output list of tensors. + """ + for i in range(self.num_layers): + x[i] = self.m[i](x[i]) + return x + + +def parse_model(cfg: dict, ch: List[int]): + """ + Generate model module from model configuration. + + Inputs: + cfg: dict + Model configurations. + ch: list + Input channels. + Output: + model: nn.Sequential + Model layers. + save: list + List of layer indices that needs to be saved. + """ + anchors, nc, gd, gw = ( + cfg["anchors"], + cfg["nc"], + cfg["depth_multiple"], + cfg["width_multiple"], + ) + num_anchors = (len(anchors[0]) // 2) if isinstance(anchors, list) else anchors + num_outputs = num_anchors * (nc + 5) + layers, save, c2 = [], [], ch[-1] + for i, (f, n, m, args) in enumerate(cfg["backbone"] + cfg["head"]): + m = eval(m) if isinstance(m, str) else m + for j, a in enumerate(args): + try: + args[j] = eval(a) if isinstance(a, str) else a + except NameError: + pass + + n = max(round(n * gd), 1) if n > 1 else n + if m in [FusedConvBatchNorm, Bottleneck, SPPF, C3, DoubleBlazeBlock]: + c1, c2 = ch[f], args[0] + if c2 != num_outputs: + c2 = make_divisible(c2 * gw, 8) + + args = [c1, c2, *args[1:]] + if m in [C3]: + args.insert(2, n) + n = 1 + elif m is nn.BatchNorm2d: + args = [ch[f]] + elif m is Concat: + c2 = sum([ch[x] for x in f]) + elif m is Detect: + args.append([ch[x] for x in f]) + if isinstance(args[1], int): + args[1] = [list(range(args[1] * 2))] * len(f) + else: + c2 = ch[f] + + m_ = nn.Sequential(*[m(*args) for _ in range(n)]) if n > 1 else m(*args) + t = str(m)[8:-2].replace("__main__.", "") + np = sum([x.numel() for x in m_.parameters()]) + m_.i, m_.f, m_.type, m_.np = (i, f, t, np) + save.extend(x % i for x in ([f] if isinstance(f, int) else f) if x != -1) + layers.append(m_) + if i == 0: + ch = [] + ch.append(c2) + return nn.Sequential(*layers), sorted(save) + + +class DoubleBlazeBlock(nn.Module): + """ + DoubleBlaze block + """ + + def __init__( + self, + in_channels: int, + out_channels: int, + stride: int = 1, + kernel_size: int = 5, + bias: bool = False, + ) -> None: + """ + Initialize DoubleBlaze block. + + Inputs: + in_channels: int + Number of input channels. + out_channels: int + Number of output channels. + stride: int + Convolution stride. + kernel_size: int + Kernel size. + bias: bool. + Enable bias in convolution. + """ + super(DoubleBlazeBlock, self).__init__() + self.stride = stride + assert stride in [1, 2] + self.use_pooling = self.stride != 1 + self.channel_pad = out_channels - in_channels + if self.channel_pad != 0: + self.pad = nn.Conv2d(in_channels, out_channels, kernel_size=1) + padding = (kernel_size - 1) // 2 + hidden_channels = max(out_channels, in_channels) // 2 + + self.conv1 = nn.Sequential( + # dw + nn.Conv2d( + in_channels, + in_channels, + kernel_size=kernel_size, + stride=stride, + padding=padding, + groups=in_channels, + bias=bias, + ), + nn.BatchNorm2d(in_channels), + # pw-linear + nn.Conv2d(in_channels, hidden_channels, 1, 1, 0, bias=bias), + nn.BatchNorm2d(hidden_channels), + ) + self.act = nn.ReLU(inplace=True) + + self.conv2 = nn.Sequential( + nn.ReLU(inplace=True), + # dw + nn.Conv2d( + hidden_channels, + hidden_channels, + kernel_size=kernel_size, + stride=1, + padding=padding, + groups=hidden_channels, + bias=bias, + ), + nn.BatchNorm2d(hidden_channels), + # pw-linear + nn.Conv2d(hidden_channels, out_channels, 1, 1, 0, bias=bias), + nn.BatchNorm2d(out_channels), + ) + + if self.use_pooling: + self.mp = nn.MaxPool2d(kernel_size=self.stride, stride=self.stride) + + def forward(self, x: torch.Tensor) -> torch.Tensor: + """ + Forward computation of DoubleBlaze block. + + Input: + x: torch.Tensor. + Input tensor + Output: torch.Tensor + Output tensor. + """ + h = self.conv1(x) + h = self.conv2(h) + + if self.use_pooling: + x = self.mp(x) + if self.channel_pad != 0: + x = self.pad(x) + return self.act(h + x) + + +class Model(nn.Module): + """Person/face detection model""" + + def __init__(self, model_cfg: dict, ch: int = 3) -> None: + """ + Initialize person/face detection model. + + Inputs: + ch: int + Input channels. + model_cfg: dict + Model configuration + """ + super().__init__() + self.model, self.save = parse_model(deepcopy(model_cfg), ch=[ch]) + + def forward(self, x: torch.Tensor) -> List[torch.Tensor]: + """ + Forward computation of Model. + + Inputs: + x: torch.Tensor. + Input image. + Outputs: List[torch.Tensor] + Multi-scale object detection output. + """ + y = [] + for m in self.model: + if m.f != -1: + x = ( + y[m.f] + if isinstance(m.f, int) + else [x if j == -1 else y[j] for j in m.f] + ) + x = m(x) + y.append(x if m.i in self.save else None) + return x diff --git a/qai_hub_models/models/_shared/imagenet_classifier/model.py b/qai_hub_models/models/_shared/imagenet_classifier/model.py index 21e70f39..e8dd34be 100644 --- a/qai_hub_models/models/_shared/imagenet_classifier/model.py +++ b/qai_hub_models/models/_shared/imagenet_classifier/model.py @@ -11,17 +11,22 @@ from qai_hub_models.evaluators.base_evaluators import BaseEvaluator from qai_hub_models.evaluators.classification_evaluator import ClassificationEvaluator +from qai_hub_models.utils.asset_loaders import CachedWebModelAsset, load_image from qai_hub_models.utils.base_model import BaseModel from qai_hub_models.utils.image_processing import ( IMAGENET_DIM, + IMAGENET_TRANSFORM, normalize_image_torchvision, ) from qai_hub_models.utils.input_spec import InputSpec -from qai_hub_models.utils.quantization import get_image_quantization_samples MODEL_ASSET_VERSION = 1 MODEL_ID = __name__.split(".")[-2] +TEST_IMAGENET_IMAGE = CachedWebModelAsset.from_asset_store( + MODEL_ID, MODEL_ASSET_VERSION, "dog.jpg" +) + class ImagenetClassifier(BaseModel): """ @@ -118,8 +123,9 @@ def from_pretrained( def _sample_inputs_impl( self, input_spec: InputSpec | None = None ) -> Dict[str, List[np.ndarray]]: - samples = get_image_quantization_samples() - return dict(image_tensor=[samples[:1].numpy()]) + image = load_image(TEST_IMAGENET_IMAGE) + tensor = IMAGENET_TRANSFORM(image).unsqueeze(0) + return dict(image_tensor=[tensor.numpy()]) @staticmethod def get_channel_last_inputs() -> List[str]: diff --git a/qai_hub_models/models/_shared/llama/model.py b/qai_hub_models/models/_shared/llama/model.py index 25f44dfa..da44c135 100644 --- a/qai_hub_models/models/_shared/llama/model.py +++ b/qai_hub_models/models/_shared/llama/model.py @@ -270,12 +270,17 @@ def __init__(self, model, encoding_path, is_token_generator=False): self.split_part = 1 self.is_token_generator = is_token_generator + def get_qnn_graph_name(self) -> Optional[str]: + model_name = "token" if self.is_token_generator else "prompt" + return f"{model_name}_part{self.split_part}" + def get_hub_compile_options( self, target_runtime: TargetRuntime, other_compile_options: str = "", device: Optional[Device] = None, ) -> str: + graph_name = self.get_qnn_graph_name() if ( target_runtime != TargetRuntime.QNN and target_runtime != TargetRuntime.PRECOMPILED_QNN_ONNX @@ -284,12 +289,26 @@ def get_hub_compile_options( f"Unsupported target_runtime provided: {target_runtime}." " Only Precompile ONN ONNX or QNN runtime is supported for Llama for now." ) - target_runtime_options = ( + options = ( " --target_runtime qnn_context_binary" if target_runtime == TargetRuntime.QNN else " --target_runtime precompiled_qnn_onnx" ) - return target_runtime_options + " --quantize_full_type w8a16 --quantize_io" + options += " --quantize_full_type w8a16 --quantize_io" + if graph_name is not None: + options += f" --qnn_graph_name {graph_name}" + return options + + def get_hub_profile_options( + self, + target_runtime: TargetRuntime, + other_profile_options: str = "", + ) -> str: + options = "--max_profiler_iterations 50" + graph_name = self.get_qnn_graph_name() + if graph_name is not None: + options += f" --qnn_options context_enable_graphs={graph_name}" + return options @staticmethod def get_output_names( diff --git a/qai_hub_models/models/_shared/llama3/__init__.py b/qai_hub_models/models/_shared/llama3/__init__.py new file mode 100644 index 00000000..21a22b31 --- /dev/null +++ b/qai_hub_models/models/_shared/llama3/__init__.py @@ -0,0 +1,4 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- diff --git a/qai_hub_models/models/_shared/llama3/app.py b/qai_hub_models/models/_shared/llama3/app.py new file mode 100644 index 00000000..20872c74 --- /dev/null +++ b/qai_hub_models/models/_shared/llama3/app.py @@ -0,0 +1,209 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +import gc +import math +from typing import Any, Callable, Set, Type + +import torch + +from qai_hub_models.models._shared.llama3.model import ( + Llama3Base_Quantized, + get_past_keyval_with_shift, +) +from qai_hub_models.models._shared.llama.model import RopeEmbedding + + +def _get_tokens_from_logits(output: torch.Tensor): + probs = torch.nn.functional.softmax(output[0][0], dim=-1) + return torch.multinomial(probs, num_samples=1).squeeze(1) + + +class ChatApp: + """ + This class is a demonstration of how to use Llama model to build a basic ChatApp. + This App use two models + * Prompt Processor + - Instantiation with sequence length 128. Used to process user + prompt. + * Token Generator + - Instantiation with sequence length 1. Used to predict + auto-regressive response. + """ + + def __init__( + self, + model_cls: Type[Llama3Base_Quantized], + get_input_prompt_with_tags: Callable, + prepare_combined_attention_mask: Callable, + tokenizer: Any, + end_tokens: Set[str], + ): + """ + Base ChatApp that generates one response for given input token. + + model_cls: Llama Model class that will be used to instantiate model + get_input_prompt_with_tags: Function to wrap input prompt with appropriate tags + prepare_combined_attention_mask: Function to combine and build attention mask, + tokenizer: Tokenizer to use, + end_tokens: Set of end tokens to convey end of token generation, + """ + self.model_cls = model_cls + self.get_input_prompt_with_tags = get_input_prompt_with_tags + self.prepare_combined_attention_mask = prepare_combined_attention_mask + self.tokenizer = tokenizer + self.end_tokens = end_tokens + + def generate_output_prompt( + self, + input_prompt: str, + prompt_sequence_length: int, + context_length: int, + max_output_tokens: int, + bundled_kvcache: bool = True, + ): + input_prompt_processed = self.get_input_prompt_with_tags( + user_input_prompt=input_prompt + ) + + input_tokens = self.tokenizer( + input_prompt_processed, + return_tensors="pt", + padding="max_length", + max_length=context_length, + ) + orig_input_ids = input_tokens["input_ids"].type(torch.long) + + num_tokens = torch.sum(input_tokens["attention_mask"]).item() + num_prompt_iterations = math.ceil(num_tokens / prompt_sequence_length) + rope_embedding = RopeEmbedding(max_length=context_length) + + print( + f"Will run prompt processor {num_prompt_iterations} time(s) and then token generator." + ) + + # Collect output prompt to summarize later + output_token = None + hub_tokens = None + + model = self.model_cls.from_pretrained(sequence_length=128) + llm_config = model.llm_config + is_prompt = True + + # Process input prompt + input_specs = self.model_cls.get_input_spec( + input_seq_length=prompt_sequence_length, + num_hidden_layers=llm_config.num_hidden_layers, + context_length=model.context_length, + hidden_size=llm_config.hidden_size, + num_attention_heads=llm_config.num_attention_heads, + num_key_value_heads=llm_config.num_key_value_heads, + ) + + # Initialization of KV cache + past_key_values = [ + torch.zeros(shape) + for k, (shape, _) in input_specs.items() + if k.startswith("past_") + ] + + for i in range(num_prompt_iterations + max_output_tokens - 1): + if i < num_prompt_iterations: + seq_len = prompt_sequence_length + next_seq_len = seq_len if i + 1 < num_prompt_iterations else 1 + else: + if is_prompt: + # switch to token processor + model = self.model_cls.from_pretrained(sequence_length=1) + is_prompt = False + + seq_len = 1 + next_seq_len = 1 + + if is_prompt: + input_ids = orig_input_ids[ + :, + context_length + - (num_prompt_iterations - i) * seq_len : context_length + - (num_prompt_iterations - i - 1) * seq_len, + ] + + # non-padded tokens in first prompt + first_prompt = (num_tokens - 1) % seq_len + 1 + padding_size0 = seq_len - first_prompt + padding_size = padding_size0 if i == 0 else 0 + offset = 0 if i == 0 else first_prompt + (i - 1) * seq_len + position_ids = [0] * (padding_size) + list( + range(offset, offset + seq_len - padding_size) + ) + position_ids = ( + torch.Tensor(position_ids).type(torch.long).reshape(1, seq_len) + ) + position_ids = ( + torch.Tensor(position_ids).type(torch.long).reshape(1, seq_len) + ) + position_ids_cos, position_ids_sin = rope_embedding.get_embedding( + position_ids + ) + attention_mask = torch.zeros((1, context_length)) + attention_mask[:, context_length - (first_prompt + i * seq_len) :] = 1.0 + else: + input_ids = output_token.reshape(-1, 1).type(torch.int32) + + # Shift attention_mask and position_ids + attention_mask = torch.cat( + (attention_mask[:, seq_len:], torch.zeros((1, seq_len))), dim=-1 + ) + position_ids = (position_ids[:, -1] + 1).reshape(-1, 1) + + position_ids = torch.Tensor(position_ids).type(torch.long).reshape(1, 1) + position_ids_cos, position_ids_sin = rope_embedding.get_embedding( + position_ids + ) + + cm_attention_masks = self.prepare_combined_attention_mask( + attention_mask=attention_mask, + input_shape=(1, seq_len), + past_key_values_length=context_length - seq_len, + ) + + # Generate output token + output = model( + input_ids, + cm_attention_masks, + position_ids_cos, + position_ids_sin, + *past_key_values, + ) + + del cm_attention_masks + del input_ids + past_key_values = get_past_keyval_with_shift( + past_key_values, + output[1:], + length=context_length - next_seq_len, + ) + output_token = _get_tokens_from_logits(output) + output_token = output_token[-next_seq_len:] + output_prompt = self.tokenizer.decode(output_token) + is_prediction = next_seq_len == 1 + + # Assistant generating end of token + if is_prediction and output_prompt in self.end_tokens: + break + + if is_prompt: + hub_tokens = output_token + else: + hub_tokens = torch.cat((hub_tokens, output_token), dim=-1) + + if is_prediction: + print() + print(f"Text generated so far: {self.tokenizer.decode(hub_tokens)}") + print() + gc.collect() + + print("-------- Response Summary --------") + print(f"Prompt: {input_prompt}") + print(f"Response: {self.tokenizer.decode(hub_tokens)}") diff --git a/qai_hub_models/models/_shared/llama3/demo.py b/qai_hub_models/models/_shared/llama3/demo.py new file mode 100644 index 00000000..bd38f8b9 --- /dev/null +++ b/qai_hub_models/models/_shared/llama3/demo.py @@ -0,0 +1,121 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +from __future__ import annotations + +from typing import Any, Callable, List, Set, Type + +from qai_hub_models.models._shared.llama3.app import ChatApp as App +from qai_hub_models.utils.args import get_model_cli_parser +from qai_hub_models.utils.base_model import BaseModel, TargetRuntime +from qai_hub_models.utils.huggingface import has_model_access + +# Max output tokens to generate +# You can override this with cli argument. +# Keeping this short as on-device demo takes time to converge. +MAX_OUTPUT_TOKENS = 20 +DEFAULT_DEVICE = "Samsung Galaxy S24 (Family)" + + +def llama_chat_demo( + model_cls: Type[BaseModel], + model_id: str, + get_input_prompt_with_tags: Callable, + prepare_combined_attention_mask: Callable, + tokenizer: Any, + end_tokens: Set[str], + hf_repo_name: str, + hf_repo_url: str, + default_prompt: str, + is_test: bool = False, + available_target_runtimes: List[TargetRuntime] = [TargetRuntime.QNN], + bundled_kvcache: bool = True, +): + """ + Shared Chat Demo App to generate output for provided input prompt + model_cls: Model base class (either Prompt Processor or Token Generator) + model_id: Model ID from hub, + get_input_prompt_with_tags: Function to wrap input prompt with appropriate tags, + prepare_combined_attention_mask: Function to combine attention mask, + tokenizer: Tokenizer to encode-decode prompt, + num_splits: Number of model splits, + end_tokens: Set of end tokens to use for end of output generation, + hf_repo_name: HF repo name, + hf_repo_url: HF repo url, + default_prompt: Default prompt to set, + is_test: If test, no options required, + available_target_runtimes: Default availble runtime in options, + """ + # Demo parameters + parser = get_model_cli_parser(model_cls) + parser.add_argument( + "--prompt", + type=str, + default=default_prompt, + help="input prompt.", + ) + parser.add_argument( + "--prompt-processor-input-seq-len", + type=int, + default=128, + help="input sequence length for prompt-processor. This must be less than `context_length` set for model.", + ) + parser.add_argument( + "--max-output-tokens", + type=int, + default=MAX_OUTPUT_TOKENS, + help="max output tokens to generate.", + ) + args = parser.parse_args([] if is_test else None) + + if not is_test: + print(f"\n{'-' * 85}") + print(f"** Generating response via {model_id} **") + print() + print("Prompt:", args.prompt) + print("Max number of output tokens to generate:", args.max_output_tokens) + print("Please pass `--max-output-tokens ` to generate longer responses.") + print() + print( + """NOTE: Each token generation takes around 15 mins on-device: + 1. Model is divided into multiple parts to fit into device constraints + 2. Each model requires separate execution on-device via AI Hub + 3. Due to autoregressive nature, we cannot run step 2 in parallel + 4. Device procurement is subject to device availability and might take longer to run demo on-device + +Alternative: + 1. Run demo on host (with PyTorch) to verify e2e result for longer responses + 2. Run demo on-device for shorter responses (--max-output-tokens 10 or 20) + 3. [Optional] Can run demo on-device to generate long sentence (takes longer) + +We are actively working on to improve UX and reduce turn-around time for these models. +""" + ) + print(f"{'-' * 85}\n") + + has_model_access(hf_repo_name, hf_repo_url) + + """ + llama_ar128 = model_cls.from_pretrained( + sequence_length=args.prompt_processor_input_seq_len + ) + llama_ar1 = model_cls.from_pretrained(sequence_length=1) + context_length = llama_ar128.context_length + """ + + app = App( + model_cls, + get_input_prompt_with_tags=get_input_prompt_with_tags, + prepare_combined_attention_mask=prepare_combined_attention_mask, + tokenizer=tokenizer, + end_tokens=end_tokens, + ) + context_length = 4096 + app.generate_output_prompt( + args.prompt, + prompt_sequence_length=args.prompt_processor_input_seq_len, + context_length=context_length, + max_output_tokens=args.max_output_tokens, + bundled_kvcache=bundled_kvcache, + ) diff --git a/qai_hub_models/models/_shared/llama3/export.py b/qai_hub_models/models/_shared/llama3/export.py new file mode 100644 index 00000000..0574d432 --- /dev/null +++ b/qai_hub_models/models/_shared/llama3/export.py @@ -0,0 +1,357 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- + +from __future__ import annotations + +import glob +import os +import tempfile +from pathlib import Path +from typing import Any, Dict, List, Mapping, Optional, Tuple, Type, cast + +import numpy as np +import qai_hub as hub + +from qai_hub_models.models._shared.llama3.model import Llama3Base_Quantized +from qai_hub_models.models._shared.llama3.split_onnx_utils import utils +from qai_hub_models.utils.args import get_input_spec_kwargs, get_model_kwargs +from qai_hub_models.utils.asset_loaders import zip_model +from qai_hub_models.utils.base_model import TargetRuntime +from qai_hub_models.utils.compare import torch_inference +from qai_hub_models.utils.printing import ( + print_inference_metrics, + print_profile_metrics_from_job, +) + + +def export_model( + model_cls: Type[Llama3Base_Quantized], + model_name: str, + components: List[str], + sub_components: Dict[str, List[str]], + num_layers_per_split: int, + device: str, + skip_profiling: bool = False, + skip_inferencing: bool = False, + skip_downloading: bool = False, + skip_summary: bool = False, + output_dir: Optional[str] = None, + target_runtime: TargetRuntime = TargetRuntime.QNN, + compile_options: str = "", + profile_options: str = "", + synchronous: bool = False, + **additional_model_kwargs, +) -> Mapping[ + str, Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] +] | List[str]: + """ + In this workflow, two instantiations of the Llama model are exported (AR-1, + AR-128). AR- refers to a model with input sequence length . + We produce two models: + AR-128: Used to process prompts. + AR-1: Used to process response. + Both instantiations have context length 4096 (with KV cache input of + 4096 minus ). + + This function accomplishes several tasks: + + 1. Performs the following steps for both AR-1 and AR-128: + a. Instantiates a PyTorch model and exports it to ONNX. + b. Converts source AIMET Pro encodings to be compatible with this ONNX model. + c. Splits the ONNX into multiple parts (due to runtime size limitation). + d. For each part: Compile the model to a QNN context binary. + 2. For each part (across both AR-1 and AR-128): + a. Link AR-1 part and AR-128 part together using link jobs. + 3. Profiles the model performance on real devices. + 4. Inferences the model on sample inputs (stringing together the parts). + 5. Downloads the model asset to the local directory. + 6. Summarizes the results from profiling and inference. + + Each of the last four steps can be optionally skipped using the input options. + + Parameters: + model_cls: Llama class. + model_name: Model name. + components: List of sub-components of the model that will be exported. + Each component is compiled and profiled separately. + Defaults to ALL_COMPONENTS if not specified. + sub_components: Dictionary of strings pointing to lists of strings, + where each sub-component will be grouped using weight sharing with + other sub-components to form a component. + num_layers_per_split: How many layers to include in each model part. + device: Device for which to export the model. + Full list of available devices can be found by running `hub.get_devices()`. + Defaults to DEFAULT_DEVICE if not specified. + skip_profiling: If set, skips profiling of compiled model on real devices. + skip_inferencing: If set, skips computing on-device outputs from sample data. + skip_downloading: If set, skips downloading of compiled model. + skip_summary: If set, skips waiting for and summarizing results + from profiling and inference. + output_dir: Directory to store generated assets (e.g. compiled model). + Defaults to `/build/`. + target_runtime: Which on-device runtime to target. Default is TFLite. + compile_options: Additional options to pass when submitting the compile job. + profile_options: Additional options to pass when submitting the profile job. + synchronous: Let each job finish before submitting the next. + **additional_model_kwargs: Additional optional kwargs used to customize + `model_cls.from_pretrained` + + Returns: + A Mapping from sub-component name to a 3-tuple of: + * A LinkJob object containing metadata about the link job submitted to hub. + * A ProfileJob containing metadata about the profile job (None if profiling skipped). + * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + """ + num_splits = len(components) + output_path = Path(output_dir or Path.cwd() / "build" / model_name) + hub_device = hub.Device(name=device) + + # Instantiation names and input sequence length + + # 1. Initialize PyTorch model + model_params = get_model_kwargs(model_cls, additional_model_kwargs) + + prompt_sequence_length = 128 + if "sequence_length" in model_params: + if isinstance(model_params["sequence_length"], int): + prompt_sequence_length = model_params["sequence_length"] + del model_params["sequence_length"] + + # If user specifies sequence length, it will define the prompt + # generator's sequence length only + instantiations = [ + ("prompt", prompt_sequence_length), + ("token", 1), + ] + + compile_jobs_to_link: Dict[str, List[hub.client.CompileJob]] = {} + compile_jobs: Dict[str, hub.client.CompileJob] = {} + link_jobs: Dict[str, hub.client.LinkJob] = {} + profile_options_per_instantiation: Dict[str, str] = {} + + sub_component_names = {} + component_from_sub_component_names = {} + + for instantiation_name, seq_len in instantiations: + full_name = f"{model_name}_{instantiation_name}" + model = model_cls.from_pretrained(sequence_length=seq_len, **model_params) + llm_config = model.llm_config + + sub_component_names[instantiation_name] = [] + + profile_options_per_instantiation[ + instantiation_name + ] = model.get_hub_profile_options(target_runtime, profile_options) + + input_spec = model.get_input_spec( + **{ + **get_input_spec_kwargs(model, additional_model_kwargs), + "input_seq_length": seq_len, + "num_hidden_layers": llm_config.num_hidden_layers, + "context_length": model.context_length, + "hidden_size": llm_config.hidden_size, + "num_attention_heads": llm_config.num_attention_heads, + "num_key_value_heads": llm_config.num_key_value_heads, + }, + ) + + # Export the full model to ONNX model + sub_output_path = output_path / instantiation_name + source_model = model.convert_to_hub_source_model( + target_runtime, + sub_output_path, + input_spec, + external_onnx_weights=True, + output_names=model.get_output_names(llm_config.num_hidden_layers), + ) + source_model_path = Path(source_model) + + input_onnx_path = glob.glob((source_model_path / "*.onnx").as_posix())[0] + input_encodings_path = glob.glob( + (source_model_path / "*.encodings").as_posix() + )[0] + + # Split encodings + model_artifact = Path(output_dir or Path.cwd()) / instantiation_name + os.makedirs(model_artifact, exist_ok=True) + + utils.split_onnx( + onnxfile=input_onnx_path, + modelname=full_name, + pickle_filedir=None, + num_splits=num_splits, + num_layers_per_split=num_layers_per_split, + output_dir=model_artifact, + split_embedding=True, + encoding_file=input_encodings_path, + using_qairt_workflow=True, + ) + + # Submit the parts for compilation + for i in range(num_splits): + sub_component_name = f"{instantiation_name}_{i + 1}_of_{num_splits}" + component_name = f"part_{i + 1}_of_{num_splits}" + sub_component_names[instantiation_name].append(sub_component_name) + full_name = f"{model_name}_{sub_component_name}" + aimet_path = Path(model_artifact) / (full_name + ".aimet") + + model_compile_options = ( + model.get_hub_compile_options(target_runtime, compile_options) + + f" --qnn_graph_name {sub_component_name}" + ) + + # TODO (#12708): Remove this zipping and let the client do it. + with tempfile.TemporaryDirectory() as tmpdir: + aimet_tmpdir = os.path.join(tmpdir, os.path.basename(aimet_path)) + os.makedirs(aimet_tmpdir) + zipped_model_path = zip_model(aimet_tmpdir, aimet_path) + submitted_compile_job = hub.submit_compile_job( + model=zipped_model_path, + device=hub_device, + name=full_name, + options=model_compile_options, + ) + if synchronous: + submitted_compile_job.wait() + if component_name not in compile_jobs_to_link: + compile_jobs_to_link[component_name] = [] + + compile_jobs_to_link[component_name].append( + cast(hub.client.CompileJob, submitted_compile_job) + ) + compile_jobs[sub_component_name] = cast( + hub.client.CompileJob, submitted_compile_job + ) + component_from_sub_component_names[sub_component_name] = component_name + + # 2. Link jobs + for component_name, cjobs in compile_jobs_to_link.items(): + models = [cjob.get_target_model() for cjob in cjobs] + + full_name = f"{model_name}_{component_name}" + link_job = hub.submit_link_job(models, name=full_name) + if synchronous: + link_job.wait() + link_jobs[component_name] = link_job + + # 3. Profile the model assets on real devices + profile_jobs: Dict[str, hub.client.ProfileJob] = {} + if not skip_profiling: + for instantiation_name, _ in instantiations: + for sub_component_name in sub_component_names[instantiation_name]: + component_name = component_from_sub_component_names[sub_component_name] + profile_options = ( + profile_options_per_instantiation[instantiation_name] + + f" --qnn_options context_enable_graphs={sub_component_name}" + ) + print( + f"Profiling model {instantiation_name} {sub_component_name} on a hosted device." + ) + full_name = f"{model_name}_{sub_component_name}" + submitted_profile_job = hub.submit_profile_job( + model=link_jobs[component_name].get_target_model(), + device=hub_device, + name=full_name, + options=profile_options, + ) + if synchronous: + submitted_profile_job.wait() + profile_jobs[sub_component_name] = cast( + hub.client.ProfileJob, submitted_profile_job + ) + + # 4. Run inference on-device with sample inputs + inference_jobs: Dict[str, hub.client.InferenceJob] = {} + final_device_output_data: Dict[str, Dict[str, np.ndarray]] = {} + final_ref_output_data: Dict[str, Dict[str, np.ndarray]] = {} + if not skip_inferencing: + for instantiation_name, seq_len in instantiations: + model = model_cls.from_pretrained(sequence_length=seq_len, **model_params) + full_model_sample_inputs = model.sample_inputs() + output_data = {} + for sub_component_name in sub_component_names[instantiation_name]: + component_name = component_from_sub_component_names[sub_component_name] + print( + f"Running inference for {sub_component_name} on a hosted device with example inputs." + ) + + compile_job = compile_jobs[sub_component_name] + target_shapes = compile_job.target_shapes + + # Source inputs from full inputs and previous part's outputs + sample_inputs = {} + for key in target_shapes: + if key in output_data: + sample_inputs[key] = output_data[key] + elif key in full_model_sample_inputs: + sample_inputs[key] = full_model_sample_inputs[key] + + # Load model with no-AIMET mode + inference_options = ( + profile_options_per_instantiation[instantiation_name] + + f" --qnn_options context_enable_graphs={sub_component_name}" + ) + # Load individual model part + full_name = f"{model_name}_{sub_component_name}" + submitted_inference_job = hub.submit_inference_job( + model=link_jobs[component_name].get_target_model(), + inputs=sample_inputs, + device=hub_device, + name=full_name, + options=inference_options, + ) + if synchronous: + submitted_inference_job.wait() + output_data = submitted_inference_job.download_output_data() + inference_jobs[sub_component_name] = cast( + hub.client.InferenceJob, submitted_inference_job + ) + + # Store the final output data + final_device_output_data[instantiation_name] = output_data + + if not skip_summary: + # Compute reference (PyTorch) output data + ref_output_data_list = torch_inference(model, full_model_sample_inputs) + final_ref_output_data[instantiation_name] = ref_output_data_list + + # 5. Download the model assets to a local file + if not skip_downloading: + os.makedirs(output_path, exist_ok=True) + for component_name, link_job in link_jobs.items(): + target_model: hub.Model = link_job.get_target_model() # type: ignore + target_model.download( + str(output_path / f"{model_name}_{component_name}.bin") + ) + + # 6. Summarize the results from profiling and inference + if not skip_summary and not skip_profiling: + for instantiation_name, _ in instantiations: + for sub_component_name in sub_component_names[instantiation_name]: + profile_job = profile_jobs[sub_component_name] + assert profile_job is not None and profile_job.wait().success + profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore + print_profile_metrics_from_job(profile_job, profile_data) + + if not skip_summary and not skip_inferencing: + for instantiation_name, _ in instantiations: + # Get ordered model output names + torch_out = final_ref_output_data[instantiation_name] + inference_result = final_device_output_data[instantiation_name] + print_inference_metrics( + None, + inference_result, + torch_out, + ) + + return { + sub_component_name: ( + link_jobs[component_name], + profile_jobs.get(sub_component_name), + inference_jobs.get(sub_component_name), + ) + for component_name in components + for sub_component_name in sub_components[component_name] + } diff --git a/qai_hub_models/models/_shared/llama3/model.py b/qai_hub_models/models/_shared/llama3/model.py new file mode 100644 index 00000000..4ba271b0 --- /dev/null +++ b/qai_hub_models/models/_shared/llama3/model.py @@ -0,0 +1,1001 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +from __future__ import annotations + +import json +import os +from abc import ABC, abstractmethod +from copy import deepcopy +from typing import List, Optional + +import numpy as np +import torch +from qai_hub.public_rest_api import DatasetEntries +from transformers.models.llama import modeling_llama + +from qai_hub_models.models._shared.llama.model import ( + Llama_QuantizedMixin, + RopeEmbedding, +) +from qai_hub_models.models.common import ( + SampleInputsType, + SourceModelFormat, + TargetRuntime, +) +from qai_hub_models.utils.aimet.encodings import map_encodings +from qai_hub_models.utils.huggingface import ( + ensure_has_required_transformer, + has_model_access, +) +from qai_hub_models.utils.input_spec import InputSpec +from qai_hub_models.utils.system_info import has_recommended_memory + +from .model_adaptations import ( + QcLlama_apply_rotary_pos_emb, + SHADynamicCacheNewValueOnly, + SHALlamaAttention, +) + +MIN_TRANFORMER_VERSION = "4.45.0" + +# isort: off + +# TODO: 10761 remove transformer version check once AIMET +# transformer restriction is uplifted. +ensure_has_required_transformer(MIN_TRANFORMER_VERSION) +from transformers import AutoConfig, AutoTokenizer # noqa: E402 + +MODEL_ID = __name__.split(".")[-2] +MODEL_ASSET_VERSION = 1 + +# Configs +AIMET_ENCODINGS_PREFIX = "config" +AIMET_CONFIG = "default_config_llama" + +DEFAULT_CONTEXT_LENGTH = 4096 + +DATA_DIR = "data" +USE_CACHED_DATA = True + +## Ref: https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1 +BEGIN_TEXT = "<|begin_of_text|>" +END_TEXT = "<|end_of_text|>" +START_HEADER = "<|start_header_id|>" +END_HEADER = "<|end_header_id|>" +SYSTEM_ID = "system" +ASSISTANT_ID = "assistant" +USER_ID = "user" +EOT_ID = "<|eot_id|>" +END_TOKENS = {"<|eot_id|>", "<|eot_id|>", "<|end_of_text|>"} + +DEFAULT_PROMPT_CONTEXT = "You are a helpful AI assistant" +DEFAULT_USER_PROMPT = "What do llamas eat? Keep the answer under ten words." + + +def get_input_prompt_with_tags( + previous_history: str = "", + system_context_prompt: str = DEFAULT_PROMPT_CONTEXT, + user_input_prompt: str = DEFAULT_USER_PROMPT, +): + """ + Get prompt to set context and initialize prompt-processor + """ + prompt = previous_history + prompt += "" if len(previous_history) == 0 else "" + + prompt = f"""{BEGIN_TEXT}{START_HEADER}{SYSTEM_ID}{END_HEADER} + +{system_context_prompt} +{START_HEADER}{USER_ID}{END_HEADER} + +{user_input_prompt}{EOT_ID}{START_HEADER}{ASSISTANT_ID}{END_HEADER} + + +""" + return prompt + + +def onnx_counting(i): + # Softmax, Softmax_1, Softmax_2, ... + if i == 0: + return "" + else: + return f"_{i}" + + +def get_tokenizer(hf_repo_name): + """ + Tokenizer to use for Llama3 + """ + tokenizer = AutoTokenizer.from_pretrained(hf_repo_name, is_fast=False) + tokenizer.padding_side = "left" + tokenizer.pad_token = tokenizer.eos_token + tokenizer.pad_token_id = tokenizer.eos_token_id + tokenizer.truncation_side = "left" + return tokenizer + + +def prepare_decoder_attention_mask( + attention_mask, input_shape, inputs_embeds, past_key_values_length, mask_neg=-50.0 +): + # Copied from transformers.models.bart.modeling_bart._make_causal_mask + def _make_causal_mask( + input_ids_shape: torch.Size, + dtype: torch.dtype, + device: torch.device, + past_key_values_length: int = 0, + mask_neg: float = -50.0, + ): + """ + Make causal mask used for bi-directional self-attention. + """ + bsz, tgt_len = input_ids_shape[0], input_ids_shape[1] + # mask = torch.full((tgt_len, tgt_len), torch.tensor(torch.finfo(dtype).min, device=device), device=device) + mask = torch.full( + (tgt_len, tgt_len), torch.tensor(mask_neg, device=device), device=device + ) + mask_cond = torch.arange(mask.size(-1), device=device) + mask.masked_fill_(mask_cond < (mask_cond + 1).view(mask.size(-1), 1), 0) + mask = mask.to(dtype) + + if past_key_values_length > 0: + mask = torch.cat( + [ + torch.zeros( + tgt_len, past_key_values_length, dtype=dtype, device=device + ), + mask, + ], + dim=-1, + ) + return mask[None, None, :, :].expand( + bsz, 1, tgt_len, tgt_len + past_key_values_length + ) + + # Copied from transformers.models.bart.modeling_bart._expand_mask + def _expand_mask( + mask: torch.Tensor, + dtype: torch.dtype, + mask_neg: float = -50.0, + tgt_len: int = None, + ): + """ + Expands attention_mask from `[bsz, seq_len]` to `[bsz, 1, tgt_seq_len, src_seq_len]`. + """ + bsz, src_len = mask.size() + tgt_len = tgt_len if tgt_len is not None else src_len + + expanded_mask = ( + mask[:, None, None, :].expand(bsz, 1, tgt_len, src_len).to(dtype) + ) + + inverted_mask = 1.0 - expanded_mask + + # return inverted_mask.masked_fill(inverted_mask.to(torch.bool), torch.finfo(dtype).min) + return inverted_mask.masked_fill(inverted_mask.to(torch.bool), mask_neg) + + # create causal mask + # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len] + combined_attention_mask = None + if input_shape[-1] > 1: + combined_attention_mask = _make_causal_mask( + input_shape, + inputs_embeds.dtype, + device=inputs_embeds.device, + past_key_values_length=past_key_values_length, + mask_neg=mask_neg, + ) + + if attention_mask is not None: + # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len] + + expanded_attn_mask = _expand_mask( + attention_mask, + inputs_embeds.dtype, + tgt_len=input_shape[1], + mask_neg=mask_neg, + ).to(inputs_embeds.device) + + combined_attention_mask = ( + expanded_attn_mask + if combined_attention_mask is None + else expanded_attn_mask + combined_attention_mask + ) + + return combined_attention_mask + + +def prepare_combined_attention_mask( + attention_mask, + input_shape, + past_key_values_length, + mask_neg=-50.0, + dtype=torch.float32, +): + dummy_embedding = torch.tensor((1.0,)).to(torch.float32) + new_mask = prepare_decoder_attention_mask( + attention_mask, input_shape, dummy_embedding, past_key_values_length, mask_neg + ) + return new_mask.clamp_min(mask_neg).to(dtype) + + +def get_past_keyval_with_shift( + past_key_vals: List[torch.Tensor], + new_key_vals: List[torch.Tensor], + length: int, +) -> List[torch.Tensor]: + """ + Clip past key value to feed next iteration + """ + ret = [] + # Key and Values are concatanated on batch dimension + for i in range(0, len(past_key_vals), 2): + n = new_key_vals[i].shape[3] + m = past_key_vals[i].shape[3] + remove = n + m - length + key_cache = torch.cat( + [past_key_vals[i][:, :, :, remove:], new_key_vals[i]], dim=3 + ) + val_cache = torch.cat( + [past_key_vals[i + 1][:, :, remove:], new_key_vals[i + 1]], dim=2 + ) + + ret.append(key_cache) + ret.append(val_cache) + return ret + + +def monkey_patch_huggingface_llama_modeling(): + modeling_llama.LLAMA_ATTENTION_CLASSES["eager"] = SHALlamaAttention + + def bypass_RotaryEmbedding(self, x, position_ids, *args, **kwargs): + return position_ids + + # Bypass rotary_emb module + modeling_llama.LlamaRotaryEmbedding.forward = bypass_RotaryEmbedding + modeling_llama.apply_rotary_pos_emb = QcLlama_apply_rotary_pos_emb + + def LlamaRMSNorm_forward(self, hidden_states): + # Raise to rank 4 + hidden_states = hidden_states.unsqueeze(0) + variance = hidden_states.pow(2).mean(-1, keepdim=True) + hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon) + return (hidden_states * self.weight).squeeze(0) + + modeling_llama.LlamaRMSNorm.forward = LlamaRMSNorm_forward + + +class Llama3Base_Quantized(Llama_QuantizedMixin, ABC): + def __init__( + self, + huggingface_model_name: str, + min_memory_recommended: int, + aimet_encodings: str, + sequence_length: int, + context_length: int, + load_pretrained: bool = True, + _make_small_for_debugging: bool = False, # construct a small and incorrect network + ): + """ + This is an abstract base class of all Llama 3 models. + + Parameters + ---------- + + huggingface_model_name: + Name of the HuggingFace model. Subclasses should provide a default + for this. + min_memory_recommended: + Minimum recommended memory in GB for running export. + aimet_encodings: + AIMET encodings file. + sequence_length: + Input sequence length (in tokens). + context_length: + Total context length (in tokens). + load_pretrained: + Load a pre-trained model as opposed to a randomly initialized. + """ + + # from transformers.models.llama import modeling_llama + self.huggingface_model_name = huggingface_model_name + + # Ensure User has access to model, + # otherwise point to instructions to get access and error out. + has_model_access(self.huggingface_model_name) + + # Ensure User has recommended memory, + # otherwise, provide warning to user and recommend to increase swap-space as a work-around. + has_recommended_memory(min_memory_recommended) + + self.llm_config = self._llm_config( + _make_small_for_debugging=_make_small_for_debugging + ) + + # TODO: Make this into a context manager + monkey_patch_huggingface_llama_modeling() + + if load_pretrained: + model = modeling_llama.LlamaForCausalLM.from_pretrained( + self.huggingface_model_name, + config=self.llm_config, + ignore_mismatched_sizes=_make_small_for_debugging, + ) + else: + model = modeling_llama.LlamaForCausalLM(self.llm_config) + model.eval() + + os.environ["TOKENIZERS_PARALLELISM"] = "0" + + for name, module in model.named_modules(): + if hasattr(module, "prepare_conv"): + module.prepare_conv() + if hasattr(module, "prepare_sha"): + module.prepare_sha() + + super().__init__(model, aimet_encodings) + + self.sequence_length = sequence_length + self.context_length = context_length + self.tokenizer = get_tokenizer(self.huggingface_model_name) + + def _llm_config(self, _make_small_for_debugging: bool = False): + """ + Construct and return a HuggingFace LLM config. + """ + llm_config = AutoConfig.from_pretrained( + self.huggingface_model_name, trust_remote_code=True + ) + if _make_small_for_debugging: + llm_config.num_hidden_layers = 8 + llm_config.num_attention_heads = 4 + llm_config.num_key_value_heads = 2 + llm_config.vocab_size = 13 + embed_dim = 8 + llm_config.head_dim = embed_dim * 2 + llm_config.hidden_size = llm_config.num_attention_heads * embed_dim * 2 + llm_config._attn_implementation = "eager" + llm_config._attn_implementation_internal = "eager" + + return llm_config + + @abstractmethod + def from_pretrained( + cls, + sequence_length: int, + context_length: int = DEFAULT_CONTEXT_LENGTH, + aimet_encodings: str | None = "DEFAULT", + ) -> "Llama3Base_Quantized": + pass + + @staticmethod + def get_output_names(num_hidden_layers: int): + output_names = ["logits"] + for layer in range(num_hidden_layers): + output_names.append(f"past_key_{layer}_out") + output_names.append(f"past_value_{layer}_out") + return output_names + + def forward( + self, + input_ids, + attention_mask, + position_ids_cos, + position_ids_sin, + *past_key_values, + ): + kv_cache = SHADynamicCacheNewValueOnly() + for layer_idx, (k, v) in enumerate( + zip(past_key_values[::2], past_key_values[1::2]) + ): + k_split = [k[i : i + 1] for i in range(self.llm_config.num_key_value_heads)] + v_split = [v[i : i + 1] for i in range(self.llm_config.num_key_value_heads)] + kv_cache.update(k_split, v_split, layer_idx, {}) + + out = self.model( + input_ids=input_ids, + attention_mask=attention_mask, + position_ids=[position_ids_cos, position_ids_sin], + past_key_values=kv_cache, + ) + + out_cache = out["past_key_values"] + flat_output_past_key_values = [] + for layer in range(len(out_cache)): + k = torch.cat(out_cache.key_cache[layer], dim=0) + v = torch.cat(out_cache.value_cache[layer], dim=0) + flat_output_past_key_values += [k, v] + + return [out["logits"]] + flat_output_past_key_values + + def get_qnn_graph_name(self) -> Optional[str]: + # Graph name of splits is determined by export script + return None + + @staticmethod + def get_input_spec( + num_hidden_layers: int, + input_seq_length: int, + context_length: int, + hidden_size: int, + num_key_value_heads: int, + num_attention_heads: int, + ) -> InputSpec: + embed_dim = hidden_size // num_attention_heads // 2 + input_spec = { + "input_ids": ((1, input_seq_length), "int32"), + "attention_mask": ( + (1, 1, input_seq_length, context_length), + "float32", + ), + # These are half the length of the hidden size per head because + # each cos/sin are applied to a half-sliced copy of the hidden size + # and then concatenated. + "position_ids_cos": ( + (1, 1, input_seq_length, embed_dim), + "float32", + ), + "position_ids_sin": ( + (1, 1, input_seq_length, embed_dim), + "float32", + ), + } + + # TODO: We could support input_seq_length == CONTEXT_LENGTH, but the + # KV cache input needs to be removed. + assert ( + input_seq_length < context_length + ), "It is currently not supported to set input sequence length to the same as or longer than context length. There should be no KV cache input at all in such case." + + for layer in range(num_hidden_layers): + past_k_name = f"past_key_{layer}_in" + input_spec[past_k_name] = ( + ( + num_key_value_heads, + 1, + embed_dim * 2, + context_length - input_seq_length, + ), + "float32", + ) + + past_v_name = f"past_value_{layer}_in" + input_spec[past_v_name] = ( + ( + num_key_value_heads, + 1, + context_length - input_seq_length, + embed_dim * 2, + ), + "float32", + ) + return input_spec + + def _use_zip_file(self) -> bool: + """ + Should the return of convert_to_hub_source_model be zipped. + """ + return False + + def preferred_hub_source_model_format( + self, target_runtime: TargetRuntime + ) -> SourceModelFormat: + """ + Source model format preferred for conversion on AI Hub. + """ + return SourceModelFormat.ONNX + + def get_calibration_data( + self, + target_runtime: TargetRuntime | None = None, + input_spec: InputSpec | None = None, + ) -> DatasetEntries | None: + # No calibration data needed + return None + + def _adapt_aimet_encodings( + self, src_encodings_path, dst_encodings_path, onnx_model_path + ): + """ + Adapt encodings from AIMET Pro to vanilla onnx export. + + Works for the new 3.0 and 3.1 encodings. + """ + import onnx + + with open(src_encodings_path) as f: + encodings = json.load(f) + + model = onnx.load(onnx_model_path) + + model_input_names = {} + for node in model.graph.node: + model_input_names[node.name] = node.input + + model_names = ( + set([o for x in model.graph.node for o in x.output]) + | set([x.name for x in model.graph.input]) + | set([x.name for x in model.graph.output]) + ) + model_param_names = set([x.name for x in model.graph.initializer]) + + uses_lists = isinstance(encodings["activation_encodings"], list) + if uses_lists: + # Convert encodings to dictionaries for faster look-ups + encodings["activation_encodings"] = { + v["name"]: v for v in encodings["activation_encodings"] + } + encodings["param_encodings"] = { + v["name"]: v for v in encodings["param_encodings"] + } + + enc_names = set(encodings["activation_encodings"].keys()) + enc_param_names = set(encodings["param_encodings"].keys()) + + new_encodings = { + "activation_encodings": {}, + "excluded_layers": [], + "param_encodings": {}, + "quantizer_args": encodings["quantizer_args"], + "version": encodings["version"], + } + + all_names = model_param_names | model_names + num_attention_heads = self.llm_config.num_attention_heads + num_key_value_heads = self.llm_config.num_key_value_heads + mapping, rev_mapping, known_unused = map_encodings( + [ + ( + r"/model_layers_(\d+)_input_layernorm_Mul_1/Mul_output_0", + "/model/model/layers.{0}/input_layernorm/Mul_1_output_0", + ), + ( + r"/model_layers_(\d+)_self_attn_q_proj_conv_Conv/Conv_output_0", + [ + f"/model/model/layers.{{0}}/self_attn/q_proj_sha.{i}/Conv_output_0" + for i in range(num_attention_heads) + ], + ), + ( + r"/model_layers_(\d+)_self_attn_Mul/Mul_output_0", + [ + f"/model/model/layers.{{0}}/self_attn/Mul{onnx_counting(i * 4)}_output_0" + for i in range(num_attention_heads) + ], + ), + ( + r"/model_layers_(\d+)_self_attn_Mul_2/Mul_output_0", + [ + f"/model/model/layers.{{0}}/self_attn/Mul{onnx_counting(2 + i * 4)}_output_0" + for i in range(num_attention_heads) + ], + ), + ( + r"/model_layers_(\d+)_self_attn_Mul_1/Mul_output_0", + [ + f"/model/model/layers.{{0}}/self_attn/Mul{onnx_counting(1 + i * 4)}_output_0" + for i in range(num_attention_heads) + ], + ), + ( + r"/model_layers_(\d+)_self_attn_Mul_3/Mul_output_0", + [ + f"/model/model/layers.{{0}}/self_attn/Mul{onnx_counting(3 + i * 4)}_output_0" + for i in range(num_attention_heads) + ], + ), + ( + r"/model_layers_(\d+)_self_attn_Sub/Sub_output_0", + [ + f"/model/model/layers.{{0}}/self_attn/Sub{onnx_counting(i)}_output_0" + for i in range(num_attention_heads) + ], + ), + ( + r"/model_layers_(\d+)_self_attn_Add/Add_output_0", + [ + f"/model/model/layers.{{0}}/self_attn/Add{onnx_counting(i)}_output_0" + for i in range(num_attention_heads) + ], + ), + ( + r"/model_layers_(\d+)_self_attn_k_proj_conv_Conv/Conv_output_0", + [ + f"/model/model/layers.{{0}}/self_attn/k_proj_sha.{i}/Conv_output_0" + for i in range(num_key_value_heads) + ], + ), + ( + r"/model_layers_(\d+)_self_attn_Mul_4/Mul_output_0", + [ + f"/model/model/layers.{{0}}/self_attn/Mul{onnx_counting(num_attention_heads * 4 + i * 4)}_output_0" + for i in range(num_attention_heads) + ], + ), + ( + r"/model_layers_(\d+)_self_attn_Mul_6/Mul_output_0", + [ + f"/model/model/layers.{{0}}/self_attn/Mul{onnx_counting(num_attention_heads * 4 + 2 + i * 4)}_output_0" + for i in range(num_attention_heads) + ], + ), + ( + r"/model_layers_(\d+)_self_attn_Mul_5/Mul_output_0", + [ + f"/model/model/layers.{{0}}/self_attn/Mul{onnx_counting(num_attention_heads * 4 + 1 + i * 4)}_output_0" + for i in range(num_attention_heads) + ], + ), + ( + r"/model_layers_(\d+)_self_attn_Mul_7/Mul_output_0", + [ + f"/model/model/layers.{{0}}/self_attn/Mul{onnx_counting(num_attention_heads * 4 + 3 + i * 4)}_output_0" + for i in range(num_attention_heads) + ], + ), + ( + r"/model_layers_(\d+)_self_attn_Sub_1/Sub_output_0", + [ + f"/model/model/layers.{{0}}/self_attn/Sub{onnx_counting(num_attention_heads + i)}_output_0" + for i in range(num_attention_heads) + ], + ), + ( + r"/model_layers_(\d+)_self_attn_Add_1/Add_output_0", + [ + f"/model/model/layers.{{0}}/self_attn/Add{onnx_counting(num_attention_heads + i)}_output_0" + for i in range(num_key_value_heads) + ], + ), + ( + r"/model_layers_(\d+)_self_attn_v_proj_conv_Conv/Conv_output_0", + [ + f"/model/model/layers.{{0}}/self_attn/v_proj_sha.{i}/Conv_output_0" + for i in range(num_key_value_heads) + ], + ), + ( + r"/model_layers_(\d+)_self_attn_MatMul/MatMul_output_0", + [ + f"/model/model/layers.{{0}}/self_attn/MatMul{onnx_counting(i)}_output_0" + for i in range(num_attention_heads) + ], + ), + ( + r"/model_layers_(\d+)_self_attn_Div/Div_output_0", + [ + f"/model/model/layers.{{0}}/self_attn/Div{onnx_counting(i)}_output_0" + for i in range(num_attention_heads) + ], + ), + ( + r"/model_layers_(\d+)_self_attn_Add_2/Add_output_0", + [ + f"/model/model/layers.{{0}}/self_attn/Add{onnx_counting(num_attention_heads + num_key_value_heads + i)}_output_0" + for i in range(num_attention_heads) + ], + ), + ( + r"/model_layers_(\d+)_self_attn_Softmax/Softmax_output_0", + [ + f"/model/model/layers.{{0}}/self_attn/Softmax{onnx_counting(i)}_output_0" + for i in range(num_attention_heads) + ], + ), + ( + r"/model_layers_(\d+)_self_attn_MatMul_1/MatMul_output_0", + [ + f"/model/model/layers.{{0}}/self_attn/MatMul{onnx_counting(num_attention_heads + i)}_output_0" + for i in range(num_attention_heads) + ], + ), + ( + r"/model_layers_(\d+)_self_attn_o_proj_conv_Conv/Conv_output_0", + "/model/model/layers.{0}/self_attn/o_proj_conv/Conv_output_0", + ), + ( + r"/model_layers_(\d+)_Add/Add_output_0", + "/model/model/layers.{0}/Add_output_0", + ), + ( + r"/model_layers_(\d+)_post_attention_layernorm_Mul_1/Mul_output_0", + "/model/model/layers.{0}/post_attention_layernorm/Mul_1_output_0", + ), + ( + r"/model_layers_(\d+)_mlp_gate_proj_conv_Conv/Conv_output_0", + "/model/model/layers.{0}/mlp/gate_proj/MatMul_output_0", + ), + ( + r"/model_layers_(\d+)_mlp_act_fn_Sigmoid/Sigmoid_output_0", + "/model/model/layers.{0}/mlp/act_fn/Sigmoid_output_0", + ), + ( + r"/model_layers_(\d+)_mlp_act_fn_Mul/Mul_output_0", + "/model/model/layers.{0}/mlp/act_fn/Mul_output_0", + ), + ( + r"/model_layers_(\d+)_mlp_up_proj_conv_Conv/Conv_output_0", + "/model/model/layers.{0}/mlp/up_proj/MatMul_output_0", + ), + ( + r"/model_layers_(\d+)_mlp_Mul/Mul_output_0", + "/model/model/layers.{0}/mlp/Mul_output_0", + ), + ( + r"/model_layers_(\d+)_mlp_down_proj_conv_Conv/Conv_output_0", + "/model/model/layers.{0}/mlp/down_proj/MatMul_output_0", + ), + ( + r"/model_layers_(\d+)_Add_1/Add_output_0", + "/model/model/layers.{0}/Add_1_output_0", + ), + ("/model_norm_Mul_1/Mul_output_0", "/model/model/norm/Mul_1_output_0"), + ("/lm_head_conv_Conv/Conv_output_0", "/model/lm_head/MatMul_output_0"), + (r"(.*)", "{0}"), + ], + enc_names, + all_names, + src_encodings=encodings["activation_encodings"], + dst_encodings=new_encodings["activation_encodings"], + ) + + def split_weights( + src_encodings, + dst_encodings, + src_name, + dst_name, + dst_pattern_index, + num_patterns, + groups, + ): + if src_name in src_encodings: + src_entry = src_encodings[src_name] + dst_entry = deepcopy(src_entry) + # Slice it! + if isinstance(dst_entry, dict): + dst_entry["name"] = dst_name + for key in ["scale", "offset", "per_block_int_scale"]: + n = len(dst_entry[key]) // num_patterns + dst_entry[key] = dst_entry[key][ + dst_pattern_index * n : (dst_pattern_index + 1) * n + ] + + # dst_encodings.append(dst_entry) + dst_encodings[dst_name] = dst_entry + else: + n = len(dst_entry) // num_patterns + dst_entry = dst_entry[ + dst_pattern_index * n : (dst_pattern_index + 1) * n + ] + dst_encodings[dst_name] = dst_entry + + # These parameters are stored as activations + param_mapping, rev_param_mapping, param_known_unused = map_encodings( + [ + ( + r"model_layers_(\d+)_(input|post_attention)_layernorm_weight", + "model.model.layers.{0}.{1}_layernorm.weight", + ), + (r"model_norm_weight", "model.model.norm.weight"), + ], + enc_names, + all_names, + src_encodings=encodings["activation_encodings"], + dst_encodings=new_encodings["param_encodings"], + ) + + # Process weight mappings + param_mapping, rev_param_mapping, param_known_unused = map_encodings( + [ + ("model_embed_tokens_Gather.weight", "model.model.embed_tokens.weight"), + ( + r"model_layers_(\d+)_self_attn_(k|v)_proj_conv_Conv.weight", + ( + ( + [ + f"model.model.layers.{{0}}.self_attn.{{1}}_proj_sha.{i}.weight" + for i in range(num_key_value_heads) + ] + ), + split_weights, + ), + ), + ( + r"model_layers_(\d+)_self_attn_q_proj_conv_Conv.weight", + ( + ( + [ + f"model.model.layers.{{0}}.self_attn.q_proj_sha.{i}.weight" + for i in range(num_attention_heads) + ] + ), + split_weights, + ), + ), + ( + r"model_layers_(\d+)_self_attn_o_proj_conv_Conv.weight", + "model.model.layers.{0}.self_attn.o_proj_conv.weight", + ), + ( + r"model_layers_(\d+)_mlp_(gate|up|down)_proj_conv_Conv.weight", + ("/model/model/layers.{0}/mlp/{1}_proj/MatMul", 1), + ), + (r"lm_head_conv_Conv.weight", ("/model/lm_head/MatMul", 1)), + ], + enc_param_names, + all_names, + model_input_names, + src_encodings=encodings["param_encodings"], + dst_encodings=new_encodings["param_encodings"], + ) + + # This is needed for subtle reasons. + # Gather ops require weights and output range to be the same, so that + # it can be implemented as a memory look-up. Therefore, AIMET does not + # store the output activation. However, since we may split the model + # right after this op, it could lead the input to the second part + # without activation encodings. + embed_a_name = "/model/model/embed_tokens/Gather_output_0" + embed_w_name = "model.model.embed_tokens.weight" + new_encodings["activation_encodings"][embed_a_name] = new_encodings[ + "param_encodings" + ][embed_w_name] + if uses_lists: + new_encodings["activation_encodings"][embed_a_name]["name"] = embed_a_name + + # Fill in "zero" encodings for RMSNorm internals. If these are not + # collapsed before runtime, it should result in catastophic numerical + # results (which is good, since it is better to catch this bug instead + # of getting a slightly worse model, which can be hard to detect). + zero_keys = [] + for layer in range(self.llm_config.num_hidden_layers): + for sec in ["input", "post_attention"]: + zero_keys += [ + f"/model/model/layers.{layer}/{sec}_layernorm/Pow_output_0", + f"/model/model/layers.{layer}/{sec}_layernorm/ReduceMean_output_0", + f"/model/model/layers.{layer}/{sec}_layernorm/Add_output_0", + f"/model/model/layers.{layer}/{sec}_layernorm/Sqrt_output_0", + f"/model/model/layers.{layer}/{sec}_layernorm/Div_output_0", + f"/model/model/layers.{layer}/{sec}_layernorm/Mul_output_0", + ] + + zero_keys += [ + "/model/model/norm/Pow_output_0", + "/model/model/norm/ReduceMean_output_0", + "/model/model/norm/Add_output_0", + "/model/model/norm/Sqrt_output_0", + "/model/model/norm/Div_output_0", + "/model/model/norm/Mul_output_0", + ] + + for key in zero_keys: + if uses_lists: + # aimet format 1.0 + zero_entry = { + "bw": 16, + "dtype": "INT", + "enc_type": "PER_TENSOR", + "is_sym": False, + "name": key, + "offset": [0], + "scale": [1e-20], + } + else: + # aimet format 0.x + zero_entry = [ + { + "bitwidth": 16, + "dtype": "int", + "is_symmetric": "False", + "max": 0.0, + "min": 0.0, + "offset": 0, + "scale": 1e-20, + } + ] + new_encodings["activation_encodings"][key] = zero_entry + + changes = True + while changes: + changes = False + for node in model.graph.node: + if node.output[0] in new_encodings["activation_encodings"]: + continue + + if node.op_type in { + "Concat", + "Split", + "Transpose", + "Cast", + "Reshape", + "Slice", + }: + if node.input[0] in new_encodings["activation_encodings"]: + for output_name in node.output: + dst_entry = deepcopy( + new_encodings["activation_encodings"][node.input[0]] + ) + if isinstance(dst_entry, dict): + dst_entry["name"] = output_name + new_encodings["activation_encodings"][ + output_name + ] = dst_entry + enc_names.add(output_name) + changes = True + + if uses_lists: + # convert back + new_encodings["activation_encodings"] = list( + new_encodings["activation_encodings"].values() + ) + new_encodings["param_encodings"] = list( + new_encodings["param_encodings"].values() + ) + + with open(dst_encodings_path, "w") as write_file: + json.dump(new_encodings, write_file, indent=4, sort_keys=True) + + def _sample_inputs_impl( + self, input_spec: InputSpec | None = None + ) -> SampleInputsType: + if not input_spec: + input_specs = self.get_input_spec( + input_seq_length=self.sequence_length, + num_hidden_layers=self.llm_config.num_hidden_layers, + context_length=self.context_length, + hidden_size=self.llm_config.hidden_size, + num_attention_heads=self.llm_config.num_attention_heads, + num_key_value_heads=self.llm_config.num_key_value_heads, + ) + input_prompt = DEFAULT_USER_PROMPT + input_prompt_processed = get_input_prompt_with_tags( + user_input_prompt=input_prompt + ) + input_tokens = self.tokenizer( + input_prompt_processed, + return_tensors="pt", + padding="max_length", + max_length=self.context_length, + ) + num_tokens = min( + torch.sum(input_tokens["attention_mask"]).item(), self.sequence_length + ) + input_ids = input_tokens["input_ids"].type(torch.int32)[ + :, -self.sequence_length : + ] + + padding_size = self.sequence_length - num_tokens + position_ids = [0] * (padding_size) + list( + range(0, self.sequence_length - padding_size) + ) + position_ids = ( + torch.Tensor(position_ids).type(torch.long).reshape(1, self.sequence_length) + ) + position_ids = ( + torch.Tensor(position_ids).type(torch.long).reshape(1, self.sequence_length) + ) + rope_embedding = RopeEmbedding(max_length=self.context_length) + position_ids_cos, position_ids_sin = rope_embedding.get_embedding(position_ids) + attention_mask = torch.zeros((1, self.context_length)) + attention_mask[:, -num_tokens:] = 1.0 + cm_attention_masks = prepare_combined_attention_mask( + attention_mask=attention_mask, + input_shape=(1, self.sequence_length), + past_key_values_length=self.context_length - self.sequence_length, + ) + + input_dict = { + "input_ids": [input_ids.detach().numpy()], + "attention_mask": [cm_attention_masks.detach().numpy()], + "position_ids_cos": [position_ids_cos.detach().numpy()], + "position_ids_sin": [position_ids_sin.detach().numpy()], + } + + # Populate the rest with zeros (KV cache input) + for k, (shape, _) in input_specs.items(): + if k.startswith("past_"): + input_dict[k] = [np.zeros(shape, dtype=np.float32)] + + return input_dict diff --git a/qai_hub_models/models/_shared/llama3/model_adaptations.py b/qai_hub_models/models/_shared/llama3/model_adaptations.py new file mode 100644 index 00000000..113508f8 --- /dev/null +++ b/qai_hub_models/models/_shared/llama3/model_adaptations.py @@ -0,0 +1,289 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +import math +from typing import Any, Dict, List, Optional, Tuple + +import torch +from torch import nn +from transformers.cache_utils import DynamicCache +from transformers.models.llama.modeling_llama import LlamaAttention + + +# Copied from transformers.models.llama.modeling_llama.repeat_kv +def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor: + """ + This is the equivalent of torch.repeat_interleave(x, dim=1, repeats=n_rep). The hidden states go from (batch, + num_key_value_heads, seqlen, head_dim) to (batch, num_attention_heads, seqlen, head_dim) + """ + if isinstance(hidden_states, list): + return [head for head in hidden_states for _ in range(n_rep)] + + batch, num_key_value_heads, slen, head_dim = hidden_states.shape + if n_rep == 1: + return hidden_states + hidden_states = hidden_states[:, :, None, :, :].expand( + batch, num_key_value_heads, n_rep, slen, head_dim + ) + return hidden_states.reshape(batch, num_key_value_heads * n_rep, slen, head_dim) + + +def _apply_rope_single(x, rope_vals: Tuple[torch.Tensor, torch.Tensor]): + """ + Based on FacebookResearch's llama, provided by Carl + """ + rope_real = rope_vals[0] # shape should be 1, 1, seqlen, head_dim/2 + rope_im = rope_vals[1] # shape should be 1, 1, seqlen, head_dim/2 + + # TODO: Why HF uses different coordinates from the paper + x_real = x[:, :, :, : x.shape[-1] // 2] # extract first half elements + x_im = x[:, :, :, x.shape[-1] // 2 :] # extract second half elements + + x_prod_real = x_real * rope_real - x_im * rope_im + x_prod_im = x_real * rope_im + x_im * rope_real + + # TODO: HF need to uses different interleaving + x = torch.cat((x_prod_real, x_prod_im), dim=3).view(*x.shape) + return x + + +def QcLlama_apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_dim=1): + query_states = _apply_rope_single(q, [cos, sin]) + key_states = _apply_rope_single(k, [cos, sin]) + return query_states, key_states + + +class SHADynamicCacheNewValueOnly(DynamicCache): + """ + Version of DynamicCache that stores the cache as lists for the separate + heads (so as to avoid concats/splits for SHA) and returning only the + new values without accumulation. + """ + + def update( + self, + key_states: List[torch.Tensor], + value_states: List[torch.Tensor], + layer_idx: int, + cache_kwargs: Optional[Dict[str, Any]] = None, + ) -> Tuple[torch.Tensor, torch.Tensor]: + # Update the number of seen tokens + if layer_idx == 0: + # self._seen_tokens += key_states.shape[-2] + # This line is updated + self._seen_tokens += key_states[0].shape[-2] + + # Update the cache + if len(self.key_cache) <= layer_idx: + self.key_cache.append(key_states) + self.value_cache.append(value_states) + else: + # Do not concatenate the cache, we only need the latest entry + self.key_cache[layer_idx] = key_states + self.value_cache[layer_idx] = value_states + + return self.key_cache[layer_idx], self.value_cache[layer_idx] + + def get_seq_length(self, layer_idx: Optional[int] = 0) -> int: + """Returns the sequence length of the cached states. A layer index can be optionally passed.""" + if len(self.key_cache) <= layer_idx: + return 0 + # [0] added to get shape since the outermost is list + return self.key_cache[layer_idx][0].shape[-2] + + +class SHALlamaAttention(LlamaAttention): + """ + Split-Head Attention version of LlamaAttention (with Convs) + """ + + def prepare_conv(self): + if not hasattr(self, "forward_no_conv"): + self.q_proj_conv = nn.Conv2d( + self.hidden_size, self.num_heads * self.head_dim, 1, bias=False + ) + self.k_proj_conv = nn.Conv2d( + self.hidden_size, + self.num_key_value_heads * self.head_dim, + 1, + bias=False, + ) + self.v_proj_conv = nn.Conv2d( + self.hidden_size, + self.num_key_value_heads * self.head_dim, + 1, + bias=False, + ) + self.o_proj_conv = nn.Conv2d( + self.num_heads * self.head_dim, self.hidden_size, 1, bias=False + ) + + self.q_proj_conv.weight.data.copy_(self.q_proj.weight[:, :, None, None]) + self.k_proj_conv.weight.data.copy_(self.k_proj.weight[:, :, None, None]) + self.v_proj_conv.weight.data.copy_(self.v_proj.weight[:, :, None, None]) + self.o_proj_conv.weight.data.copy_(self.o_proj.weight[:, :, None, None]) + + del self.q_proj + del self.k_proj + del self.v_proj + del self.o_proj + + def prepare_sha(self): + if not hasattr(self, "forward_mha"): + self.q_proj_sha = nn.ModuleList( + [ + nn.Conv2d(self.hidden_size, self.head_dim, 1, bias=False) + for _ in range(self.num_heads) + ] + ) + self.k_proj_sha = nn.ModuleList( + [ + nn.Conv2d(self.hidden_size, self.head_dim, 1, bias=False) + for _ in range(self.num_key_value_heads) + ] + ) + self.v_proj_sha = nn.ModuleList( + [ + nn.Conv2d(self.hidden_size, self.head_dim, 1, bias=False) + for _ in range(self.num_key_value_heads) + ] + ) + if not hasattr(self, "o_proj_conv"): + self.o_proj_conv = nn.Conv2d( + self.num_heads * self.head_dim, self.hidden_size, 1, bias=False + ) + self.o_proj_conv.weight.data.copy_(self.o_proj.weight[:, :, None, None]) + del self.o_proj + + self.forward_mha = self.forward + self.forward = self.forward_sha + + for i in range(self.num_heads): + self.q_proj_sha[i].weight.data.copy_( + self.q_proj_conv.weight[i * self.head_dim : (i + 1) * self.head_dim, :] + ) + + for i in range(self.num_key_value_heads): + self.k_proj_sha[i].weight.data.copy_( + self.k_proj_conv.weight[i * self.head_dim : (i + 1) * self.head_dim, :] + ) + self.v_proj_sha[i].weight.data.copy_( + self.v_proj_conv.weight[i * self.head_dim : (i + 1) * self.head_dim, :] + ) + + del self.q_proj_conv + del self.k_proj_conv + del self.v_proj_conv + + def forward_sha( + self, + hidden_states: torch.Tensor, + attention_mask: Optional[torch.Tensor] = None, + position_ids: Optional[torch.LongTensor] = None, + past_key_value: Optional[Tuple[torch.Tensor]] = None, + output_attentions: bool = False, + use_cache: bool = False, + cache_position: Optional[torch.LongTensor] = None, + position_embeddings: Optional[ + Tuple[torch.Tensor, torch.Tensor] + ] = None, # will become mandatory in v4.45 + ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]: + + bsz, q_len, _ = hidden_states.size() + + hidden_states = torch.reshape(hidden_states, (bsz, -1, 1, self.hidden_size)) + hidden_states = hidden_states.transpose(1, 3) + + query_states = [ + q_proj(hidden_states).permute(0, 2, 3, 1) for q_proj in self.q_proj_sha + ] + key_states = [ + k_proj(hidden_states).permute(0, 2, 3, 1) for k_proj in self.k_proj_sha + ] + value_states = [ + v_proj(hidden_states).permute(0, 2, 3, 1) for v_proj in self.v_proj_sha + ] + + kv_seq_len = value_states[0].shape[-2] + if past_key_value is not None: + kv_seq_len += past_key_value.value_cache[self.layer_idx][0].shape[-2] + + assert position_embeddings is not None + query_states = [ + _apply_rope_single(q, position_embeddings) for q in query_states + ] + key_states = [_apply_rope_single(k, position_embeddings) for k in key_states] + + if position_embeddings is None: + cos, sin = self.rotary_emb(value_states, position_ids) + else: + cos, sin = position_embeddings + + if past_key_value is not None: + # reuse k, v, self_attention + past_key = past_key_value.key_cache[self.layer_idx] + past_value = past_key_value.value_cache[self.layer_idx] + + cache_kwargs = {"sin": sin, "cos": cos, "cache_position": cache_position} + transposed_key_states = [ + key_state.transpose(2, 3) for key_state in key_states + ] + past_key_value.update( + transposed_key_states, value_states, self.layer_idx, cache_kwargs + ) + + # Now concate the key/value states + key_states = [ + torch.cat([pk, k.transpose(2, 3)], dim=3) + for pk, k in zip(past_key, key_states) + ] + value_states = [ + torch.cat([pv, v], dim=2) for pv, v in zip(past_value, value_states) + ] + + key_states = repeat_kv(key_states, self.num_key_value_groups) + value_states = repeat_kv(value_states, self.num_key_value_groups) + + attn_weights = [ + torch.matmul(q, k) / math.sqrt(self.head_dim) + for q, k in zip(query_states, key_states) + ] + if attn_weights[0].size() != (bsz, 1, q_len, kv_seq_len): + raise ValueError( + f"Attention weights should be of size {(bsz, 1, q_len, kv_seq_len)}, but is" + f" {attn_weights[0].size()}" + ) + + if attention_mask is not None: + if attention_mask.size() != (bsz, 1, q_len, kv_seq_len): + raise ValueError( + f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.size()}" + ) + attn_weights = [aw + attention_mask for aw in attn_weights] + + # upcast attention to fp32 + attn_weights = [ + nn.functional.softmax(aw, dim=-1, dtype=torch.float32).to( + query_states[0].dtype + ) + for aw in attn_weights + ] + attn_output = [torch.matmul(aw, v) for aw, v in zip(attn_weights, value_states)] + + if attn_output[0].size() != (bsz, 1, q_len, self.head_dim): + raise ValueError( + f"`attn_output` should be of size {(bsz, 1, q_len, self.head_dim)}, but is" + f" {attn_output[0].size()}" + ) + + attn_output = torch.cat(attn_output, dim=3) + attn_output = attn_output.permute(0, 3, 1, 2) + attn_output = self.o_proj_conv(attn_output) + attn_output = attn_output.transpose(1, 3) + attn_output = attn_output.reshape(bsz, q_len, self.hidden_size) + + if not output_attentions: + attn_weights = None + + return attn_output, attn_weights, past_key_value diff --git a/qai_hub_models/models/_shared/llama3/split_onnx_utils/__init__.py b/qai_hub_models/models/_shared/llama3/split_onnx_utils/__init__.py new file mode 100644 index 00000000..21a22b31 --- /dev/null +++ b/qai_hub_models/models/_shared/llama3/split_onnx_utils/__init__.py @@ -0,0 +1,4 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- diff --git a/qai_hub_models/models/_shared/llama3/split_onnx_utils/split_onnx.py b/qai_hub_models/models/_shared/llama3/split_onnx_utils/split_onnx.py new file mode 100644 index 00000000..45607848 --- /dev/null +++ b/qai_hub_models/models/_shared/llama3/split_onnx_utils/split_onnx.py @@ -0,0 +1,230 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +# Implementation of a class that splits a larger onnx graph into smaller subgraphs + +import collections +import os + +import onnx +from onnx.external_data_helper import uses_external_data + + +class OnnxSplitter: + def __init__(self, onnxmodel, verbose=False): + self.model = onnxmodel + self.verbose = verbose + self.graph_inputs = {i.name for i in self.model.graph.input} + self.graph_outputs = {i.name for i in self.model.graph.output} + # nodeid:Onnx Node + self.node = {id(node): node for node in self.model.graph.node} + # tensorname: nodeid + self.producer = { + output: id(node) for node in self.model.graph.node for output in node.output + } + + def partition_subgraph( + self, + name, # name of the ONNX graph + output_tensors, # list of new output tensors to include + additional_input_tensors=None, + ): + """ + Partition a graph with input and output tensors + - Captures all nodes that required to compute the given output_tensors + """ + + def upstream(nodeid): + return [ + self.producer[i] + for i in self.node[nodeid].input + if i not in leaf_tensors + ] + + # Check prerequisite + value_info = {i.name: i for i in self.model.graph.value_info} + assert all( + [ + (name in value_info) or (name in self.graph_outputs) + for name in output_tensors + ] + ), "ValueInfoProto of output_tensors should be given" + + # prepare the 'leaf' tensors, which can be model input or parameter tensors + leaf_tensors = set(self.graph_inputs) + leaf_tensors.update({i.name for i in self.model.graph.initializer}) + if additional_input_tensors is not None: + leaf_tensors.update(additional_input_tensors) + self.graph_inputs.update(additional_input_tensors) + + visited_output_tensors, visited_input_tensors = set(output_tensors), set() + + # Traverse from output_tensors to input or 'leaf' nodes + q = collections.deque([self.producer[i] for i in output_tensors]) + visited = set() + while q: + nodeid = q.popleft() + if nodeid in visited: + continue + visited.add(nodeid) + visited_output_tensors.update( + [i for i in self.node[nodeid].output if i in self.graph_outputs] + ) + visited_input_tensors.update( + [i for i in self.node[nodeid].input if i in self.graph_inputs] + ) + for producerid in upstream(nodeid): + if producerid not in visited: + q.append(producerid) + + use = set() + for nodeid in visited: + use.update(self.node[nodeid].input) + use.update(self.node[nodeid].output) + + # Include in-use items and preserve the original order + new_node = [i for i in self.model.graph.node if id(i) in visited] + new_initializer = [i for i in self.model.graph.initializer if i.name in use] + new_value_info = [i for i in self.model.graph.value_info if i.name in use] + new_sparse_initializer = [ + i for i in self.model.graph.sparse_initializer if i.name in use + ] + + value_info_dict = {i.name: i for i in new_value_info} + value_info_dict.update({i.name: i for i in self.model.graph.output}) + if additional_input_tensors is not None: + new_inputs = [ + value_info_dict[i] + for i in additional_input_tensors + if i in value_info_dict and i in use + ] + else: + new_inputs = [] + new_inputs += [i for i in self.model.graph.input if i.name in use] + + new_outputs = [value_info_dict[i] for i in output_tensors] + new_outputs += [ + value_info_dict[i.name] + for i in self.model.graph.output + if i.name in visited_output_tensors and i.name not in output_tensors + ] + + if self.verbose: + print("new_inputs", [i.name for i in new_inputs]) + if self.verbose: + print("new_outputs", [i.name for i in new_outputs]) + new_graph = onnx.helper.make_graph( + nodes=new_node, + name=name, + inputs=new_inputs, + outputs=new_outputs, + initializer=new_initializer, + value_info=new_value_info, + sparse_initializer=new_sparse_initializer, + ) + return new_graph + + def split(self, list_of_intermediate_output_tensors): + count = 0 + additional_input_tensors, covered_output_tensors = [], set() + for i, output_tensors in enumerate(list_of_intermediate_output_tensors): + count += 1 + graphname = f"{self.model.graph.name}_split{count}" + if self.verbose: + print(f"Partitoin new graph: {graphname} for outputs[{output_tensors}]") + subgraph = self.partition_subgraph( + graphname, output_tensors, additional_input_tensors + ) + additional_input_tensors += [ + i for i in output_tensors if i not in self.graph_outputs + ] + covered_output_tensors.update([i.name for i in subgraph.output]) + yield subgraph + + graphname = f"{self.model.graph.name}_split{count+1}" + last_output_tensors = [ + i.name + for i in self.model.graph.output + if i.name not in covered_output_tensors + ] + lastgraph = self.partition_subgraph( + graphname, last_output_tensors, additional_input_tensors + ) + yield lastgraph + + @classmethod + def get_all_tensors(cls, graph): + yield from graph.initializer + for node in graph.node: + for attribute in node.attribute: + if attribute.type == onnx.AttributeProto.GRAPH: + yield from cls.get_all_tensors(attribute.g) + if attribute.type == onnx.AttributeProto.GRAPHS: + for graph in attribute.graphs: + yield from cls.get_all_tensors(graph) + if attribute.HasField("t"): + yield attribute.t + yield from attribute.tensors + + @classmethod + def is_using_external_data(cls, onnxmodel): + for tensor in cls.get_all_tensors(onnxmodel.graph): + if uses_external_data(tensor): + return True + return False + + +def save_model(model, newonnxfile, using_external_data=False): + kwargs = {} + if using_external_data or model.ByteSize() > onnx.checker.MAXIMUM_PROTOBUF: + dirname = os.path.dirname(newonnxfile) + location = os.path.basename(newonnxfile).replace(".onnx", ".data") + kwargs["save_as_external_data"] = True + kwargs["all_tensors_to_one_file"] = True + kwargs["location"] = location + if os.path.exists(os.path.join(dirname, kwargs["location"])): + os.unlink(os.path.join(dirname, kwargs["location"])) + + onnx.save(model, newonnxfile, **kwargs) + + +def split_onnx_by_names( + onnxfile, list_of_output_tensors, output_dir=".", verbose=False +): + if verbose: + print(f"Loading {onnxfile}") + onnxmodel = onnx.load(onnxfile, load_external_data=False) + splitter = OnnxSplitter(onnxmodel, verbose=verbose) + using_external_data = OnnxSplitter.is_using_external_data(onnxmodel) + + list_of_output_tensors = [i.split(",") for i in list_of_output_tensors] + num_splits = len(list_of_output_tensors) + 1 + + # 1. split model + new_model_info = [] + for i, subgraph in enumerate(splitter.split(list_of_output_tensors)): + new_basename = f"{os.path.basename(onnxfile)}_{i+1}_of_{num_splits}" + input_tensors = [i.name for i in subgraph.input] + new_model_info.append([new_basename, input_tensors]) + + submodel = onnx.helper.make_model( + subgraph, opset_imports=onnxmodel.opset_import + ) + if ( + not using_external_data + and submodel.ByteSize() < onnx.checker.MAXIMUM_PROTOBUF + ): + onnx.checker.check_model(submodel) + + if using_external_data: + if verbose: + print(f"Loading external data from {os.path.dirname(onnxfile)}") + onnx.load_external_data_for_model( + submodel, base_dir=os.path.dirname(onnxfile) + ) + + newonnxfile = f"{output_dir}/{new_basename}.onnx" + if verbose: + print(f"Saving {newonnxfile}") + save_model(submodel, newonnxfile, using_external_data) diff --git a/qai_hub_models/models/_shared/llama3/split_onnx_utils/utils.py b/qai_hub_models/models/_shared/llama3/split_onnx_utils/utils.py new file mode 100644 index 00000000..13e357a9 --- /dev/null +++ b/qai_hub_models/models/_shared/llama3/split_onnx_utils/utils.py @@ -0,0 +1,420 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +import collections +import json +import os +import re +import shutil +from copy import deepcopy +from pathlib import Path +from typing import Optional + +import numpy as np +import onnx +from onnx.numpy_helper import from_array, to_array + +from .split_onnx import OnnxSplitter, save_model + + +def _target_name(name, deco_digit=True, using_qairt_workflow=False): + name = f"_{name}" if deco_digit and name.isdigit() else name + # name = name.replace('.', '_') + if not using_qairt_workflow: + name = name.replace("/", "-") + return name + + +def get_onnx_input_output_names( + onnxfile, onnxmodel=None, deco_digit=True, using_qairt_workflow=False +): + onnxmodel = _load_model(onnxfile) if onnxmodel is None else onnxmodel + input_names = [ + _target_name( + i.name, deco_digit=deco_digit, using_qairt_workflow=using_qairt_workflow + ) + for i in onnxmodel.graph.input + ] + output_names = [ + _target_name( + i.name, deco_digit=deco_digit, using_qairt_workflow=using_qairt_workflow + ) + for i in onnxmodel.graph.output + ] + return input_names, output_names + + +def get_split_tensors(onnxfile, onnxmodel=None, include_first_input=True): + """ + Model topology + │ ←───────── layers[0] ────────────→ │ │ ←───────── layers[-1] ─────────────→ │ + │ │ │ │ + embed ────┬──────────── add0 ─┬─────────── add1 ── ┄┄┄ ─┬─────────────── add ─┬───────────── add ─── lmhead + ↑ └─ norm ─ attn ─┘ └─ norm ─ ffn ─┘ ↑ ↑ └─ norm ─ attn ─┘ └─ norm ─ ffn ─┘ ↑ + │ │ │ │ + │ │ │ │ + valid splitting points + """ + + def get_nodes(): + model = _load_model(onnxfile) if onnxmodel is None else onnxmodel + nodes = {i.name: i for i in model.graph.node} + seq = {i.name: idx for idx, i in enumerate(model.graph.node)} + producers = collections.defaultdict(lambda: None) + producers.update({i.output[0]: i.name for i in model.graph.node}) + return nodes, seq, producers + + nodes, seq, producers = get_nodes() + + def maybe_skip_cast(a): + if nodes[a].op_type == "Cast": + return producers[nodes[a].input[0]] + else: + return a + + def can_visit(src, dst): + if seq[src] < seq[dst]: + return False + stack, visited = collections.deque([src]), set() + while stack: + cur = stack.pop() + if cur == dst: + return True + visited.add(cur) + next_nodes = [ + producers[tensor] + for tensor in nodes[cur].input + if producers[tensor] is not None + ] + for name in next_nodes: + if name not in visited and seq[name] >= seq[dst]: + stack.append(name) + return False + + def is_residual_add(nodename, strict): + if nodes[nodename].op_type != "Add": + return False + a, b = [producers[tensor] for tensor in nodes[nodename].input] + if a is None or b is None: + return False + a = maybe_skip_cast(a) + b = maybe_skip_cast(b) + begin, end = (a, b) if seq[a] < seq[b] else (b, a) + if strict and nodes[begin].op_type != "Add": + return False + return can_visit(end, begin) + + def get_add0(add1): + a, b = [producers[tensor] for tensor in nodes[add1].input] + a = maybe_skip_cast(a) + b = maybe_skip_cast(b) + add0 = a if seq[a] < seq[b] else b + assert is_residual_add(add0, strict=False) + return add0 + + def get_layer0_input(add0): + a, b = [producers[tensor] for tensor in nodes[add0].input] + return a if seq[a] < seq[b] else b + + residual_add_names = [ + name for name in nodes.keys() if is_residual_add(name, strict=True) + ] + if len(residual_add_names) % 2 == 1: + # 'add0' is missing in residual_adds + add0 = get_add0(residual_add_names[0]) + residual_add_names.insert(0, add0) + + output_tensors = [] + if include_first_input: + layer0_input = get_layer0_input(residual_add_names[0]) + output_tensors.append(nodes[layer0_input].output[0]) + output_tensors += [ + nodes[node].output[0] for i, node in enumerate(residual_add_names) if i % 2 == 1 + ] + + return output_tensors + + +def _load_model(onnxfile, load_external_data=False, model_cache={}): + if onnxfile not in model_cache: + model_cache[onnxfile] = onnx.load( + onnxfile, load_external_data=load_external_data + ) + return model_cache[onnxfile] + + +def _load_encoding(encodingfile, no_merge=False): + all = {} + if encodingfile is not None: + with open(encodingfile) as json_file: + quant_encoding_dict = json.load(json_file) + if no_merge: + return quant_encoding_dict + all.update(quant_encoding_dict["activation_encodings"]) + all.update(quant_encoding_dict["param_encodings"]) + return all + + +def _save_encoding(encodings, encodingfile): + with open(encodingfile, "wt") as json_file: + json.dump(encodings, json_file, indent=4, sort_keys=True) + + +def embed_forecast_token_embeddings(onnxmodel, forecast_token_embeddings, base_dir): + + (embedding_table_name,) = [ + node.input[0] for node in onnxmodel.graph.node if node.op_type == "Gather" + ] + (embedding_table_proto,) = [ + i for i in onnxmodel.graph.initializer if i.name == embedding_table_name + ] + embedding_table = to_array(embedding_table_proto, base_dir=base_dir) + + assert ( + embedding_table.shape[1] == forecast_token_embeddings.shape[1] + ), "Mismatching token embedding size" + new_embedding_table = np.concatenate( + (embedding_table, forecast_token_embeddings), axis=0 + ) + onnxmodel.graph.initializer.remove(embedding_table_proto) + onnxmodel.graph.initializer.append( + from_array(new_embedding_table, embedding_table_proto.name) + ) + + +def split_onnx_by_names( + onnxfile, + modelname, + pickle_filedir, + *list_of_output_tensors, + output_dir=".", + onnxmodel=None, + encoding_file=None, + using_qairt_workflow=False, +): + encodings = None + uses_lists = None + if encoding_file is not None: + with open(encoding_file) as f: + encodings = json.load(f) + uses_lists = isinstance(encodings["activation_encodings"], list) + if uses_lists: + # Convert encodings to dictionary + encodings["activation_encodings"] = { + v["name"]: v for v in encodings["activation_encodings"] + } + encodings["param_encodings"] = { + v["name"]: v for v in encodings["param_encodings"] + } + + onnx_to_artifacts_map = dict() + onnxmodel = ( + _load_model(onnxfile, load_external_data=False) + if onnxmodel is None + else onnxmodel + ) + splitter = OnnxSplitter(onnxmodel, verbose=False) + base_dir = os.path.dirname(onnxfile) + using_external_data = OnnxSplitter.is_using_external_data(onnxmodel) + + list_of_output_tensors = [i.split(",") for i in list_of_output_tensors] + num_splits = len(list_of_output_tensors) + 1 + + # 1. split model + new_model_info = [] + for i, subgraph in enumerate(splitter.split(list_of_output_tensors)): + new_basename = f"{modelname}_{i+1}_of_{num_splits}" + input_tensor_names = [i.name for i in subgraph.input] + output_tensor_names = [i.name for i in subgraph.output] + new_model_info.append([new_basename, input_tensor_names, output_tensor_names]) + + submodel = onnx.helper.make_model( + subgraph, opset_imports=onnxmodel.opset_import + ) + if ( + not using_external_data + and submodel.ByteSize() < onnx.checker.MAXIMUM_PROTOBUF + ): + onnx.checker.check_model(submodel) + + if using_external_data: + onnx.load_external_data_for_model(submodel, base_dir=base_dir) + + part_root_path = Path(output_dir) / (new_basename + ".aimet") + part_root_path.mkdir(parents=True, exist_ok=True) + + newonnxfile = part_root_path / (new_basename + ".onnx") + save_model(submodel, newonnxfile, using_external_data) + + # Save subset of encodings + if encodings is not None: + new_encodings = deepcopy(encodings) + + activation_names = ( + set(o for x in submodel.graph.node for o in x.output) + | set(x.name for x in submodel.graph.input) + | set(x.name for x in submodel.graph.output) + ) + param_names = set(x.name for x in submodel.graph.initializer) + + for k in encodings["activation_encodings"]: + if k not in activation_names: + del new_encodings["activation_encodings"][k] + + for k in encodings["param_encodings"]: + if k not in param_names: + del new_encodings["param_encodings"][k] + + if uses_lists: + # convert back + new_encodings["activation_encodings"] = list( + new_encodings["activation_encodings"].values() + ) + new_encodings["param_encodings"] = list( + new_encodings["param_encodings"].values() + ) + + new_encodings_path = part_root_path / (new_basename + ".encodings") + with open(new_encodings_path, "w") as write_file: + json.dump(new_encodings, write_file, indent=4, sort_keys=True) + + return onnx_to_artifacts_map + + +def _get_lm_head_sizes(onnxmodel): + "Get dimensions of the LM head : embedding_size, vocab_size" + lm_head_weight_name = next( + node.input[1] + for node in reversed(onnxmodel.graph.node) + if node.op_type in ("Conv", "MatMul", "Gemm") + ) + (lm_head_weight,) = [ + i for i in onnxmodel.graph.initializer if lm_head_weight_name == i.name + ] + if len(lm_head_weight.dims) == 2: + embedding_size, vocab_size = lm_head_weight.dims + else: + (lm_head,) = [i for i in onnxmodel.graph.node if lm_head_weight.name in i.input] + if lm_head.op_type == "Conv": + attr_group = [i.i for i in lm_head.attribute if i.name == "group"] + group = attr_group[0] if len(attr_group) == 1 else 1 + grouped_vocab, group_size, _, _ = lm_head_weight.dims + vocab_size, embedding_size = grouped_vocab // group, group * group_size + elif lm_head.op_type == "MatMul": + group, group_size, vocab_size = lm_head_weight.dims + embedding_size = group * group_size + else: + raise RuntimeError(f"Unexpected lm_head op_type:{lm_head}") + + return embedding_size, vocab_size + + +def fill_input_encodings_of_split(onnxmodel, encodingfile, output_tensor_list): + + changed = False + encodings = _load_encoding(encodingfile, no_merge=True) + enc_act, enc_param = encodings["activation_encodings"], encodings["param_encodings"] + producer = {tensor: node for node in onnxmodel.graph.node for tensor in node.output} + for split_tensor in output_tensor_list: + if split_tensor not in enc_act: + assert split_tensor in producer + input_tensor = producer[split_tensor].input[0] # use only 1st input + if input_tensor in producer: + while input_tensor not in enc_act and input_tensor not in enc_param: + input_tensor = producer[input_tensor].input[0] + input_encoding = ( + enc_act[input_tensor] + if input_tensor in enc_act + else enc_param[input_tensor] + ) + enc_act[split_tensor] = input_encoding + changed = True + + if changed: + backup = f"{encodingfile}.bak" + if not os.path.exists(backup): + shutil.move(encodingfile, backup) + _save_encoding(encodings, encodingfile) + + +def split_onnx( + onnxfile, + modelname, + pickle_filedir, + num_splits, + num_layers_per_split: Optional[int] = None, + output_dir="./", + split_embedding=False, + encoding_file=None, + using_qairt_workflow=False, +): + def _is_cache(layer, name): + return re.search(f"past_(key|value)_{layer}_", name) is not None + + num_splits = int(num_splits) + + onnxmodel = _load_model(onnxfile, load_external_data=False) + input_names, output_names = get_onnx_input_output_names( + onnxfile, + onnxmodel=onnxmodel, + deco_digit=False, + using_qairt_workflow=using_qairt_workflow, + ) + output_tensor_list = get_split_tensors( + onnxfile, onnxmodel=onnxmodel, include_first_input=split_embedding + ) + + # Infer the shape of per-layer tensors + (input_ids,) = [i for i in onnxmodel.graph.input if i.name == "input_ids"] + batch_size, seq_length = [i.dim_value for i in input_ids.type.tensor_type.shape.dim] + + embedding_size, vocab_size = _get_lm_head_sizes(onnxmodel) + + per_layer_output_value_info = [ + onnx.helper.make_tensor_value_info( + name, onnx.TensorProto.FLOAT, [batch_size, seq_length, embedding_size] + ) + for name in output_tensor_list + ] + onnxmodel.graph.value_info.extend(per_layer_output_value_info) + + names_to_split = [] + if split_embedding: + first_output_tensors = output_tensor_list[0].split(",") + fill_input_encodings_of_split(onnxmodel, encoding_file, first_output_tensors) + names_to_split.append(output_tensor_list[0]) + output_tensor_list.pop(0) + + num_layers = len(output_tensor_list) + if num_layers_per_split is None: + num_layers_per_split = ( + ((num_layers - 1) // num_splits) + if split_embedding + else (num_layers // num_splits) + ) + past_key_values = { + layer: [output for output in output_names if _is_cache(layer, output)] + for layer in range(num_layers) + } + + for layer_end in range(num_layers_per_split, num_layers, num_layers_per_split): + outputs = [output_tensor_list[layer_end - 1]] + for layer in range(layer_end - num_layers_per_split, layer_end): + outputs += past_key_values[layer] + names_to_split.append(",".join(outputs)) + + names_to_split = names_to_split[: num_splits - 1] + assert ( + num_splits == len(names_to_split) + 1 + ), f"Failed to split into {num_splits} pieces!" + return split_onnx_by_names( + onnxfile, + modelname, + pickle_filedir, + *names_to_split, + output_dir=output_dir, + onnxmodel=onnxmodel, + encoding_file=encoding_file, + using_qairt_workflow=using_qairt_workflow, + ) diff --git a/qai_hub_models/models/aotgan/README.md b/qai_hub_models/models/aotgan/README.md index 709849c3..c3b8a37f 100644 --- a/qai_hub_models/models/aotgan/README.md +++ b/qai_hub_models/models/aotgan/README.md @@ -6,7 +6,7 @@ AOT-GAN is a machine learning model that allows to erase and in-paint part of given input image. This is based on the implementation of AOT-GAN found -[here](https://github.com/researchmm/AOT-GAN-for-Inpainting). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/aotgan). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.aotgan.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of AOT-GAN can be found +* The license for the original implementation of AOT-GAN can be found [here](https://github.com/taki0112/AttnGAN-Tensorflow/blob/master/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Aggregated Contextual Transformations for High-Resolution Image Inpainting](https://arxiv.org/abs/2104.01431) * [Source Model Implementation](https://github.com/researchmm/AOT-GAN-for-Inpainting) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/aotgan/export.py b/qai_hub_models/models/aotgan/export.py index 35f236ac..b0e272e4 100644 --- a/qai_hub_models/models/aotgan/export.py +++ b/qai_hub_models/models/aotgan/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch from qai_hub_models.models.aotgan import Model +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "aotgan" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/aotgan/perf.yaml b/qai_hub_models/models/aotgan/perf.yaml index 488934ec..95593bfe 100644 --- a/qai_hub_models/models/aotgan/perf.yaml +++ b/qai_hub_models/models/aotgan/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: AOT-GAN performance_metrics: - torchscript_onnx_tflite: - inference_time: 153234.0 - throughput: 6.525966821984677 + inference_time: 152996.0 + throughput: 6.536118591335721 estimated_peak_memory_range: - min: 3284992 - max: 5465888 + min: 4313088 + max: 6987208 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 235 - job_id: joprk1705 + job_id: jg9lno9qg job_status: Passed torchscript_onnx_qnn: - inference_time: 153843.0 - throughput: 6.500133252731681 + inference_time: 153279.0 + throughput: 6.524050913693331 estimated_peak_memory_range: - min: 4227072 - max: 23282152 + min: 4317184 + max: 24792560 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,7 +71,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 274 - job_id: j1glneqjp + job_id: jp2kyojxp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -82,13 +80,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T13:00:39Z' + timestamp: '2024-10-15T01:04:42Z' - torchscript_onnx_tflite: - inference_time: 120494.0 - throughput: 8.299168423323984 + inference_time: 120324.0 + throughput: 8.310893919750008 estimated_peak_memory_range: - min: 75563008 - max: 268910816 + min: 3362816 + max: 225851856 primary_compute_unit: NPU precision: fp16 layer_info: @@ -96,14 +94,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 235 - job_id: jep283zrp + job_id: jp14zoqkp job_status: Passed torchscript_onnx_qnn: - inference_time: 139206.0 - throughput: 7.183598408114593 + inference_time: 139029.0 + throughput: 7.19274395989326 estimated_peak_memory_range: - min: 4284416 - max: 51562816 + min: 4214784 + max: 63555488 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,7 +109,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 274 - job_id: jw566q065 + job_id: jpy138nrp job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -120,13 +118,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T13:00:40Z' + timestamp: '2024-10-15T01:04:43Z' - torchscript_onnx_tflite: - inference_time: 153299.0 - throughput: 6.523199759946249 + inference_time: 152722.0 + throughput: 6.547845104176216 estimated_peak_memory_range: min: 3293184 - max: 5408344 + max: 5871152 primary_compute_unit: NPU precision: fp16 layer_info: @@ -134,14 +132,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 235 - job_id: jqpyevy8g + job_id: jgdx167kp job_status: Passed torchscript_onnx_qnn: - inference_time: 92901.0 - throughput: 10.764146779905492 + inference_time: 92370.0 + throughput: 10.826025765941322 estimated_peak_memory_range: - min: 4395008 - max: 5750104 + min: 4444160 + max: 5674816 primary_compute_unit: NPU precision: fp16 layer_info: @@ -149,7 +147,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 274 - job_id: jwgoye9q5 + job_id: jp8qyj8zp job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -157,14 +155,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T13:00:43Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T01:04:45Z' - torchscript_onnx_tflite: - inference_time: 196094.0 - throughput: 5.099595092149683 + inference_time: 153035.0 + throughput: 6.534452902930702 estimated_peak_memory_range: - min: 3325952 - max: 172737088 + min: 3321856 + max: 5871472 primary_compute_unit: NPU precision: fp16 layer_info: @@ -172,14 +170,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 235 - job_id: j2p0yex9g + job_id: j5mnx9vyp job_status: Passed torchscript_onnx_qnn: - inference_time: 194222.0 - throughput: 5.14874730977953 + inference_time: 92574.0 + throughput: 10.80216907555037 estimated_peak_memory_range: - min: 4255744 - max: 45352048 + min: 4509696 + max: 5770936 primary_compute_unit: NPU precision: fp16 layer_info: @@ -187,22 +185,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 274 - job_id: jygzevqog + job_id: jglvmw7e5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T13:00:46Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T01:04:49Z' - torchscript_onnx_tflite: - inference_time: 150359.0 - throughput: 6.6507492068981575 + inference_time: 152757.0 + throughput: 6.546344848353922 estimated_peak_memory_range: - min: 3219456 - max: 5131872 + min: 3317760 + max: 5471768 primary_compute_unit: NPU precision: fp16 layer_info: @@ -210,14 +208,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 235 - job_id: j1p8owkkg + job_id: jpxko0ej5 job_status: Passed torchscript_onnx_qnn: - inference_time: 93120.0 - throughput: 10.738831615120274 + inference_time: 93610.0 + throughput: 10.682619378271552 estimated_peak_memory_range: - min: 4444160 - max: 5980048 + min: 4489216 + max: 5770440 primary_compute_unit: NPU precision: fp16 layer_info: @@ -225,22 +223,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 274 - job_id: j1pv3zyk5 + job_id: j5q6q4w7p job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T13:00:43Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T01:04:47Z' - torchscript_onnx_tflite: - inference_time: 153344.0 - throughput: 6.521285475792988 + inference_time: 152642.0 + throughput: 6.551276843856868 estimated_peak_memory_range: - min: 3289088 - max: 5317984 + min: 3223552 + max: 5517544 primary_compute_unit: NPU precision: fp16 layer_info: @@ -248,14 +246,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 235 - job_id: jogkzrkwg + job_id: jp4lrejq5 job_status: Passed torchscript_onnx_qnn: - inference_time: 92988.0 - throughput: 10.754075794726202 + inference_time: 92421.0 + throughput: 10.820051719847221 estimated_peak_memory_range: - min: 4382720 - max: 6036920 + min: 4460544 + max: 5743688 primary_compute_unit: NPU precision: fp16 layer_info: @@ -263,22 +261,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 274 - job_id: j7gjxk6vp + job_id: jgkex6dyg job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T13:00:44Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T01:04:46Z' - torchscript_onnx_tflite: - inference_time: 153465.0 - throughput: 6.51614374613104 + inference_time: 193915.0 + throughput: 5.156898641157208 estimated_peak_memory_range: - min: 3321856 - max: 5293976 + min: 3379200 + max: 195570816 primary_compute_unit: NPU precision: fp16 layer_info: @@ -286,14 +284,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 235 - job_id: jn5q89dn5 + job_id: j57yrovq5 job_status: Passed torchscript_onnx_qnn: - inference_time: 93130.0 - throughput: 10.737678513905294 + inference_time: 195480.0 + throughput: 5.11561285041948 estimated_peak_memory_range: - min: 4509696 - max: 5646168 + min: 864256 + max: 48880976 primary_compute_unit: NPU precision: fp16 layer_info: @@ -301,19 +299,57 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 274 - job_id: jlpe940og + job_id: jp3j0o8xg job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T01:04:51Z' + - torchscript_onnx_tflite: + inference_time: 118959.0 + throughput: 8.406257618170967 + estimated_peak_memory_range: + min: 3121152 + max: 89988256 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 235 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 235 + job_id: jprv3x9vg + job_status: Passed + torchscript_onnx_qnn: + inference_time: 118560.0 + throughput: 8.434547908232119 + estimated_peak_memory_range: + min: 3158016 + max: 68309728 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 274 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 274 + job_id: jgo26dm4p + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T13:00:45Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T01:04:52Z' - torchscript_onnx_qnn: - inference_time: 96405.0 - throughput: 10.372905969607386 + inference_time: 96258.0 + throughput: 10.388746909347795 estimated_peak_memory_range: min: 4202496 max: 4202496 @@ -324,7 +360,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 274 - job_id: j1p3kqr35 + job_id: jp0z0ok25 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -333,4 +369,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T13:00:41Z' + timestamp: '2024-10-15T01:04:44Z' diff --git a/qai_hub_models/models/baichuan_7b_quantized/README.md b/qai_hub_models/models/baichuan2_7b_quantized/README.md similarity index 63% rename from qai_hub_models/models/baichuan_7b_quantized/README.md rename to qai_hub_models/models/baichuan2_7b_quantized/README.md index 6e9ee724..41a9b966 100644 --- a/qai_hub_models/models/baichuan_7b_quantized/README.md +++ b/qai_hub_models/models/baichuan2_7b_quantized/README.md @@ -1,30 +1,37 @@ [![Qualcomm® AI Hub Models](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/quic-logo.jpg)](../../README.md) -# [Baichuan-7B: Large language model achieving state-of-the-art performance on Chinese and English language benchmarks](https://aihub.qualcomm.com/models/baichuan_7b_quantized) +# [Baichuan2-7B: State-of-the-art large language model useful on a variety of language understanding and generation tasks](https://aihub.qualcomm.com/models/baichuan2_7b_quantized) -Baichuan-7B is a family of LLMs. It achieves the state-of-the-art performance of its size on standard Chinese and English authoritative benchmarks (C-EVAL/MMLU). 4-bit weights and 16-bit activations making it suitable for on-device The model is quantized to deployment. For Prompt and output length specified below, the time to first token is Llama-PromptProcessor-Quantized's latency and average time per addition token is Llama-TokenGenerator-KVCache-Quantized's latency. +Baichuan2-7B is a family of LLMs. It achieves the state-of-the-art performance of its size on standard Chinese and English authoritative benchmarks (C-EVAL/MMLU). 4-bit weights and 16-bit activations making it suitable for on-device deployment. For Prompt and output length specified below, the time to first token is Baichuan2-PromptProcessor-Quantized's latency and average time per addition token is Baichuan2-TokenGenerator-Quantized's latency. -This is based on the implementation of Baichuan-7B found -[here](https://github.com/baichuan-inc/Baichuan-7B/). This repository contains scripts for optimized on-device +This is based on the implementation of Baichuan2-7B found +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance -accross various devices, can be found [here](https://aihub.qualcomm.com/models/baichuan_7b_quantized). +accross various devices, can be found [here](https://aihub.qualcomm.com/models/baichuan2_7b_quantized). [Sign up](https://myaccount.qualcomm.com/signup) to start using Qualcomm AI Hub and run these models on a hosted Qualcomm® device. +## Deploying Baichuan2-7B on-device + +Please follow the [LLM on-device deployment](https://github.com/quic/ai-hub-apps/tree/main/tutorials/llm_on_genie) tutorial. + ## License -- The license for the original implementation of Baichuan-7B can be found +* The license for the original implementation of Baichuan2-7B can be found [here](https://github.com/baichuan-inc/Baichuan-7B/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://github.com/baichuan-inc/Baichuan-7B/blob/main/LICENSE) +* The license for the compiled assets for on-device deployment can be found [here](https://github.com/baichuan-inc/Baichuan-7B/blob/main/LICENSE) + ## References * [Baichuan 2: Open Large-scale Language Models](https://arxiv.org/abs/2309.10305) * [Source Model Implementation](https://github.com/baichuan-inc/Baichuan-7B/) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/baichuan2_7b_quantized/info.yaml b/qai_hub_models/models/baichuan2_7b_quantized/info.yaml new file mode 100644 index 00000000..b258510d --- /dev/null +++ b/qai_hub_models/models/baichuan2_7b_quantized/info.yaml @@ -0,0 +1,59 @@ +name: Baichuan2-7B +id: baichuan2_7b_quantized +status: public +headline: State-of-the-art large language model useful on a variety of language + understanding and generation tasks. +domain: Generative AI +description: Baichuan2-7B is a family of LLMs. It achieves the state-of-the-art performance of its size on standard Chinese and English authoritative benchmarks (C-EVAL/MMLU). 4-bit weights and 16-bit activations making it suitable for on-device deployment. For Prompt and output length specified below, the time to first token is Baichuan2-PromptProcessor-Quantized's latency and average time per addition token is Baichuan2-TokenGenerator-Quantized's latency. +use_case: Text Generation +tags: + - llm + - generative-ai + - quantized +research_paper: https://arxiv.org/abs/2309.10305 +research_paper_title: "Baichuan 2: Open Large-scale Language Models" +license: https://github.com/baichuan-inc/Baichuan-7B/blob/main/LICENSE +deploy_license: https://github.com/baichuan-inc/Baichuan-7B/blob/main/LICENSE +source_repo: https://github.com/baichuan-inc/Baichuan-7B/ +technical_details: + Input sequence length for Prompt Processor: 128 + Context length: 4096 + Number of parameters: 7.07B + Precision: w4a16 + w8a16 (few layers) + Num of key-value heads: 8 + Information about the model parts: Prompt Processor and Token Generator are split into 5 parts each. Each corresponding Prompt Processor and Token Generator part share weights. + Prompt processor model size: 5.06 GB + Prompt processor input (part1): 128 tokens + Prompt processor output (part1): Embeddings output + Prompt processor input (other parts): 128 tokens + KVCache initialized with pad token + Prompt processor output (other parts): 128 output tokens + KVCache for token generator + Token generator model size: 5.06 GB + Token generator input (part1): 128 tokens + Token generator output (part1): Embeddings output + Token generator input (other parts): 1 input token + past KVCache + Token generator output (other parts): 1 output token + KVCache for next iteration + Use: Initiate conversation with prompt-processor and then token generator for subsequent iterations. + Supported languages: Chinese and English. + Minimum QNN SDK version required: 2.27.7 + TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens). + Response Rate: Rate of response generation after the first response token. +applicable_scenarios: + - Dialogue + - Content Generation + - Customer Support +related_models: [] +form_factors: + - Phone + - Tablet +has_static_banner: true +has_animated_banner: true +license_type: apache-2.0 +deploy_license_type: apache-2.0 +dataset: [] +model_type_llm: true +llm_details: + call_to_action: 'download' + Snapdragon 8 Elite QRD: + torchscript_onnx_qnn: + model_download_url: v2/snapdragon_8_elite/models.zip + genie_compatible: true diff --git a/qai_hub_models/models/baichuan2_7b_quantized/perf.yaml b/qai_hub_models/models/baichuan2_7b_quantized/perf.yaml new file mode 100644 index 00000000..b16bc822 --- /dev/null +++ b/qai_hub_models/models/baichuan2_7b_quantized/perf.yaml @@ -0,0 +1,25 @@ +aggregated: + supported_oses: + - Android + supported_devices: + - Snapdragon 8 Elite QRD + supported_chipsets: + - Snapdragon® 8 Elite +models: + name: 'Baichuan2-7B' + performance_metrics: + - torchscript_onnx_qnn: + llm_metrics: + time_to_first_token_range: + min: 208048 + max: 6657536 + tokens_per_second: 7.72 + evaluation_metrics: null + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-16T00:32:42.210701Z' diff --git a/qai_hub_models/models/baichuan_7b_quantized/info.yaml b/qai_hub_models/models/baichuan_7b_quantized/info.yaml deleted file mode 100644 index cee1c0d1..00000000 --- a/qai_hub_models/models/baichuan_7b_quantized/info.yaml +++ /dev/null @@ -1,47 +0,0 @@ -name: Baichuan-7B -id: baichuan_7b_quantized -status: public -headline: Large language model achieving state-of-the-art performance on Chinese and English language benchmarks. -domain: Generative AI -description: Baichuan-7B is a family of LLMs. It achieves the state-of-the-art performance of - its size on standard Chinese and English authoritative benchmarks (C-EVAL/MMLU). - 4-bit weights and 16-bit activations making it suitable for on-device - The model is quantized to deployment. For Prompt and output length specified below, - the time to first token is Llama-PromptProcessor-Quantized's latency and average - time per addition token is Llama-TokenGenerator-KVCache-Quantized's latency. -use_case: Text Generation -tags: - - llm - - generative-ai - - quantized -research_paper: https://arxiv.org/abs/2309.10305 -research_paper_title: "Baichuan 2: Open Large-scale Language Models" -license: https://github.com/baichuan-inc/Baichuan-7B/blob/main/LICENSE -deploy_license: https://github.com/baichuan-inc/Baichuan-7B/blob/main/LICENSE -source_repo: https://github.com/baichuan-inc/Baichuan-7B/ -technical_details: - Number of parameters: 7B - Model size: 3.9GB - Model-1 (Prompt Processor): Baichuan-PromptProcessor-Quantized - Max context length: 1024 - Prompt processor input: 1024 tokens - Prompt processor output: 1024 output tokens + KVCache for token generator - Model-2 (Token Generator): Baichuan-TokenGenerator-KVCache-Quantized - Token generator input: 1 input token + past KVCache - Token generator output: 1 output token + KVCache for next iteration - Decoding length: 1024 (1 output token + 1023 from KVCache) - Use: Initiate conversation with prompt-processor and then token generator for subsequent iterations. -applicable_scenarios: - - Dialogue - - Content Generation - - Customer Support -related_models: [] -form_factors: - - Phone - - Tablet -has_static_banner: true -has_animated_banner: true -license_type: apache-2.0 -deploy_license_type: apache-2.0 -dataset: [] -restrict_model_sharing: true diff --git a/qai_hub_models/models/baichuan_7b_quantized/perf.yaml b/qai_hub_models/models/baichuan_7b_quantized/perf.yaml deleted file mode 100644 index 4f87d7e0..00000000 --- a/qai_hub_models/models/baichuan_7b_quantized/perf.yaml +++ /dev/null @@ -1,77 +0,0 @@ -models: -- name: Baichuan-TokenGenerator-KVCache-Quantized - performance_metrics: - - reference_device_info: - name: Samsung Galaxy S24 Ultra - os: '14' - form_factor: Phone - os_name: Android - manufacturer: Samsung - chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-02-16T22:23:17.643089Z' - torchscript_onnx_qnn: - inference_time: 108059 - throughput: 9.25 - estimated_peak_memory_range: - min: 561152 - max: 112366992 - layer_info: - layers_on_npu: 33820 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 33820 - precision: uint16 - primary_compute_unit: NPU - job_id: "null" - job_status: Passed -- name: Baichuan-PromptProcessor-Quantized - performance_metrics: - - reference_device_info: - name: Samsung Galaxy S24 Ultra - os: '14' - form_factor: Phone - os_name: Android - manufacturer: Samsung - chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-02-16T22:23:17.643089Z' - torchscript_onnx_qnn: - inference_time: 2599326 - throughput: 393.94 - estimated_peak_memory_range: - min: 53248 - max: 40255040 - layer_info: - layers_on_npu: 31772 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 31772 - precision: uint16 - primary_compute_unit: NPU - job_id: "null" - job_status: Passed -aggregated: - supported_devices: - - Samsung Galaxy S24 Ultra - supported_oses: - - Android - supported_chipsets: - - Snapdragon® 8 Gen 3 - performance_metrics: - - reference_device_info: - name: Samsung Galaxy S24 Ultra - os: '14' - form_factor: Phone - os_name: Android - manufacturer: Samsung - chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-02-16T22:23:17.643089Z' - torchscript_onnx_qnn: - inference_time: 108059 - throughput: 9.25 - estimated_peak_memory_range: - min: 561152 - max: 112366992 - precision: uint16 - primary_compute_unit: NPU - job_id: "" - job_status: Passed diff --git a/qai_hub_models/models/common.py b/qai_hub_models/models/common.py index 598158b5..3e076cf9 100644 --- a/qai_hub_models/models/common.py +++ b/qai_hub_models/models/common.py @@ -2,10 +2,12 @@ # Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. # SPDX-License-Identifier: BSD-3-Clause # --------------------------------------------------------------------- +from dataclasses import dataclass from enum import Enum, unique -from typing import Dict, List +from typing import Dict, List, Optional import numpy as np +import qai_hub as hub @unique @@ -34,3 +36,12 @@ class SourceModelFormat(Enum): SampleInputsType = Dict[str, List[np.ndarray]] + + +@dataclass +class ExportResult: + compile_job: Optional[hub.CompileJob] = None + quantize_job: Optional[hub.QuantizeJob] = None + profile_job: Optional[hub.ProfileJob] = None + inference_job: Optional[hub.InferenceJob] = None + link_job: Optional[hub.LinkJob] = None diff --git a/qai_hub_models/models/controlnet_quantized/README.md b/qai_hub_models/models/controlnet_quantized/README.md index d37819e2..08c419b4 100644 --- a/qai_hub_models/models/controlnet_quantized/README.md +++ b/qai_hub_models/models/controlnet_quantized/README.md @@ -6,7 +6,7 @@ On-device, high-resolution image synthesis from text and image prompts. ControlNet guides Stable-diffusion with provided input image to generate accurate images from given input prompt. This is based on the implementation of ControlNet found -[here](https://github.com/lllyasviel/ControlNet). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/controlnet_quantized). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.controlnet_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of ControlNet can be found +* The license for the original implementation of ControlNet can be found [here](https://github.com/lllyasviel/ControlNet/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://github.com/lllyasviel/ControlNet/blob/main/LICENSE) +* The license for the compiled assets for on-device deployment can be found [here](https://github.com/lllyasviel/ControlNet/blob/main/LICENSE) + ## References * [Adding Conditional Control to Text-to-Image Diffusion Models](https://arxiv.org/abs/2302.05543) * [Source Model Implementation](https://github.com/lllyasviel/ControlNet) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/controlnet_quantized/export.py b/qai_hub_models/models/controlnet_quantized/export.py index 41b1cff8..b98f52d3 100644 --- a/qai_hub_models/models/controlnet_quantized/export.py +++ b/qai_hub_models/models/controlnet_quantized/export.py @@ -9,13 +9,14 @@ import warnings from pathlib import Path -from typing import Any, Dict, List, Mapping, Optional, Tuple, cast +from typing import Any, Dict, List, Mapping, Optional, cast import qai_hub as hub +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.controlnet_quantized import Model from qai_hub_models.utils.args import export_parser -from qai_hub_models.utils.base_model import BasePrecompiledModel, TargetRuntime +from qai_hub_models.utils.base_model import BasePrecompiledModel from qai_hub_models.utils.printing import print_profile_metrics_from_job from qai_hub_models.utils.qai_hub_helpers import ( can_access_qualcomm_ai_hub, @@ -46,19 +47,16 @@ def export_model( output_dir: Optional[str] = None, profile_options: str = "", **additional_model_kwargs, -) -> Mapping[str, Tuple[Optional[hub.ProfileJob], Optional[hub.InferenceJob]]] | List[ - str -]: +) -> Mapping[str, ExportResult] | List[str]: """ - This function accomplishes 5 main tasks: + This function executes the following recipe: - 1. Initialize model. - 2. Upload model assets to hub. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Summarizes the results from profiling. + 1. Initialize model + 2. Upload model assets to hub + 3. Profiles the model performance on a real device + 4. Summarizes the results from profiling - Each of the last three steps can be optionally skipped using the input options. + Each of the last 2 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -80,9 +78,8 @@ def export_model( `model_cls.from_precompiled` Returns: - A Mapping from component_name to a 2-tuple of: + A Mapping from component_name to a struct of: * A ProfileJob containing metadata about the profile job (None if profiling skipped). - * An InferenceJob containing metadata about the inference job (None if inferencing skipped). """ model_name = "controlnet_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,9 +108,7 @@ def export_model( component_arg, ) - target_runtime = TargetRuntime.TFLITE - # On-device perf improves with I/O in channel_last format except when using ONNX. - use_channel_last_format = target_runtime != TargetRuntime.ONNX + target_runtime = TargetRuntime.QNN # 1. Initialize model print("Initializing model class") @@ -135,8 +130,11 @@ def export_model( uploaded_models[component_name] = hub.upload_model( components_dict[component_name].get_target_model_path() ) + print( + f"The {component_name} model is saved here: {components_dict[component_name].get_target_model_path()}" + ) - # 3. Profile the model assets on real devices + # 3. Profiles the model performance on a real device profile_jobs: Dict[str, hub.client.ProfileJob] = {} if not skip_profiling: for component_name in components: @@ -154,31 +152,7 @@ def export_model( hub.client.ProfileJob, submitted_profile_job ) - # 4. Run inference on-device with sample inputs - inference_jobs: Dict[str, hub.client.InferenceJob] = {} - if not skip_inferencing: - for component_name in components: - print( - f"Running inference for {component_name} on a hosted device with example inputs." - ) - profile_options_all = components_dict[ - component_name - ].get_hub_profile_options(target_runtime, profile_options) - sample_inputs = components_dict[component_name].sample_inputs( - use_channel_last_format=use_channel_last_format - ) - submitted_inference_job = hub.submit_inference_job( - model=uploaded_models[component_name], - inputs=sample_inputs, - device=hub_device, - name=f"{model_name}_{component_name}", - options=profile_options_all, - ) - inference_jobs[component_name] = cast( - hub.client.InferenceJob, submitted_inference_job - ) - - # 5. Summarize the results from profiling + # 4. Summarizes the results from profiling if not skip_summary and not skip_profiling: for component_name in components: profile_job = profile_jobs[component_name] @@ -187,9 +161,8 @@ def export_model( print_profile_metrics_from_job(profile_job, profile_data) return { - component_name: ( - profile_jobs.get(component_name, None), - inference_jobs.get(component_name, None), + component_name: ExportResult( + profile_job=profile_jobs.get(component_name, None), ) for component_name in components } diff --git a/qai_hub_models/models/controlnet_quantized/perf.yaml b/qai_hub_models/models/controlnet_quantized/perf.yaml index fae155c5..dfdd181f 100644 --- a/qai_hub_models/models/controlnet_quantized/perf.yaml +++ b/qai_hub_models/models/controlnet_quantized/perf.yaml @@ -11,7 +11,7 @@ aggregated: supported_chipsets: - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 3 - - Qcs8550 Proxy + - QCS8550 Proxy models: - name: TextEncoder_Quantized performance_metrics: @@ -112,7 +112,7 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy + chipset: QCS8550 Proxy timestamp: '2024-10-03T16:19:26Z' - name: UNet_Quantized performance_metrics: @@ -213,7 +213,7 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy + chipset: QCS8550 Proxy timestamp: '2024-10-03T16:19:27Z' - name: VAEDecoder_Quantized performance_metrics: @@ -314,7 +314,7 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy + chipset: QCS8550 Proxy timestamp: '2024-10-03T16:19:26Z' - name: ControlNet_Quantized performance_metrics: @@ -415,5 +415,5 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy + chipset: QCS8550 Proxy timestamp: '2024-10-03T16:19:27Z' diff --git a/qai_hub_models/models/convnext_tiny/README.md b/qai_hub_models/models/convnext_tiny/README.md index 67aed5c6..efee0bb2 100644 --- a/qai_hub_models/models/convnext_tiny/README.md +++ b/qai_hub_models/models/convnext_tiny/README.md @@ -6,7 +6,7 @@ ConvNextTiny is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of ConvNext-Tiny found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/convnext.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/convnext_tiny). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.convnext_tiny.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of ConvNext-Tiny can be found +* The license for the original implementation of ConvNext-Tiny can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/convnext.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/convnext_tiny/export.py b/qai_hub_models/models/convnext_tiny/export.py index db0e6fa9..00a16877 100644 --- a/qai_hub_models/models/convnext_tiny/export.py +++ b/qai_hub_models/models/convnext_tiny/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.convnext_tiny import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "convnext_tiny" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/convnext_tiny/perf.yaml b/qai_hub_models/models/convnext_tiny/perf.yaml index 132a383b..d832083e 100644 --- a/qai_hub_models/models/convnext_tiny/perf.yaml +++ b/qai_hub_models/models/convnext_tiny/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: ConvNext-Tiny performance_metrics: - torchscript_onnx_tflite: - inference_time: 3313.0 - throughput: 301.84123151222457 + inference_time: 3402.0 + throughput: 293.9447383891828 estimated_peak_memory_range: min: 16384 - max: 34047392 + max: 3571480 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,37 +56,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 328 - job_id: jqpyev28g + job_id: jg9lno3mg job_status: Passed torchscript_onnx_qnn: - inference_time: 3839.0 - throughput: 260.4845011721803 + inference_time: 3892.0 + throughput: 256.9373072970195 estimated_peak_memory_range: - min: 233472 - max: 136630264 + min: 626688 + max: 93060648 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 223 + layers_on_npu: 232 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 223 - job_id: j1p3kq735 + total_layers: 232 + job_id: jpxko07j5 job_status: Passed torchscript_onnx: - inference_time: 16401.0 - throughput: 60.97189195780745 + inference_time: 13414.0 + throughput: 74.5489786789921 estimated_peak_memory_range: - min: 12288 - max: 68924008 + min: 638976 + max: 3750992 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 189 + layers_on_npu: 198 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 189 - job_id: jnp10qm85 + total_layers: 198 + job_id: jglvmwee5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:59:44Z' + timestamp: '2024-10-15T01:03:45Z' - torchscript_onnx_tflite: - inference_time: 2771.0 - throughput: 360.8805485384338 + inference_time: 2577.0 + throughput: 388.04811796662784 estimated_peak_memory_range: - min: 20480 - max: 213364256 + min: 16384 + max: 217772624 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,37 +109,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 328 - job_id: j2p0ye99g + job_id: jp14zodnp job_status: Passed torchscript_onnx_qnn: - inference_time: 3194.0 - throughput: 313.08703819661866 + inference_time: 3299.0 + throughput: 303.12215822976657 estimated_peak_memory_range: min: 618496 - max: 31299392 + max: 36684272 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 223 + layers_on_npu: 232 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 223 - job_id: jwgoyewq5 + total_layers: 232 + job_id: j5mnx9wyp job_status: Passed torchscript_onnx: - inference_time: 14123.0 - throughput: 70.80648587410607 + inference_time: 9798.0 + throughput: 102.06164523372117 estimated_peak_memory_range: min: 0 - max: 378934128 + max: 390272624 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 189 + layers_on_npu: 198 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 189 - job_id: jvgdw7mr5 + total_layers: 198 + job_id: j56y4oqvp job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:59:45Z' + timestamp: '2024-10-15T01:03:46Z' - torchscript_onnx_tflite: - inference_time: 3253.0 - throughput: 307.40854595757764 + inference_time: 3342.0 + throughput: 299.22202274087374 estimated_peak_memory_range: - min: 0 - max: 2261184 + min: 20480 + max: 2120064 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,22 +162,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 328 - job_id: j1p8owrkg + job_id: jgdx16r6p job_status: Passed torchscript_onnx_qnn: - inference_time: 3397.0 - throughput: 294.3773918163085 + inference_time: 3633.0 + throughput: 275.2546105147261 estimated_peak_memory_range: - min: 638976 - max: 2044160 + min: 634880 + max: 1771984 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 223 + layers_on_npu: 232 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 223 - job_id: j7gjxk8vp + total_layers: 232 + job_id: jprv3x1vg job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:59:39Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T01:03:38Z' - torchscript_onnx_tflite: - inference_time: 9137.0 - throughput: 109.44511327569224 + inference_time: 3385.0 + throughput: 295.4209748892171 estimated_peak_memory_range: - min: 24576 - max: 205835680 + min: 1273856 + max: 3086216 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,37 +200,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 328 - job_id: jogkzr0wg + job_id: jgdx16rkp job_status: Passed torchscript_onnx_qnn: - inference_time: 9739.0 - throughput: 102.67994660642776 + inference_time: 3670.0 + throughput: 272.47956403269757 estimated_peak_memory_range: - min: 634880 - max: 32317280 + min: 643072 + max: 1912072 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 223 + layers_on_npu: 232 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 223 - job_id: jmg9v9qw5 + total_layers: 232 + job_id: jp0z0oe25 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:59:43Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T01:03:41Z' - torchscript_onnx_tflite: - inference_time: 3270.0 - throughput: 305.8103975535168 + inference_time: 3400.0 + throughput: 294.11764705882354 estimated_peak_memory_range: - min: 16384 - max: 2740240 + min: 32768 + max: 2225928 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,37 +238,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 328 - job_id: jn5q891n5 + job_id: jp14zodkp job_status: Passed torchscript_onnx_qnn: - inference_time: 3404.0 - throughput: 293.7720329024677 + inference_time: 3670.0 + throughput: 272.47956403269757 estimated_peak_memory_range: - min: 643072 - max: 1995504 + min: 630784 + max: 1906976 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 223 + layers_on_npu: 232 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 223 - job_id: jlpe94nog + total_layers: 232 + job_id: jpy138vrp job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:59:40Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T01:03:40Z' - torchscript_onnx_tflite: - inference_time: 3274.0 - throughput: 305.43677458766035 + inference_time: 3384.0 + throughput: 295.5082742316785 estimated_peak_memory_range: - min: 20480 - max: 2183128 + min: 16384 + max: 2285648 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,37 +276,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 328 - job_id: j1glne8jp + job_id: jg9lno3qg job_status: Passed torchscript_onnx_qnn: - inference_time: 3393.0 - throughput: 294.7244326554671 + inference_time: 3664.0 + throughput: 272.92576419213975 estimated_peak_memory_range: - min: 634880 - max: 1867360 + min: 626688 + max: 1792296 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 223 + layers_on_npu: 232 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 223 - job_id: jygzev0og + total_layers: 232 + job_id: jp2kyo3xp job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:59:41Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T01:03:39Z' - torchscript_onnx_tflite: - inference_time: 3265.0 - throughput: 306.2787136294028 + inference_time: 9206.0 + throughput: 108.62480990658267 estimated_peak_memory_range: - min: 28672 - max: 3297824 + min: 184320 + max: 210138112 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,60 +314,113 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 328 - job_id: jw566qm65 + job_id: j5we6ydz5 job_status: Passed torchscript_onnx_qnn: - inference_time: 3524.0 - throughput: 283.7684449489217 + inference_time: 9842.0 + throughput: 101.6053647632595 estimated_peak_memory_range: - min: 634880 - max: 1936752 + min: 0 + max: 32409280 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 223 + layers_on_npu: 232 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 223 - job_id: jz5womr3p + total_layers: 232 + job_id: jgkex6ryg job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T01:03:43Z' + - torchscript_onnx_tflite: + inference_time: 2159.0 + throughput: 463.1773969430292 + estimated_peak_memory_range: + min: 12288 + max: 64427792 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 328 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 328 + job_id: jp4lrexq5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 2436.0 + throughput: 410.5090311986864 + estimated_peak_memory_range: + min: 0 + max: 37701840 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 232 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 232 + job_id: j5q6q497p + job_status: Passed + torchscript_onnx: + inference_time: 7452.0 + throughput: 134.19216317767044 + estimated_peak_memory_range: + min: 643072 + max: 132250864 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 198 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 198 + job_id: jpv6k2z75 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:59:42Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T01:03:49Z' - torchscript_onnx_qnn: - inference_time: 3635.0 - throughput: 275.1031636863824 + inference_time: 3891.0 + throughput: 257.0033410434336 estimated_peak_memory_range: min: 602112 max: 602112 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 223 + layers_on_npu: 232 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 223 - job_id: j1pv3znk5 + total_layers: 232 + job_id: jgn6v1rv5 job_status: Passed torchscript_onnx: - inference_time: 17094.0 - throughput: 58.5000585000585 + inference_time: 16264.0 + throughput: 61.48548942449582 estimated_peak_memory_range: - min: 61222912 - max: 61222912 + min: 60178432 + max: 60178432 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 189 + layers_on_npu: 198 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 189 - job_id: jz57zv8vp + total_layers: 198 + job_id: jp3j0oqxg job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:59:46Z' + timestamp: '2024-10-15T01:03:47Z' diff --git a/qai_hub_models/models/convnext_tiny_w8a16_quantized/README.md b/qai_hub_models/models/convnext_tiny_w8a16_quantized/README.md index 0f5910ed..613e7dde 100644 --- a/qai_hub_models/models/convnext_tiny_w8a16_quantized/README.md +++ b/qai_hub_models/models/convnext_tiny_w8a16_quantized/README.md @@ -6,7 +6,7 @@ ConvNextTiny is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of ConvNext-Tiny-w8a16-Quantized found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/convnext.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/convnext_tiny_w8a16_quantized). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.convnext_tiny_w8a16_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of ConvNext-Tiny-w8a16-Quantized can be found +* The license for the original implementation of ConvNext-Tiny-w8a16-Quantized can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/convnext.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/convnext_tiny_w8a16_quantized/evaluate.py b/qai_hub_models/models/convnext_tiny_w8a16_quantized/evaluate.py index 362002ba..55ba9bac 100644 --- a/qai_hub_models/models/convnext_tiny_w8a16_quantized/evaluate.py +++ b/qai_hub_models/models/convnext_tiny_w8a16_quantized/evaluate.py @@ -28,7 +28,7 @@ def main(): default_split_size=2500, supported_datasets=SUPPORTED_DATASETS, supports_tflite=False, - supports_ort=False, + supports_onnx=False, ) args = parser.parse_args() args.device = None diff --git a/qai_hub_models/models/convnext_tiny_w8a16_quantized/export.py b/qai_hub_models/models/convnext_tiny_w8a16_quantized/export.py index 6de43c32..5b97df43 100644 --- a/qai_hub_models/models/convnext_tiny_w8a16_quantized/export.py +++ b/qai_hub_models/models/convnext_tiny_w8a16_quantized/export.py @@ -10,17 +10,17 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.convnext_tiny_w8a16_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.printing import ( print_inference_metrics, @@ -45,20 +45,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -80,10 +78,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "convnext_tiny_w8a16_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,7 +107,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( target_runtime, output_path, input_spec ) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -135,7 +133,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +148,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +169,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -202,7 +200,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/convnext_tiny_w8a16_quantized/perf.yaml b/qai_hub_models/models/convnext_tiny_w8a16_quantized/perf.yaml index 163b09a4..5c1b0e76 100644 --- a/qai_hub_models/models/convnext_tiny_w8a16_quantized/perf.yaml +++ b/qai_hub_models/models/convnext_tiny_w8a16_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: ConvNext-Tiny-w8a16-Quantized performance_metrics: - torchscript_onnx_qnn: - inference_time: 3447.0 - throughput: 290.1073397156948 + inference_time: 3622.0 + throughput: 276.09055770292656 estimated_peak_memory_range: - min: 323584 - max: 13088648 + min: 0 + max: 121490384 primary_compute_unit: NPU precision: int8 layer_info: @@ -58,7 +56,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 215 - job_id: jz5womz6p + job_id: j5mnx9z7p job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -67,13 +65,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:58:56Z' + timestamp: '2024-10-15T01:02:47Z' - torchscript_onnx_qnn: - inference_time: 2447.0 - throughput: 408.6636697997548 + inference_time: 2610.0 + throughput: 383.1417624521073 estimated_peak_memory_range: - min: 0 - max: 29793200 + min: 315392 + max: 36563696 primary_compute_unit: NPU precision: int8 layer_info: @@ -81,7 +79,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 215 - job_id: jmg9v92l5 + job_id: jgn6v1ej5 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -90,13 +88,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:58:57Z' + timestamp: '2024-10-15T01:02:48Z' - torchscript_onnx_qnn: - inference_time: 3060.0 - throughput: 326.797385620915 + inference_time: 13298.0 + throughput: 75.19927808693036 estimated_peak_memory_range: - min: 323584 - max: 1587888 + min: 315392 + max: 8825552 primary_compute_unit: NPU precision: int8 layer_info: @@ -104,22 +102,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 215 - job_id: jvgdw74e5 + job_id: jglvmw025 job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:59:00Z' + chipset: QCS6490 Proxy + timestamp: '2024-10-15T01:02:58Z' - torchscript_onnx_qnn: - inference_time: 4195.0 - throughput: 238.37902264600714 + inference_time: 3178.0 + throughput: 314.6633102580239 estimated_peak_memory_range: - min: 315392 - max: 33940192 + min: 335872 + max: 1492200 primary_compute_unit: NPU precision: int8 layer_info: @@ -127,22 +125,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 215 - job_id: jvgdw74r5 + job_id: jp2kyom6p job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:59:04Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T01:02:50Z' - torchscript_onnx_qnn: - inference_time: 3090.0 - throughput: 323.62459546925567 + inference_time: 3204.0 + throughput: 312.10986267166044 estimated_peak_memory_range: - min: 327680 - max: 1678656 + min: 335872 + max: 1702992 primary_compute_unit: NPU precision: int8 layer_info: @@ -150,22 +148,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 215 - job_id: jz5womz3p + job_id: jp8qyj3qp job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:59:01Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T01:02:54Z' - torchscript_onnx_qnn: - inference_time: 3091.0 - throughput: 323.51989647363314 + inference_time: 3204.0 + throughput: 312.10986267166044 estimated_peak_memory_range: - min: 331776 - max: 2146800 + min: 335872 + max: 1630224 primary_compute_unit: NPU precision: int8 layer_info: @@ -173,7 +171,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 215 - job_id: jmg9v92w5 + job_id: jp0z0o105 job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -181,14 +179,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:59:02Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T01:02:53Z' - torchscript_onnx_qnn: - inference_time: 3074.0 - throughput: 325.30904359141186 + inference_time: 3198.0 + throughput: 312.6954346466542 estimated_peak_memory_range: - min: 323584 - max: 1667072 + min: 335872 + max: 1549344 primary_compute_unit: NPU precision: int8 layer_info: @@ -196,22 +194,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 215 - job_id: jnp10q185 + job_id: jpy138d0p job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:59:03Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T01:02:52Z' - torchscript_onnx_qnn: - inference_time: 13121.0 - throughput: 76.21370322383964 + inference_time: 4241.0 + throughput: 235.7934449422306 estimated_peak_memory_range: - min: 352256 - max: 8126144 + min: 315392 + max: 42740768 primary_compute_unit: NPU precision: int8 layer_info: @@ -219,19 +217,42 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 215 - job_id: jz57zvnvp + job_id: j5q6q47ep job_status: Passed reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T01:02:57Z' + - torchscript_onnx_qnn: + inference_time: 2406.0 + throughput: 415.6275976724855 + estimated_peak_memory_range: + min: 0 + max: 36477936 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 215 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 215 + job_id: j56y4o3np + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T12:59:05Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T01:02:59Z' - torchscript_onnx_qnn: - inference_time: 3353.0 - throughput: 298.2403817476886 + inference_time: 3505.0 + throughput: 285.30670470756064 estimated_peak_memory_range: min: 303104 max: 303104 @@ -242,7 +263,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 215 - job_id: jnp10q125 + job_id: jprv3xykg job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -251,4 +272,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:58:58Z' + timestamp: '2024-10-15T01:02:49Z' diff --git a/qai_hub_models/models/convnext_tiny_w8a8_quantized/README.md b/qai_hub_models/models/convnext_tiny_w8a8_quantized/README.md index 2cc33cf1..7eac8a6d 100644 --- a/qai_hub_models/models/convnext_tiny_w8a8_quantized/README.md +++ b/qai_hub_models/models/convnext_tiny_w8a8_quantized/README.md @@ -6,7 +6,7 @@ ConvNextTiny is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of ConvNext-Tiny-w8a8-Quantized found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/convnext.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/convnext_tiny_w8a8_quantized). @@ -17,11 +17,6 @@ accross various devices, can be found [here](https://aihub.qualcomm.com/models/c ## Example & Usage -Install the package via pip: -```bash -pip install "qai_hub_models[convnext_tiny_w8a8_quantized]" -``` - Once installed, run the following simple CLI demo: @@ -44,15 +39,19 @@ python -m qai_hub_models.models.convnext_tiny_w8a8_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of ConvNext-Tiny-w8a8-Quantized can be found +* The license for the original implementation of ConvNext-Tiny-w8a8-Quantized can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [A ConvNet for the 2020s](https://arxiv.org/abs/2201.03545) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/convnext.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/convnext_tiny_w8a8_quantized/conftest.py b/qai_hub_models/models/convnext_tiny_w8a8_quantized/conftest.py index e737cdbc..1c81d4b0 100644 --- a/qai_hub_models/models/convnext_tiny_w8a8_quantized/conftest.py +++ b/qai_hub_models/models/convnext_tiny_w8a8_quantized/conftest.py @@ -9,7 +9,6 @@ import pytest from qai_hub_models.models.convnext_tiny_w8a8_quantized import Model -from qai_hub_models.utils.testing import skip_clone_repo_check # Instantiate the model only once for all tests. @@ -22,7 +21,6 @@ def cached_from_pretrained(): from_pretrained = Model.from_pretrained sig = inspect.signature(from_pretrained) - @skip_clone_repo_check def _cached_from_pretrained(*args, **kwargs): cache_key = str(args) + str(kwargs) model = pretrained_cache.get(cache_key, None) diff --git a/qai_hub_models/models/convnext_tiny_w8a8_quantized/evaluate.py b/qai_hub_models/models/convnext_tiny_w8a8_quantized/evaluate.py index 76c29397..87373ea9 100644 --- a/qai_hub_models/models/convnext_tiny_w8a8_quantized/evaluate.py +++ b/qai_hub_models/models/convnext_tiny_w8a8_quantized/evaluate.py @@ -13,10 +13,8 @@ from qai_hub_models.models.convnext_tiny_w8a8_quantized import MODEL_ID, Model from qai_hub_models.utils.args import evaluate_parser, get_hub_device, get_model_kwargs -from qai_hub_models.utils.base_model import BaseModel from qai_hub_models.utils.evaluate import evaluate_on_dataset from qai_hub_models.utils.inference import compile_model_from_args -from qai_hub_models.utils.quantization_aimet import AIMETQuantizableMixin SUPPORTED_DATASETS = ["imagenette", "imagenet"] @@ -28,7 +26,8 @@ def main(): default_split_size=2500, supported_datasets=SUPPORTED_DATASETS, supports_tflite=False, - supports_ort=False, + supports_onnx=False, + is_hub_quantized=True, ) args = parser.parse_args() args.device = None @@ -40,13 +39,7 @@ def main(): MODEL_ID, args, get_model_kwargs(Model, vars(args)) ) hub_device = get_hub_device(None, args.chipset) - - # Use Fp16 model for torch inference - for cls in Model.__mro__: - if issubclass(cls, BaseModel) and not issubclass(cls, AIMETQuantizableMixin): - torch_cls = cls - break - torch_model = torch_cls.from_pretrained(**get_model_kwargs(torch_cls, vars(args))) + torch_model = Model.from_pretrained(**get_model_kwargs(Model, vars(args))) evaluate_on_dataset( hub_model, torch_model, diff --git a/qai_hub_models/models/convnext_tiny_w8a8_quantized/export.py b/qai_hub_models/models/convnext_tiny_w8a8_quantized/export.py index 6714b0a4..43fc8aa9 100644 --- a/qai_hub_models/models/convnext_tiny_w8a8_quantized/export.py +++ b/qai_hub_models/models/convnext_tiny_w8a8_quantized/export.py @@ -10,18 +10,20 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.convnext_tiny_w8a8_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference +from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( print_inference_metrics, print_on_target_demo_cmd, @@ -31,11 +33,14 @@ can_access_qualcomm_ai_hub, export_without_hub_access, ) +from qai_hub_models.utils.quantization import get_calibration_data def export_model( device: str = "Samsung Galaxy S23 (Family)", chipset: Optional[str] = None, + num_calibration_samples: int = 100, + skip_compiling: bool = False, skip_profiling: bool = False, skip_inferencing: bool = False, skip_downloading: bool = False, @@ -45,20 +50,19 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + 3. Compiles the model to an asset that can be run on device + 4. Profiles the model performance on a real device + 5. Inferences the model on sample inputs + 6. Downloads the model asset to the local directory + 7. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 5 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -66,6 +70,9 @@ def export_model( Defaults to DEFAULT_DEVICE if not specified. chipset: If set, will choose a random device with this chipset. Overrides the `device` argument. + num_calibration_samples: The number of calibration data samples + to use for quantization. + skip_compiling: If set, skips compiling model to format that can run on device. skip_profiling: If set, skips profiling of compiled model on real devices. skip_inferencing: If set, skips computing on-device outputs from sample data. skip_downloading: If set, skips downloading of compiled model. @@ -80,10 +87,11 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: - * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). + A struct of: + * A CompileJob object containing metadata about the compile job submitted to hub (None if compiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). + * A QuantizeJob object containing metadata about the quantize job submitted to hub """ model_name = "convnext_tiny_w8a8_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,33 +117,52 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) ) # Trace the model - source_model = model.convert_to_hub_source_model( - target_runtime, output_path, input_spec + source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) + + print(f"Quantizing model {model_name} with {num_calibration_samples} samples.") + # 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + onnx_compile_job = hub.submit_compile_job( + model=source_model, + input_specs=input_spec, + device=hub_device, + name=model_name, + options="--target_runtime onnx", + ) + quantize_job = hub.submit_quantize_job( + model=onnx_compile_job.get_target_model(), + calibration_data=get_calibration_data( + input_spec, "imagenette", num_calibration_samples + ), + weights_dtype=model.get_weights_dtype(), + activations_dtype=model.get_activations_dtype(), + name=model_name, + options=model.get_quantize_options(), ) + if skip_compiling: + return ExportResult(quantize_job=quantize_job) - # 2. Compile the model to an on-device asset + # 3. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) print(f"Optimizing model {model_name} to run on-device") submitted_compile_job = hub.submit_compile_job( - model=source_model, + model=quantize_job.get_target_model(), input_specs=input_spec, device=hub_device, name=model_name, - calibration_data=model.get_calibration_data(target_runtime), options=model_compile_options, ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 4. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +177,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 5. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +198,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 6. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 7. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -202,12 +229,22 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + quantize_job=quantize_job, + ) def main(): warnings.filterwarnings("ignore") - parser = export_parser(model_cls=Model, supports_tflite=False, supports_onnx=False) + parser = export_parser( + model_cls=Model, + supports_tflite=False, + supports_onnx=False, + is_hub_quantized=True, + ) args = parser.parse_args() export_model(**vars(args)) diff --git a/qai_hub_models/models/convnext_tiny_w8a8_quantized/model.py b/qai_hub_models/models/convnext_tiny_w8a8_quantized/model.py index 5e332910..a6e459c7 100644 --- a/qai_hub_models/models/convnext_tiny_w8a8_quantized/model.py +++ b/qai_hub_models/models/convnext_tiny_w8a8_quantized/model.py @@ -4,34 +4,11 @@ # --------------------------------------------------------------------- from __future__ import annotations -from pathlib import Path - -from aimet_torch.quantsim import QuantizationSimModel - -from qai_hub_models.models._shared.convnext_tiny_quantized.model import ( - ConvNextTinyQuantizableBase, -) -from qai_hub_models.utils.asset_loaders import CachedWebModelAsset +from qai_hub_models.models.convnext_tiny.model import ConvNextTiny +from qai_hub_models.utils.quantization import HubQuantizableMixin MODEL_ID = __name__.split(".")[-2] -MODEL_ASSET_VERSION = 1 - -DEFAULT_ENCODINGS = "convnext_tiny_w8a8_quantized_encodings.json" - - -class ConvNextTinyW8A8Quantizable(ConvNextTinyQuantizableBase): - def __init__( - self, - quant_sim_model: QuantizationSimModel, - ) -> None: - ConvNextTinyQuantizableBase.__init__(self, quant_sim_model) - @classmethod - def _default_aimet_encodings(cls) -> str | Path: - return CachedWebModelAsset.from_asset_store( - MODEL_ID, MODEL_ASSET_VERSION, DEFAULT_ENCODINGS - ).fetch() - @classmethod - def _output_bw(cls) -> int: - return 8 +class ConvNextTinyW8A8Quantizable(HubQuantizableMixin, ConvNextTiny): + pass diff --git a/qai_hub_models/models/convnext_tiny_w8a8_quantized/perf.yaml b/qai_hub_models/models/convnext_tiny_w8a8_quantized/perf.yaml index 2b25f8df..d41eb63c 100644 --- a/qai_hub_models/models/convnext_tiny_w8a8_quantized/perf.yaml +++ b/qai_hub_models/models/convnext_tiny_w8a8_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,41 +20,33 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - - QCS8250 (Proxy) - - RB5 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8775P Proxy models: - name: ConvNext-Tiny-w8a8-Quantized performance_metrics: - torchscript_onnx_qnn: - inference_time: 1721.0 - throughput: 581.0575246949448 + inference_time: 1745.0 + throughput: 573.0659025787966 estimated_peak_memory_range: min: 16384 - max: 126115312 + max: 296021224 primary_compute_unit: NPU precision: int8 layer_info: @@ -61,7 +54,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 215 - job_id: jep283o4p + job_id: jpxk97115 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -70,13 +63,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:58:05Z' + timestamp: '2024-10-17T17:33:01Z' - torchscript_onnx_qnn: - inference_time: 1205.0 - throughput: 829.8755186721992 + inference_time: 1224.0 + throughput: 816.9934640522875 estimated_peak_memory_range: min: 163840 - max: 22532912 + max: 23951088 primary_compute_unit: NPU precision: int8 layer_info: @@ -84,7 +77,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 215 - job_id: jqpyev87g + job_id: j5mnewzwp job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -93,13 +86,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:58:06Z' + timestamp: '2024-10-17T17:33:03Z' - torchscript_onnx_qnn: - inference_time: 1675.0 - throughput: 597.0149253731344 + inference_time: 6437.0 + throughput: 155.35187199005748 estimated_peak_memory_range: - min: 180224 - max: 1408144 + min: 208896 + max: 8283008 primary_compute_unit: NPU precision: int8 layer_info: @@ -107,22 +100,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 215 - job_id: j1p8owjxg + job_id: jgn609er5 job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:58:09Z' + chipset: QCS6490 Proxy + timestamp: '2024-10-17T17:33:05Z' - torchscript_onnx_qnn: - inference_time: 2165.0 - throughput: 461.8937644341801 + inference_time: 1675.0 + throughput: 597.0149253731344 estimated_peak_memory_range: - min: 475136 - max: 23927968 + min: 212992 + max: 1423768 primary_compute_unit: NPU precision: int8 layer_info: @@ -130,22 +123,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 215 - job_id: jw566qo05 + job_id: jprv64y9g job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:58:13Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-17T17:33:07Z' - torchscript_onnx_qnn: - inference_time: 1679.0 - throughput: 595.5926146515783 + inference_time: 1672.0 + throughput: 598.0861244019138 estimated_peak_memory_range: - min: 184320 - max: 1662120 + min: 221184 + max: 1819072 primary_compute_unit: NPU precision: int8 layer_info: @@ -153,22 +146,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 215 - job_id: jogkzr62g + job_id: jpy1z4d7p job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:58:10Z' + chipset: SA8255P Proxy + timestamp: '2024-10-17T17:33:11Z' - torchscript_onnx_qnn: - inference_time: 1687.0 - throughput: 592.7682276229995 + inference_time: 1678.0 + throughput: 595.9475566150179 estimated_peak_memory_range: - min: 184320 - max: 1399112 + min: 188416 + max: 1723008 primary_compute_unit: NPU precision: int8 layer_info: @@ -176,7 +169,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 215 - job_id: jn5q89445 + job_id: jp0z41r65 job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -184,14 +177,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:58:11Z' + chipset: SA8775P Proxy + timestamp: '2024-10-17T17:33:13Z' - torchscript_onnx_qnn: - inference_time: 1671.0 - throughput: 598.4440454817475 + inference_time: 2142.0 + throughput: 466.8534080298786 estimated_peak_memory_range: - min: 172032 - max: 1383456 + min: 163840 + max: 27302640 primary_compute_unit: NPU precision: int8 layer_info: @@ -199,22 +192,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 215 - job_id: j1glnew8p + job_id: jp8q237xp job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:58:12Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-17T17:33:15Z' - torchscript_onnx_qnn: - inference_time: 6707.0 - throughput: 149.0979573579842 + inference_time: 1156.0 + throughput: 865.0519031141869 estimated_peak_memory_range: - min: 163840 - max: 8127056 + min: 0 + max: 28172864 primary_compute_unit: NPU precision: int8 layer_info: @@ -222,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 215 - job_id: j1p3kqol5 + job_id: jgkevly2g job_status: Passed reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T12:58:14Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-17T17:33:17Z' - torchscript_onnx_qnn: - inference_time: 1816.0 - throughput: 550.6607929515418 + inference_time: 1828.0 + throughput: 547.0459518599563 estimated_peak_memory_range: - min: 466944 - max: 466944 + min: 520192 + max: 520192 primary_compute_unit: NPU precision: int8 layer_info: @@ -245,7 +238,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 215 - job_id: j2p0yeo6g + job_id: jp2kx7m4p job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -254,4 +247,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:58:07Z' + timestamp: '2024-10-17T17:33:09Z' diff --git a/qai_hub_models/models/convnext_tiny_w8a8_quantized/requirements.txt b/qai_hub_models/models/convnext_tiny_w8a8_quantized/requirements.txt deleted file mode 100644 index de5b80e8..00000000 --- a/qai_hub_models/models/convnext_tiny_w8a8_quantized/requirements.txt +++ /dev/null @@ -1 +0,0 @@ -aimet-torch==1.32.1.post1; sys_platform == "linux" diff --git a/qai_hub_models/models/convnext_tiny_w8a8_quantized/test.py b/qai_hub_models/models/convnext_tiny_w8a8_quantized/test.py deleted file mode 100644 index b7fedd53..00000000 --- a/qai_hub_models/models/convnext_tiny_w8a8_quantized/test.py +++ /dev/null @@ -1,31 +0,0 @@ -# --------------------------------------------------------------------- -# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. -# SPDX-License-Identifier: BSD-3-Clause -# --------------------------------------------------------------------- -from qai_hub_models.models._shared.imagenet_classifier.test_utils import ( - run_imagenet_classifier_test, -) -from qai_hub_models.models.convnext_tiny_w8a8_quantized.demo import main as demo_main -from qai_hub_models.models.convnext_tiny_w8a8_quantized.model import ( - MODEL_ASSET_VERSION, - MODEL_ID, - ConvNextTinyW8A8Quantizable, -) -from qai_hub_models.utils.testing import skip_clone_repo_check - - -@skip_clone_repo_check -def test_task(): - run_imagenet_classifier_test( - ConvNextTinyW8A8Quantizable.from_pretrained(), - MODEL_ID, - asset_version=MODEL_ASSET_VERSION, - probability_threshold=0.56, - diff_tol=0.06, - ) - - -@skip_clone_repo_check -def test_demo(): - # Verify demo does not crash - demo_main(is_test=True) diff --git a/qai_hub_models/models/ddrnet23_slim/README.md b/qai_hub_models/models/ddrnet23_slim/README.md index b076dea2..f4c0a382 100644 --- a/qai_hub_models/models/ddrnet23_slim/README.md +++ b/qai_hub_models/models/ddrnet23_slim/README.md @@ -6,7 +6,7 @@ DDRNet23Slim is a machine learning model that segments an image into semantic classes, specifically designed for road-based scenes. It is designed for the application of self-driving cars. This is based on the implementation of DDRNet23-Slim found -[here](https://github.com/chenjun2hao/DDRNet.pytorch). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/ddrnet23_slim). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.ddrnet23_slim.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of DDRNet23-Slim can be found +* The license for the original implementation of DDRNet23-Slim can be found [here](https://github.com/chenjun2hao/DDRNet.pytorch/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes](https://arxiv.org/abs/2101.06085) * [Source Model Implementation](https://github.com/chenjun2hao/DDRNet.pytorch) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/ddrnet23_slim/export.py b/qai_hub_models/models/ddrnet23_slim/export.py index 3f162c6e..b5b34070 100644 --- a/qai_hub_models/models/ddrnet23_slim/export.py +++ b/qai_hub_models/models/ddrnet23_slim/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.ddrnet23_slim import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "ddrnet23_slim" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/ddrnet23_slim/perf.yaml b/qai_hub_models/models/ddrnet23_slim/perf.yaml index 00fb7085..55751df0 100644 --- a/qai_hub_models/models/ddrnet23_slim/perf.yaml +++ b/qai_hub_models/models/ddrnet23_slim/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: DDRNet23-Slim performance_metrics: - torchscript_onnx_tflite: - inference_time: 5175.0 - throughput: 193.23671497584542 + inference_time: 5215.0 + throughput: 191.75455417066155 estimated_peak_memory_range: - min: 987136 - max: 3164272 + min: 1929216 + max: 3767840 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 131 - job_id: jmg9v9zl5 + job_id: jgz3dzj45 job_status: Passed torchscript_onnx: - inference_time: 9595.0 - throughput: 104.22094841063054 + inference_time: 7422.0 + throughput: 134.73457289140393 estimated_peak_memory_range: - min: 11673600 - max: 218833312 + min: 11857920 + max: 13806568 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,7 +71,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 155 - job_id: j1glney8p + job_id: jp3j0oemg job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -82,13 +80,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:57:10Z' + timestamp: '2024-10-15T01:00:45Z' - torchscript_onnx_tflite: - inference_time: 4469.0 - throughput: 223.76370552696352 + inference_time: 4002.0 + throughput: 249.8750624687656 estimated_peak_memory_range: min: 987136 - max: 74955808 + max: 79422992 primary_compute_unit: NPU precision: fp16 layer_info: @@ -96,14 +94,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 131 - job_id: jnp10qn25 + job_id: j5we6y345 job_status: Passed torchscript_onnx: - inference_time: 7899.0 - throughput: 126.59830358273199 + inference_time: 5648.0 + throughput: 177.05382436260624 estimated_peak_memory_range: min: 0 - max: 82281968 + max: 94016992 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,7 +109,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 155 - job_id: j1p3kqzl5 + job_id: jgo26d31p job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -120,13 +118,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:57:11Z' + timestamp: '2024-10-15T01:00:46Z' - torchscript_onnx_tflite: - inference_time: 5059.0 - throughput: 197.667523225934 + inference_time: 5072.0 + throughput: 197.1608832807571 estimated_peak_memory_range: - min: 999424 - max: 2488576 + min: 1036288 + max: 3600920 primary_compute_unit: NPU precision: fp16 layer_info: @@ -134,7 +132,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 131 - job_id: jvgdw7de5 + job_id: jg9lnoymg job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -142,14 +140,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:56:57Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T01:00:26Z' - torchscript_onnx_tflite: - inference_time: 7501.0 - throughput: 133.3155579256099 + inference_time: 5150.0 + throughput: 194.1747572815534 estimated_peak_memory_range: - min: 1028096 - max: 65317360 + min: 12288 + max: 1769952 primary_compute_unit: NPU precision: fp16 layer_info: @@ -157,22 +155,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 131 - job_id: jz57zvelp + job_id: jp4lred25 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:56:58Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T01:00:30Z' - torchscript_onnx_tflite: - inference_time: 5198.0 - throughput: 192.3816852635629 + inference_time: 5152.0 + throughput: 194.09937888198758 estimated_peak_memory_range: - min: 0 - max: 14606376 + min: 995328 + max: 2779824 primary_compute_unit: NPU precision: fp16 layer_info: @@ -180,22 +178,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 131 - job_id: jqp4qjyvg + job_id: j57yroln5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:56:59Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T01:00:29Z' - torchscript_onnx_tflite: inference_time: 5027.0 throughput: 198.92580067634773 estimated_peak_memory_range: - min: 999424 - max: 8435688 + min: 528384 + max: 3069008 primary_compute_unit: NPU precision: fp16 layer_info: @@ -203,22 +201,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 131 - job_id: j0pxvel1g + job_id: jgdx16q6p job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:57:00Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T01:00:28Z' - torchscript_onnx_tflite: - inference_time: 5195.0 - throughput: 192.49278152069297 + inference_time: 7487.0 + throughput: 133.56484573260317 estimated_peak_memory_range: - min: 405504 - max: 7332312 + min: 1032192 + max: 67333008 primary_compute_unit: NPU precision: fp16 layer_info: @@ -226,19 +224,57 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 131 - job_id: jo5mrv0wg + job_id: jp14zownp job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T01:00:27Z' + - torchscript_onnx_tflite: + inference_time: 3419.0 + throughput: 292.48318221702255 + estimated_peak_memory_range: + min: 983040 + max: 41260000 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 131 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 131 + job_id: j5mnx967p + job_status: Passed + torchscript_onnx: + inference_time: 5009.0 + throughput: 199.64064683569575 + estimated_peak_memory_range: + min: 11886592 + max: 58267120 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 155 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 155 + job_id: jpedm6k85 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:57:01Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T01:00:49Z' - torchscript_onnx: - inference_time: 9452.0 - throughput: 105.79771476936098 + inference_time: 8333.0 + throughput: 120.00480019200768 estimated_peak_memory_range: min: 9859072 max: 9859072 @@ -249,7 +285,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 155 - job_id: jwgoyelx5 + job_id: jpv6k2vz5 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -258,4 +294,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:57:12Z' + timestamp: '2024-10-15T01:00:47Z' diff --git a/qai_hub_models/models/deeplabv3_plus_mobilenet/README.md b/qai_hub_models/models/deeplabv3_plus_mobilenet/README.md index a8ce7dc5..ed83d022 100644 --- a/qai_hub_models/models/deeplabv3_plus_mobilenet/README.md +++ b/qai_hub_models/models/deeplabv3_plus_mobilenet/README.md @@ -6,7 +6,7 @@ DeepLabV3 is designed for semantic segmentation at multiple scales, trained on the various datasets. It uses MobileNet as a backbone. This is based on the implementation of DeepLabV3-Plus-MobileNet found -[here](https://github.com/jfzhang95/pytorch-deeplab-xception). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/deeplabv3_plus_mobilenet). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.deeplabv3_plus_mobilenet.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of DeepLabV3-Plus-MobileNet can be found +* The license for the original implementation of DeepLabV3-Plus-MobileNet can be found [here](https://github.com/jfzhang95/pytorch-deeplab-xception/blob/master/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Rethinking Atrous Convolution for Semantic Image Segmentation](https://arxiv.org/abs/1706.05587) * [Source Model Implementation](https://github.com/jfzhang95/pytorch-deeplab-xception) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/deeplabv3_plus_mobilenet/export.py b/qai_hub_models/models/deeplabv3_plus_mobilenet/export.py index c5f586f2..d1ce03ae 100644 --- a/qai_hub_models/models/deeplabv3_plus_mobilenet/export.py +++ b/qai_hub_models/models/deeplabv3_plus_mobilenet/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.deeplabv3_plus_mobilenet import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "deeplabv3_plus_mobilenet" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/deeplabv3_plus_mobilenet/perf.yaml b/qai_hub_models/models/deeplabv3_plus_mobilenet/perf.yaml index 08441fdb..6a9da31e 100644 --- a/qai_hub_models/models/deeplabv3_plus_mobilenet/perf.yaml +++ b/qai_hub_models/models/deeplabv3_plus_mobilenet/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: DeepLabV3-Plus-MobileNet performance_metrics: - torchscript_onnx_tflite: - inference_time: 13181.0 - throughput: 75.86677793794098 + inference_time: 13441.0 + throughput: 74.39922624804701 estimated_peak_memory_range: - min: 14233600 - max: 23263104 + min: 21590016 + max: 23177024 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 98 - job_id: jvgdw73e5 + job_id: jgjvn3zeg job_status: Passed torchscript_onnx_qnn: - inference_time: 13150.0 - throughput: 76.04562737642586 + inference_time: 13124.0 + throughput: 76.19628162145688 estimated_peak_memory_range: - min: 3190784 - max: 18670512 + min: 3178496 + max: 21314592 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 124 - job_id: jep283w4p + job_id: jp14zoynp job_status: Passed torchscript_onnx: - inference_time: 17923.0 - throughput: 55.794230876527365 + inference_time: 16946.0 + throughput: 59.01097604154373 estimated_peak_memory_range: - min: 52928512 - max: 54745920 + min: 47849472 + max: 346264704 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: j1p3kq9l5 + job_id: jp0z0o205 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:56:27Z' + timestamp: '2024-10-15T00:59:55Z' - torchscript_onnx_tflite: - inference_time: 10842.0 - throughput: 92.23390518354547 + inference_time: 10784.0 + throughput: 92.7299703264095 estimated_peak_memory_range: min: 22142976 - max: 97061248 + max: 102268128 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 98 - job_id: jz57zv4lp + job_id: jpedm6ev5 job_status: Passed torchscript_onnx_qnn: - inference_time: 10777.0 - throughput: 92.79020135473694 + inference_time: 10749.0 + throughput: 93.03190994511117 estimated_peak_memory_range: min: 3174400 - max: 24408400 + max: 29549824 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 124 - job_id: jqpyevx7g + job_id: jgdx16e6p job_status: Passed torchscript_onnx: - inference_time: 16346.0 - throughput: 61.17704637220115 + inference_time: 15136.0 + throughput: 66.0676532769556 estimated_peak_memory_range: - min: 48263168 - max: 126144864 + min: 958464 + max: 85892848 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: jwgoyerx5 + job_id: jp8qyjmqp job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:56:28Z' + timestamp: '2024-10-15T00:59:56Z' - torchscript_onnx_tflite: - inference_time: 13090.0 - throughput: 76.39419404125286 + inference_time: 13166.0 + throughput: 75.95321282090232 estimated_peak_memory_range: - min: 22343680 - max: 23718824 + min: 22347776 + max: 68519216 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 98 - job_id: jqp4qj1vg + job_id: jgz3dzox5 job_status: Passed torchscript_onnx_qnn: - inference_time: 12022.0 - throughput: 83.18083513558476 + inference_time: 12047.0 + throughput: 83.00821781356355 estimated_peak_memory_range: - min: 3223552 - max: 4869352 + min: 3239936 + max: 4477560 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 124 - job_id: j1p8owxxg + job_id: jp4lrek25 job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:56:22Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:59:48Z' - torchscript_onnx_tflite: - inference_time: 18229.0 - throughput: 54.85764441274892 + inference_time: 13288.0 + throughput: 75.25586995785672 estimated_peak_memory_range: - min: 22167552 - max: 99317600 + min: 22155264 + max: 34721392 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 98 - job_id: j0pxve41g + job_id: jgdx16ezp job_status: Passed torchscript_onnx_qnn: - inference_time: 18600.0 - throughput: 53.763440860215056 + inference_time: 12206.0 + throughput: 81.92692118630183 estimated_peak_memory_range: - min: 3174400 - max: 28359712 + min: 3194880 + max: 4680256 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 124 - job_id: jw566q705 + job_id: jgn6v1lj5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:56:26Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:59:51Z' - torchscript_onnx_tflite: inference_time: 13223.0 throughput: 75.62580352416245 estimated_peak_memory_range: - min: 22122496 - max: 30344656 + min: 14749696 + max: 20440648 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 98 - job_id: jo5mrvmwg + job_id: jp14zoy7p job_status: Passed torchscript_onnx_qnn: - inference_time: 12048.0 - throughput: 83.00132802124834 + inference_time: 12296.0 + throughput: 81.32726089785297 estimated_peak_memory_range: - min: 3235840 - max: 4624240 + min: 3260416 + max: 4642640 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 124 - job_id: jogkzr42g + job_id: j5mnx9q7p job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:56:23Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:59:50Z' - torchscript_onnx_tflite: - inference_time: 13101.0 - throughput: 76.33005114113426 + inference_time: 13234.0 + throughput: 75.5629439322956 estimated_peak_memory_range: - min: 22114304 - max: 25047336 + min: 28082176 + max: 30679032 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 98 - job_id: jegn2rnrg + job_id: jg9lnoj8g job_status: Passed torchscript_onnx_qnn: - inference_time: 12119.0 - throughput: 82.51505899826718 + inference_time: 12164.0 + throughput: 82.20979940808944 estimated_peak_memory_range: - min: 3227648 - max: 4558864 + min: 3207168 + max: 4483920 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 124 - job_id: jn5q89y45 + job_id: jpxko0n85 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:56:24Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:59:49Z' - torchscript_onnx_tflite: - inference_time: 13119.0 - throughput: 76.22532205198567 + inference_time: 18816.0 + throughput: 53.14625850340136 estimated_peak_memory_range: min: 22138880 - max: 38378768 + max: 101751920 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 98 - job_id: joprk1095 + job_id: j5we6y2m5 job_status: Passed torchscript_onnx_qnn: - inference_time: 12081.0 - throughput: 82.77460475126232 + inference_time: 18643.0 + throughput: 53.6394357131363 estimated_peak_memory_range: - min: 3252224 - max: 4620112 + min: 3174400 + max: 31557232 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 124 - job_id: j1glnex8p + job_id: jp2kyo06p job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:59:53Z' + - torchscript_onnx_tflite: + inference_time: 7831.0 + throughput: 127.69761205465458 + estimated_peak_memory_range: + min: 20316160 + max: 59031680 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 98 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 98 + job_id: jg9lnojmg + job_status: Passed + torchscript_onnx_qnn: + inference_time: 9188.0 + throughput: 108.837614279495 + estimated_peak_memory_range: + min: 3170304 + max: 27743296 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 124 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 124 + job_id: jpy138r0p + job_status: Passed + torchscript_onnx: + inference_time: 11971.0 + throughput: 83.53521009105337 + estimated_peak_memory_range: + min: 53542912 + max: 94260304 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 126 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 126 + job_id: jglvmw225 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:56:25Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:59:59Z' - torchscript_onnx_qnn: - inference_time: 12511.0 - throughput: 79.92966189753017 + inference_time: 12380.0 + throughput: 80.77544426494346 estimated_peak_memory_range: min: 3170304 max: 3170304 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 124 - job_id: j2p0yej6g + job_id: j57yro0n5 job_status: Passed torchscript_onnx: - inference_time: 16612.0 - throughput: 60.197447628220566 + inference_time: 16661.0 + throughput: 60.020406938359045 estimated_peak_memory_range: - min: 69386240 - max: 69386240 + min: 69431296 + max: 69431296 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: j1pv3zdj5 + job_id: jgkex6qvg job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:56:29Z' + timestamp: '2024-10-15T00:59:57Z' diff --git a/qai_hub_models/models/deeplabv3_plus_mobilenet_quantized/README.md b/qai_hub_models/models/deeplabv3_plus_mobilenet_quantized/README.md index c371f6c3..f8503741 100644 --- a/qai_hub_models/models/deeplabv3_plus_mobilenet_quantized/README.md +++ b/qai_hub_models/models/deeplabv3_plus_mobilenet_quantized/README.md @@ -6,7 +6,7 @@ DeepLabV3 Quantized is designed for semantic segmentation at multiple scales, trained on various datasets. It uses MobileNet as a backbone. This is based on the implementation of DeepLabV3-Plus-MobileNet-Quantized found -[here](https://github.com/jfzhang95/pytorch-deeplab-xception). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/deeplabv3_plus_mobilenet_quantized). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.deeplabv3_plus_mobilenet_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of DeepLabV3-Plus-MobileNet-Quantized can be found +* The license for the original implementation of DeepLabV3-Plus-MobileNet-Quantized can be found [here](https://github.com/quic/aimet-model-zoo/blob/develop/LICENSE.pdf). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Rethinking Atrous Convolution for Semantic Image Segmentation](https://arxiv.org/abs/1706.05587) * [Source Model Implementation](https://github.com/jfzhang95/pytorch-deeplab-xception) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/deeplabv3_plus_mobilenet_quantized/export.py b/qai_hub_models/models/deeplabv3_plus_mobilenet_quantized/export.py index 7d5193ae..58212a92 100644 --- a/qai_hub_models/models/deeplabv3_plus_mobilenet_quantized/export.py +++ b/qai_hub_models/models/deeplabv3_plus_mobilenet_quantized/export.py @@ -10,17 +10,17 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.deeplabv3_plus_mobilenet_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.printing import ( print_inference_metrics, @@ -45,20 +45,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -80,10 +78,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "deeplabv3_plus_mobilenet_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,7 +107,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( target_runtime, output_path, input_spec ) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -135,7 +133,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +148,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +169,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -198,7 +196,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/deeplabv3_plus_mobilenet_quantized/perf.yaml b/qai_hub_models/models/deeplabv3_plus_mobilenet_quantized/perf.yaml index 105efc74..dba4daa2 100644 --- a/qai_hub_models/models/deeplabv3_plus_mobilenet_quantized/perf.yaml +++ b/qai_hub_models/models/deeplabv3_plus_mobilenet_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,44 +20,41 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS6490 (Proxy) - RB3 Gen 2 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Sa8775p Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Qcs6490 Proxy + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: DeepLabV3-Plus-MobileNet-Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 3353.0 - throughput: 298.2403817476886 + inference_time: 3304.0 + throughput: 302.6634382566586 estimated_peak_memory_range: min: 12288 - max: 8840440 + max: 153358872 primary_compute_unit: NPU precision: int8 layer_info: @@ -64,29 +62,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 104 - job_id: jnp10q625 + job_id: jgkex6eng job_status: Passed torchscript_onnx_qnn: - inference_time: 5163.0 - throughput: 193.6858415649816 + inference_time: 5214.0 + throughput: 191.79133103183736 estimated_peak_memory_range: min: 16384 - max: 14733928 + max: 12339320 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 99 + layers_on_npu: 142 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 99 - job_id: jqpyevm7g + total_layers: 142 + job_id: jg9lno08g job_status: Passed torchscript_onnx: - inference_time: 4668.0 - throughput: 214.22450728363324 + inference_time: 4221.0 + throughput: 236.9106846718787 estimated_peak_memory_range: - min: 15290368 - max: 18104776 + min: 11128832 + max: 19101592 primary_compute_unit: NPU precision: int8 layer_info: @@ -94,7 +92,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 103 - job_id: j1pv3z7j5 + job_id: jp0z0o4n5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -103,13 +101,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:55:44Z' + timestamp: '2024-10-15T00:59:05Z' - torchscript_onnx_tflite: - inference_time: 2847.0 - throughput: 351.24692658939233 + inference_time: 2825.0 + throughput: 353.98230088495575 estimated_peak_memory_range: min: 12288 - max: 65304512 + max: 68144736 primary_compute_unit: NPU precision: int8 layer_info: @@ -117,29 +115,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 104 - job_id: jvgdw72e5 + job_id: j5q6q46op job_status: Passed torchscript_onnx_qnn: - inference_time: 3858.0 - throughput: 259.2016588906169 + inference_time: 3844.0 + throughput: 260.1456815816857 estimated_peak_memory_range: min: 802816 - max: 29932288 + max: 26509280 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 99 + layers_on_npu: 142 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 99 - job_id: j2p0ye66g + total_layers: 142 + job_id: jp14zo27p job_status: Passed torchscript_onnx: - inference_time: 4064.0 - throughput: 246.06299212598427 + inference_time: 3141.0 + throughput: 318.3699458771092 estimated_peak_memory_range: min: 12288 - max: 72339920 + max: 75157872 primary_compute_unit: NPU precision: int8 layer_info: @@ -147,7 +145,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 103 - job_id: j7gjxkqxp + job_id: jp8qyj2op job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -156,13 +154,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:55:45Z' + timestamp: '2024-10-15T00:59:06Z' - torchscript_onnx_tflite: - inference_time: 3284.0 - throughput: 304.50669914738125 + inference_time: 14162.0 + throughput: 70.61149555147578 estimated_peak_memory_range: - min: 12288 - max: 14577616 + min: 5586944 + max: 50318288 primary_compute_unit: NPU precision: int8 layer_info: @@ -170,37 +168,60 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 104 - job_id: jz57zv9lp + job_id: jpedm6ov5 job_status: Passed torchscript_onnx_qnn: - inference_time: 3939.0 - throughput: 253.87154100025387 + inference_time: 18291.0 + throughput: 54.67169646274124 estimated_peak_memory_range: - min: 847872 - max: 2112960 + min: 827392 + max: 9088768 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 99 + layers_on_npu: 142 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 99 - job_id: jogkzr82g + total_layers: 142 + job_id: jp2kyoxqp job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS6490 Proxy + timestamp: '2024-10-15T00:59:03Z' + - torchscript_onnx_tflite: + inference_time: 127380.0 + throughput: 7.850525985241011 + estimated_peak_memory_range: + min: 11624448 + max: 66169216 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 101 + layers_on_gpu: 3 + layers_on_cpu: 0 + total_layers: 104 + job_id: jgz3dz2x5 + job_status: Passed + reference_device_info: + name: RB5 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:55:37Z' + chipset: QCS8250 Proxy + timestamp: '2024-10-15T00:58:51Z' - torchscript_onnx_tflite: - inference_time: 4263.0 - throughput: 234.57658925639223 + inference_time: 3315.0 + throughput: 301.65912518853696 estimated_peak_memory_range: - min: 20480 - max: 66912736 + min: 16384 + max: 8860536 primary_compute_unit: NPU precision: int8 layer_info: @@ -208,37 +229,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 104 - job_id: jqp4qj3vg + job_id: jglvmw4m5 job_status: Passed torchscript_onnx_qnn: - inference_time: 5664.0 - throughput: 176.5536723163842 + inference_time: 3963.0 + throughput: 252.33409033560434 estimated_peak_memory_range: - min: 802816 - max: 30079360 + min: 831488 + max: 2029904 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 99 + layers_on_npu: 142 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 99 - job_id: j1p3kq6l5 + total_layers: 142 + job_id: j57yro295 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:55:42Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:58:57Z' - torchscript_onnx_tflite: - inference_time: 3280.0 - throughput: 304.8780487804878 + inference_time: 3335.0 + throughput: 299.85007496251876 estimated_peak_memory_range: min: 12288 - max: 8695576 + max: 4632240 primary_compute_unit: NPU precision: int8 layer_info: @@ -246,37 +267,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 104 - job_id: j0pxvex1g + job_id: jpv6k2qr5 job_status: Passed torchscript_onnx_qnn: - inference_time: 3980.0 - throughput: 251.25628140703517 + inference_time: 3970.0 + throughput: 251.88916876574308 estimated_peak_memory_range: - min: 831488 - max: 2093912 + min: 827392 + max: 2031208 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 99 + layers_on_npu: 142 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 99 - job_id: jn5q89v45 + total_layers: 142 + job_id: j5mnx9e9p job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:55:39Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:59:00Z' - torchscript_onnx_tflite: - inference_time: 3323.0 - throughput: 300.9328919650918 + inference_time: 3294.0 + throughput: 303.58227079538557 estimated_peak_memory_range: min: 12288 - max: 3232360 + max: 8952312 primary_compute_unit: NPU precision: int8 layer_info: @@ -284,22 +305,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 104 - job_id: jo5mrv8wg + job_id: jgo26dzkp job_status: Passed torchscript_onnx_qnn: - inference_time: 3969.0 - throughput: 251.95263290501387 + inference_time: 3994.0 + throughput: 250.37556334501753 estimated_peak_memory_range: - min: 827392 - max: 2407720 + min: 819200 + max: 2086304 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 99 + layers_on_npu: 142 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 99 - job_id: j1glnel8p + total_layers: 142 + job_id: jpxko09l5 job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -307,14 +328,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:55:40Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:58:59Z' - torchscript_onnx_tflite: - inference_time: 3317.0 - throughput: 301.4772384684956 + inference_time: 3328.0 + throughput: 300.4807692307692 estimated_peak_memory_range: - min: 40960 - max: 148009896 + min: 12288 + max: 120682856 primary_compute_unit: NPU precision: int8 layer_info: @@ -322,37 +343,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 104 - job_id: jegn2rkrg + job_id: jp3j0onng job_status: Passed torchscript_onnx_qnn: - inference_time: 3975.0 - throughput: 251.57232704402514 + inference_time: 3963.0 + throughput: 252.33409033560434 estimated_peak_memory_range: - min: 847872 - max: 2052464 + min: 843776 + max: 2147872 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 99 + layers_on_npu: 142 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 99 - job_id: jw566qw05 + total_layers: 142 + job_id: jp4lren15 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:55:41Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:58:58Z' - torchscript_onnx_tflite: - inference_time: 14594.0 - throughput: 68.52131012744964 + inference_time: 4166.0 + throughput: 240.03840614498318 estimated_peak_memory_range: - min: 5537792 - max: 49597760 + min: 5566464 + max: 74729504 primary_compute_unit: NPU precision: int8 layer_info: @@ -360,75 +381,105 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 104 - job_id: joprk1w95 + job_id: j56y4o2yp job_status: Passed torchscript_onnx_qnn: - inference_time: 18495.0 - throughput: 54.068667207353336 + inference_time: 5510.0 + throughput: 181.48820326678765 estimated_peak_memory_range: - min: 888832 - max: 8740416 + min: 802816 + max: 33657840 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 99 + layers_on_npu: 142 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 99 - job_id: jwgoye8x5 + total_layers: 142 + job_id: jprv3x67g job_status: Passed reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T12:55:43Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:59:02Z' - torchscript_onnx_tflite: - inference_time: 115874.0 - throughput: 8.630063689870031 + inference_time: 2441.0 + throughput: 409.6681687832855 estimated_peak_memory_range: - min: 10899456 - max: 53906400 + min: 8192 + max: 43724352 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 101 - layers_on_gpu: 3 + layers_on_npu: 104 + layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 104 - job_id: jep283e4p + job_id: j5we6ywm5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 3816.0 + throughput: 262.0545073375262 + estimated_peak_memory_range: + min: 815104 + max: 26711376 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 142 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 142 + job_id: jpy138zlp + job_status: Passed + torchscript_onnx: + inference_time: 2494.0 + throughput: 400.962309542903 + estimated_peak_memory_range: + min: 65536 + max: 49631824 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 103 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 103 + job_id: jglvmw6m5 job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T12:55:33Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:59:09Z' - torchscript_onnx_qnn: - inference_time: 4292.0 - throughput: 232.99161230195713 + inference_time: 4324.0 + throughput: 231.26734505087882 estimated_peak_memory_range: - min: 802816 - max: 802816 + min: 815104 + max: 815104 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 99 + layers_on_npu: 142 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 99 - job_id: j1p8ow1xg + total_layers: 142 + job_id: jgdx16nzp job_status: Passed torchscript_onnx: - inference_time: 4600.0 - throughput: 217.3913043478261 + inference_time: 4680.0 + throughput: 213.67521367521368 estimated_peak_memory_range: - min: 18182144 - max: 18182144 + min: 18178048 + max: 18178048 primary_compute_unit: NPU precision: int8 layer_info: @@ -436,7 +487,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 103 - job_id: jlpe94y1g + job_id: jgkex6vng job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -445,4 +496,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:55:45Z' + timestamp: '2024-10-15T00:59:07Z' diff --git a/qai_hub_models/models/deeplabv3_resnet50/README.md b/qai_hub_models/models/deeplabv3_resnet50/README.md index 2149f272..e57f8d8a 100644 --- a/qai_hub_models/models/deeplabv3_resnet50/README.md +++ b/qai_hub_models/models/deeplabv3_resnet50/README.md @@ -6,7 +6,7 @@ DeepLabV3 is designed for semantic segmentation at multiple scales, trained on the COCO dataset. It uses ResNet50 as a backbone. This is based on the implementation of DeepLabV3-ResNet50 found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/segmentation/deeplabv3.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/deeplabv3_resnet50). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.deeplabv3_resnet50.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of DeepLabV3-ResNet50 can be found +* The license for the original implementation of DeepLabV3-ResNet50 can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Rethinking Atrous Convolution for Semantic Image Segmentation](https://arxiv.org/abs/1706.05587) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/segmentation/deeplabv3.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/deeplabv3_resnet50/export.py b/qai_hub_models/models/deeplabv3_resnet50/export.py index d693b76e..3913a5b5 100644 --- a/qai_hub_models/models/deeplabv3_resnet50/export.py +++ b/qai_hub_models/models/deeplabv3_resnet50/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.deeplabv3_resnet50 import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "deeplabv3_resnet50" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/deeplabv3_resnet50/perf.yaml b/qai_hub_models/models/deeplabv3_resnet50/perf.yaml index 598764c9..49b7bdcf 100644 --- a/qai_hub_models/models/deeplabv3_resnet50/perf.yaml +++ b/qai_hub_models/models/deeplabv3_resnet50/perf.yaml @@ -18,37 +18,31 @@ aggregated: - Samsung Galaxy S21 - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - - QCS8550 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) - SA8775 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL supported_chipsets: - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: DeepLabV3-ResNet50 performance_metrics: - torchscript_onnx_tflite: - inference_time: 291699.0 - throughput: 3.428191389068869 + inference_time: 291789.0 + throughput: 3.427133990657633 estimated_peak_memory_range: - min: 12288 - max: 148775512 + min: 22183936 + max: 200672256 primary_compute_unit: GPU precision: fp16 layer_info: @@ -56,7 +50,7 @@ models: layers_on_gpu: 95 layers_on_cpu: 0 total_layers: 95 - job_id: jogkzr9og + job_id: j56y4oyyp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -65,13 +59,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:54:35Z' + timestamp: '2024-10-15T00:57:45Z' - torchscript_onnx_tflite: - inference_time: 225905.0 - throughput: 4.426639516610964 + inference_time: 225775.0 + throughput: 4.429188351234636 estimated_peak_memory_range: - min: 22335488 - max: 44592032 + min: 22384640 + max: 44657360 primary_compute_unit: GPU precision: fp16 layer_info: @@ -79,7 +73,7 @@ models: layers_on_gpu: 95 layers_on_cpu: 0 total_layers: 95 - job_id: jn5q89mm5 + job_id: jp3j0ojng job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -88,13 +82,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:54:36Z' + timestamp: '2024-10-15T00:57:46Z' - torchscript_onnx_tflite: - inference_time: 290320.0 - throughput: 3.444475062000551 + inference_time: 289970.0 + throughput: 3.448632617167293 estimated_peak_memory_range: - min: 253952 - max: 148467448 + min: 32768 + max: 244270728 primary_compute_unit: GPU precision: fp16 layer_info: @@ -102,7 +96,7 @@ models: layers_on_gpu: 95 layers_on_cpu: 0 total_layers: 95 - job_id: j1glne1lp + job_id: jgo26d2kp job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -110,14 +104,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:54:37Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:57:47Z' - torchscript_onnx_tflite: - inference_time: 776883.0 - throughput: 1.2871951117478437 + inference_time: 290802.0 + throughput: 3.438765895695353 estimated_peak_memory_range: - min: 73728 - max: 31080016 + min: 49152 + max: 148905792 primary_compute_unit: GPU precision: fp16 layer_info: @@ -125,22 +119,22 @@ models: layers_on_gpu: 95 layers_on_cpu: 0 total_layers: 95 - job_id: jw566qd75 + job_id: jgz3dz3x5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:54:38Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:57:51Z' - torchscript_onnx_tflite: - inference_time: 291442.0 - throughput: 3.431214444040324 + inference_time: 289879.0 + throughput: 3.449715226008093 estimated_peak_memory_range: - min: 110592 - max: 148877688 + min: 2187264 + max: 145774384 primary_compute_unit: GPU precision: fp16 layer_info: @@ -148,22 +142,22 @@ models: layers_on_gpu: 95 layers_on_cpu: 0 total_layers: 95 - job_id: j1p3kqwz5 + job_id: jpedm6dv5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:54:39Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:57:50Z' - torchscript_onnx_tflite: - inference_time: 290379.0 - throughput: 3.4437752041297753 + inference_time: 290181.0 + throughput: 3.446125004738422 estimated_peak_memory_range: - min: 32768 - max: 149249784 + min: 86016 + max: 148819304 primary_compute_unit: GPU precision: fp16 layer_info: @@ -171,22 +165,22 @@ models: layers_on_gpu: 95 layers_on_cpu: 0 total_layers: 95 - job_id: j1pv3z9m5 + job_id: jgjvn3veg job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:54:41Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:57:49Z' - torchscript_onnx_tflite: - inference_time: 290542.0 - throughput: 3.441843175857535 + inference_time: 757728.0 + throughput: 1.3197347860973858 estimated_peak_memory_range: - min: 16384 - max: 304936512 + min: 21626880 + max: 53838160 primary_compute_unit: GPU precision: fp16 layer_info: @@ -194,13 +188,21 @@ models: layers_on_gpu: 95 layers_on_cpu: 0 total_layers: 95 - job_id: j7gjxkw8p + job_id: jpv6k26r5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:57:48Z' + - reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:54:42Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:57:53Z' diff --git a/qai_hub_models/models/densenet121/README.md b/qai_hub_models/models/densenet121/README.md index 8da6afd9..a291d310 100644 --- a/qai_hub_models/models/densenet121/README.md +++ b/qai_hub_models/models/densenet121/README.md @@ -6,7 +6,7 @@ Densenet is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of DenseNet-121 found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/densenet.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/densenet121). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.densenet121.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of DenseNet-121 can be found +* The license for the original implementation of DenseNet-121 can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Densely Connected Convolutional Networks](https://arxiv.org/abs/1608.06993) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/densenet.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/densenet121/export.py b/qai_hub_models/models/densenet121/export.py index 0c09c8d4..339c295a 100644 --- a/qai_hub_models/models/densenet121/export.py +++ b/qai_hub_models/models/densenet121/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.densenet121 import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "densenet121" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/densenet121/perf.yaml b/qai_hub_models/models/densenet121/perf.yaml index 363e6c8e..510ac059 100644 --- a/qai_hub_models/models/densenet121/perf.yaml +++ b/qai_hub_models/models/densenet121/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: DenseNet-121 performance_metrics: - torchscript_onnx_tflite: - inference_time: 1930.0 - throughput: 518.1347150259068 + inference_time: 1922.0 + throughput: 520.2913631633714 estimated_peak_memory_range: - min: 28672 - max: 247350448 + min: 16384 + max: 6100120 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 312 - job_id: jogkzroog + job_id: jpy13o0lp job_status: Passed torchscript_onnx_qnn: - inference_time: 1994.0 - throughput: 501.5045135406219 + inference_time: 1990.0 + throughput: 502.51256281407035 estimated_peak_memory_range: - min: 12288 - max: 5539288 + min: 622592 + max: 5712216 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 372 - job_id: j7gjxko8p + job_id: jpv6k2kr5 job_status: Passed torchscript_onnx: - inference_time: 1948.0 - throughput: 513.347022587269 + inference_time: 1872.0 + throughput: 534.1880341880342 estimated_peak_memory_range: min: 12288 - max: 18023096 + max: 17858104 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 374 - job_id: jqp4qj9lg + job_id: j5mnx9x9p job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:50:57Z' + timestamp: '2024-10-15T00:57:11Z' - torchscript_onnx_tflite: - inference_time: 1634.0 - throughput: 611.9951040391677 + inference_time: 1425.0 + throughput: 701.7543859649123 estimated_peak_memory_range: - min: 12288 - max: 104142304 + min: 16384 + max: 104650240 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 312 - job_id: jn5q89zm5 + job_id: jp0z0m7n5 job_status: Passed torchscript_onnx_qnn: - inference_time: 1472.0 - throughput: 679.3478260869565 + inference_time: 1474.0 + throughput: 678.42605156038 estimated_peak_memory_range: - min: 626688 - max: 19773136 + min: 0 + max: 19659696 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 372 - job_id: jlpe9480g + job_id: jgjvn3neg job_status: Passed torchscript_onnx: - inference_time: 1466.0 - throughput: 682.1282401091405 + inference_time: 1458.0 + throughput: 685.8710562414266 estimated_peak_memory_range: - min: 0 - max: 108106720 + min: 466944 + max: 110751600 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 374 - job_id: j0pxved9g + job_id: jgn6v1vq5 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:50:58Z' + timestamp: '2024-10-15T00:57:12Z' - torchscript_onnx_tflite: - inference_time: 1912.0 - throughput: 523.0125523012553 + inference_time: 1920.0 + throughput: 520.8333333333334 estimated_peak_memory_range: - min: 12288 - max: 1692536 + min: 20480 + max: 27558888 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 312 - job_id: j1glneolp + job_id: jp8qyevop job_status: Passed torchscript_onnx_qnn: - inference_time: 1789.0 - throughput: 558.9714924538848 + inference_time: 1788.0 + throughput: 559.2841163310962 estimated_peak_memory_range: - min: 647168 - max: 1850192 + min: 626688 + max: 1912584 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 372 - job_id: jz5wom8jp + job_id: jgz3dzdx5 job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:50:52Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:57:03Z' - torchscript_onnx_tflite: - inference_time: 2606.0 - throughput: 383.7298541826554 + inference_time: 1928.0 + throughput: 518.6721991701245 estimated_peak_memory_range: - min: 16384 - max: 104512000 + min: 12288 + max: 1484832 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 312 - job_id: jw566qr75 + job_id: j56y4o4yp job_status: Passed torchscript_onnx_qnn: - inference_time: 2677.0 - throughput: 373.55248412401943 + inference_time: 1799.0 + throughput: 555.864369093941 estimated_peak_memory_range: - min: 618496 - max: 20196928 + min: 634880 + max: 2299552 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 372 - job_id: jz57zv7rp + job_id: jp14zoz7p job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:50:56Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:57:06Z' - torchscript_onnx_tflite: - inference_time: 1936.0 - throughput: 516.5289256198347 + inference_time: 1927.0 + throughput: 518.9413596263622 estimated_peak_memory_range: - min: 20480 - max: 6452032 + min: 24576 + max: 1738480 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 312 - job_id: j1p3kqxz5 + job_id: jglvmwmm5 job_status: Passed torchscript_onnx_qnn: - inference_time: 1794.0 - throughput: 557.4136008918617 + inference_time: 1791.0 + throughput: 558.3472920156337 estimated_peak_memory_range: min: 638976 - max: 1871608 + max: 1984776 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 372 - job_id: jmg9v9kv5 + job_id: jg9lnon8g job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:50:53Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:57:05Z' - torchscript_onnx_tflite: - inference_time: 1926.0 - throughput: 519.2107995846313 + inference_time: 1922.0 + throughput: 520.2913631633714 estimated_peak_memory_range: - min: 24576 - max: 1541736 + min: 49152 + max: 223232648 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 312 - job_id: jwgoyeod5 + job_id: j5q6qloop job_status: Passed torchscript_onnx_qnn: - inference_time: 1786.0 - throughput: 559.9104143337066 + inference_time: 1803.0 + throughput: 554.6311702717693 estimated_peak_memory_range: - min: 634880 - max: 1915920 + min: 643072 + max: 2285632 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 372 - job_id: jnp10q7l5 + job_id: j5we6y6m5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:50:54Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:57:04Z' - torchscript_onnx_tflite: - inference_time: 1934.0 - throughput: 517.063081695967 + inference_time: 2617.0 + throughput: 382.11692777990066 estimated_peak_memory_range: - min: 40960 - max: 255189400 + min: 16384 + max: 106824192 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 312 - job_id: j1pv3zem5 + job_id: jgkex2mng job_status: Passed torchscript_onnx_qnn: - inference_time: 1940.0 - throughput: 515.4639175257732 + inference_time: 2723.0 + throughput: 367.2420124862284 estimated_peak_memory_range: - min: 655360 - max: 1939584 + min: 0 + max: 24374832 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 372 - job_id: jvgdw78l5 + job_id: j57yror95 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:57:09Z' + - torchscript_onnx_tflite: + inference_time: 1004.0 + throughput: 996.01593625498 + estimated_peak_memory_range: + min: 12288 + max: 27684800 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 312 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 312 + job_id: jgo26d6kp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 1293.0 + throughput: 773.3952049497293 + estimated_peak_memory_range: + min: 0 + max: 19214256 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 372 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 372 + job_id: jpxko0ol5 + job_status: Passed + torchscript_onnx: + inference_time: 1297.0 + throughput: 771.0100231303007 + estimated_peak_memory_range: + min: 0 + max: 32728176 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 374 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 374 + job_id: jpy1383lp + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:50:55Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:57:15Z' - torchscript_onnx_qnn: - inference_time: 2012.0 - throughput: 497.0178926441352 + inference_time: 2019.0 + throughput: 495.2947003467063 estimated_peak_memory_range: min: 602112 max: 602112 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 372 - job_id: jygzev86g + job_id: jpedm6mv5 job_status: Passed torchscript_onnx: - inference_time: 2007.0 - throughput: 498.2561036372696 + inference_time: 2054.0 + throughput: 486.8549172346641 estimated_peak_memory_range: - min: 17166336 - max: 17166336 + min: 17170432 + max: 17170432 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 374 - job_id: jo5mrvdqg + job_id: jprv3x37g job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:50:59Z' + timestamp: '2024-10-15T00:57:13Z' diff --git a/qai_hub_models/models/densenet121_quantized/README.md b/qai_hub_models/models/densenet121_quantized/README.md new file mode 100644 index 00000000..d0ac37c1 --- /dev/null +++ b/qai_hub_models/models/densenet121_quantized/README.md @@ -0,0 +1,59 @@ +[![Qualcomm® AI Hub Models](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/quic-logo.jpg)](../../README.md) + + +# [DenseNet-121-Quantized: Imagenet classifier and general purpose backbone](https://aihub.qualcomm.com/models/densenet121_quantized) + +Densenet is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. + +This is based on the implementation of DenseNet-121-Quantized found +[here]({source_repo}). This repository contains scripts for optimized on-device +export suitable to run on Qualcomm® devices. More details on model performance +accross various devices, can be found [here](https://aihub.qualcomm.com/models/densenet121_quantized). + +[Sign up](https://myaccount.qualcomm.com/signup) to start using Qualcomm AI Hub and run these models on a hosted Qualcomm® device. + + + + +## Example & Usage + + +Once installed, run the following simple CLI demo: + +```bash +python -m qai_hub_models.models.densenet121_quantized.demo +``` +More details on the CLI tool can be found with the `--help` option. See +[demo.py](demo.py) for sample usage of the model including pre/post processing +scripts. Please refer to our [general instructions on using +models](../../../#getting-started) for more usage instructions. + +## Export for on-device deployment + +This repository contains export scripts that produce a model optimized for +on-device deployment. This can be run as follows: + +```bash +python -m qai_hub_models.models.densenet121_quantized.export +``` +Additional options are documented with the `--help` option. Note that the above +script requires access to Deployment instructions for Qualcomm® AI Hub. + + +## License +* The license for the original implementation of DenseNet-121-Quantized can be found + [here](https://github.com/pytorch/vision/blob/main/LICENSE). +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + + +## References +* [Densely Connected Convolutional Networks](https://arxiv.org/abs/1608.06993) +* [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/densenet.py) + + + +## Community +* Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. +* For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). + + diff --git a/qai_hub_models/models/densenet121_quantized/__init__.py b/qai_hub_models/models/densenet121_quantized/__init__.py new file mode 100644 index 00000000..13778437 --- /dev/null +++ b/qai_hub_models/models/densenet121_quantized/__init__.py @@ -0,0 +1,10 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +from qai_hub_models.models._shared.imagenet_classifier.app import ( # noqa: F401 + ImagenetClassifierApp as App, +) + +from .model import MODEL_ID # noqa: F401 +from .model import ConvNextTinyW8A8Quantizable as Model # noqa: F401 diff --git a/qai_hub_models/models/densenet121_quantized/conftest.py b/qai_hub_models/models/densenet121_quantized/conftest.py new file mode 100644 index 00000000..2857e300 --- /dev/null +++ b/qai_hub_models/models/densenet121_quantized/conftest.py @@ -0,0 +1,37 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +# THIS FILE WAS AUTO-GENERATED. DO NOT EDIT MANUALLY. + +import inspect + +import pytest + +from qai_hub_models.models.densenet121_quantized import Model + + +# Instantiate the model only once for all tests. +# Mock from_pretrained to always return the initialized model. +# This speeds up tests and limits memory leaks. +@pytest.fixture(scope="module", autouse=True) +def cached_from_pretrained(): + with pytest.MonkeyPatch.context() as mp: + pretrained_cache = {} + from_pretrained = Model.from_pretrained + sig = inspect.signature(from_pretrained) + + def _cached_from_pretrained(*args, **kwargs): + cache_key = str(args) + str(kwargs) + model = pretrained_cache.get(cache_key, None) + if model: + return model + else: + model = from_pretrained(*args, **kwargs) + pretrained_cache[cache_key] = model + return model + + _cached_from_pretrained.__signature__ = sig + + mp.setattr(Model, "from_pretrained", _cached_from_pretrained) + yield mp diff --git a/qai_hub_models/models/densenet121_quantized/demo.py b/qai_hub_models/models/densenet121_quantized/demo.py new file mode 100644 index 00000000..adc48957 --- /dev/null +++ b/qai_hub_models/models/densenet121_quantized/demo.py @@ -0,0 +1,17 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +from qai_hub_models.models._shared.imagenet_classifier.demo import imagenet_demo +from qai_hub_models.models.convnext_tiny_w8a8_quantized.model import ( + MODEL_ID, + ConvNextTinyW8A8Quantizable, +) + + +def main(is_test: bool = False): + imagenet_demo(ConvNextTinyW8A8Quantizable, MODEL_ID, is_test) + + +if __name__ == "__main__": + main() diff --git a/qai_hub_models/models/densenet121_quantized/evaluate.py b/qai_hub_models/models/densenet121_quantized/evaluate.py new file mode 100644 index 00000000..b6133faa --- /dev/null +++ b/qai_hub_models/models/densenet121_quantized/evaluate.py @@ -0,0 +1,56 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +# THIS FILE WAS AUTO-GENERATED. DO NOT EDIT MANUALLY. + + +from __future__ import annotations + +import warnings + +import qai_hub as hub + +from qai_hub_models.models.densenet121_quantized import MODEL_ID, Model +from qai_hub_models.utils.args import evaluate_parser, get_hub_device, get_model_kwargs +from qai_hub_models.utils.evaluate import evaluate_on_dataset +from qai_hub_models.utils.inference import compile_model_from_args + +SUPPORTED_DATASETS = ["imagenette", "imagenet"] + + +def main(): + warnings.filterwarnings("ignore") + parser = evaluate_parser( + model_cls=Model, + default_split_size=2500, + supported_datasets=SUPPORTED_DATASETS, + supports_tflite=False, + is_hub_quantized=True, + ) + args = parser.parse_args() + args.device = None + + if args.hub_model_id is not None: + hub_model = hub.get_model(args.hub_model_id) + else: + hub_model = compile_model_from_args( + MODEL_ID, args, get_model_kwargs(Model, vars(args)) + ) + hub_device = get_hub_device(None, args.chipset) + torch_model = Model.from_pretrained(**get_model_kwargs(Model, vars(args))) + evaluate_on_dataset( + hub_model, + torch_model, + hub_device, + args.dataset_name, + args.split_size, + args.num_samples, + args.seed, + args.profile_options, + args.use_cache, + ) + + +if __name__ == "__main__": + main() diff --git a/qai_hub_models/models/densenet121_quantized/export.py b/qai_hub_models/models/densenet121_quantized/export.py new file mode 100644 index 00000000..423e8c87 --- /dev/null +++ b/qai_hub_models/models/densenet121_quantized/export.py @@ -0,0 +1,250 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +# THIS FILE WAS AUTO-GENERATED. DO NOT EDIT MANUALLY. + + +from __future__ import annotations + +import os +import warnings +from pathlib import Path +from typing import Any, Dict, List, Optional, cast + +import qai_hub as hub +import torch + +from qai_hub_models.models.common import ExportResult, TargetRuntime +from qai_hub_models.models.densenet121_quantized import Model +from qai_hub_models.utils.args import ( + export_parser, + get_input_spec_kwargs, + get_model_kwargs, +) +from qai_hub_models.utils.compare import torch_inference +from qai_hub_models.utils.input_spec import make_torch_inputs +from qai_hub_models.utils.printing import ( + print_inference_metrics, + print_on_target_demo_cmd, + print_profile_metrics_from_job, +) +from qai_hub_models.utils.qai_hub_helpers import ( + can_access_qualcomm_ai_hub, + export_without_hub_access, +) +from qai_hub_models.utils.quantization import get_calibration_data + + +def export_model( + device: str = "Samsung Galaxy S23 (Family)", + chipset: Optional[str] = None, + num_calibration_samples: int = 100, + skip_compiling: bool = False, + skip_profiling: bool = False, + skip_inferencing: bool = False, + skip_downloading: bool = False, + skip_summary: bool = False, + output_dir: Optional[str] = None, + target_runtime: TargetRuntime = TargetRuntime.QNN, + compile_options: str = "", + profile_options: str = "", + **additional_model_kwargs, +) -> ExportResult | List[str]: + """ + This function executes the following recipe: + + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + 3. Compiles the model to an asset that can be run on device + 4. Profiles the model performance on a real device + 5. Inferences the model on sample inputs + 6. Downloads the model asset to the local directory + 7. Summarizes the results from profiling and inference + + Each of the last 5 steps can be optionally skipped using the input options. + + Parameters: + device: Device for which to export the model. + Full list of available devices can be found by running `hub.get_devices()`. + Defaults to DEFAULT_DEVICE if not specified. + chipset: If set, will choose a random device with this chipset. + Overrides the `device` argument. + num_calibration_samples: The number of calibration data samples + to use for quantization. + skip_compiling: If set, skips compiling model to format that can run on device. + skip_profiling: If set, skips profiling of compiled model on real devices. + skip_inferencing: If set, skips computing on-device outputs from sample data. + skip_downloading: If set, skips downloading of compiled model. + skip_summary: If set, skips waiting for and summarizing results + from profiling and inference. + output_dir: Directory to store generated assets (e.g. compiled model). + Defaults to `/build/`. + target_runtime: Which on-device runtime to target. Default is TFLite. + compile_options: Additional options to pass when submitting the compile job. + profile_options: Additional options to pass when submitting the profile job. + **additional_model_kwargs: Additional optional kwargs used to customize + `model_cls.from_pretrained` and `model.get_input_spec` + + Returns: + A struct of: + * A CompileJob object containing metadata about the compile job submitted to hub (None if compiling skipped). + * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). + * A QuantizeJob object containing metadata about the quantize job submitted to hub + """ + model_name = "densenet121_quantized" + output_path = Path(output_dir or Path.cwd() / "build" / model_name) + if chipset: + hub_device = hub.Device(attributes=f"chipset:{chipset}") + else: + hub_device = hub.Device(name=device) + if not can_access_qualcomm_ai_hub(): + return export_without_hub_access( + "densenet121_quantized", + "DenseNet-121-Quantized", + device, + skip_profiling, + skip_inferencing, + skip_downloading, + skip_summary, + output_path, + target_runtime, + compile_options, + profile_options, + ) + + # On-device perf improves with I/O in channel_last format except when using ONNX. + use_channel_last_format = target_runtime != TargetRuntime.ONNX + + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) + input_spec = model.get_input_spec( + **get_input_spec_kwargs(model, additional_model_kwargs) + ) + + # Trace the model + source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) + + print(f"Quantizing model {model_name} with {num_calibration_samples} samples.") + # 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + onnx_compile_job = hub.submit_compile_job( + model=source_model, + input_specs=input_spec, + device=hub_device, + name=model_name, + options="--target_runtime onnx", + ) + quantize_job = hub.submit_quantize_job( + model=onnx_compile_job.get_target_model(), + calibration_data=get_calibration_data( + input_spec, "imagenette", num_calibration_samples + ), + weights_dtype=model.get_weights_dtype(), + activations_dtype=model.get_activations_dtype(), + name=model_name, + options=model.get_quantize_options(), + ) + if skip_compiling: + return ExportResult(quantize_job=quantize_job) + + # 3. Compiles the model to an asset that can be run on device + model_compile_options = model.get_hub_compile_options( + target_runtime, compile_options, hub_device + ) + print(f"Optimizing model {model_name} to run on-device") + submitted_compile_job = hub.submit_compile_job( + model=quantize_job.get_target_model(), + input_specs=input_spec, + device=hub_device, + name=model_name, + options=model_compile_options, + ) + compile_job = cast(hub.client.CompileJob, submitted_compile_job) + + # 4. Profiles the model performance on a real device + profile_job: Optional[hub.client.ProfileJob] = None + if not skip_profiling: + profile_options_all = model.get_hub_profile_options( + target_runtime, profile_options + ) + print(f"Profiling model {model_name} on a hosted device.") + submitted_profile_job = hub.submit_profile_job( + model=compile_job.get_target_model(), + device=hub_device, + name=model_name, + options=profile_options_all, + ) + profile_job = cast(hub.client.ProfileJob, submitted_profile_job) + + # 5. Inferences the model on sample inputs + inference_job: Optional[hub.client.InferenceJob] = None + if not skip_inferencing: + profile_options_all = model.get_hub_profile_options( + target_runtime, profile_options + ) + print( + f"Running inference for {model_name} on a hosted device with example inputs." + ) + sample_inputs = model.sample_inputs( + input_spec, use_channel_last_format=use_channel_last_format + ) + submitted_inference_job = hub.submit_inference_job( + model=compile_job.get_target_model(), + inputs=sample_inputs, + device=hub_device, + name=model_name, + options=profile_options_all, + ) + inference_job = cast(hub.client.InferenceJob, submitted_inference_job) + + # 6. Downloads the model asset to the local directory + if not skip_downloading: + os.makedirs(output_path, exist_ok=True) + target_model: hub.Model = compile_job.get_target_model() # type: ignore + target_model.download(str(output_path / model_name)) + + # 7. Summarizes the results from profiling and inference + if not skip_summary and not skip_profiling: + assert profile_job is not None and profile_job.wait().success + profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore + print_profile_metrics_from_job(profile_job, profile_data) + + if not skip_summary and not skip_inferencing: + sample_inputs = model.sample_inputs(use_channel_last_format=False) + torch_out = torch_inference( + model, sample_inputs, return_channel_last_output=use_channel_last_format + ) + assert inference_job is not None and inference_job.wait().success + inference_result: hub.client.DatasetEntries = inference_job.download_output_data() # type: ignore + + print_inference_metrics( + inference_job, + inference_result, + torch_out, + model.get_output_names(), + metrics="psnr,top1,top5", + ) + + if not skip_summary: + print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) + + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + quantize_job=quantize_job, + ) + + +def main(): + warnings.filterwarnings("ignore") + parser = export_parser( + model_cls=Model, supports_tflite=False, is_hub_quantized=True + ) + args = parser.parse_args() + export_model(**vars(args)) + + +if __name__ == "__main__": + main() diff --git a/qai_hub_models/models/densenet121_quantized/info.yaml b/qai_hub_models/models/densenet121_quantized/info.yaml new file mode 100644 index 00000000..37aadad5 --- /dev/null +++ b/qai_hub_models/models/densenet121_quantized/info.yaml @@ -0,0 +1,43 @@ +name: DenseNet-121-Quantized +# id must match with the model dir name in qai_hub_models +id: densenet121_quantized +status: public +headline: Imagenet classifier and general purpose backbone. +domain: Computer Vision +description: Densenet is a machine learning model that can classify images from the + Imagenet dataset. It can also be used as a backbone in building more complex models + for specific use cases. +use_case: Image Classification +tags: + - backbone + - quantized +research_paper: https://arxiv.org/abs/1608.06993 +research_paper_title: Densely Connected Convolutional Networks +license: https://github.com/pytorch/vision/blob/main/LICENSE +deploy_license: https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf +source_repo: https://github.com/pytorch/vision/blob/main/torchvision/models/densenet.py +technical_details: + Model checkpoint: Imagenet + Input resolution: 224x224 + Number of parameters: 7.97M + Model size: 9.4 MB +applicable_scenarios: + - Medical Imaging + - Anomaly Detection + - Inventory Management +related_models: + - mobilenet_v2 + - squeezenet1_1 + - googlenet +form_factors: + - Phone + - Tablet + - IoT +has_static_banner: true +has_animated_banner: true +license_type: bsd-3-clause +deploy_license_type: AI Model Hub License +dataset: + - imagenet-1k + - imagenet-22k +labels_file: imagenet_labels.txt diff --git a/qai_hub_models/models/densenet121_quantized/model.py b/qai_hub_models/models/densenet121_quantized/model.py new file mode 100644 index 00000000..a6e459c7 --- /dev/null +++ b/qai_hub_models/models/densenet121_quantized/model.py @@ -0,0 +1,14 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +from __future__ import annotations + +from qai_hub_models.models.convnext_tiny.model import ConvNextTiny +from qai_hub_models.utils.quantization import HubQuantizableMixin + +MODEL_ID = __name__.split(".")[-2] + + +class ConvNextTinyW8A8Quantizable(HubQuantizableMixin, ConvNextTiny): + pass diff --git a/qai_hub_models/models/densenet121_quantized/perf.yaml b/qai_hub_models/models/densenet121_quantized/perf.yaml new file mode 100644 index 00000000..408a8eb0 --- /dev/null +++ b/qai_hub_models/models/densenet121_quantized/perf.yaml @@ -0,0 +1,298 @@ +aggregated: + supported_oses: + - Android + supported_devices: + - Snapdragon 8 Elite QRD + - Samsung Galaxy S24 + - Samsung Galaxy S24 Ultra + - Samsung Galaxy S24+ + - Snapdragon 8 Gen 3 QRD + - Samsung Galaxy S23 + - Samsung Galaxy S23 Ultra + - Samsung Galaxy S23+ + - Samsung Galaxy S22 5G + - Samsung Galaxy S22 Ultra 5G + - Samsung Galaxy S22+ 5G + - Samsung Galaxy Tab S8 + - Xiaomi 12 + - Xiaomi 12 Pro + - Samsung Galaxy S21 + - Samsung Galaxy S21 Ultra + - Samsung Galaxy S21+ + - Snapdragon X Elite CRD + - Snapdragon X Plus 8-Core CRD + - QCS6490 (Proxy) + - RB3 Gen 2 (Proxy) + - QCS8450 (Proxy) + - XR2 Gen 2 (Proxy) + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8775 (Proxy) + supported_chipsets: + - Snapdragon® 8 Elite + - Snapdragon® 8 Gen 3 + - Snapdragon® 8 Gen 2 + - Snapdragon® 8 Gen 1 + - Snapdragon® 888 + - Snapdragon® X Elite + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8775P Proxy +models: +- name: DenseNet-121-Quantized + performance_metrics: + - torchscript_onnx_qnn: + inference_time: 1745.0 + throughput: 573.0659025787966 + estimated_peak_memory_range: + min: 16384 + max: 285175608 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 215 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 215 + job_id: jp2kx7lmp + job_status: Passed + torchscript_onnx: + inference_time: 29847.0 + throughput: 33.5042047776996 + estimated_peak_memory_range: + min: 7876608 + max: 12774376 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 325 + layers_on_gpu: 0 + layers_on_cpu: 27 + total_layers: 352 + job_id: jgjvd0e8g + job_status: Passed + reference_device_info: + name: Samsung Galaxy S23 + os: '13' + form_factor: Phone + os_name: Android + manufacturer: Samsung + chipset: Snapdragon® 8 Gen 2 + timestamp: '2024-10-17T17:32:14Z' + - torchscript_onnx_qnn: + inference_time: 1218.0 + throughput: 821.0180623973728 + estimated_peak_memory_range: + min: 163840 + max: 23146192 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 215 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 215 + job_id: jpy1z464p + job_status: Passed + torchscript_onnx: + inference_time: 22391.0 + throughput: 44.66080121477379 + estimated_peak_memory_range: + min: 9449472 + max: 1067915744 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 325 + layers_on_gpu: 0 + layers_on_cpu: 27 + total_layers: 352 + job_id: jpedork05 + job_status: Passed + reference_device_info: + name: Samsung Galaxy S24 + os: '14' + form_factor: Phone + os_name: Android + manufacturer: Samsung + chipset: Snapdragon® 8 Gen 3 + timestamp: '2024-10-17T17:32:15Z' + - torchscript_onnx_qnn: + inference_time: 6521.0 + throughput: 153.35071308081584 + estimated_peak_memory_range: + min: 212992 + max: 8568672 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 215 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 215 + job_id: jp0z41le5 + job_status: Passed + reference_device_info: + name: RB3 Gen 2 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS6490 Proxy + timestamp: '2024-10-17T17:31:57Z' + - torchscript_onnx_qnn: + inference_time: 1672.0 + throughput: 598.0861244019138 + estimated_peak_memory_range: + min: 180224 + max: 1556048 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 215 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 215 + job_id: jp8q23z8p + job_status: Passed + reference_device_info: + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8550 Proxy + timestamp: '2024-10-17T17:31:59Z' + - torchscript_onnx_qnn: + inference_time: 1670.0 + throughput: 598.8023952095808 + estimated_peak_memory_range: + min: 172032 + max: 1472536 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 215 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 215 + job_id: j5q6073mp + job_status: Passed + reference_device_info: + name: SA8255 (Proxy) + os: '13' + form_factor: Auto + os_name: Android + manufacturer: Qualcomm + chipset: SA8255P Proxy + timestamp: '2024-10-17T17:32:03Z' + - torchscript_onnx_qnn: + inference_time: 1684.0 + throughput: 593.8242280285035 + estimated_peak_memory_range: + min: 196608 + max: 1433416 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 215 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 215 + job_id: jglv403l5 + job_status: Passed + reference_device_info: + name: SA8775 (Proxy) + os: '13' + form_factor: Auto + os_name: Android + manufacturer: Qualcomm + chipset: SA8775P Proxy + timestamp: '2024-10-17T17:32:05Z' + - torchscript_onnx_qnn: + inference_time: 2130.0 + throughput: 469.4835680751174 + estimated_peak_memory_range: + min: 167936 + max: 28636608 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 215 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 215 + job_id: j56y23n7p + job_status: Passed + reference_device_info: + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-17T17:32:06Z' + - torchscript_onnx_qnn: + inference_time: 1160.0 + throughput: 862.0689655172414 + estimated_peak_memory_range: + min: 0 + max: 27657328 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 215 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 215 + job_id: jp3jn4ezg + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-17T17:32:19Z' + - torchscript_onnx_qnn: + inference_time: 1822.0 + throughput: 548.847420417124 + estimated_peak_memory_range: + min: 487424 + max: 487424 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 215 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 215 + job_id: jgkevl3og + job_status: Passed + torchscript_onnx: + inference_time: 32525.0 + throughput: 30.745580322828594 + estimated_peak_memory_range: + min: 48742400 + max: 48742400 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 325 + layers_on_gpu: 0 + layers_on_cpu: 27 + total_layers: 352 + job_id: jgz32xr65 + job_status: Passed + reference_device_info: + name: Snapdragon X Elite CRD + os: '11' + form_factor: Compute + os_name: Windows + manufacturer: Qualcomm + chipset: Snapdragon® X Elite + timestamp: '2024-10-17T17:32:17Z' diff --git a/qai_hub_models/models/detr_resnet101/README.md b/qai_hub_models/models/detr_resnet101/README.md index 7b1057a5..3ca26748 100644 --- a/qai_hub_models/models/detr_resnet101/README.md +++ b/qai_hub_models/models/detr_resnet101/README.md @@ -6,7 +6,7 @@ DETR is a machine learning model that can detect objects (trained on COCO dataset). This is based on the implementation of DETR-ResNet101 found -[here](https://github.com/facebookresearch/detr). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/detr_resnet101). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.detr_resnet101.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of DETR-ResNet101 can be found +* The license for the original implementation of DETR-ResNet101 can be found [here](https://github.com/facebookresearch/detr/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) * [Source Model Implementation](https://github.com/facebookresearch/detr) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/detr_resnet101/export.py b/qai_hub_models/models/detr_resnet101/export.py index 2e00aede..bbb2ab54 100644 --- a/qai_hub_models/models/detr_resnet101/export.py +++ b/qai_hub_models/models/detr_resnet101/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.detr_resnet101 import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "detr_resnet101" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/detr_resnet101/perf.yaml b/qai_hub_models/models/detr_resnet101/perf.yaml index 20087154..da277769 100644 --- a/qai_hub_models/models/detr_resnet101/perf.yaml +++ b/qai_hub_models/models/detr_resnet101/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: DETR-ResNet101 performance_metrics: - torchscript_onnx_tflite: - inference_time: 17324.0 - throughput: 57.723389517432466 + inference_time: 15179.0 + throughput: 65.88049278608604 estimated_peak_memory_range: - min: 73728 - max: 3077384 + min: 90112 + max: 2967160 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,22 +56,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 856 - job_id: jw566qv75 + job_id: jp0z0mq95 job_status: Passed torchscript_onnx: - inference_time: 20465.0 - throughput: 48.86391399951136 + inference_time: 16036.0 + throughput: 62.35969069593415 estimated_peak_memory_range: - min: 40960 - max: 133388992 + min: 28672 + max: 133747336 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 856 + layers_on_npu: 886 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 856 - job_id: jegn2romg + total_layers: 886 + job_id: jgdx1dlzp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -82,13 +80,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:50:16Z' + timestamp: '2024-10-15T00:56:23Z' - torchscript_onnx_tflite: - inference_time: 14027.0 - throughput: 71.29108148570614 + inference_time: 12443.0 + throughput: 80.36647110825363 estimated_peak_memory_range: min: 53248 - max: 304006048 + max: 316370960 primary_compute_unit: NPU precision: fp16 layer_info: @@ -96,22 +94,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 856 - job_id: j1p3kq8z5 + job_id: jp8qye9kp job_status: Passed torchscript_onnx: - inference_time: 16238.0 - throughput: 61.583938908732605 + inference_time: 14577.0 + throughput: 68.60122110173562 estimated_peak_memory_range: - min: 606208 - max: 257867888 + min: 0 + max: 285366128 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 856 + layers_on_npu: 886 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 856 - job_id: joprk1oe5 + total_layers: 886 + job_id: j57yre395 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -120,13 +118,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:50:17Z' + timestamp: '2024-10-15T00:56:24Z' - torchscript_onnx_tflite: - inference_time: 17375.0 - throughput: 57.55395683453237 + inference_time: 15100.0 + throughput: 66.2251655629139 estimated_peak_memory_range: - min: 77824 - max: 3496320 + min: 94208 + max: 2422152 primary_compute_unit: NPU precision: fp16 layer_info: @@ -134,7 +132,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 856 - job_id: jwgoyemd5 + job_id: jgkex2nwg job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -142,14 +140,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:50:03Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:56:04Z' - torchscript_onnx_tflite: - inference_time: 23451.0 - throughput: 42.64210481429363 + inference_time: 15078.0 + throughput: 66.32179334129195 estimated_peak_memory_range: - min: 73728 - max: 248929136 + min: 81920 + max: 2476136 primary_compute_unit: NPU precision: fp16 layer_info: @@ -157,22 +155,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 856 - job_id: j1pv3z4m5 + job_id: jp3j0z33g job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:50:04Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:56:09Z' - torchscript_onnx_tflite: - inference_time: 17341.0 - throughput: 57.666801222536186 + inference_time: 15122.0 + throughput: 66.12881893929375 estimated_peak_memory_range: - min: 106496 - max: 2650776 + min: 57344 + max: 2307120 primary_compute_unit: NPU precision: fp16 layer_info: @@ -180,22 +178,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 856 - job_id: j7gjxk18p + job_id: j56y48j6p job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:50:05Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:56:07Z' - torchscript_onnx_tflite: - inference_time: 17418.0 - throughput: 57.41187277528993 + inference_time: 15140.0 + throughput: 66.05019815059445 estimated_peak_memory_range: - min: 73728 - max: 2739360 + min: 86016 + max: 2745760 primary_compute_unit: NPU precision: fp16 layer_info: @@ -203,22 +201,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 856 - job_id: jlpe9420g + job_id: jglvmyzj5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:50:06Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:56:06Z' - torchscript_onnx_tflite: - inference_time: 17358.0 - throughput: 57.61032377001959 + inference_time: 21263.0 + throughput: 47.03005220335795 estimated_peak_memory_range: - min: 126976 - max: 3318264 + min: 86016 + max: 256037744 primary_compute_unit: NPU precision: fp16 layer_info: @@ -226,22 +224,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 856 - job_id: jygzevw6g + job_id: j5q6qlknp job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:50:07Z' - - torchscript_onnx: - inference_time: 21172.0 - throughput: 47.23219346306443 + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:56:05Z' + - torchscript_onnx_tflite: + inference_time: 8835.0 + throughput: 113.18619128466327 estimated_peak_memory_range: - min: 121675776 - max: 121675776 + min: 77824 + max: 121667136 primary_compute_unit: NPU precision: fp16 layer_info: @@ -249,7 +247,45 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 856 - job_id: jep2834mp + job_id: jpv6klxk5 + job_status: Passed + torchscript_onnx: + inference_time: 11385.0 + throughput: 87.83487044356609 + estimated_peak_memory_range: + min: 2883584 + max: 123746080 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 886 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 886 + job_id: j5mnx0y9p + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:56:28Z' + - torchscript_onnx: + inference_time: 17968.0 + throughput: 55.65449688334817 + estimated_peak_memory_range: + min: 121643008 + max: 121643008 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 886 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 886 + job_id: jp4lry015 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -258,4 +294,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:50:18Z' + timestamp: '2024-10-15T00:56:25Z' diff --git a/qai_hub_models/models/detr_resnet101_dc5/README.md b/qai_hub_models/models/detr_resnet101_dc5/README.md index 8a40c445..17aabd5f 100644 --- a/qai_hub_models/models/detr_resnet101_dc5/README.md +++ b/qai_hub_models/models/detr_resnet101_dc5/README.md @@ -6,7 +6,7 @@ DETR is a machine learning model that can detect objects (trained on COCO dataset). This is based on the implementation of DETR-ResNet101-DC5 found -[here](https://github.com/facebookresearch/detr). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/detr_resnet101_dc5). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.detr_resnet101_dc5.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of DETR-ResNet101-DC5 can be found +* The license for the original implementation of DETR-ResNet101-DC5 can be found [here](https://github.com/facebookresearch/detr/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) * [Source Model Implementation](https://github.com/facebookresearch/detr) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/detr_resnet101_dc5/export.py b/qai_hub_models/models/detr_resnet101_dc5/export.py index 6757d732..3b1b6587 100644 --- a/qai_hub_models/models/detr_resnet101_dc5/export.py +++ b/qai_hub_models/models/detr_resnet101_dc5/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.detr_resnet101_dc5 import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "detr_resnet101_dc5" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/detr_resnet101_dc5/perf.yaml b/qai_hub_models/models/detr_resnet101_dc5/perf.yaml index b9d1383d..23686df3 100644 --- a/qai_hub_models/models/detr_resnet101_dc5/perf.yaml +++ b/qai_hub_models/models/detr_resnet101_dc5/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: DETR-ResNet101-DC5 performance_metrics: - torchscript_onnx_tflite: - inference_time: 117649.0 - throughput: 8.499859752314087 + inference_time: 92491.0 + throughput: 10.81186277583765 estimated_peak_memory_range: - min: 270336 - max: 2720080 + min: 180224 + max: 2968360 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,22 +56,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 857 - job_id: j1pv3zzm5 + job_id: jgn6vz2k5 job_status: Passed torchscript_onnx: - inference_time: 130513.0 - throughput: 7.662071977504157 + inference_time: 91901.0 + throughput: 10.881274414859469 estimated_peak_memory_range: - min: 135168 - max: 134580736 + min: 147456 + max: 133899832 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 856 + layers_on_npu: 886 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 856 - job_id: jqpyevn4g + total_layers: 886 + job_id: jgdx1d9rp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -82,13 +80,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:49:24Z' + timestamp: '2024-10-15T00:55:19Z' - torchscript_onnx_tflite: - inference_time: 108892.0 - throughput: 9.183411086213864 + inference_time: 67087.0 + throughput: 14.906017559288685 estimated_peak_memory_range: - min: 237568 - max: 503637600 + min: 184320 + max: 574980688 primary_compute_unit: NPU precision: fp16 layer_info: @@ -96,7 +94,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 857 - job_id: j7gjxkk8p + job_id: jprv3lk0g + job_status: Passed + torchscript_onnx: + inference_time: 81298.0 + throughput: 12.300425594725578 + estimated_peak_memory_range: + min: 1597440 + max: 589120736 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 886 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 886 + job_id: j57yrewv5 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -105,13 +118,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:49:10Z' + timestamp: '2024-10-15T00:55:20Z' - torchscript_onnx_tflite: - inference_time: 116654.0 - throughput: 8.57235928472234 + inference_time: 81085.0 + throughput: 12.332737251032867 estimated_peak_memory_range: - min: 106496 - max: 2419008 + min: 49152 + max: 2737600 primary_compute_unit: NPU precision: fp16 layer_info: @@ -119,7 +132,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 857 - job_id: jlpe9440g + job_id: jp2kyr8rp job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -127,14 +140,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:49:11Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:54:59Z' - torchscript_onnx_tflite: - inference_time: 137101.0 - throughput: 7.293892823538851 + inference_time: 82049.0 + throughput: 12.187838974271472 estimated_peak_memory_range: - min: 110592 - max: 445147040 + min: 32768 + max: 3290264 primary_compute_unit: NPU precision: fp16 layer_info: @@ -142,22 +155,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 857 - job_id: jygzevv6g + job_id: jgkex2zwg job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:49:11Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:55:04Z' - torchscript_onnx_tflite: - inference_time: 126921.0 - throughput: 7.87891680651744 + inference_time: 90194.0 + throughput: 11.087212009668049 estimated_peak_memory_range: - min: 176128 - max: 3387008 + min: 24576 + max: 2621128 primary_compute_unit: NPU precision: fp16 layer_info: @@ -165,22 +178,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 857 - job_id: jz5wommjp + job_id: jp8qyeokp job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:49:13Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:55:03Z' - torchscript_onnx_tflite: - inference_time: 127138.0 - throughput: 7.865469017917539 + inference_time: 81200.0 + throughput: 12.31527093596059 estimated_peak_memory_range: - min: 155648 - max: 3586816 + min: 57344 + max: 3438264 primary_compute_unit: NPU precision: fp16 layer_info: @@ -188,22 +201,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 857 - job_id: jmg9v99v5 + job_id: jp0z0my95 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:49:14Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:55:01Z' - torchscript_onnx_tflite: - inference_time: 117019.0 - throughput: 8.54562079662277 + inference_time: 96012.0 + throughput: 10.415364746073408 estimated_peak_memory_range: - min: 86016 - max: 2450616 + min: 655360 + max: 511797968 primary_compute_unit: NPU precision: fp16 layer_info: @@ -211,30 +224,68 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 857 - job_id: jnp10qql5 + job_id: jpy13oe8p job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:55:00Z' + - torchscript_onnx_tflite: + inference_time: 61375.0 + throughput: 16.293279022403258 + estimated_peak_memory_range: + min: 0 + max: 290416816 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 857 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 857 + job_id: jglvmynj5 + job_status: Passed + torchscript_onnx: + inference_time: 66684.0 + throughput: 14.996101013736428 + estimated_peak_memory_range: + min: 622592 + max: 337442992 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 886 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 886 + job_id: jp144vxnp + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:49:15Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-16T09:31:10Z' - torchscript_onnx: - inference_time: 122356.0 - throughput: 8.172872601261892 + inference_time: 69984.0 + throughput: 14.288980338363054 estimated_peak_memory_range: - min: 125001728 - max: 125001728 + min: 125333504 + max: 125333504 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 856 + layers_on_npu: 886 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 856 - job_id: j1p8ow88g + total_layers: 886 + job_id: jp4lryo85 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -243,4 +294,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:49:26Z' + timestamp: '2024-10-15T00:55:21Z' diff --git a/qai_hub_models/models/detr_resnet50/README.md b/qai_hub_models/models/detr_resnet50/README.md index db07b7cf..5086b09b 100644 --- a/qai_hub_models/models/detr_resnet50/README.md +++ b/qai_hub_models/models/detr_resnet50/README.md @@ -6,7 +6,7 @@ DETR is a machine learning model that can detect objects (trained on COCO dataset). This is based on the implementation of DETR-ResNet50 found -[here](https://github.com/facebookresearch/detr). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/detr_resnet50). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.detr_resnet50.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of DETR-ResNet50 can be found +* The license for the original implementation of DETR-ResNet50 can be found [here](https://github.com/facebookresearch/detr/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) * [Source Model Implementation](https://github.com/facebookresearch/detr) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/detr_resnet50/export.py b/qai_hub_models/models/detr_resnet50/export.py index cf88ac19..9a364c3c 100644 --- a/qai_hub_models/models/detr_resnet50/export.py +++ b/qai_hub_models/models/detr_resnet50/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.detr_resnet50 import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "detr_resnet50" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/detr_resnet50/perf.yaml b/qai_hub_models/models/detr_resnet50/perf.yaml index bfcdcd33..15246f6a 100644 --- a/qai_hub_models/models/detr_resnet50/perf.yaml +++ b/qai_hub_models/models/detr_resnet50/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: DETR-ResNet50 performance_metrics: - torchscript_onnx_tflite: - inference_time: 13273.0 - throughput: 75.340917652377 + inference_time: 10837.0 + throughput: 92.27646027498385 estimated_peak_memory_range: min: 53248 - max: 2855424 + max: 6342384 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,22 +56,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 788 - job_id: jvgdw7rk5 + job_id: jp4lryz85 job_status: Passed torchscript_onnx: - inference_time: 16264.0 - throughput: 61.48548942449582 + inference_time: 12140.0 + throughput: 82.37232289950576 estimated_peak_memory_range: - min: 49152 - max: 99946208 + min: 20480 + max: 100370800 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 737 + layers_on_npu: 767 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 737 - job_id: jogkzrrog + total_layers: 767 + job_id: j5we6lo35 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -82,13 +80,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:48:27Z' + timestamp: '2024-10-15T00:54:18Z' - torchscript_onnx_tflite: - inference_time: 9929.0 - throughput: 100.71507704703394 + inference_time: 8412.0 + throughput: 118.87779362815026 estimated_peak_memory_range: - min: 73728 - max: 246665024 + min: 53248 + max: 258123680 primary_compute_unit: NPU precision: fp16 layer_info: @@ -96,22 +94,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 788 - job_id: jz5womdjp + job_id: jpxkolw35 job_status: Passed torchscript_onnx: - inference_time: 12148.0 - throughput: 82.31807704972012 + inference_time: 9784.0 + throughput: 102.20768601798855 estimated_peak_memory_range: - min: 770048 - max: 205117088 + min: 1183744 + max: 222319216 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 737 + layers_on_npu: 767 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 737 - job_id: jn5q899m5 + total_layers: 767 + job_id: jg9lnzvwg job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -120,13 +118,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:48:28Z' + timestamp: '2024-10-15T00:54:19Z' - torchscript_onnx_tflite: - inference_time: 13107.0 - throughput: 76.29510948348211 + inference_time: 10824.0 + throughput: 92.38728750923873 estimated_peak_memory_range: - min: 90112 - max: 2641592 + min: 57344 + max: 2919216 primary_compute_unit: NPU precision: fp16 layer_info: @@ -134,7 +132,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 788 - job_id: jmg9v93v5 + job_id: j5mnx0jdp job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -142,14 +140,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:48:15Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:53:58Z' - torchscript_onnx_tflite: - inference_time: 16617.0 - throughput: 60.17933441656135 + inference_time: 10814.0 + throughput: 92.4727205474385 estimated_peak_memory_range: - min: 53248 - max: 213364224 + min: 61440 + max: 2394160 primary_compute_unit: NPU precision: fp16 layer_info: @@ -157,22 +155,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 788 - job_id: jnp10qdl5 + job_id: jpy13o98p job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:48:16Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:54:03Z' - torchscript_onnx_tflite: - inference_time: 13086.0 - throughput: 76.41754546843956 + inference_time: 10889.0 + throughput: 91.8357975939021 estimated_peak_memory_range: - min: 57344 - max: 2252832 + min: 65536 + max: 2615152 primary_compute_unit: NPU precision: fp16 layer_info: @@ -180,22 +178,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 788 - job_id: jvgdw7rl5 + job_id: jp2kyr2rp job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:48:16Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:54:02Z' - torchscript_onnx_tflite: - inference_time: 13093.0 - throughput: 76.37668983426258 + inference_time: 10901.0 + throughput: 91.73470323823503 estimated_peak_memory_range: - min: 847872 - max: 2912120 + min: 73728 + max: 2930952 primary_compute_unit: NPU precision: fp16 layer_info: @@ -203,22 +201,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 788 - job_id: jz57zvvrp + job_id: jprv3lz0g job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:48:17Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:54:01Z' - torchscript_onnx_tflite: - inference_time: 13125.0 - throughput: 76.19047619047619 + inference_time: 14348.0 + throughput: 69.69612489545581 estimated_peak_memory_range: - min: 65536 - max: 2501112 + min: 53248 + max: 223624288 primary_compute_unit: NPU precision: fp16 layer_info: @@ -226,30 +224,68 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 788 - job_id: jqp4qjjlg + job_id: jgn6vzjk5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:53:59Z' + - torchscript_onnx_tflite: + inference_time: 7199.0 + throughput: 138.90818169190166 + estimated_peak_memory_range: + min: 53248 + max: 94489632 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 788 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 788 + job_id: jp8qyelkp + job_status: Passed + torchscript_onnx: + inference_time: 8554.0 + throughput: 116.90437222352116 + estimated_peak_memory_range: + min: 0 + max: 91226368 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 767 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 767 + job_id: j57yrezv5 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:48:18Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:54:22Z' - torchscript_onnx: - inference_time: 16799.0 - throughput: 59.52735281862016 + inference_time: 13315.0 + throughput: 75.10326699211416 estimated_peak_memory_range: - min: 83038208 - max: 83038208 + min: 83030016 + max: 83030016 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 737 + layers_on_npu: 767 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 737 - job_id: j1glneelp + total_layers: 767 + job_id: jp14zn08p job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -258,4 +294,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:48:29Z' + timestamp: '2024-10-15T00:54:20Z' diff --git a/qai_hub_models/models/detr_resnet50_dc5/README.md b/qai_hub_models/models/detr_resnet50_dc5/README.md index e0be8280..7c4b569f 100644 --- a/qai_hub_models/models/detr_resnet50_dc5/README.md +++ b/qai_hub_models/models/detr_resnet50_dc5/README.md @@ -6,7 +6,7 @@ DETR is a machine learning model that can detect objects (trained on COCO dataset). This is based on the implementation of DETR-ResNet50-DC5 found -[here](https://github.com/facebookresearch/detr). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/detr_resnet50_dc5). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.detr_resnet50_dc5.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of DETR-ResNet50-DC5 can be found +* The license for the original implementation of DETR-ResNet50-DC5 can be found [here](https://github.com/facebookresearch/detr/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) * [Source Model Implementation](https://github.com/facebookresearch/detr) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/detr_resnet50_dc5/export.py b/qai_hub_models/models/detr_resnet50_dc5/export.py index 2f3caed7..bac47113 100644 --- a/qai_hub_models/models/detr_resnet50_dc5/export.py +++ b/qai_hub_models/models/detr_resnet50_dc5/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.detr_resnet50_dc5 import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "detr_resnet50_dc5" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/detr_resnet50_dc5/perf.yaml b/qai_hub_models/models/detr_resnet50_dc5/perf.yaml index b0fba8fb..e56ef18a 100644 --- a/qai_hub_models/models/detr_resnet50_dc5/perf.yaml +++ b/qai_hub_models/models/detr_resnet50_dc5/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: DETR-ResNet50-DC5 performance_metrics: - torchscript_onnx_tflite: - inference_time: 111119.0 - throughput: 8.999361045365779 + inference_time: 75052.0 + throughput: 13.324095293929542 estimated_peak_memory_range: - min: 28672 - max: 2167856 + min: 81920 + max: 2203896 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,22 +56,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 789 - job_id: jqp4qjwqg + job_id: j5we6lk35 job_status: Passed torchscript_onnx: - inference_time: 114154.0 - throughput: 8.760096010652276 + inference_time: 92231.0 + throughput: 10.842341512072947 estimated_peak_memory_range: - min: 40960 - max: 100199608 + min: 131072 + max: 101161016 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 737 + layers_on_npu: 767 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 737 - job_id: j1pv3z175 + total_layers: 767 + job_id: jgo26lxqp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -82,13 +80,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:47:41Z' + timestamp: '2024-10-15T00:53:24Z' - torchscript_onnx_tflite: - inference_time: 101554.0 - throughput: 9.84697796246332 + inference_time: 68196.0 + throughput: 14.66361663440671 estimated_peak_memory_range: - min: 262144 - max: 448896640 + min: 167936 + max: 517087552 primary_compute_unit: NPU precision: fp16 layer_info: @@ -96,22 +94,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 789 - job_id: j0pxve1jg + job_id: jg9lnzrwg job_status: Passed torchscript_onnx: - inference_time: 89688.0 - throughput: 11.14976362501115 + inference_time: 81134.0 + throughput: 12.325289028027708 estimated_peak_memory_range: - min: 1613824 - max: 413587120 + min: 2596864 + max: 529878240 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 737 + layers_on_npu: 767 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 737 - job_id: j7gjxk07p + total_layers: 767 + job_id: jpv6kljk5 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -120,13 +118,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:47:41Z' + timestamp: '2024-10-15T00:53:25Z' - torchscript_onnx_tflite: - inference_time: 109750.0 - throughput: 9.111617312072893 + inference_time: 74434.0 + throughput: 13.43472069215681 estimated_peak_memory_range: - min: 2879488 - max: 5519040 + min: 86016 + max: 2524712 primary_compute_unit: NPU precision: fp16 layer_info: @@ -134,7 +132,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 789 - job_id: jo5mrvzyg + job_id: jp14zn98p job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -142,14 +140,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:47:27Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:53:04Z' - torchscript_onnx_tflite: - inference_time: 131028.0 - throughput: 7.631956528375614 + inference_time: 85586.0 + throughput: 11.684153950412451 estimated_peak_memory_range: - min: 524288 - max: 416029872 + min: 184320 + max: 3234288 primary_compute_unit: NPU precision: fp16 layer_info: @@ -157,22 +155,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 789 - job_id: jegn2r9vg + job_id: jpxkolq35 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:47:28Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:53:09Z' - torchscript_onnx_tflite: - inference_time: 120615.0 - throughput: 8.290842764166978 + inference_time: 74502.0 + throughput: 13.422458457491073 estimated_peak_memory_range: - min: 192512 - max: 3228888 + min: 147456 + max: 2657944 primary_compute_unit: NPU precision: fp16 layer_info: @@ -180,22 +178,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 789 - job_id: joprk14v5 + job_id: jp4lry785 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:47:29Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:53:07Z' - torchscript_onnx_tflite: - inference_time: 114460.0 - throughput: 8.736676568233444 + inference_time: 80512.0 + throughput: 12.420508744038155 estimated_peak_memory_range: - min: 12288 - max: 1852912 + min: 147456 + max: 3016376 primary_compute_unit: NPU precision: fp16 layer_info: @@ -203,22 +201,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 789 - job_id: jep2837xp + job_id: j57yremv5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:47:30Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:53:06Z' - torchscript_onnx_tflite: - inference_time: 117557.0 - throughput: 8.506511734732937 + inference_time: 91938.0 + throughput: 10.876895299005852 estimated_peak_memory_range: - min: 192512 - max: 2849040 + min: 16384 + max: 477056736 primary_compute_unit: NPU precision: fp16 layer_info: @@ -226,30 +224,68 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 789 - job_id: jqpyev4rg + job_id: jgdx1dkrp job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:53:05Z' + - torchscript_onnx_tflite: + inference_time: 49532.0 + throughput: 20.18896874747638 + estimated_peak_memory_range: + min: 81920 + max: 265154736 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 789 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 789 + job_id: jgn6vz4k5 + job_status: Passed + torchscript_onnx: + inference_time: 66336.0 + throughput: 15.074770863482875 + estimated_peak_memory_range: + min: 2220032 + max: 309790256 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 767 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 767 + job_id: jprvvnjkg + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:47:31Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-16T09:31:45Z' - torchscript_onnx: - inference_time: 115048.0 - throughput: 8.69202419859537 + inference_time: 65232.0 + throughput: 15.329899435859701 estimated_peak_memory_range: - min: 86614016 - max: 86614016 + min: 86802432 + max: 86802432 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 737 + layers_on_npu: 767 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 737 - job_id: jlpe94r7g + total_layers: 767 + job_id: jgjvnrjvg job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -258,4 +294,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:47:42Z' + timestamp: '2024-10-15T00:53:26Z' diff --git a/qai_hub_models/models/efficientnet_b0/README.md b/qai_hub_models/models/efficientnet_b0/README.md index b2ffa91b..6a3dac7b 100644 --- a/qai_hub_models/models/efficientnet_b0/README.md +++ b/qai_hub_models/models/efficientnet_b0/README.md @@ -6,7 +6,7 @@ EfficientNetB0 is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of EfficientNet-B0 found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/efficientnet.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/efficientnet_b0). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.efficientnet_b0.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of EfficientNet-B0 can be found +* The license for the original implementation of EfficientNet-B0 can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/efficientnet.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/efficientnet_b0/export.py b/qai_hub_models/models/efficientnet_b0/export.py index 80d94f65..b7d112a1 100644 --- a/qai_hub_models/models/efficientnet_b0/export.py +++ b/qai_hub_models/models/efficientnet_b0/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.efficientnet_b0 import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "efficientnet_b0" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/efficientnet_b0/perf.yaml b/qai_hub_models/models/efficientnet_b0/perf.yaml index 61118a09..0921e8f4 100644 --- a/qai_hub_models/models/efficientnet_b0/perf.yaml +++ b/qai_hub_models/models/efficientnet_b0/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: EfficientNet-B0 performance_metrics: - torchscript_onnx_tflite: - inference_time: 1604.0 - throughput: 623.4413965087282 + inference_time: 1603.0 + throughput: 623.8303181534623 estimated_peak_memory_range: - min: 12288 - max: 1636936 + min: 49152 + max: 1629576 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: jo5mrv3yg + job_id: j56y4jryp job_status: Passed torchscript_onnx_qnn: - inference_time: 1681.0 - throughput: 594.883997620464 + inference_time: 1673.0 + throughput: 597.7286312014345 estimated_peak_memory_range: - min: 618496 - max: 87307416 + min: 12288 + max: 85201864 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 243 - job_id: jogkzryyg + job_id: jgdx198zp job_status: Passed torchscript_onnx: - inference_time: 1612.0 - throughput: 620.3473945409429 + inference_time: 1591.0 + throughput: 628.5355122564425 estimated_peak_memory_range: - min: 626688 - max: 17485456 + min: 12288 + max: 15613000 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: jlpe94v7g + job_id: j5mnx2d7p job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:46:53Z' + timestamp: '2024-10-15T17:27:55Z' - torchscript_onnx_tflite: - inference_time: 1537.0 - throughput: 650.6180871828237 + inference_time: 1159.0 + throughput: 862.8127696289905 estimated_peak_memory_range: - min: 20480 - max: 78350944 + min: 16384 + max: 79371328 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: jegn2revg + job_id: jp3j03xng job_status: Passed torchscript_onnx_qnn: - inference_time: 1590.0 - throughput: 628.930817610063 + inference_time: 1392.0 + throughput: 718.3908045977012 estimated_peak_memory_range: min: 618496 - max: 19453120 + max: 19653984 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 243 - job_id: jn5q89275 + job_id: jp14zl7np job_status: Passed torchscript_onnx: - inference_time: 1202.0 - throughput: 831.9467554076539 + inference_time: 1200.0 + throughput: 833.3333333333334 estimated_peak_memory_range: min: 0 - max: 84180960 + max: 84684048 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: jygzev7zg + job_id: jpy13w70p job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:46:54Z' + timestamp: '2024-10-15T17:27:56Z' - torchscript_onnx_tflite: - inference_time: 1599.0 - throughput: 625.3908692933084 + inference_time: 1596.0 + throughput: 626.5664160401003 estimated_peak_memory_range: - min: 20480 - max: 1383992 + min: 28672 + max: 1894072 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: joprk1yv5 + job_id: jgo260okp job_status: Passed torchscript_onnx_qnn: - inference_time: 1590.0 - throughput: 628.930817610063 + inference_time: 1560.0 + throughput: 641.025641025641 estimated_peak_memory_range: - min: 626688 - max: 2414336 + min: 663552 + max: 1744304 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 243 - job_id: jw566q1v5 + job_id: j5mnx2o7p job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:46:48Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T17:27:48Z' - torchscript_onnx_tflite: - inference_time: 3063.0 - throughput: 326.47730982696703 + inference_time: 1606.0 + throughput: 622.66500622665 estimated_peak_memory_range: - min: 16384 - max: 87640400 + min: 28672 + max: 1469416 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: jep283mxp + job_id: j5we6v8m5 job_status: Passed torchscript_onnx_qnn: - inference_time: 3154.0 - throughput: 317.0577045022194 + inference_time: 1572.0 + throughput: 636.1323155216285 estimated_peak_memory_range: - min: 618496 - max: 22421040 + min: 630784 + max: 2023896 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 243 - job_id: j7gjxkl7p + job_id: j5q6qkmep job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:46:52Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T17:27:51Z' - torchscript_onnx_tflite: - inference_time: 1610.0 - throughput: 621.1180124223603 + inference_time: 1606.0 + throughput: 622.66500622665 estimated_peak_memory_range: - min: 24576 - max: 1478496 + min: 45056 + max: 305898320 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: jqpyevdrg + job_id: jgz3d98x5 job_status: Passed torchscript_onnx_qnn: - inference_time: 1574.0 - throughput: 635.3240152477764 + inference_time: 1572.0 + throughput: 636.1323155216285 estimated_peak_memory_range: - min: 626688 - max: 2065992 + min: 659456 + max: 2003088 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 243 - job_id: j1p3kqmx5 + job_id: jp0z0qv05 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:46:49Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T17:27:50Z' - torchscript_onnx_tflite: - inference_time: 1604.0 - throughput: 623.4413965087282 + inference_time: 1606.0 + throughput: 622.66500622665 estimated_peak_memory_range: - min: 12288 - max: 1437320 + min: 16384 + max: 24360160 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: j2p0yer2g + job_id: jgjvnmoeg job_status: Passed torchscript_onnx_qnn: - inference_time: 1574.0 - throughput: 635.3240152477764 + inference_time: 1580.0 + throughput: 632.9113924050633 estimated_peak_memory_range: min: 643072 - max: 2196424 + max: 2374184 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 243 - job_id: jwgoyev45 + job_id: jp2ky646p job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:46:50Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T17:27:49Z' - torchscript_onnx_tflite: - inference_time: 1608.0 - throughput: 621.8905472636816 + inference_time: 3074.0 + throughput: 325.30904359141186 estimated_peak_memory_range: - min: 20480 - max: 7748824 + min: 16384 + max: 88034752 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: j1p8ow7zg + job_id: jpv6koer5 job_status: Passed torchscript_onnx_qnn: - inference_time: 1568.0 - throughput: 637.7551020408164 + inference_time: 3166.0 + throughput: 315.8559696778269 estimated_peak_memory_range: - min: 634880 - max: 1954144 + min: 618496 + max: 23178912 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 243 - job_id: j1pv3zw75 + job_id: jgz3d9445 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T17:27:53Z' + - torchscript_onnx_tflite: + inference_time: 1118.0 + throughput: 894.4543828264758 + estimated_peak_memory_range: + min: 8192 + max: 32749360 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 245 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 245 + job_id: jp14zl77p + job_status: Passed + torchscript_onnx_qnn: + inference_time: 1171.0 + throughput: 853.9709649871904 + estimated_peak_memory_range: + min: 614400 + max: 15861920 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 243 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 243 + job_id: jp14zlvnp + job_status: Passed + torchscript_onnx: + inference_time: 962.0 + throughput: 1039.5010395010395 + estimated_peak_memory_range: + min: 0 + max: 36186352 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 245 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 245 + job_id: jp3j036mg + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:46:51Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T17:27:59Z' - torchscript_onnx_qnn: - inference_time: 1743.0 - throughput: 573.7234652897304 + inference_time: 1776.0 + throughput: 563.063063063063 estimated_peak_memory_range: min: 602112 max: 602112 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 243 - job_id: j1glnekep + job_id: j57yrwkn5 job_status: Passed torchscript_onnx: - inference_time: 1707.0 - throughput: 585.8230814294083 + inference_time: 1683.0 + throughput: 594.1770647653001 estimated_peak_memory_range: - min: 14692352 - max: 14692352 + min: 14618624 + max: 14618624 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: jz5wom9zp + job_id: jgkexn8vg job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:46:55Z' + timestamp: '2024-10-15T17:27:57Z' diff --git a/qai_hub_models/models/esrgan/README.md b/qai_hub_models/models/esrgan/README.md index 524a2fa3..80590084 100644 --- a/qai_hub_models/models/esrgan/README.md +++ b/qai_hub_models/models/esrgan/README.md @@ -6,7 +6,7 @@ ESRGAN is a machine learning model that upscales an image with minimal loss in quality. This is based on the implementation of ESRGAN found -[here](https://github.com/xinntao/ESRGAN/). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/esrgan). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.esrgan.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of ESRGAN can be found +* The license for the original implementation of ESRGAN can be found [here](https://github.com/xinntao/ESRGAN/blob/master/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks](https://arxiv.org/abs/1809.00219) * [Source Model Implementation](https://github.com/xinntao/ESRGAN/) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/esrgan/export.py b/qai_hub_models/models/esrgan/export.py index ff1189fa..6a55cf77 100644 --- a/qai_hub_models/models/esrgan/export.py +++ b/qai_hub_models/models/esrgan/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.esrgan import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "esrgan" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/esrgan/perf.yaml b/qai_hub_models/models/esrgan/perf.yaml index f8b4c070..ec2ed366 100644 --- a/qai_hub_models/models/esrgan/perf.yaml +++ b/qai_hub_models/models/esrgan/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: ESRGAN performance_metrics: - torchscript_onnx_tflite: - inference_time: 70500.0 - throughput: 14.184397163120567 + inference_time: 67448.0 + throughput: 14.826236508124778 estimated_peak_memory_range: - min: 3276800 - max: 5950536 + min: 3194880 + max: 6302928 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1024 - job_id: j0pxve6jg + job_id: jp14zlq7p job_status: Passed torchscript_onnx_qnn: - inference_time: 70092.0 - throughput: 14.266963419505792 + inference_time: 70723.0 + throughput: 14.139671676823664 estimated_peak_memory_range: - min: 118784 - max: 108858496 + min: 122880 + max: 115569424 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1026 - job_id: j1p8owzzg + job_id: jpy13wvlp job_status: Passed torchscript_onnx: - inference_time: 66911.0 - throughput: 14.945225747634918 + inference_time: 70475.0 + throughput: 14.189428875487762 estimated_peak_memory_range: - min: 110592 - max: 43964008 + min: 159744 + max: 44247240 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1028 - job_id: j7gjxke7p + job_id: jgjvnm1eg job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:46:13Z' + timestamp: '2024-10-15T17:27:12Z' - torchscript_onnx_tflite: - inference_time: 54928.0 - throughput: 18.20565103408098 + inference_time: 55287.0 + throughput: 18.087434659142293 estimated_peak_memory_range: - min: 3260416 - max: 627269248 + min: 3272704 + max: 690795968 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1024 - job_id: jo5mrv6yg + job_id: jgdx197zp job_status: Passed torchscript_onnx_qnn: - inference_time: 56060.0 - throughput: 17.83803068141277 + inference_time: 55722.0 + throughput: 17.946233085675317 estimated_peak_memory_range: - min: 69632 - max: 100600272 + min: 90112 + max: 114988816 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1026 - job_id: jogkzr3yg + job_id: jp0z0qen5 job_status: Passed torchscript_onnx: - inference_time: 55529.0 - throughput: 18.008608114678818 + inference_time: 59118.0 + throughput: 16.91532189857573 estimated_peak_memory_range: - min: 6418432 - max: 655419728 + min: 6443008 + max: 728828640 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1028 - job_id: jlpe94k7g + job_id: jpedm12v5 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:46:14Z' + timestamp: '2024-10-15T17:27:13Z' - torchscript_onnx_tflite: - inference_time: 64316.0 - throughput: 15.548230611356427 + inference_time: 60535.0 + throughput: 16.519368960105723 estimated_peak_memory_range: - min: 3203072 - max: 5437424 + min: 3186688 + max: 846716424 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1024 - job_id: jegn2r3vg + job_id: j57yrwv95 job_status: Passed torchscript_onnx_qnn: - inference_time: 62756.0 - throughput: 15.9347313404296 + inference_time: 62046.0 + throughput: 16.117074428649712 estimated_peak_memory_range: - min: 348160 - max: 1620992 + min: 434176 + max: 1666728 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1026 - job_id: j1glne3ep + job_id: jgkexnrng job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:46:07Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T17:27:05Z' - torchscript_onnx_tflite: - inference_time: 162662.0 - throughput: 6.147717352547 + inference_time: 64451.0 + throughput: 15.515663061860948 estimated_peak_memory_range: - min: 3223552 - max: 593337776 + min: 3473408 + max: 7950592 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1024 - job_id: joprk1ev5 + job_id: jgn6vyrq5 job_status: Passed torchscript_onnx_qnn: - inference_time: 135659.0 - throughput: 7.3714239379620965 + inference_time: 62024.0 + throughput: 16.12279117760867 estimated_peak_memory_range: - min: 237568 - max: 78143568 + min: 368640 + max: 2043176 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1026 - job_id: j1pv3zv75 + job_id: j56y4jvyp job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:46:12Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T17:27:08Z' - torchscript_onnx_tflite: - inference_time: 63613.0 - throughput: 15.720057221008284 + inference_time: 68788.0 + throughput: 14.537419317322788 estimated_peak_memory_range: - min: 3293184 - max: 5430712 + min: 3166208 + max: 6302720 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1024 - job_id: jep283lxp + job_id: j5mnx2v9p job_status: Passed torchscript_onnx_qnn: - inference_time: 62831.0 - throughput: 15.915710397733603 + inference_time: 63011.0 + throughput: 15.870244877878466 estimated_peak_memory_range: - min: 425984 - max: 1684216 + min: 348160 + max: 1969064 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1026 - job_id: jw566qnv5 + job_id: jglvmz7m5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:46:08Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T17:27:07Z' - torchscript_onnx_tflite: - inference_time: 67352.0 - throughput: 14.847369046205012 + inference_time: 64819.0 + throughput: 15.427575247998272 estimated_peak_memory_range: - min: 3252224 - max: 6000000 + min: 3256320 + max: 6156008 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1024 - job_id: jqpyev6rg + job_id: jpxkojel5 job_status: Passed torchscript_onnx_qnn: - inference_time: 64446.0 - throughput: 15.516866834248829 + inference_time: 63190.0 + throughput: 15.82528881152081 estimated_peak_memory_range: - min: 425984 - max: 5189448 + min: 409600 + max: 1637688 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1026 - job_id: j1p3kqex5 + job_id: j5q6qk9op job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:46:10Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T17:27:06Z' - torchscript_onnx_tflite: - inference_time: 65640.0 - throughput: 15.234613040828762 + inference_time: 136101.0 + throughput: 7.347484588651075 estimated_peak_memory_range: - min: 3268608 - max: 6235928 + min: 3162112 + max: 648556912 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1024 - job_id: j2p0yel2g + job_id: jp4lroj15 job_status: Passed torchscript_onnx_qnn: - inference_time: 63738.0 - throughput: 15.689227776208854 + inference_time: 134526.0 + throughput: 7.433507277403624 estimated_peak_memory_range: - min: 405504 - max: 1688576 + min: 331776 + max: 92125040 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,22 +329,75 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1026 - job_id: jwgoye345 + job_id: jgo260mkp job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T17:27:10Z' + - torchscript_onnx_tflite: + inference_time: 42308.0 + throughput: 23.636191736787367 + estimated_peak_memory_range: + min: 12288 + max: 188887088 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 1024 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 1024 + job_id: jp2ky63qp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 37764.0 + throughput: 26.480245736680438 + estimated_peak_memory_range: + min: 831488 + max: 137073696 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 1026 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 1026 + job_id: jpv6ko4r5 + job_status: Passed + torchscript_onnx: + inference_time: 38328.0 + throughput: 26.09058651638489 + estimated_peak_memory_range: + min: 8298496 + max: 194769792 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 1028 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 1028 + job_id: jg9ln188g + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:46:11Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T17:27:15Z' - torchscript_onnx_qnn: - inference_time: 65269.0 - throughput: 15.321209149826105 + inference_time: 64824.0 + throughput: 15.426385289398988 estimated_peak_memory_range: - min: 208896 - max: 208896 + min: 204800 + max: 204800 primary_compute_unit: NPU precision: fp16 layer_info: @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1026 - job_id: jn5q89375 + job_id: jp8qy9wop job_status: Passed torchscript_onnx: - inference_time: 65506.0 - throughput: 15.26577718071627 + inference_time: 65670.0 + throughput: 15.227653418608192 estimated_peak_memory_range: - min: 40529920 - max: 40529920 + min: 39833600 + max: 39833600 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1028 - job_id: jygzevrzg + job_id: jgz3d9wx5 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:46:15Z' + timestamp: '2024-10-15T17:27:14Z' diff --git a/qai_hub_models/models/facemap_3dmm/README.md b/qai_hub_models/models/facemap_3dmm/README.md index fadc852d..d320f639 100644 --- a/qai_hub_models/models/facemap_3dmm/README.md +++ b/qai_hub_models/models/facemap_3dmm/README.md @@ -1,14 +1,14 @@ [![Qualcomm® AI Hub Models](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/quic-logo.jpg)](../../README.md) -# [FaceMap_3DMM: Facial landmark predictor with 3DMM](#) +# [Facial-Landmark-Detection: Facial landmark predictor with 3DMM](https://aihub.qualcomm.com/models/facemap_3dmm) Facial landmark is a deep learning model that can predict 68 landmarks from a single image. It can also be used as a backbone in building more complex models for specific use cases. -This is based on the implementation of FaceMap_3DMM found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py). This repository contains scripts for optimized on-device +This is based on the implementation of Facial-Landmark-Detection found +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance -accross various devices, can be found [here](#). +accross various devices, can be found [here](https://aihub.qualcomm.com/models/facemap_3dmm). [Sign up](https://myaccount.qualcomm.com/signup) to start using Qualcomm AI Hub and run these models on a hosted Qualcomm® device. @@ -39,14 +39,18 @@ python -m qai_hub_models.models.facemap_3dmm.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of FaceMap_3DMM can be found - [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the original implementation of Facial-Landmark-Detection can be found + [here](https://github.com/qcom-ai-hub/ai-hub-models-internal/blob/main/LICENSE). +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References -* [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385.) -* [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py) +* [None](None) +* [Source Model Implementation](https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/facemap_3dmm/model.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. diff --git a/qai_hub_models/models/facemap_3dmm/export.py b/qai_hub_models/models/facemap_3dmm/export.py index 6ab29d08..c21ddb3c 100644 --- a/qai_hub_models/models/facemap_3dmm/export.py +++ b/qai_hub_models/models/facemap_3dmm/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.facemap_3dmm import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "facemap_3dmm" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -96,7 +94,7 @@ def export_model( if not can_access_qualcomm_ai_hub(): return export_without_hub_access( "facemap_3dmm", - "FaceMap_3DMM", + "Facial-Landmark-Detection", device, skip_profiling, skip_inferencing, @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/facemap_3dmm/info.yaml b/qai_hub_models/models/facemap_3dmm/info.yaml index 68142447..1d9f2a4c 100644 --- a/qai_hub_models/models/facemap_3dmm/info.yaml +++ b/qai_hub_models/models/facemap_3dmm/info.yaml @@ -1,20 +1,18 @@ -name: FaceMap_3DMM +name: Facial-Landmark-Detection # id must match with the model dir name in qai_hub_models id: facemap_3dmm status: public headline: Facial landmark predictor with 3DMM. domain: Computer Vision -use_case: POSE_ESTIMATION +use_case: Pose Estimation description: Facial landmark is a deep learning model that can predict 68 landmarks from a single image. It can also be used as a backbone in building more complex models for specific use cases. tags: - backbone -research_paper: https://arxiv.org/abs/1512.03385. -research_paper_title: Deep Residual Learning for Image Recognition -license: https://github.com/pytorch/vision/blob/main/LICENSE +source_repo: https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/facemap_3dmm/model.py +license: https://github.com/qcom-ai-hub/ai-hub-models-internal/blob/main/LICENSE deploy_license: https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf -source_repo: https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py technical_details: Input resolution: 128x128 Number of parameters: 5.424M @@ -29,10 +27,8 @@ form_factors: - Tablet - IoT - XR -has_static_banner: false -has_animated_banner: false +has_static_banner: true +has_animated_banner: true license_type: bsd-3-clause deploy_license_type: AI Model Hub License -dataset: - - QDMS images - - R channel images from Datatang captured images, Getty images, Multipie Images +dataset: [] diff --git a/qai_hub_models/models/facemap_3dmm/model.py b/qai_hub_models/models/facemap_3dmm/model.py index cad3dee9..5c9264af 100644 --- a/qai_hub_models/models/facemap_3dmm/model.py +++ b/qai_hub_models/models/facemap_3dmm/model.py @@ -63,4 +63,4 @@ def get_input_spec() -> InputSpec: @staticmethod def get_output_names() -> List[str]: - return ["3dmm_parameters"] + return ["parameters_3dmm"] diff --git a/qai_hub_models/models/facemap_3dmm/perf.yaml b/qai_hub_models/models/facemap_3dmm/perf.yaml new file mode 100644 index 00000000..4be519d1 --- /dev/null +++ b/qai_hub_models/models/facemap_3dmm/perf.yaml @@ -0,0 +1,432 @@ +aggregated: + supported_oses: + - Android + supported_devices: + - Snapdragon 8 Elite QRD + - Samsung Galaxy S24 + - Samsung Galaxy S24 Ultra + - Samsung Galaxy S24+ + - Snapdragon 8 Gen 3 QRD + - Samsung Galaxy S23 + - Samsung Galaxy S23 Ultra + - Samsung Galaxy S23+ + - Samsung Galaxy S22 5G + - Samsung Galaxy S22 Ultra 5G + - Samsung Galaxy S22+ 5G + - Samsung Galaxy Tab S8 + - Xiaomi 12 + - Xiaomi 12 Pro + - Samsung Galaxy S21 + - Samsung Galaxy S21 Ultra + - Samsung Galaxy S21+ + - Snapdragon X Elite CRD + - Snapdragon X Plus 8-Core CRD + - QCS8450 (Proxy) + - XR2 Gen 2 (Proxy) + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) + supported_chipsets: + - Snapdragon® 8 Elite + - Snapdragon® 8 Gen 3 + - Snapdragon® 8 Gen 2 + - Snapdragon® 8 Gen 1 + - Snapdragon® 888 + - Snapdragon® X Elite + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy +models: +- name: Facial-Landmark-Detection + performance_metrics: + - torchscript_onnx_tflite: + inference_time: 347.0 + throughput: 2881.844380403458 + estimated_peak_memory_range: + min: 24576 + max: 3974056 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 47 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 47 + job_id: j5q6qllmp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 364.0 + throughput: 2747.252747252747 + estimated_peak_memory_range: + min: 233472 + max: 26049896 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 60 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 60 + job_id: j5we6llj5 + job_status: Passed + torchscript_onnx: + inference_time: 474.0 + throughput: 2109.7046413502107 + estimated_peak_memory_range: + min: 40960 + max: 12298000 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 59 + layers_on_gpu: 0 + layers_on_cpu: 1 + total_layers: 60 + job_id: jpxkol015 + job_status: Passed + reference_device_info: + name: Samsung Galaxy S23 + os: '13' + form_factor: Phone + os_name: Android + manufacturer: Samsung + chipset: Snapdragon® 8 Gen 2 + timestamp: '2024-10-15T00:49:51Z' + - torchscript_onnx_tflite: + inference_time: 274.0 + throughput: 3649.6350364963505 + estimated_peak_memory_range: + min: 16384 + max: 26486288 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 47 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 47 + job_id: jglvmyyl5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 306.0 + throughput: 3267.97385620915 + estimated_peak_memory_range: + min: 208896 + max: 11125488 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 60 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 60 + job_id: jg9lnzzvg + job_status: Passed + torchscript_onnx: + inference_time: 378.0 + throughput: 2645.5026455026455 + estimated_peak_memory_range: + min: 0 + max: 27464624 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 59 + layers_on_gpu: 0 + layers_on_cpu: 1 + total_layers: 60 + job_id: j5mnx09wp + job_status: Passed + reference_device_info: + name: Samsung Galaxy S24 + os: '14' + form_factor: Phone + os_name: Android + manufacturer: Samsung + chipset: Snapdragon® 8 Gen 3 + timestamp: '2024-10-15T00:49:52Z' + - torchscript_onnx_tflite: + inference_time: 352.0 + throughput: 2840.909090909091 + estimated_peak_memory_range: + min: 28672 + max: 1533216 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 47 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 47 + job_id: j56y4887p + job_status: Passed + torchscript_onnx_qnn: + inference_time: 351.0 + throughput: 2849.002849002849 + estimated_peak_memory_range: + min: 229376 + max: 1547376 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 60 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 60 + job_id: jgdx1ddlp + job_status: Passed + reference_device_info: + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:49:43Z' + - torchscript_onnx_tflite: + inference_time: 348.0 + throughput: 2873.5632183908046 + estimated_peak_memory_range: + min: 28672 + max: 1568024 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 47 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 47 + job_id: jgjvnrr8g + job_status: Passed + torchscript_onnx_qnn: + inference_time: 359.0 + throughput: 2785.515320334262 + estimated_peak_memory_range: + min: 221184 + max: 1919480 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 60 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 60 + job_id: jp14zno2p + job_status: Passed + reference_device_info: + name: SA8255 (Proxy) + os: '13' + form_factor: Auto + os_name: Android + manufacturer: Qualcomm + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:49:47Z' + - torchscript_onnx_tflite: + inference_time: 344.0 + throughput: 2906.9767441860463 + estimated_peak_memory_range: + min: 28672 + max: 166357928 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 47 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 47 + job_id: jpv6kllm5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 351.0 + throughput: 2849.002849002849 + estimated_peak_memory_range: + min: 225280 + max: 1447848 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 60 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 60 + job_id: jg9lnzolg + job_status: Passed + reference_device_info: + name: SA8775 (Proxy) + os: '13' + form_factor: Auto + os_name: Android + manufacturer: Qualcomm + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:49:45Z' + - torchscript_onnx_tflite: + inference_time: 342.0 + throughput: 2923.9766081871344 + estimated_peak_memory_range: + min: 28672 + max: 1508248 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 47 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 47 + job_id: jgo26lldp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 351.0 + throughput: 2849.002849002849 + estimated_peak_memory_range: + min: 221184 + max: 1473168 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 60 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 60 + job_id: j5we6ly65 + job_status: Passed + reference_device_info: + name: SA8650 (Proxy) + os: '13' + form_factor: Auto + os_name: Android + manufacturer: Qualcomm + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:49:44Z' + - torchscript_onnx_tflite: + inference_time: 450.0 + throughput: 2222.222222222222 + estimated_peak_memory_range: + min: 24576 + max: 26895376 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 47 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 47 + job_id: jp3j0zzzg + job_status: Passed + torchscript_onnx_qnn: + inference_time: 476.0 + throughput: 2100.840336134454 + estimated_peak_memory_range: + min: 0 + max: 13865344 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 60 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 60 + job_id: j57yreol5 + job_status: Passed + reference_device_info: + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:49:49Z' + - torchscript_onnx_tflite: + inference_time: 266.0 + throughput: 3759.3984962406016 + estimated_peak_memory_range: + min: 12288 + max: 15695424 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 47 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 47 + job_id: jgz3dll65 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 299.0 + throughput: 3344.4816053511704 + estimated_peak_memory_range: + min: 0 + max: 9206240 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 60 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 60 + job_id: jp4lryev5 + job_status: Passed + torchscript_onnx: + inference_time: 369.0 + throughput: 2710.027100271003 + estimated_peak_memory_range: + min: 0 + max: 16134224 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 59 + layers_on_gpu: 0 + layers_on_cpu: 1 + total_layers: 60 + job_id: jp2kyro4p + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:49:55Z' + - torchscript_onnx_qnn: + inference_time: 450.0 + throughput: 2222.222222222222 + estimated_peak_memory_range: + min: 585728 + max: 585728 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 60 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 60 + job_id: jp14znnlp + job_status: Passed + torchscript_onnx: + inference_time: 527.0 + throughput: 1897.5332068311195 + estimated_peak_memory_range: + min: 12500992 + max: 12500992 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 59 + layers_on_gpu: 0 + layers_on_cpu: 1 + total_layers: 60 + job_id: jgn6vz1r5 + job_status: Passed + reference_device_info: + name: Snapdragon X Elite CRD + os: '11' + form_factor: Compute + os_name: Windows + manufacturer: Qualcomm + chipset: Snapdragon® X Elite + timestamp: '2024-10-15T00:49:53Z' diff --git a/qai_hub_models/models/fastsam_s/README.md b/qai_hub_models/models/fastsam_s/README.md index a2e3760d..882d25fb 100644 --- a/qai_hub_models/models/fastsam_s/README.md +++ b/qai_hub_models/models/fastsam_s/README.md @@ -6,7 +6,7 @@ The Fast Segment Anything Model (FastSAM) is a novel, real-time CNN-based solution for the Segment Anything task. This task is designed to segment any object within an image based on various possible user interaction prompts. The model performs competitively despite significantly reduced computation, making it a practical choice for a variety of vision tasks. This is based on the implementation of FastSam-S found -[here](https://github.com/CASIA-IVA-Lab/FastSAM). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/fastsam_s). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.fastsam_s.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of FastSam-S can be found +* The license for the original implementation of FastSam-S can be found [here](https://github.com/CASIA-IVA-Lab/FastSAM/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://github.com/CASIA-IVA-Lab/FastSAM/blob/main/LICENSE) +* The license for the compiled assets for on-device deployment can be found [here](https://github.com/CASIA-IVA-Lab/FastSAM/blob/main/LICENSE) + ## References * [Fast Segment Anything](https://arxiv.org/abs/2306.12156) * [Source Model Implementation](https://github.com/CASIA-IVA-Lab/FastSAM) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/fastsam_s/export.py b/qai_hub_models/models/fastsam_s/export.py index 11a05633..d38cdfd2 100644 --- a/qai_hub_models/models/fastsam_s/export.py +++ b/qai_hub_models/models/fastsam_s/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.fastsam_s import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "fastsam_s" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -122,7 +120,7 @@ def export_model( model.to("cpu"), make_torch_inputs(input_spec), check_trace=False ) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -136,7 +134,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -151,7 +149,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -172,13 +170,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -199,7 +197,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/fastsam_s/perf.yaml b/qai_hub_models/models/fastsam_s/perf.yaml index 839aa124..10c4cd5b 100644 --- a/qai_hub_models/models/fastsam_s/perf.yaml +++ b/qai_hub_models/models/fastsam_s/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: FastSam-S performance_metrics: - torchscript_onnx_qnn: - inference_time: 8062.0 - throughput: 124.03870007442322 + inference_time: 8064.0 + throughput: 124.0079365079365 estimated_peak_memory_range: - min: 6352896 - max: 23213592 + min: 4218880 + max: 19390024 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 286 - job_id: jogkzrqyg + job_id: jgjvnr78g job_status: Passed torchscript_onnx: - inference_time: 10484.0 - throughput: 95.38344143456696 + inference_time: 9580.0 + throughput: 104.38413361169103 estimated_peak_memory_range: - min: 327680 - max: 28540592 + min: 4132864 + max: 25901272 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,7 +71,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 289 - job_id: jygzevjzg + job_id: j5mnx00qp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -82,13 +80,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:45:21Z' + timestamp: '2024-10-15T00:49:05Z' - torchscript_onnx_qnn: - inference_time: 6560.0 - throughput: 152.4390243902439 + inference_time: 6960.0 + throughput: 143.67816091954023 estimated_peak_memory_range: min: 4931584 - max: 36809936 + max: 40159504 primary_compute_unit: NPU precision: fp16 layer_info: @@ -96,14 +94,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 286 - job_id: jn5q89r75 + job_id: jpedm7z05 job_status: Passed torchscript_onnx: - inference_time: 8041.0 - throughput: 124.36264146250467 + inference_time: 7273.0 + throughput: 137.49484394335212 estimated_peak_memory_range: - min: 14278656 - max: 89357632 + min: 1331200 + max: 84592272 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,7 +109,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 289 - job_id: jz5wom3zp + job_id: jgn6vzzm5 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -120,13 +118,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:45:22Z' + timestamp: '2024-10-15T00:49:06Z' - torchscript_onnx_qnn: - inference_time: 7924.0 - throughput: 126.19888944977284 + inference_time: 7380.0 + throughput: 135.50135501355012 estimated_peak_memory_range: - min: 4952064 - max: 10352832 + min: 4947968 + max: 10193600 primary_compute_unit: NPU precision: fp16 layer_info: @@ -134,7 +132,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 286 - job_id: jw566qzv5 + job_id: j5we6l7j5 job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -142,14 +140,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:45:16Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:48:58Z' - torchscript_onnx_qnn: - inference_time: 13435.0 - throughput: 74.4324525493115 + inference_time: 7689.0 + throughput: 130.05592404734037 estimated_peak_memory_range: - min: 4952064 - max: 35733824 + min: 4968448 + max: 10193296 primary_compute_unit: NPU precision: fp16 layer_info: @@ -157,22 +155,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 286 - job_id: jlpe94w7g + job_id: jgdx1d3lp job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:45:20Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:49:01Z' - torchscript_onnx_qnn: - inference_time: 8030.0 - throughput: 124.53300124533001 + inference_time: 7719.0 + throughput: 129.55045990413265 estimated_peak_memory_range: - min: 4988928 - max: 8364352 + min: 4997120 + max: 9779592 primary_compute_unit: NPU precision: fp16 layer_info: @@ -180,22 +178,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 286 - job_id: j1p3kq1x5 + job_id: jp14znjlp job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:45:17Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:49:00Z' - torchscript_onnx_qnn: - inference_time: 8004.0 - throughput: 124.9375312343828 + inference_time: 7618.0 + throughput: 131.26804935678655 estimated_peak_memory_range: - min: 4997120 - max: 8482232 + min: 4972544 + max: 8709760 primary_compute_unit: NPU precision: fp16 layer_info: @@ -203,22 +201,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 286 - job_id: jwgoyen45 + job_id: jg9lnzmvg job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:45:18Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:48:59Z' - torchscript_onnx_qnn: - inference_time: 7833.0 - throughput: 127.66500702157539 + inference_time: 13749.0 + throughput: 72.73256236817222 estimated_peak_memory_range: - min: 4947968 - max: 8111160 + min: 4960256 + max: 44328528 primary_compute_unit: NPU precision: fp16 layer_info: @@ -226,19 +224,57 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 286 - job_id: j1pv3zr75 + job_id: jp4lryyl5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:49:03Z' + - torchscript_onnx_qnn: + inference_time: 5490.0 + throughput: 182.14936247723134 + estimated_peak_memory_range: + min: 4927488 + max: 37555888 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 286 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 286 + job_id: jpxkoll95 + job_status: Passed + torchscript_onnx: + inference_time: 5354.0 + throughput: 186.77624206200971 + estimated_peak_memory_range: + min: 16953344 + max: 65201088 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 289 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 289 + job_id: jpy13oo4p + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:45:19Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:49:09Z' - torchscript_onnx_qnn: - inference_time: 8378.0 - throughput: 119.36022917164001 + inference_time: 8317.0 + throughput: 120.23566189731875 estimated_peak_memory_range: min: 4923392 max: 4923392 @@ -249,14 +285,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 286 - job_id: j1glne2ep + job_id: jgz3dlm65 job_status: Passed torchscript_onnx: - inference_time: 10647.0 - throughput: 93.92317084624777 + inference_time: 9903.0 + throughput: 100.97950116126427 estimated_peak_memory_range: - min: 21442560 - max: 21442560 + min: 22536192 + max: 22536192 primary_compute_unit: NPU precision: fp16 layer_info: @@ -264,7 +300,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 289 - job_id: jmg9v9yq5 + job_id: jprv3lleg job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -273,4 +309,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:45:23Z' + timestamp: '2024-10-15T00:49:07Z' diff --git a/qai_hub_models/models/fastsam_x/README.md b/qai_hub_models/models/fastsam_x/README.md index b6890348..0ebcaa2c 100644 --- a/qai_hub_models/models/fastsam_x/README.md +++ b/qai_hub_models/models/fastsam_x/README.md @@ -6,7 +6,7 @@ The Fast Segment Anything Model (FastSAM) is a novel, real-time CNN-based solution for the Segment Anything task. This task is designed to segment any object within an image based on various possible user interaction prompts. The model performs competitively despite significantly reduced computation, making it a practical choice for a variety of vision tasks. This is based on the implementation of FastSam-X found -[here](https://github.com/CASIA-IVA-Lab/FastSAM). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/fastsam_x). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.fastsam_x.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of FastSam-X can be found +* The license for the original implementation of FastSam-X can be found [here](https://github.com/CASIA-IVA-Lab/FastSAM/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://github.com/CASIA-IVA-Lab/FastSAM/blob/main/LICENSE) +* The license for the compiled assets for on-device deployment can be found [here](https://github.com/CASIA-IVA-Lab/FastSAM/blob/main/LICENSE) + ## References * [Fast Segment Anything](https://arxiv.org/abs/2306.12156) * [Source Model Implementation](https://github.com/CASIA-IVA-Lab/FastSAM) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/fastsam_x/export.py b/qai_hub_models/models/fastsam_x/export.py index 5a8cba67..46438c56 100644 --- a/qai_hub_models/models/fastsam_x/export.py +++ b/qai_hub_models/models/fastsam_x/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.fastsam_x import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "fastsam_x" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -122,7 +120,7 @@ def export_model( model.to("cpu"), make_torch_inputs(input_spec), check_trace=False ) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -136,7 +134,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -151,7 +149,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -172,13 +170,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -199,7 +197,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/fastsam_x/perf.yaml b/qai_hub_models/models/fastsam_x/perf.yaml index 90071a5c..7b95d17f 100644 --- a/qai_hub_models/models/fastsam_x/perf.yaml +++ b/qai_hub_models/models/fastsam_x/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: FastSam-X performance_metrics: - torchscript_onnx_qnn: - inference_time: 45786.0 - throughput: 21.84073734329271 + inference_time: 45671.0 + throughput: 21.895732521731514 estimated_peak_memory_range: - min: 4968448 - max: 21532680 + min: 4980736 + max: 20363240 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 418 - job_id: jogkzd8og + job_id: jp3j0z6zg job_status: Passed torchscript_onnx: - inference_time: 49871.0 - throughput: 20.051733472358684 + inference_time: 48826.0 + throughput: 20.480891328390612 estimated_peak_memory_range: - min: 106496 - max: 165346544 + min: 28672 + max: 164570040 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,7 +71,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 421 - job_id: jlpe92y0g + job_id: j57yre4r5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -82,13 +80,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T23:10:24Z' + timestamp: '2024-10-15T00:48:17Z' - torchscript_onnx_qnn: - inference_time: 38576.0 - throughput: 25.922853587722937 + inference_time: 38249.0 + throughput: 26.144474365342884 estimated_peak_memory_range: - min: 4931584 - max: 54882560 + min: 4952064 + max: 66109264 primary_compute_unit: NPU precision: fp16 layer_info: @@ -96,14 +94,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 418 - job_id: jn5q8wvm5 + job_id: jgo26l8dp job_status: Passed torchscript_onnx: - inference_time: 40684.0 - throughput: 24.579687346376954 + inference_time: 39188.0 + throughput: 25.518015719097683 estimated_peak_memory_range: - min: 479232 - max: 145914784 + min: 585728 + max: 164283632 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,7 +109,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 421 - job_id: jygzewn6g + job_id: jp4lry1l5 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -120,13 +118,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T23:10:25Z' + timestamp: '2024-10-15T00:48:18Z' - torchscript_onnx_qnn: - inference_time: 40305.0 - throughput: 24.810817516437165 + inference_time: 43275.0 + throughput: 23.108030040439054 estimated_peak_memory_range: - min: 5017600 - max: 6240864 + min: 5120000 + max: 6442816 primary_compute_unit: NPU precision: fp16 layer_info: @@ -134,7 +132,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 418 - job_id: jw566vw75 + job_id: jgjvnrq8g job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -142,14 +140,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T23:10:20Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:48:09Z' - torchscript_onnx_qnn: - inference_time: 86347.0 - throughput: 11.581178269077096 + inference_time: 43253.0 + throughput: 23.119783598825514 estimated_peak_memory_range: - min: 4931584 - max: 53543440 + min: 5042176 + max: 11567880 primary_compute_unit: NPU precision: fp16 layer_info: @@ -157,22 +155,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 418 - job_id: j7gjx1q8p + job_id: j5we6l4j5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T23:10:23Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:48:13Z' - torchscript_onnx_qnn: - inference_time: 42954.0 - throughput: 23.280718908599898 + inference_time: 42992.0 + throughput: 23.260141421659842 estimated_peak_memory_range: - min: 5115904 - max: 6705216 + min: 5033984 + max: 12084568 primary_compute_unit: NPU precision: fp16 layer_info: @@ -180,22 +178,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 418 - job_id: j1p3k86z5 + job_id: jgz3dln65 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T23:10:21Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:48:12Z' - torchscript_onnx_qnn: - inference_time: 43626.0 - throughput: 22.922110667950307 + inference_time: 43064.0 + throughput: 23.22125208991269 estimated_peak_memory_range: - min: 5058560 - max: 6374936 + min: 5038080 + max: 13404360 primary_compute_unit: NPU precision: fp16 layer_info: @@ -203,22 +201,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 418 - job_id: jwgoym8d5 + job_id: jpedm7y05 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T23:10:21Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:48:10Z' - torchscript_onnx_qnn: - inference_time: 43428.0 - throughput: 23.02661877129962 + inference_time: 90623.0 + throughput: 11.034726283614535 estimated_peak_memory_range: - min: 5099520 - max: 6388424 + min: 4931584 + max: 70253824 primary_compute_unit: NPU precision: fp16 layer_info: @@ -226,19 +224,57 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 418 - job_id: j1pv347m5 + job_id: jp14zn6lp job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:48:15Z' + - torchscript_onnx_qnn: + inference_time: 30823.0 + throughput: 32.443305323946404 + estimated_peak_memory_range: + min: 4927488 + max: 64419072 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 418 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 418 + job_id: jgdx1d2lp + job_status: Passed + torchscript_onnx: + inference_time: 31678.0 + throughput: 31.567649472820253 + estimated_peak_memory_range: + min: 753664 + max: 80827200 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 421 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 421 + job_id: jgn6vznm5 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T23:10:22Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:48:21Z' - torchscript_onnx_qnn: - inference_time: 44461.0 - throughput: 22.491621870853105 + inference_time: 44484.0 + throughput: 22.4799928064023 estimated_peak_memory_range: min: 4923392 max: 4923392 @@ -249,14 +285,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 418 - job_id: j1gln7llp + job_id: jpv6kl7m5 job_status: Passed torchscript_onnx: - inference_time: 49409.0 - throughput: 20.239227671072072 + inference_time: 49500.0 + throughput: 20.2020202020202 estimated_peak_memory_range: - min: 146132992 - max: 146132992 + min: 146264064 + max: 146264064 primary_compute_unit: NPU precision: fp16 layer_info: @@ -264,7 +300,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 421 - job_id: jz5wox4jp + job_id: jpxkol495 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -273,4 +309,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T23:10:26Z' + timestamp: '2024-10-15T00:48:19Z' diff --git a/qai_hub_models/models/fcn_resnet50/README.md b/qai_hub_models/models/fcn_resnet50/README.md index 5f781abd..862ed945 100644 --- a/qai_hub_models/models/fcn_resnet50/README.md +++ b/qai_hub_models/models/fcn_resnet50/README.md @@ -6,7 +6,7 @@ FCN_ResNet50 is a machine learning model that can segment images from the COCO dataset. It uses ResNet50 as a backbone. This is based on the implementation of FCN-ResNet50 found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/segmentation/fcn.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/fcn_resnet50). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.fcn_resnet50.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of FCN-ResNet50 can be found +* The license for the original implementation of FCN-ResNet50 can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Fully Convolutional Networks for Semantic Segmentation](https://arxiv.org/abs/1411.4038) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/segmentation/fcn.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/fcn_resnet50/export.py b/qai_hub_models/models/fcn_resnet50/export.py index 6d9a42c3..ecb04253 100644 --- a/qai_hub_models/models/fcn_resnet50/export.py +++ b/qai_hub_models/models/fcn_resnet50/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.fcn_resnet50 import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "fcn_resnet50" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/fcn_resnet50/perf.yaml b/qai_hub_models/models/fcn_resnet50/perf.yaml index 8161c75a..7f89e516 100644 --- a/qai_hub_models/models/fcn_resnet50/perf.yaml +++ b/qai_hub_models/models/fcn_resnet50/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: FCN-ResNet50 performance_metrics: - torchscript_onnx_tflite: - inference_time: 41802.0 - throughput: 23.922300368403427 + inference_time: 41294.0 + throughput: 24.216593209667263 estimated_peak_memory_range: - min: 98304 - max: 2318400 + min: 22110208 + max: 24248904 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 86 - job_id: j1glne42p + job_id: jp4lry9l5 job_status: Passed torchscript_onnx_qnn: - inference_time: 42427.0 - throughput: 23.569896528154242 + inference_time: 42202.0 + throughput: 23.695559452158665 estimated_peak_memory_range: - min: 3215360 - max: 18722408 + min: 3166208 + max: 18804520 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 127 - job_id: jygzev24g + job_id: jgkex29og job_status: Passed torchscript_onnx: - inference_time: 42829.0 - throughput: 23.3486656237596 + inference_time: 43019.0 + throughput: 23.245542667193565 estimated_peak_memory_range: - min: 16384 - max: 124852056 + min: 47304704 + max: 49692968 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 129 - job_id: jo5mrve7g + job_id: j5we6l1j5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:43:52Z' + timestamp: '2024-10-15T00:47:19Z' - torchscript_onnx_tflite: - inference_time: 36249.0 - throughput: 27.586967916356315 + inference_time: 36401.0 + throughput: 27.471772753495785 estimated_peak_memory_range: - min: 7905280 - max: 152998672 + min: 22093824 + max: 191262144 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 86 - job_id: jw566q2n5 + job_id: jpxkold95 job_status: Passed torchscript_onnx_qnn: - inference_time: 36793.0 - throughput: 27.179082977740332 + inference_time: 38829.0 + throughput: 25.75394679234593 estimated_peak_memory_range: - min: 2564096 - max: 57565264 + min: 3162112 + max: 77455616 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 127 - job_id: jz5womw4p + job_id: j5q6qlmmp job_status: Passed torchscript_onnx: - inference_time: 39140.0 - throughput: 25.549310168625446 + inference_time: 39035.0 + throughput: 25.618035096708084 estimated_peak_memory_range: - min: 49319936 - max: 195369312 + min: 1544192 + max: 174381568 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 129 - job_id: jegn2r0jg + job_id: jg9lnzxvg job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:43:53Z' + timestamp: '2024-10-15T00:47:20Z' - torchscript_onnx_tflite: - inference_time: 41582.0 - throughput: 24.04886729835025 + inference_time: 41355.0 + throughput: 24.180872929512756 estimated_peak_memory_range: - min: 22102016 - max: 24466560 + min: 22097920 + max: 24380520 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 86 - job_id: j1p3kqnm5 + job_id: j5mnx0dqp job_status: Passed torchscript_onnx_qnn: - inference_time: 38881.0 - throughput: 25.719503099200125 + inference_time: 39263.0 + throughput: 25.469271324147417 estimated_peak_memory_range: - min: 3219456 - max: 4360256 + min: 3276800 + max: 4505048 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 127 - job_id: jnp10q2n5 + job_id: j56y48d7p job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:43:47Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:47:11Z' - torchscript_onnx_tflite: - inference_time: 65469.0 - throughput: 15.274404680077595 + inference_time: 41195.0 + throughput: 24.274790629930816 estimated_peak_memory_range: - min: 22151168 - max: 109518784 + min: 22126592 + max: 24212112 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 86 - job_id: jwgoyez15 + job_id: jpy13o74p job_status: Passed torchscript_onnx_qnn: - inference_time: 68055.0 - throughput: 14.693997502020425 + inference_time: 39366.0 + throughput: 25.40263171264543 estimated_peak_memory_range: - min: 1740800 - max: 36045680 + min: 3289088 + max: 4504280 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 127 - job_id: j0pxve98g + job_id: jpv6kl9m5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:43:51Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:47:14Z' - torchscript_onnx_tflite: - inference_time: 41353.0 - throughput: 24.182042415302398 + inference_time: 41439.0 + throughput: 24.131856463717753 estimated_peak_memory_range: - min: 0 - max: 1795512 + min: 22040576 + max: 23827488 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 86 - job_id: j1pv3zqz5 + job_id: jp2kyrvmp job_status: Passed torchscript_onnx_qnn: - inference_time: 39684.0 - throughput: 25.199072674125592 + inference_time: 38937.0 + throughput: 25.682512777050107 estimated_peak_memory_range: - min: 3334144 - max: 4604568 + min: 3313664 + max: 4747336 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 127 - job_id: jvgdw7n65 + job_id: jgo26l4dp job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:43:48Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:47:13Z' - torchscript_onnx_tflite: - inference_time: 41532.0 - throughput: 24.077819512664934 + inference_time: 41541.0 + throughput: 24.072602970559206 estimated_peak_memory_range: - min: 22126592 - max: 24172080 + min: 22110208 + max: 23911848 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 86 - job_id: j7gjxkd1p + job_id: jprv3lneg job_status: Passed torchscript_onnx_qnn: - inference_time: 39565.0 - throughput: 25.274864147605207 + inference_time: 39506.0 + throughput: 25.312610742672 estimated_peak_memory_range: - min: 3289088 - max: 4588256 + min: 3309568 + max: 4568112 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 127 - job_id: jz57zv2np + job_id: jp3j0zwzg job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:43:49Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:47:12Z' - torchscript_onnx_tflite: - inference_time: 41181.0 - throughput: 24.28304315096768 + inference_time: 65773.0 + throughput: 15.203807033281134 estimated_peak_memory_range: - min: 22126592 - max: 24436800 + min: 22204416 + max: 119273184 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 86 - job_id: jlpe94o8g + job_id: jgn6vz7m5 job_status: Passed torchscript_onnx_qnn: - inference_time: 38834.0 - throughput: 25.750630890456815 + inference_time: 65372.0 + throughput: 15.297069081563972 estimated_peak_memory_range: - min: 3317760 - max: 4564432 + min: 3260416 + max: 41841584 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 127 - job_id: jqp4qjn2g + job_id: jpedm7l05 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:47:17Z' + - torchscript_onnx_tflite: + inference_time: 29946.0 + throughput: 33.39344152808388 + estimated_peak_memory_range: + min: 15872000 + max: 118544416 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 86 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 86 + job_id: jp8qye48p + job_status: Passed + torchscript_onnx_qnn: + inference_time: 29775.0 + throughput: 33.58522250209908 + estimated_peak_memory_range: + min: 3194880 + max: 74920576 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 127 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 127 + job_id: jgz3dl465 + job_status: Passed + torchscript_onnx: + inference_time: 26793.0 + throughput: 37.32318142798492 + estimated_peak_memory_range: + min: 35823616 + max: 137945296 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 129 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 129 + job_id: j57yre9r5 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:43:50Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:47:23Z' - torchscript_onnx_qnn: - inference_time: 39297.0 - throughput: 25.447235157900096 + inference_time: 39296.0 + throughput: 25.44788273615635 estimated_peak_memory_range: min: 3153920 max: 3153920 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 127 - job_id: jmg9v90m5 + job_id: jglvmy1l5 job_status: Passed torchscript_onnx: - inference_time: 42125.0 - throughput: 23.73887240356083 + inference_time: 42238.0 + throughput: 23.67536341682845 estimated_peak_memory_range: - min: 69451776 - max: 69451776 + min: 69459968 + max: 69459968 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 129 - job_id: joprk16k5 + job_id: jp14znvlp job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:43:54Z' + timestamp: '2024-10-15T00:47:21Z' diff --git a/qai_hub_models/models/fcn_resnet50_quantized/README.md b/qai_hub_models/models/fcn_resnet50_quantized/README.md index f2a318f8..94047c54 100644 --- a/qai_hub_models/models/fcn_resnet50_quantized/README.md +++ b/qai_hub_models/models/fcn_resnet50_quantized/README.md @@ -6,7 +6,7 @@ FCN_ResNet50 is a quantized machine learning model that can segment images from the COCO dataset. It uses ResNet50 as a backbone. This is based on the implementation of FCN-ResNet50-Quantized found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/segmentation/fcn.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/fcn_resnet50_quantized). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.fcn_resnet50_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of FCN-ResNet50-Quantized can be found +* The license for the original implementation of FCN-ResNet50-Quantized can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Fully Convolutional Networks for Semantic Segmentation](https://arxiv.org/abs/1411.4038) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/segmentation/fcn.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/fcn_resnet50_quantized/export.py b/qai_hub_models/models/fcn_resnet50_quantized/export.py index d81e827e..3a559df3 100644 --- a/qai_hub_models/models/fcn_resnet50_quantized/export.py +++ b/qai_hub_models/models/fcn_resnet50_quantized/export.py @@ -10,17 +10,17 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.fcn_resnet50_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.printing import ( print_inference_metrics, @@ -45,20 +45,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -80,10 +78,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "fcn_resnet50_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,7 +107,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( target_runtime, output_path, input_spec ) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -135,7 +133,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +148,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +169,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -198,7 +196,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/fcn_resnet50_quantized/perf.yaml b/qai_hub_models/models/fcn_resnet50_quantized/perf.yaml index fcb23e38..4df69318 100644 --- a/qai_hub_models/models/fcn_resnet50_quantized/perf.yaml +++ b/qai_hub_models/models/fcn_resnet50_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,41 +20,38 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS6490 (Proxy) - RB3 Gen 2 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Sa8775p Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs6490 Proxy + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: FCN-ResNet50-Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 12976.0 - throughput: 77.06535141800246 + inference_time: 12962.0 + throughput: 77.14858818083628 estimated_peak_memory_range: - min: 6324224 - max: 7662360 + min: 5533696 + max: 7846280 primary_compute_unit: NPU precision: int8 layer_info: @@ -61,29 +59,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 89 - job_id: jn5q896e5 + job_id: jg9lnz8qg job_status: Passed torchscript_onnx_qnn: - inference_time: 14822.0 - throughput: 67.46727836999055 + inference_time: 14804.0 + throughput: 67.54931099702783 estimated_peak_memory_range: - min: 36864 - max: 16490184 + min: 12288 + max: 138402352 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 128 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: jz5wome4p + total_layers: 128 + job_id: jp0z0md25 job_status: Passed torchscript_onnx: - inference_time: 25046.0 - throughput: 39.926535175277486 + inference_time: 22000.0 + throughput: 45.45454545454545 estimated_peak_memory_range: - min: 61440 - max: 43208608 + min: 0 + max: 43668568 primary_compute_unit: NPU precision: int8 layer_info: @@ -91,7 +89,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 144 - job_id: joprk1vk5 + job_id: jgz3dl8z5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -100,13 +98,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:43:06Z' + timestamp: '2024-10-15T00:46:25Z' - torchscript_onnx_tflite: - inference_time: 12479.0 - throughput: 80.1346261719689 + inference_time: 10618.0 + throughput: 94.17969485778866 estimated_peak_memory_range: - min: 32768 - max: 89827680 + min: 49152 + max: 94352608 primary_compute_unit: NPU precision: int8 layer_info: @@ -114,29 +112,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 89 - job_id: j1glnev2p + job_id: jp14zn3kp job_status: Passed torchscript_onnx_qnn: - inference_time: 10866.0 - throughput: 92.03018590097552 + inference_time: 10827.0 + throughput: 92.36168837166343 estimated_peak_memory_range: - min: 802816 - max: 34130528 + min: 819200 + max: 38062816 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 128 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: jmg9v9lm5 + total_layers: 128 + job_id: jp8qye6zp job_status: Passed torchscript_onnx: - inference_time: 18811.0 - throughput: 53.16038488118654 + inference_time: 16828.0 + throughput: 59.424768243403854 estimated_peak_memory_range: min: 12288 - max: 165767904 + max: 183845840 primary_compute_unit: NPU precision: int8 layer_info: @@ -144,7 +142,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 144 - job_id: jep283k6p + job_id: j5we6l8z5 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -153,13 +151,44 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:43:07Z' + timestamp: '2024-10-15T00:46:26Z' + - torchscript_onnx_qnn: + inference_time: 113680.0 + throughput: 8.796622097114708 + estimated_peak_memory_range: + min: 1269760 + max: 9004752 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 128 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 128 + job_id: jgjvnro7g + job_status: Passed + reference_device_info: + name: RB3 Gen 2 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS6490 Proxy + timestamp: '2024-10-15T00:46:23Z' + - reference_device_info: + name: RB5 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8250 Proxy + timestamp: '2024-10-15T00:46:11Z' - torchscript_onnx_tflite: - inference_time: 13041.0 - throughput: 76.68123610152595 + inference_time: 12985.0 + throughput: 77.01193685021178 estimated_peak_memory_range: - min: 5550080 - max: 320087936 + min: 5566464 + max: 9482200 primary_compute_unit: NPU precision: int8 layer_info: @@ -167,22 +196,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 89 - job_id: jw566qyn5 + job_id: jgdx1d0kp job_status: Passed torchscript_onnx_qnn: - inference_time: 12659.0 - throughput: 78.99518129394107 + inference_time: 13182.0 + throughput: 75.86102260658474 estimated_peak_memory_range: - min: 819200 - max: 2540176 + min: 847872 + max: 2106504 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 128 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: jvgdw7x65 + total_layers: 128 + job_id: j5q6qlz7p job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -190,14 +219,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:43:00Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:46:16Z' - torchscript_onnx_tflite: - inference_time: 15359.0 - throughput: 65.10840549514943 + inference_time: 13045.0 + throughput: 76.65772326561901 estimated_peak_memory_range: - min: 5558272 - max: 98656656 + min: 5525504 + max: 11347872 primary_compute_unit: NPU precision: int8 layer_info: @@ -205,37 +234,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 89 - job_id: j1p3kqjm5 + job_id: j5mnx04yp job_status: Passed torchscript_onnx_qnn: - inference_time: 17161.0 - throughput: 58.27166249053086 + inference_time: 13235.0 + throughput: 75.55723460521345 estimated_peak_memory_range: - min: 811008 - max: 36156400 + min: 819200 + max: 2086088 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 128 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: jo5mrvn7g + total_layers: 128 + job_id: jp3j0zxxg job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:43:04Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:46:20Z' - torchscript_onnx_tflite: - inference_time: 12926.0 - throughput: 77.36345350456445 + inference_time: 12924.0 + throughput: 77.37542556484061 estimated_peak_memory_range: - min: 5545984 - max: 18638280 + min: 5550080 + max: 6911416 primary_compute_unit: NPU precision: int8 layer_info: @@ -243,37 +272,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 89 - job_id: jwgoye215 + job_id: jpxkolmj5 job_status: Passed torchscript_onnx_qnn: - inference_time: 12796.0 - throughput: 78.14942169427947 + inference_time: 13286.0 + throughput: 75.2671985548698 estimated_peak_memory_range: - min: 823296 - max: 2186032 + min: 819200 + max: 2199288 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 128 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: jz57zvynp + total_layers: 128 + job_id: j56y48rvp job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:43:01Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:46:19Z' - torchscript_onnx_tflite: - inference_time: 12914.0 - throughput: 77.43534148985597 + inference_time: 13019.0 + throughput: 76.81081496274676 estimated_peak_memory_range: - min: 5545984 - max: 93051008 + min: 5521408 + max: 19730848 primary_compute_unit: NPU precision: int8 layer_info: @@ -281,37 +310,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 89 - job_id: j1pv3z6z5 + job_id: jp4lry8q5 job_status: Passed torchscript_onnx_qnn: - inference_time: 12828.0 - throughput: 77.95447458684129 + inference_time: 13214.0 + throughput: 75.67731194187982 estimated_peak_memory_range: - min: 847872 - max: 2227720 + min: 819200 + max: 2320624 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 128 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: jqp4qjl2g + total_layers: 128 + job_id: jglvmyoe5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:43:02Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:46:18Z' - torchscript_onnx_tflite: - inference_time: 12957.0 - throughput: 77.17835918808366 + inference_time: 15187.0 + throughput: 65.8457891617831 estimated_peak_memory_range: - min: 2605056 - max: 4449936 + min: 5627904 + max: 101214848 primary_compute_unit: NPU precision: int8 layer_info: @@ -319,75 +348,105 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 89 - job_id: j7gjxkv1p + job_id: j57yre6q5 job_status: Passed torchscript_onnx_qnn: - inference_time: 12273.0 - throughput: 81.47967082212988 + inference_time: 17021.0 + throughput: 58.750954703013925 estimated_peak_memory_range: - min: 880640 - max: 2430368 + min: 716800 + max: 37925024 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 128 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: j0pxvek8g + total_layers: 128 + job_id: jpv6kle75 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:43:03Z' - - torchscript_onnx_qnn: - inference_time: 117049.0 - throughput: 8.543430529094653 + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:46:22Z' + - torchscript_onnx_tflite: + inference_time: 9021.0 + throughput: 110.85245538188671 estimated_peak_memory_range: - min: 1277952 - max: 9028416 + min: 5517312 + max: 51723280 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 89 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: jegn2r6jg + total_layers: 89 + job_id: jpy13oqrp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 10772.0 + throughput: 92.8332714444857 + estimated_peak_memory_range: + min: 835584 + max: 34962752 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 128 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 128 + job_id: jpedm7875 + job_status: Passed + torchscript_onnx: + inference_time: 14952.0 + throughput: 66.88068485821294 + estimated_peak_memory_range: + min: 1753088 + max: 103152720 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 144 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 144 + job_id: j5we6l8j5 job_status: Passed reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T12:43:05Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:46:29Z' - torchscript_onnx_qnn: - inference_time: 12992.0 - throughput: 76.9704433497537 + inference_time: 13359.0 + throughput: 74.85590238790328 estimated_peak_memory_range: min: 794624 max: 794624 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 128 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: jnp10q4n5 + total_layers: 128 + job_id: jgkex2oyg job_status: Passed torchscript_onnx: - inference_time: 25865.0 - throughput: 38.662284941040014 + inference_time: 21562.0 + throughput: 46.37788702346721 estimated_peak_memory_range: - min: 35233792 - max: 35233792 + min: 36794368 + max: 36794368 primary_compute_unit: NPU precision: int8 layer_info: @@ -395,7 +454,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 144 - job_id: jqpyev10g + job_id: jp14zn7kp job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -404,4 +463,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:43:08Z' + timestamp: '2024-10-15T00:46:27Z' diff --git a/qai_hub_models/models/ffnet_122ns_lowres/README.md b/qai_hub_models/models/ffnet_122ns_lowres/README.md index 4bf440e2..2b69b08a 100644 --- a/qai_hub_models/models/ffnet_122ns_lowres/README.md +++ b/qai_hub_models/models/ffnet_122ns_lowres/README.md @@ -6,7 +6,7 @@ FFNet-122NS-LowRes is a "fuss-free network" that segments street scene images with per-pixel classes like road, sidewalk, and pedestrian. Trained on the Cityscapes dataset. This is based on the implementation of FFNet-122NS-LowRes found -[here](https://github.com/Qualcomm-AI-research/FFNet). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/ffnet_122ns_lowres). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.ffnet_122ns_lowres.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of FFNet-122NS-LowRes can be found +* The license for the original implementation of FFNet-122NS-LowRes can be found [here](https://github.com/Qualcomm-AI-research/FFNet/blob/master/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Simple and Efficient Architectures for Semantic Segmentation](https://arxiv.org/abs/2206.08236) * [Source Model Implementation](https://github.com/Qualcomm-AI-research/FFNet) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/ffnet_122ns_lowres/export.py b/qai_hub_models/models/ffnet_122ns_lowres/export.py index 4d467cd3..2a304b3f 100644 --- a/qai_hub_models/models/ffnet_122ns_lowres/export.py +++ b/qai_hub_models/models/ffnet_122ns_lowres/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.ffnet_122ns_lowres import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "ffnet_122ns_lowres" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/ffnet_122ns_lowres/perf.yaml b/qai_hub_models/models/ffnet_122ns_lowres/perf.yaml index ec86f95c..29d342b0 100644 --- a/qai_hub_models/models/ffnet_122ns_lowres/perf.yaml +++ b/qai_hub_models/models/ffnet_122ns_lowres/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: FFNet-122NS-LowRes performance_metrics: - torchscript_onnx_tflite: - inference_time: 7331.0 - throughput: 136.40703860319192 + inference_time: 7435.0 + throughput: 134.49899125756556 estimated_peak_memory_range: - min: 667648 - max: 2865608 + min: 647168 + max: 2559144 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 216 - job_id: jw566q4n5 + job_id: jgo26le4p job_status: Passed torchscript_onnx_qnn: - inference_time: 7187.0 - throughput: 139.14011409489356 + inference_time: 7220.0 + throughput: 138.50415512465375 estimated_peak_memory_range: min: 6307840 - max: 33284944 + max: 37875312 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 348 - job_id: jz5wom64p + job_id: j57yrevq5 job_status: Passed torchscript_onnx: - inference_time: 7783.0 - throughput: 128.48515996402415 + inference_time: 7563.0 + throughput: 132.22266296443212 estimated_peak_memory_range: - min: 6324224 - max: 9185048 + min: 6316032 + max: 68449600 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 350 - job_id: jegn2rvjg + job_id: jgkex2dyg job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:41:22Z' + timestamp: '2024-10-15T00:44:36Z' - torchscript_onnx_tflite: - inference_time: 5667.0 - throughput: 176.4602082230457 + inference_time: 6103.0 + throughput: 163.85384237260365 estimated_peak_memory_range: - min: 659456 - max: 67127136 + min: 667648 + max: 71345056 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 216 - job_id: j1p3kq0m5 + job_id: jpv6klz75 job_status: Passed torchscript_onnx_qnn: - inference_time: 5656.0 - throughput: 176.8033946251768 + inference_time: 5973.0 + throughput: 167.42005692281936 estimated_peak_memory_range: min: 6307840 - max: 29405056 + max: 31171600 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 348 - job_id: jmg9v9nm5 + job_id: jp4lryjq5 job_status: Passed torchscript_onnx: - inference_time: 8050.0 - throughput: 124.22360248447205 + inference_time: 6501.0 + throughput: 153.82248884786955 estimated_peak_memory_range: - min: 7589888 - max: 92935840 + min: 999424 + max: 95885696 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 350 - job_id: joprk13k5 + job_id: j5q6qlw7p job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:41:23Z' + timestamp: '2024-10-15T00:44:37Z' - torchscript_onnx_tflite: - inference_time: 7260.0 - throughput: 137.7410468319559 + inference_time: 7253.0 + throughput: 137.87398317937405 estimated_peak_memory_range: - min: 651264 - max: 2269632 + min: 647168 + max: 2878104 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 216 - job_id: jwgoye615 + job_id: jgjvnrk7g job_status: Passed torchscript_onnx_qnn: - inference_time: 6788.0 - throughput: 147.3187978786093 + inference_time: 6724.0 + throughput: 148.720999405116 estimated_peak_memory_range: - min: 6328320 - max: 7716224 + min: 6365184 + max: 7488816 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 348 - job_id: jvgdw7165 + job_id: j5mnx0vyp job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:41:17Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:44:28Z' - torchscript_onnx_tflite: - inference_time: 10766.0 - throughput: 92.88500835965075 + inference_time: 7262.0 + throughput: 137.70311209033324 estimated_peak_memory_range: - min: 638976 - max: 61768912 + min: 671744 + max: 2912336 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 216 - job_id: j1pv3zkz5 + job_id: jg9lnz9qg job_status: Passed torchscript_onnx_qnn: - inference_time: 11175.0 - throughput: 89.48545861297539 + inference_time: 6725.0 + throughput: 148.6988847583643 estimated_peak_memory_range: - min: 6422528 - max: 27181728 + min: 6340608 + max: 8135168 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 348 - job_id: jo5mrvx7g + job_id: jp2kyrjxp job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:41:21Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:44:31Z' - torchscript_onnx_tflite: - inference_time: 7252.0 - throughput: 137.89299503585218 + inference_time: 7274.0 + throughput: 137.47594171020071 estimated_peak_memory_range: - min: 0 - max: 5974544 + min: 647168 + max: 2696992 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 216 - job_id: j7gjxkn1p + job_id: j5we6lmz5 job_status: Passed torchscript_onnx_qnn: - inference_time: 6911.0 - throughput: 144.6968600781363 + inference_time: 6701.0 + throughput: 149.23145799134457 estimated_peak_memory_range: - min: 6336512 - max: 7479032 + min: 6340608 + max: 7749680 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 348 - job_id: jz57zvrnp + job_id: jprv3l9vg job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:41:18Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:44:30Z' - torchscript_onnx_tflite: - inference_time: 7373.0 - throughput: 135.63000135630003 + inference_time: 7383.0 + throughput: 135.4462955438169 estimated_peak_memory_range: - min: 647168 - max: 9586848 + min: 638976 + max: 2937272 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 216 - job_id: jlpe94m8g + job_id: jgz3dlvz5 job_status: Passed torchscript_onnx_qnn: - inference_time: 6887.0 - throughput: 145.20110352838682 + inference_time: 6722.0 + throughput: 148.76524843796489 estimated_peak_memory_range: - min: 6365184 - max: 7608704 + min: 6373376 + max: 7866616 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 348 - job_id: jqp4qjr2g + job_id: jgn6vzxv5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:41:19Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:44:29Z' - torchscript_onnx_tflite: - inference_time: 7255.0 - throughput: 137.83597518952448 + inference_time: 10742.0 + throughput: 93.0925339787749 estimated_peak_memory_range: - min: 675840 - max: 2804416 + min: 638976 + max: 63506224 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 216 - job_id: jygzevd4g + job_id: jpedm7475 job_status: Passed torchscript_onnx_qnn: - inference_time: 6817.0 - throughput: 146.69209329617135 + inference_time: 11012.0 + throughput: 90.81002542680712 estimated_peak_memory_range: - min: 6332416 - max: 7559904 + min: 1155072 + max: 23134992 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 348 - job_id: j0pxveo8g + job_id: jp0z0mk25 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:44:34Z' + - torchscript_onnx_tflite: + inference_time: 4901.0 + throughput: 204.03999183840034 + estimated_peak_memory_range: + min: 622592 + max: 30095232 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 216 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 216 + job_id: jgdx1d7kp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 5054.0 + throughput: 197.86307874950535 + estimated_peak_memory_range: + min: 6291456 + max: 28683552 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 348 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 348 + job_id: jp8qye8zp + job_status: Passed + torchscript_onnx: + inference_time: 5389.0 + throughput: 185.56318426424198 + estimated_peak_memory_range: + min: 7593984 + max: 53147632 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 350 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 350 + job_id: jp3j0z8xg + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:41:20Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:44:40Z' - torchscript_onnx_qnn: - inference_time: 7181.0 - throughput: 139.25637097897229 + inference_time: 7133.0 + throughput: 140.19346698443852 estimated_peak_memory_range: min: 6303744 max: 6303744 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 348 - job_id: jnp10qzn5 + job_id: jpxkolej5 job_status: Passed torchscript_onnx: - inference_time: 7591.0 - throughput: 131.73494928204454 + inference_time: 7697.0 + throughput: 129.92074834351047 estimated_peak_memory_range: - min: 60002304 - max: 60002304 + min: 61521920 + max: 61521920 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 350 - job_id: jep283y6p + job_id: jglvmy7e5 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:41:24Z' + timestamp: '2024-10-15T00:44:38Z' diff --git a/qai_hub_models/models/ffnet_40s/README.md b/qai_hub_models/models/ffnet_40s/README.md index f9ee034a..d2b952c7 100644 --- a/qai_hub_models/models/ffnet_40s/README.md +++ b/qai_hub_models/models/ffnet_40s/README.md @@ -6,7 +6,7 @@ FFNet-40S is a "fuss-free network" that segments street scene images with per-pixel classes like road, sidewalk, and pedestrian. Trained on the Cityscapes dataset. This is based on the implementation of FFNet-40S found -[here](https://github.com/Qualcomm-AI-research/FFNet). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/ffnet_40s). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.ffnet_40s.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of FFNet-40S can be found +* The license for the original implementation of FFNet-40S can be found [here](https://github.com/Qualcomm-AI-research/FFNet/blob/master/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Simple and Efficient Architectures for Semantic Segmentation](https://arxiv.org/abs/2206.08236) * [Source Model Implementation](https://github.com/Qualcomm-AI-research/FFNet) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/ffnet_40s/export.py b/qai_hub_models/models/ffnet_40s/export.py index a6e519b6..530eb87c 100644 --- a/qai_hub_models/models/ffnet_40s/export.py +++ b/qai_hub_models/models/ffnet_40s/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.ffnet_40s import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "ffnet_40s" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/ffnet_40s/perf.yaml b/qai_hub_models/models/ffnet_40s/perf.yaml index f3c5f185..c186f2c9 100644 --- a/qai_hub_models/models/ffnet_40s/perf.yaml +++ b/qai_hub_models/models/ffnet_40s/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: FFNet-40S performance_metrics: - torchscript_onnx_tflite: - inference_time: 17077.0 - throughput: 58.55829478245594 + inference_time: 17007.0 + throughput: 58.799317927912035 estimated_peak_memory_range: - min: 2621440 - max: 4673240 + min: 2519040 + max: 4726640 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 92 - job_id: jygzexkxg + job_id: j5q6ql77p job_status: Passed torchscript_onnx_qnn: - inference_time: 17566.0 - throughput: 56.928156666287144 + inference_time: 17621.0 + throughput: 56.75046819136258 estimated_peak_memory_range: - min: 27770880 - max: 42248864 + min: 26689536 + max: 51292232 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 140 - job_id: jnp10dxn5 + job_id: j5we6ldz5 job_status: Passed torchscript_onnx: - inference_time: 27096.0 - throughput: 36.90581635665781 + inference_time: 24964.0 + throughput: 40.0576830636116 estimated_peak_memory_range: - min: 0 - max: 31581472 + min: 27320320 + max: 29771632 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 142 - job_id: jep287n6p + job_id: jp2kyr3xp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:40:32Z' + timestamp: '2024-10-15T00:43:39Z' - torchscript_onnx_tflite: - inference_time: 14933.0 - throughput: 66.96578048617157 + inference_time: 14889.0 + throughput: 67.16367788300087 estimated_peak_memory_range: - min: 2527232 - max: 97092528 + min: 1675264 + max: 107263440 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 92 - job_id: jz5wodnmp + job_id: jglvmy0e5 job_status: Passed torchscript_onnx_qnn: - inference_time: 15207.0 - throughput: 65.75918984678108 + inference_time: 15114.0 + throughput: 66.16382162233691 estimated_peak_memory_range: - min: 25206784 - max: 55393184 + min: 25198592 + max: 60290624 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 140 - job_id: jvgdwrl65 + job_id: jg9lnz3qg job_status: Passed torchscript_onnx: - inference_time: 22259.0 - throughput: 44.92564805247316 + inference_time: 22020.0 + throughput: 45.41326067211626 estimated_peak_memory_range: - min: 28938240 - max: 143201440 + min: 28663808 + max: 158544848 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 142 - job_id: jqpye400g + job_id: jpy13ovrp job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:40:33Z' + timestamp: '2024-10-15T00:43:40Z' - torchscript_onnx_tflite: - inference_time: 16980.0 - throughput: 58.89281507656066 + inference_time: 16785.0 + throughput: 59.577003276735184 estimated_peak_memory_range: - min: 2527232 - max: 4745208 + min: 2539520 + max: 4771248 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 92 - job_id: jmg9v3e85 + job_id: j56y483vp job_status: Passed torchscript_onnx_qnn: - inference_time: 17375.0 - throughput: 57.55395683453237 + inference_time: 16327.0 + throughput: 61.2482391131255 estimated_peak_memory_range: - min: 25239552 - max: 26556672 + min: 25235456 + max: 26408256 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 140 - job_id: jqp4qx02g + job_id: jgdx1drkp job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:40:27Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:43:32Z' - torchscript_onnx_tflite: - inference_time: 27754.0 - throughput: 36.030842401095335 + inference_time: 16799.0 + throughput: 59.52735281862016 estimated_peak_memory_range: - min: 2547712 - max: 88737440 + min: 2531328 + max: 4632704 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 92 - job_id: jnp10dx75 + job_id: jgjvnr07g job_status: Passed torchscript_onnx_qnn: - inference_time: 28663.0 - throughput: 34.888183372291806 + inference_time: 16511.0 + throughput: 60.56568348373811 estimated_peak_memory_range: - min: 25202688 - max: 53468224 + min: 25264128 + max: 26509960 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 140 - job_id: joprk4jk5 + job_id: jpxkol7j5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:40:31Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:43:35Z' - torchscript_onnx_tflite: - inference_time: 16954.0 - throughput: 58.983130824584165 + inference_time: 16774.0 + throughput: 59.61607249314415 estimated_peak_memory_range: - min: 2535424 - max: 4722736 + min: 2531328 + max: 4739104 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 92 - job_id: jvgdwrlz5 + job_id: jpv6kl175 job_status: Passed torchscript_onnx_qnn: - inference_time: 17597.0 - throughput: 56.82786838665682 + inference_time: 16850.0 + throughput: 59.347181008902076 estimated_peak_memory_range: - min: 25264128 - max: 26983544 + min: 25284608 + max: 26491568 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 140 - job_id: j0pxv728g + job_id: jp4lryxq5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:40:28Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:43:34Z' - torchscript_onnx_tflite: - inference_time: 16989.0 - throughput: 58.86161633998469 + inference_time: 16854.0 + throughput: 59.33309600094933 estimated_peak_memory_range: - min: 2519040 - max: 4551392 + min: 2527232 + max: 4802984 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 92 - job_id: jz5wodn4p + job_id: jgo26l14p job_status: Passed torchscript_onnx_qnn: - inference_time: 17515.0 - throughput: 57.09391949757351 + inference_time: 16816.0 + throughput: 59.467174119885826 estimated_peak_memory_range: - min: 25268224 - max: 26538128 + min: 25210880 + max: 26464240 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 140 - job_id: jo5mrwy7g + job_id: j57yrejq5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:40:29Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:43:33Z' - torchscript_onnx_tflite: - inference_time: 16916.0 - throughput: 59.11563017261764 + inference_time: 27915.0 + throughput: 35.82303421099767 estimated_peak_memory_range: - min: 2539520 - max: 4768152 + min: 2555904 + max: 97240432 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 92 - job_id: jmg9v3em5 + job_id: jp3j0z4xg job_status: Passed torchscript_onnx_qnn: - inference_time: 17492.0 - throughput: 57.16899153898925 + inference_time: 28352.0 + throughput: 35.270880361173816 estimated_peak_memory_range: - min: 25264128 - max: 26549120 + min: 23126016 + max: 57746960 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,22 +329,75 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 140 - job_id: jegn298jg + job_id: jgn6vzrv5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:43:37Z' + - torchscript_onnx_tflite: + inference_time: 11794.0 + throughput: 84.78887569950822 + estimated_peak_memory_range: + min: 872448 + max: 45721472 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 92 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 92 + job_id: jgz3dlxz5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 12082.0 + throughput: 82.76775368316504 + estimated_peak_memory_range: + min: 25178112 + max: 58235520 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 140 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 140 + job_id: jprv3l1vg + job_status: Passed + torchscript_onnx: + inference_time: 15466.0 + throughput: 64.6579593948015 + estimated_peak_memory_range: + min: 33693696 + max: 87935584 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 142 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 142 + job_id: jgkex2ryg + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:40:30Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:43:43Z' - torchscript_onnx_qnn: - inference_time: 17860.0 - throughput: 55.99104143337066 + inference_time: 16542.0 + throughput: 60.45218232378189 estimated_peak_memory_range: - min: 25223168 - max: 25223168 + min: 25219072 + max: 25219072 primary_compute_unit: NPU precision: fp16 layer_info: @@ -354,11 +405,11 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 140 - job_id: jz57zj3np + job_id: jp14zndkp job_status: Passed torchscript_onnx: - inference_time: 26179.0 - throughput: 38.19855609457962 + inference_time: 30278.0 + throughput: 33.027280533720855 estimated_peak_memory_range: min: 25223168 max: 25223168 @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 142 - job_id: j2p0ye00g + job_id: jp0z0me25 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:40:34Z' + timestamp: '2024-10-15T00:43:41Z' diff --git a/qai_hub_models/models/ffnet_40s_quantized/README.md b/qai_hub_models/models/ffnet_40s_quantized/README.md index 5237d76f..1772b22d 100644 --- a/qai_hub_models/models/ffnet_40s_quantized/README.md +++ b/qai_hub_models/models/ffnet_40s_quantized/README.md @@ -6,7 +6,7 @@ FFNet-40S-Quantized is a "fuss-free network" that segments street scene images with per-pixel classes like road, sidewalk, and pedestrian. Trained on the Cityscapes dataset. This is based on the implementation of FFNet-40S-Quantized found -[here](https://github.com/Qualcomm-AI-research/FFNet). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/ffnet_40s_quantized). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.ffnet_40s_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of FFNet-40S-Quantized can be found +* The license for the original implementation of FFNet-40S-Quantized can be found [here](https://github.com/Qualcomm-AI-research/FFNet/blob/master/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Simple and Efficient Architectures for Semantic Segmentation](https://arxiv.org/abs/2206.08236) * [Source Model Implementation](https://github.com/Qualcomm-AI-research/FFNet) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/ffnet_40s_quantized/export.py b/qai_hub_models/models/ffnet_40s_quantized/export.py index 2f335230..0f4c39fd 100644 --- a/qai_hub_models/models/ffnet_40s_quantized/export.py +++ b/qai_hub_models/models/ffnet_40s_quantized/export.py @@ -10,17 +10,17 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.ffnet_40s_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.printing import ( print_inference_metrics, @@ -45,20 +45,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -80,10 +78,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "ffnet_40s_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,7 +107,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( target_runtime, output_path, input_spec ) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -135,7 +133,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +148,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +169,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -198,7 +196,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/ffnet_40s_quantized/perf.yaml b/qai_hub_models/models/ffnet_40s_quantized/perf.yaml index b7fc36e3..779f9d20 100644 --- a/qai_hub_models/models/ffnet_40s_quantized/perf.yaml +++ b/qai_hub_models/models/ffnet_40s_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,44 +20,41 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS6490 (Proxy) - RB3 Gen 2 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Sa8775p Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Qcs6490 Proxy + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: FFNet-40S-Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 4110.0 - throughput: 243.30900243309003 + inference_time: 4177.0 + throughput: 239.40627244433804 estimated_peak_memory_range: - min: 655360 - max: 17844600 + min: 675840 + max: 3087992 primary_compute_unit: NPU precision: int8 layer_info: @@ -64,14 +62,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 99 - job_id: jygzex9xg + job_id: jp8qye7qp job_status: Passed torchscript_onnx: - inference_time: 9631.0 - throughput: 103.83137784238397 + inference_time: 8966.0 + throughput: 111.5324559446799 estimated_peak_memory_range: - min: 7917568 - max: 17182824 + min: 110592 + max: 11835736 primary_compute_unit: NPU precision: int8 layer_info: @@ -79,7 +77,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 168 - job_id: jw5663ly5 + job_id: j5mnx0zyp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -88,13 +86,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:39:47Z' + timestamp: '2024-10-15T00:42:47Z' - torchscript_onnx_tflite: - inference_time: 2910.0 - throughput: 343.64261168384877 + inference_time: 2927.0 + throughput: 341.646737273659 estimated_peak_memory_range: - min: 659456 - max: 65961408 + min: 655360 + max: 66216384 primary_compute_unit: NPU precision: int8 layer_info: @@ -102,7 +100,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 99 - job_id: jz5wodvmp + job_id: jgkex2yvg + job_status: Passed + torchscript_onnx: + inference_time: 6409.0 + throughput: 156.03058199407084 + estimated_peak_memory_range: + min: 4567040 + max: 112039248 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 168 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 168 + job_id: jgn6vz9v5 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -111,13 +124,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:39:30Z' + timestamp: '2024-10-15T00:42:49Z' - torchscript_onnx_tflite: - inference_time: 4064.0 - throughput: 246.06299212598427 + inference_time: 27414.0 + throughput: 36.47771211789597 estimated_peak_memory_range: - min: 638976 - max: 12263072 + min: 3170304 + max: 43533152 primary_compute_unit: NPU precision: int8 layer_info: @@ -125,22 +138,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 99 - job_id: jmg9v3185 + job_id: jgjvnrl1g job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:39:31Z' + chipset: QCS6490 Proxy + timestamp: '2024-10-15T00:42:32Z' - torchscript_onnx_tflite: - inference_time: 5114.0 - throughput: 195.54165037152913 + inference_time: 189840.0 + throughput: 5.267593763168985 estimated_peak_memory_range: - min: 704512 - max: 66771808 + min: 929792 + max: 14978752 primary_compute_unit: NPU precision: int8 layer_info: @@ -148,22 +161,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 99 - job_id: jnp10dl75 + job_id: jpedm7v85 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: RB5 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:39:32Z' + chipset: QCS8250 Proxy + timestamp: '2024-10-15T00:42:33Z' - torchscript_onnx_tflite: - inference_time: 4097.0 - throughput: 244.081034903588 + inference_time: 4061.0 + throughput: 246.2447672986949 estimated_peak_memory_range: - min: 647168 - max: 2774536 + min: 684032 + max: 1949272 primary_compute_unit: NPU precision: int8 layer_info: @@ -171,22 +184,45 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 99 - job_id: jvgdwr9z5 + job_id: j5q6ql2ep job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:42:26Z' + - torchscript_onnx_tflite: + inference_time: 4154.0 + throughput: 240.73182474723157 + estimated_peak_memory_range: + min: 638976 + max: 8262920 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 99 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 99 + job_id: jgo26lv1p + job_status: Passed + reference_device_info: + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:39:33Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:42:30Z' - torchscript_onnx_tflite: - inference_time: 4159.0 - throughput: 240.44241404183697 + inference_time: 4147.0 + throughput: 241.13817217265492 estimated_peak_memory_range: - min: 663552 - max: 5066760 + min: 659456 + max: 2704744 primary_compute_unit: NPU precision: int8 layer_info: @@ -194,7 +230,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 99 - job_id: jz57zjw9p + job_id: jp3j0zmmg job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -202,14 +238,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:39:34Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:42:29Z' - torchscript_onnx_tflite: - inference_time: 4080.0 - throughput: 245.09803921568627 + inference_time: 4123.0 + throughput: 242.5418384671356 estimated_peak_memory_range: - min: 651264 - max: 6841376 + min: 16384 + max: 194057624 primary_compute_unit: NPU precision: int8 layer_info: @@ -217,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 99 - job_id: jqp4qxo1g + job_id: j56y481np job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:39:35Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:42:28Z' - torchscript_onnx_tflite: - inference_time: 26480.0 - throughput: 37.764350453172206 + inference_time: 5144.0 + throughput: 194.4012441679627 estimated_peak_memory_range: - min: 1085440 - max: 43239040 + min: 0 + max: 70203216 primary_compute_unit: NPU precision: int8 layer_info: @@ -240,22 +276,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 99 - job_id: j0pxv7jlg + job_id: jglvmyk25 job_status: Passed reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T12:39:35Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:42:27Z' - torchscript_onnx_tflite: - inference_time: 187748.0 - throughput: 5.326288429171017 + inference_time: 2516.0 + throughput: 397.456279809221 estimated_peak_memory_range: - min: 716800 - max: 8401296 + min: 651264 + max: 32985312 primary_compute_unit: NPU precision: int8 layer_info: @@ -263,22 +299,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 99 - job_id: jo5mrw29g + job_id: jgz3dl745 + job_status: Passed + torchscript_onnx: + inference_time: 5339.0 + throughput: 187.30099269526127 + estimated_peak_memory_range: + min: 0 + max: 53719008 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 168 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 168 + job_id: jpy13o4rp job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T12:39:37Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:42:52Z' - torchscript_onnx: - inference_time: 9749.0 - throughput: 102.57462303826034 + inference_time: 9156.0 + throughput: 109.217999126256 estimated_peak_memory_range: - min: 9482240 - max: 9482240 + min: 10833920 + max: 10833920 primary_compute_unit: NPU precision: int8 layer_info: @@ -286,7 +337,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 168 - job_id: jwgoy1qk5 + job_id: jprv3l4vg job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -295,4 +346,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:39:48Z' + timestamp: '2024-10-15T00:42:50Z' diff --git a/qai_hub_models/models/ffnet_54s/README.md b/qai_hub_models/models/ffnet_54s/README.md index 90096232..a652acc3 100644 --- a/qai_hub_models/models/ffnet_54s/README.md +++ b/qai_hub_models/models/ffnet_54s/README.md @@ -6,7 +6,7 @@ FFNet-54S is a "fuss-free network" that segments street scene images with per-pixel classes like road, sidewalk, and pedestrian. Trained on the Cityscapes dataset. This is based on the implementation of FFNet-54S found -[here](https://github.com/Qualcomm-AI-research/FFNet). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/ffnet_54s). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.ffnet_54s.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of FFNet-54S can be found +* The license for the original implementation of FFNet-54S can be found [here](https://github.com/Qualcomm-AI-research/FFNet/blob/master/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Simple and Efficient Architectures for Semantic Segmentation](https://arxiv.org/abs/2206.08236) * [Source Model Implementation](https://github.com/Qualcomm-AI-research/FFNet) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/ffnet_54s/export.py b/qai_hub_models/models/ffnet_54s/export.py index d199dbaa..0389e2ba 100644 --- a/qai_hub_models/models/ffnet_54s/export.py +++ b/qai_hub_models/models/ffnet_54s/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.ffnet_54s import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "ffnet_54s" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/ffnet_54s/perf.yaml b/qai_hub_models/models/ffnet_54s/perf.yaml index 257544e0..f4d12321 100644 --- a/qai_hub_models/models/ffnet_54s/perf.yaml +++ b/qai_hub_models/models/ffnet_54s/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: FFNet-54S performance_metrics: - torchscript_onnx_tflite: - inference_time: 19620.0 - throughput: 50.9683995922528 + inference_time: 19975.0 + throughput: 50.06257822277847 estimated_peak_memory_range: - min: 2158592 - max: 4310984 + min: 2146304 + max: 4449216 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 113 - job_id: jmg9v3v85 + job_id: jprv3l2kg job_status: Passed torchscript_onnx_qnn: - inference_time: 20290.0 - throughput: 49.28536224741252 + inference_time: 20164.0 + throughput: 49.59333465582226 estimated_peak_memory_range: - min: 25223168 - max: 50812504 + min: 25219072 + max: 47290376 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 175 - job_id: jegn292qg + job_id: jp3j0zemg job_status: Passed torchscript_onnx: - inference_time: 29790.0 - throughput: 33.56831151393085 + inference_time: 28053.0 + throughput: 35.64681139272092 estimated_peak_memory_range: - min: 25997312 - max: 41080456 + min: 25911296 + max: 28680032 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 177 - job_id: j1gln0zmp + job_id: j57yrexn5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:38:42Z' + timestamp: '2024-10-15T00:41:35Z' - torchscript_onnx_tflite: - inference_time: 17535.0 - throughput: 57.0287995437696 + inference_time: 17729.0 + throughput: 56.40476056179141 estimated_peak_memory_range: - min: 2523136 - max: 107213376 + min: 2535424 + max: 120832592 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 113 - job_id: jnp10d075 + job_id: jp2kyr96p job_status: Passed torchscript_onnx_qnn: - inference_time: 17684.0 - throughput: 56.5482922415743 + inference_time: 17811.0 + throughput: 56.14507888383583 estimated_peak_memory_range: min: 21004288 - max: 52449872 + max: 56553616 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 175 - job_id: joprk4k75 + job_id: jgo26l31p job_status: Passed torchscript_onnx: - inference_time: 25465.0 - throughput: 39.2695857058708 + inference_time: 24675.0 + throughput: 40.52684903748734 estimated_peak_memory_range: - min: 405504 - max: 122913152 + min: 589824 + max: 142903376 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 177 - job_id: jw5663jy5 + job_id: jp4lryv25 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:38:43Z' + timestamp: '2024-10-15T00:41:36Z' - torchscript_onnx_tflite: - inference_time: 19638.0 - throughput: 50.921682452388225 + inference_time: 19858.0 + throughput: 50.35753852351697 estimated_peak_memory_range: - min: 262144 - max: 14650120 + min: 2547712 + max: 7829712 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 113 - job_id: jvgdwrwz5 + job_id: jpy13oj0p job_status: Passed torchscript_onnx_qnn: - inference_time: 19843.0 - throughput: 50.39560550320012 + inference_time: 18980.0 + throughput: 52.68703898840885 estimated_peak_memory_range: - min: 25231360 - max: 26398024 + min: 25247744 + max: 26457440 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 175 - job_id: jqpye4elg + job_id: jgjvnre1g job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:38:37Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:41:27Z' - torchscript_onnx_tflite: - inference_time: 31911.0 - throughput: 31.337156466422236 + inference_time: 19960.0 + throughput: 50.100200400801604 estimated_peak_memory_range: - min: 2580480 - max: 95283264 + min: 2543616 + max: 4964736 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 113 - job_id: jz57zjz9p + job_id: j5q6ql3ep job_status: Passed torchscript_onnx_qnn: - inference_time: 36419.0 - throughput: 27.458194898267386 + inference_time: 19252.0 + throughput: 51.94265530853937 estimated_peak_memory_range: - min: 25182208 - max: 53629712 + min: 25264128 + max: 26769104 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 175 - job_id: jn5q878o5 + job_id: j5we6lq45 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:38:41Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:41:31Z' - torchscript_onnx_tflite: - inference_time: 19639.0 - throughput: 50.91908956667855 + inference_time: 19841.0 + throughput: 50.40068544932211 estimated_peak_memory_range: - min: 2461696 - max: 4436168 + min: 2539520 + max: 4601856 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 113 - job_id: jqp4qxq1g + job_id: jgkex23vg job_status: Passed torchscript_onnx_qnn: - inference_time: 20167.0 - throughput: 49.58595725690484 + inference_time: 19187.0 + throughput: 52.11862198363475 estimated_peak_memory_range: - min: 25223168 - max: 26881960 + min: 25264128 + max: 26467192 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 175 - job_id: j2p0y1yng + job_id: jgz3dlr45 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:38:38Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:41:30Z' - torchscript_onnx_tflite: - inference_time: 19617.0 - throughput: 50.9761941173472 + inference_time: 20016.0 + throughput: 49.96003197442047 estimated_peak_memory_range: - min: 2461696 - max: 4670472 + min: 2551808 + max: 4598328 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 113 - job_id: j0pxv7vlg + job_id: jp8qyezqp job_status: Passed torchscript_onnx_qnn: - inference_time: 19890.0 - throughput: 50.27652086475616 + inference_time: 19257.0 + throughput: 51.92916861401049 estimated_peak_memory_range: - min: 25251840 - max: 26492392 + min: 25276416 + max: 26660264 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 175 - job_id: j1p8o3oog + job_id: jpedm7k85 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:38:39Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:41:28Z' - torchscript_onnx_tflite: - inference_time: 19580.0 - throughput: 51.07252298263534 + inference_time: 32162.0 + throughput: 31.092593744170138 estimated_peak_memory_range: - min: 2535424 - max: 4331176 + min: 2560000 + max: 104041296 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 113 - job_id: jo5mrwr9g + job_id: jp0z0ml05 job_status: Passed torchscript_onnx_qnn: - inference_time: 20061.0 - throughput: 49.84796371068242 + inference_time: 32442.0 + throughput: 30.824240182479503 estimated_peak_memory_range: - min: 25247744 - max: 26485640 + min: 25153536 + max: 58919248 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 175 - job_id: jogkzlzng + job_id: jp14znenp job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:41:33Z' + - torchscript_onnx_tflite: + inference_time: 14186.0 + throughput: 70.49203440011279 + estimated_peak_memory_range: + min: 454656 + max: 48936256 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 113 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 113 + job_id: j56y48nnp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 11828.0 + throughput: 84.54514710855597 + estimated_peak_memory_range: + min: 25202688 + max: 63362416 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 175 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 175 + job_id: jgdx1do6p + job_status: Passed + torchscript_onnx: + inference_time: 22041.0 + throughput: 45.36999228710131 + estimated_peak_memory_range: + min: 31764480 + max: 84906384 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 177 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 177 + job_id: jgn6vz3j5 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:38:40Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:41:39Z' - torchscript_onnx_qnn: - inference_time: 20259.0 - throughput: 49.36077792586011 + inference_time: 19271.0 + throughput: 51.89144310103264 estimated_peak_memory_range: min: 25223168 max: 25223168 @@ -354,11 +405,11 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 175 - job_id: jep2878qp + job_id: jpv6klvz5 job_status: Passed torchscript_onnx: - inference_time: 29226.0 - throughput: 34.21610894409088 + inference_time: 32787.0 + throughput: 30.499893250373624 estimated_peak_memory_range: min: 25223168 max: 25223168 @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 177 - job_id: j1p3k43n5 + job_id: jpxkoly85 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:38:44Z' + timestamp: '2024-10-15T00:41:37Z' diff --git a/qai_hub_models/models/ffnet_54s_quantized/README.md b/qai_hub_models/models/ffnet_54s_quantized/README.md index 1f6912e8..9306773a 100644 --- a/qai_hub_models/models/ffnet_54s_quantized/README.md +++ b/qai_hub_models/models/ffnet_54s_quantized/README.md @@ -6,7 +6,7 @@ FFNet-54S-Quantized is a "fuss-free network" that segments street scene images with per-pixel classes like road, sidewalk, and pedestrian. Trained on the Cityscapes dataset. This is based on the implementation of FFNet-54S-Quantized found -[here](https://github.com/Qualcomm-AI-research/FFNet). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/ffnet_54s_quantized). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.ffnet_54s_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of FFNet-54S-Quantized can be found +* The license for the original implementation of FFNet-54S-Quantized can be found [here](https://github.com/Qualcomm-AI-research/FFNet/blob/master/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Simple and Efficient Architectures for Semantic Segmentation](https://arxiv.org/abs/2206.08236) * [Source Model Implementation](https://github.com/Qualcomm-AI-research/FFNet) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/ffnet_54s_quantized/export.py b/qai_hub_models/models/ffnet_54s_quantized/export.py index 83412cea..333e5f15 100644 --- a/qai_hub_models/models/ffnet_54s_quantized/export.py +++ b/qai_hub_models/models/ffnet_54s_quantized/export.py @@ -10,17 +10,17 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.ffnet_54s_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.printing import ( print_inference_metrics, @@ -45,20 +45,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -80,10 +78,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "ffnet_54s_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,7 +107,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( target_runtime, output_path, input_spec ) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -135,7 +133,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +148,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +169,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -198,7 +196,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/ffnet_54s_quantized/perf.yaml b/qai_hub_models/models/ffnet_54s_quantized/perf.yaml index cd23609b..6c399fe9 100644 --- a/qai_hub_models/models/ffnet_54s_quantized/perf.yaml +++ b/qai_hub_models/models/ffnet_54s_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,44 +20,41 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS6490 (Proxy) - RB3 Gen 2 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Sa8775p Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Qcs6490 Proxy + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: FFNet-54S-Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 4787.0 - throughput: 208.89910173386255 + inference_time: 4790.0 + throughput: 208.76826722338205 estimated_peak_memory_range: - min: 671744 - max: 2785320 + min: 638976 + max: 3180224 primary_compute_unit: NPU precision: int8 layer_info: @@ -64,14 +62,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 120 - job_id: jmg9v3685 + job_id: jgdx1de6p job_status: Passed torchscript_onnx: - inference_time: 11208.0 - throughput: 89.22198429693077 + inference_time: 11064.0 + throughput: 90.38322487346349 estimated_peak_memory_range: - min: 131072 - max: 15625080 + min: 32768 + max: 16535720 primary_compute_unit: NPU precision: int8 layer_info: @@ -79,7 +77,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 217 - job_id: jwgoy1yk5 + job_id: jg9lnzymg job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -88,13 +86,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:37:57Z' + timestamp: '2024-10-15T00:40:42Z' - torchscript_onnx_tflite: inference_time: 3364.0 throughput: 297.2651605231867 estimated_peak_memory_range: - min: 659456 - max: 73994144 + min: 434176 + max: 75581088 primary_compute_unit: NPU precision: int8 layer_info: @@ -102,14 +100,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 120 - job_id: jnp10dr75 + job_id: j57yre0n5 job_status: Passed torchscript_onnx: - inference_time: 8324.0 - throughput: 120.1345506967804 + inference_time: 7967.0 + throughput: 125.51776076314799 estimated_peak_memory_range: - min: 4820992 - max: 119541024 + min: 4702208 + max: 130219984 primary_compute_unit: NPU precision: int8 layer_info: @@ -117,7 +115,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 217 - job_id: j1pv313r5 + job_id: jp14znwnp job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -126,13 +124,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:37:58Z' + timestamp: '2024-10-15T00:40:43Z' - torchscript_onnx_tflite: - inference_time: 4704.0 - throughput: 212.58503401360545 + inference_time: 31917.0 + throughput: 31.331265469812326 estimated_peak_memory_range: - min: 659456 - max: 1998880 + min: 696320 + max: 46927552 primary_compute_unit: NPU precision: int8 layer_info: @@ -140,22 +138,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 120 - job_id: jvgdwrjz5 + job_id: jpy13or0p job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:37:41Z' + chipset: QCS6490 Proxy + timestamp: '2024-10-15T00:40:27Z' - torchscript_onnx_tflite: - inference_time: 5923.0 - throughput: 168.83336147222693 + inference_time: 201523.0 + throughput: 4.9622127499094395 estimated_peak_memory_range: - min: 675840 - max: 78116048 + min: 1114112 + max: 2992344 primary_compute_unit: NPU precision: int8 layer_info: @@ -163,22 +161,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 120 - job_id: jz57zjq9p + job_id: jp0z0m205 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: RB5 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:37:42Z' + chipset: QCS8250 Proxy + timestamp: '2024-10-15T00:40:29Z' - torchscript_onnx_tflite: - inference_time: 4761.0 - throughput: 210.03990758244066 + inference_time: 4692.0 + throughput: 213.12872975277068 estimated_peak_memory_range: - min: 638976 - max: 26838784 + min: 643072 + max: 13561432 primary_compute_unit: NPU precision: int8 layer_info: @@ -186,22 +184,45 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 120 - job_id: jqp4qxz1g + job_id: jp4lryk25 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:40:21Z' + - torchscript_onnx_tflite: + inference_time: 4711.0 + throughput: 212.26915729144557 + estimated_peak_memory_range: + min: 643072 + max: 2704944 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 120 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 120 + job_id: jprv3l8kg + job_status: Passed + reference_device_info: + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:37:43Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:40:25Z' - torchscript_onnx_tflite: - inference_time: 4859.0 - throughput: 205.80366330520684 + inference_time: 4773.0 + throughput: 209.51183741881417 estimated_peak_memory_range: - min: 692224 - max: 2767248 + min: 12288 + max: 18584840 primary_compute_unit: NPU precision: int8 layer_info: @@ -209,7 +230,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 120 - job_id: j0pxv7wlg + job_id: jgn6vzlj5 job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -217,14 +238,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:37:44Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:40:24Z' - torchscript_onnx_tflite: - inference_time: 4696.0 - throughput: 212.94718909710392 + inference_time: 4698.0 + throughput: 212.85653469561515 estimated_peak_memory_range: - min: 647168 - max: 2673432 + min: 651264 + max: 12307840 primary_compute_unit: NPU precision: int8 layer_info: @@ -232,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 120 - job_id: jo5mrwj9g + job_id: j5mnx0q7p job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:37:45Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:40:23Z' - torchscript_onnx_tflite: - inference_time: 29982.0 - throughput: 33.35334534053766 + inference_time: 5950.0 + throughput: 168.0672268907563 estimated_peak_memory_range: - min: 1241088 - max: 46676736 + min: 671744 + max: 79706752 primary_compute_unit: NPU precision: int8 layer_info: @@ -255,22 +276,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 120 - job_id: jegn29jqg + job_id: jpxkoln85 job_status: Passed reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T12:37:46Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:40:22Z' - torchscript_onnx_tflite: - inference_time: 202726.0 - throughput: 4.932766394049111 + inference_time: 2871.0 + throughput: 348.31069313827936 estimated_peak_memory_range: - min: 970752 - max: 2884360 + min: 634880 + max: 35734560 primary_compute_unit: NPU precision: int8 layer_info: @@ -278,22 +299,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 120 - job_id: joprk4z75 + job_id: jp8qyemqp + job_status: Passed + torchscript_onnx: + inference_time: 7316.0 + throughput: 136.6867140513942 + estimated_peak_memory_range: + min: 7585792 + max: 68578848 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 217 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 217 + job_id: jp4lryd25 job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T12:37:47Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:40:46Z' - torchscript_onnx: - inference_time: 11477.0 - throughput: 87.13078330574191 + inference_time: 11112.0 + throughput: 89.99280057595392 estimated_peak_memory_range: - min: 14020608 - max: 14020608 + min: 13795328 + max: 13795328 primary_compute_unit: NPU precision: int8 layer_info: @@ -301,7 +337,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 217 - job_id: j7gjx0xep + job_id: jgdx1dq6p job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -310,4 +346,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:37:59Z' + timestamp: '2024-10-15T00:40:44Z' diff --git a/qai_hub_models/models/ffnet_78s/README.md b/qai_hub_models/models/ffnet_78s/README.md index 0f2d79dc..15383f97 100644 --- a/qai_hub_models/models/ffnet_78s/README.md +++ b/qai_hub_models/models/ffnet_78s/README.md @@ -6,7 +6,7 @@ FFNet-78S is a "fuss-free network" that segments street scene images with per-pixel classes like road, sidewalk, and pedestrian. Trained on the Cityscapes dataset. This is based on the implementation of FFNet-78S found -[here](https://github.com/Qualcomm-AI-research/FFNet). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/ffnet_78s). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.ffnet_78s.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of FFNet-78S can be found +* The license for the original implementation of FFNet-78S can be found [here](https://github.com/Qualcomm-AI-research/FFNet/blob/master/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Simple and Efficient Architectures for Semantic Segmentation](https://arxiv.org/abs/2206.08236) * [Source Model Implementation](https://github.com/Qualcomm-AI-research/FFNet) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/ffnet_78s/export.py b/qai_hub_models/models/ffnet_78s/export.py index 6478f4eb..ba9b2de6 100644 --- a/qai_hub_models/models/ffnet_78s/export.py +++ b/qai_hub_models/models/ffnet_78s/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.ffnet_78s import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "ffnet_78s" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/ffnet_78s/perf.yaml b/qai_hub_models/models/ffnet_78s/perf.yaml index db839eb8..4df0ad09 100644 --- a/qai_hub_models/models/ffnet_78s/perf.yaml +++ b/qai_hub_models/models/ffnet_78s/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: FFNet-78S performance_metrics: - torchscript_onnx_tflite: - inference_time: 23714.0 - throughput: 42.16918276123809 + inference_time: 23254.0 + throughput: 43.00335426163241 estimated_peak_memory_range: - min: 2568192 - max: 4545248 + min: 2560000 + max: 4265008 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 149 - job_id: jvgdwrkz5 + job_id: jp14zn27p job_status: Passed torchscript_onnx_qnn: - inference_time: 23669.0 - throughput: 42.24935569732562 + inference_time: 23635.0 + throughput: 42.31013327691982 estimated_peak_memory_range: - min: 2555904 - max: 24837272 + min: 25231360 + max: 49994480 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 235 - job_id: jep2871qp + job_id: j5mnx0e7p job_status: Passed torchscript_onnx: - inference_time: 33511.0 - throughput: 29.840947748500493 + inference_time: 32700.0 + throughput: 30.581039755351682 estimated_peak_memory_range: - min: 25206784 - max: 27484456 + min: 25268224 + max: 57157440 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 237 - job_id: j1p3k4yn5 + job_id: j56y48enp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:36:47Z' + timestamp: '2024-10-15T00:39:25Z' - torchscript_onnx_tflite: - inference_time: 21171.0 - throughput: 47.23442444853809 + inference_time: 21162.0 + throughput: 47.25451280597297 estimated_peak_memory_range: - min: 2560000 - max: 121156992 + min: 2543616 + max: 136007200 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 149 - job_id: jz57zjm9p + job_id: jgdx1dnzp job_status: Passed torchscript_onnx_qnn: - inference_time: 21243.0 - throughput: 47.07433036765052 + inference_time: 21440.0 + throughput: 46.64179104477612 estimated_peak_memory_range: - min: 21016576 - max: 58204224 + min: 21008384 + max: 63699088 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 235 - job_id: jqpye4llg + job_id: jgn6vz0j5 job_status: Passed torchscript_onnx: - inference_time: 28905.0 - throughput: 34.596090641757485 + inference_time: 29103.0 + throughput: 34.36071882623784 estimated_peak_memory_range: - min: 1335296 - max: 139201984 + min: 2121728 + max: 159953264 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 237 - job_id: jwgoy1jk5 + job_id: jp3j0zvmg job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:36:48Z' + timestamp: '2024-10-15T00:39:26Z' - torchscript_onnx_tflite: - inference_time: 23807.0 - throughput: 42.004452471962026 + inference_time: 23104.0 + throughput: 43.282548476454295 estimated_peak_memory_range: - min: 2539520 - max: 8656216 + min: 2560000 + max: 4869240 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 149 - job_id: jqp4qx71g + job_id: j5we6lw45 job_status: Passed torchscript_onnx_qnn: - inference_time: 23508.0 - throughput: 42.53871022630594 + inference_time: 23037.0 + throughput: 43.4084299170899 estimated_peak_memory_range: - min: 25264128 - max: 26553344 + min: 25272320 + max: 26476200 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 235 - job_id: j1p8o3nog + job_id: jp2kyrx6p job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:36:42Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:39:17Z' - torchscript_onnx_tflite: - inference_time: 39381.0 - throughput: 25.392955994007263 + inference_time: 23077.0 + throughput: 43.33318888937037 estimated_peak_memory_range: - min: 2699264 - max: 106880720 + min: 2572288 + max: 4742120 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 149 - job_id: j0pxv7qlg + job_id: j57yre2n5 job_status: Passed torchscript_onnx_qnn: - inference_time: 39522.0 - throughput: 25.302363240726685 + inference_time: 23073.0 + throughput: 43.34070125254627 estimated_peak_memory_range: - min: 25300992 - max: 55599376 + min: 25268224 + max: 26886872 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 235 - job_id: jw5663ky5 + job_id: jp8qye0qp job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:36:46Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:39:20Z' - torchscript_onnx_tflite: - inference_time: 23157.0 - throughput: 43.18348663471089 + inference_time: 23169.0 + throughput: 43.16112046268721 estimated_peak_memory_range: - min: 2547712 - max: 4615128 + min: 2543616 + max: 4705344 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 149 - job_id: jo5mrw79g + job_id: jgdx1dn6p job_status: Passed torchscript_onnx_qnn: - inference_time: 23725.0 - throughput: 42.14963119072708 + inference_time: 23407.0 + throughput: 42.72226257102576 estimated_peak_memory_range: - min: 25309184 - max: 26654208 + min: 25280512 + max: 26525808 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 235 - job_id: jogkzl1ng + job_id: jp0z0m305 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:36:43Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:39:19Z' - torchscript_onnx_tflite: - inference_time: 23288.0 - throughput: 42.94057025077293 + inference_time: 23240.0 + throughput: 43.029259896729776 estimated_peak_memory_range: - min: 2547712 - max: 4718832 + min: 2555904 + max: 4660656 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 149 - job_id: jegn294qg + job_id: jp14zn2np job_status: Passed torchscript_onnx_qnn: - inference_time: 23516.0 - throughput: 42.52423881612519 + inference_time: 23501.0 + throughput: 42.55138079230671 estimated_peak_memory_range: min: 25284608 - max: 26604224 + max: 29540720 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 235 - job_id: jn5q87no5 + job_id: jpy13oz0p job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:36:44Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:39:18Z' - torchscript_onnx_tflite: - inference_time: 23346.0 - throughput: 42.833890173905594 + inference_time: 39261.0 + throughput: 25.47056875780036 estimated_peak_memory_range: - min: 2560000 - max: 4387696 + min: 1220608 + max: 113735920 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 149 - job_id: joprk4r75 + job_id: jg9lnz0mg job_status: Passed torchscript_onnx_qnn: - inference_time: 23922.0 - throughput: 41.802524872502296 + inference_time: 39234.0 + throughput: 25.4880970586736 estimated_peak_memory_range: - min: 25272320 - max: 26547768 + min: 25079808 + max: 61194848 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 235 - job_id: j1gln0jmp + job_id: j5q6qleep job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:39:23Z' + - torchscript_onnx_tflite: + inference_time: 16614.0 + throughput: 60.19020103527146 + estimated_peak_memory_range: + min: 2174976 + max: 57745728 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 149 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 149 + job_id: jpxkol985 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 16705.0 + throughput: 59.86231667165519 + estimated_peak_memory_range: + min: 25178112 + max: 67295488 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 235 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 235 + job_id: jglvmy625 + job_status: Passed + torchscript_onnx: + inference_time: 20857.0 + throughput: 47.94553387351968 + estimated_peak_memory_range: + min: 27004928 + max: 90335184 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 237 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 237 + job_id: jgjvnrz1g + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:36:45Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:39:29Z' - torchscript_onnx_qnn: - inference_time: 24051.0 - throughput: 41.57831275206852 + inference_time: 23035.0 + throughput: 43.41219882787063 estimated_peak_memory_range: min: 25219072 max: 25219072 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 235 - job_id: j2p0y1wng + job_id: jprv3l6kg job_status: Passed torchscript_onnx: - inference_time: 33007.0 - throughput: 30.296603750719544 + inference_time: 36440.0 + throughput: 27.442371020856204 estimated_peak_memory_range: - min: 33652736 - max: 33652736 + min: 32493568 + max: 32493568 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 237 - job_id: j1pv31jr5 + job_id: jgo26lk1p job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:36:49Z' + timestamp: '2024-10-15T00:39:27Z' diff --git a/qai_hub_models/models/ffnet_78s_lowres/README.md b/qai_hub_models/models/ffnet_78s_lowres/README.md index d39d8172..cb86eb48 100644 --- a/qai_hub_models/models/ffnet_78s_lowres/README.md +++ b/qai_hub_models/models/ffnet_78s_lowres/README.md @@ -6,7 +6,7 @@ FFNet-78S-LowRes is a "fuss-free network" that segments street scene images with per-pixel classes like road, sidewalk, and pedestrian. Trained on the Cityscapes dataset. This is based on the implementation of FFNet-78S-LowRes found -[here](https://github.com/Qualcomm-AI-research/FFNet). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/ffnet_78s_lowres). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.ffnet_78s_lowres.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of FFNet-78S-LowRes can be found +* The license for the original implementation of FFNet-78S-LowRes can be found [here](https://github.com/Qualcomm-AI-research/FFNet/blob/master/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Simple and Efficient Architectures for Semantic Segmentation](https://arxiv.org/abs/2206.08236) * [Source Model Implementation](https://github.com/Qualcomm-AI-research/FFNet) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/ffnet_78s_lowres/export.py b/qai_hub_models/models/ffnet_78s_lowres/export.py index 1d8469f1..42c7e301 100644 --- a/qai_hub_models/models/ffnet_78s_lowres/export.py +++ b/qai_hub_models/models/ffnet_78s_lowres/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.ffnet_78s_lowres import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "ffnet_78s_lowres" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/ffnet_78s_lowres/perf.yaml b/qai_hub_models/models/ffnet_78s_lowres/perf.yaml index 014f110d..3c41bf2e 100644 --- a/qai_hub_models/models/ffnet_78s_lowres/perf.yaml +++ b/qai_hub_models/models/ffnet_78s_lowres/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: FFNet-78S-LowRes performance_metrics: - torchscript_onnx_tflite: - inference_time: 8311.0 - throughput: 120.3224642040669 + inference_time: 8330.0 + throughput: 120.04801920768307 estimated_peak_memory_range: - min: 651264 - max: 2762192 + min: 638976 + max: 2660728 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 149 - job_id: joprk4m05 + job_id: jpedm7dv5 job_status: Passed torchscript_onnx_qnn: - inference_time: 8366.0 - throughput: 119.53143676786995 + inference_time: 8398.0 + throughput: 119.07597046915933 estimated_peak_memory_range: min: 6311936 - max: 35352632 + max: 29553584 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 236 - job_id: j1gln09jp + job_id: j5mnx0n9p job_status: Passed torchscript_onnx: - inference_time: 8924.0 - throughput: 112.05737337516808 + inference_time: 8025.0 + throughput: 124.61059190031153 estimated_peak_memory_range: - min: 6307840 - max: 8953200 + min: 6320128 + max: 9269528 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 238 - job_id: jz5wodk3p + job_id: j56y482yp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:36:00Z' + timestamp: '2024-10-15T00:38:28Z' - torchscript_onnx_tflite: - inference_time: 6668.0 - throughput: 149.97000599880025 + inference_time: 6595.0 + throughput: 151.6300227445034 estimated_peak_memory_range: min: 655360 - max: 59796448 + max: 63719536 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 149 - job_id: jep287qrp + job_id: jgz3dl3x5 job_status: Passed torchscript_onnx_qnn: - inference_time: 6652.0 - throughput: 150.3307276007216 + inference_time: 7132.0 + throughput: 140.21312394840157 estimated_peak_memory_range: min: 6307840 - max: 32223456 + max: 33817856 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 236 - job_id: jw5663965 + job_id: jgn6vz6q5 job_status: Passed torchscript_onnx: - inference_time: 7365.0 - throughput: 135.77732518669382 + inference_time: 6923.0 + throughput: 144.4460494005489 estimated_peak_memory_range: - min: 7581696 - max: 80764752 + min: 2400256 + max: 85574912 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 238 - job_id: jmg9v3rw5 + job_id: jp3j0znng job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:36:01Z' + timestamp: '2024-10-15T00:38:29Z' - torchscript_onnx_tflite: - inference_time: 8162.0 - throughput: 122.51899044351875 + inference_time: 8303.0 + throughput: 120.43839576056847 estimated_peak_memory_range: - min: 651264 - max: 2121592 + min: 188416 + max: 21398488 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 149 - job_id: jqpye4k8g + job_id: j5we6lem5 job_status: Passed torchscript_onnx_qnn: - inference_time: 7694.0 - throughput: 129.97140629061607 + inference_time: 7635.0 + throughput: 130.97576948264572 estimated_peak_memory_range: - min: 6361088 - max: 7466096 + min: 6365184 + max: 7587240 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 236 - job_id: jwgoy17q5 + job_id: jp2kyrkqp job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:35:55Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:38:21Z' - torchscript_onnx_tflite: - inference_time: 12057.0 - throughput: 82.9393713195654 + inference_time: 8344.0 + throughput: 119.84659635666347 estimated_peak_memory_range: - min: 12288 - max: 52558496 + min: 0 + max: 1809984 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 149 - job_id: j2p0y189g + job_id: j57yrey95 job_status: Passed torchscript_onnx_qnn: - inference_time: 12604.0 - throughput: 79.33989209774674 + inference_time: 7638.0 + throughput: 130.92432573972243 estimated_peak_memory_range: - min: 6307840 - max: 24217200 + min: 6356992 + max: 8020696 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 236 - job_id: jygzex6og + job_id: jp8qyeqop job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:35:59Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:38:24Z' - torchscript_onnx_tflite: - inference_time: 8301.0 - throughput: 120.46741356463076 + inference_time: 8181.0 + throughput: 122.2344456667889 estimated_peak_memory_range: - min: 49152 - max: 2351024 + min: 24576 + max: 4455456 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 149 - job_id: j1p8o3dkg + job_id: jgdx1dxzp job_status: Passed torchscript_onnx_qnn: - inference_time: 7841.0 - throughput: 127.53475322025253 + inference_time: 7735.0 + throughput: 129.2824822236587 estimated_peak_memory_range: - min: 6365184 - max: 7682824 + min: 6393856 + max: 7584728 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 236 - job_id: j1pv318k5 + job_id: jp0z0mzn5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:35:56Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:38:23Z' - torchscript_onnx_tflite: - inference_time: 8278.0 - throughput: 120.80212611741966 + inference_time: 8180.0 + throughput: 122.24938875305624 estimated_peak_memory_range: - min: 671744 - max: 2711488 + min: 172032 + max: 9611728 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 149 - job_id: jogkzlwwg + job_id: jp14zn47p job_status: Passed torchscript_onnx_qnn: - inference_time: 7693.0 - throughput: 129.98830105290523 + inference_time: 7747.0 + throughput: 129.08222537756552 estimated_peak_memory_range: - min: 6393856 - max: 7764184 + min: 6414336 + max: 7559240 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 236 - job_id: j7gjx09vp + job_id: jpy13o1lp job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:35:57Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:38:22Z' - torchscript_onnx_tflite: - inference_time: 8301.0 - throughput: 120.46741356463076 + inference_time: 11977.0 + throughput: 83.49336227769892 estimated_peak_memory_range: - min: 16384 - max: 1772056 + min: 663552 + max: 56111328 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 149 - job_id: jn5q87xn5 + job_id: jg9lnzl8g job_status: Passed torchscript_onnx_qnn: - inference_time: 7825.0 - throughput: 127.79552715654953 + inference_time: 12509.0 + throughput: 79.94244144216164 estimated_peak_memory_range: - min: 6369280 - max: 7692072 + min: 6320128 + max: 26534272 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 236 - job_id: jlpe9rqog + job_id: j5q6ql6op job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:38:26Z' + - torchscript_onnx_tflite: + inference_time: 5656.0 + throughput: 176.8033946251768 + estimated_peak_memory_range: + min: 57344 + max: 30200688 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 149 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 149 + job_id: jpxkolkl5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 5912.0 + throughput: 169.14749661705008 + estimated_peak_memory_range: + min: 6303744 + max: 27599472 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 236 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 236 + job_id: jglvmy4m5 + job_status: Passed + torchscript_onnx: + inference_time: 4997.0 + throughput: 200.12007204322595 + estimated_peak_memory_range: + min: 7557120 + max: 52304048 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 238 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 238 + job_id: jgjvnrdeg + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:35:58Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:38:32Z' - torchscript_onnx_qnn: - inference_time: 8193.0 - throughput: 122.05541315757354 + inference_time: 8198.0 + throughput: 121.98097096852891 estimated_peak_memory_range: min: 6303744 max: 6303744 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 236 - job_id: j1p3k4l35 + job_id: jprv3lv7g job_status: Passed torchscript_onnx: - inference_time: 8769.0 - throughput: 114.03808872163303 + inference_time: 8793.0 + throughput: 113.72682815876266 estimated_peak_memory_range: - min: 50311168 - max: 50311168 + min: 50634752 + max: 50634752 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 238 - job_id: jnp10d985 + job_id: jgo26lzkp job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:36:02Z' + timestamp: '2024-10-15T00:38:30Z' diff --git a/qai_hub_models/models/ffnet_78s_quantized/README.md b/qai_hub_models/models/ffnet_78s_quantized/README.md index f2c25927..f370cdde 100644 --- a/qai_hub_models/models/ffnet_78s_quantized/README.md +++ b/qai_hub_models/models/ffnet_78s_quantized/README.md @@ -6,7 +6,7 @@ FFNet-78S-Quantized is a "fuss-free network" that segments street scene images with per-pixel classes like road, sidewalk, and pedestrian. Trained on the Cityscapes dataset. This is based on the implementation of FFNet-78S-Quantized found -[here](https://github.com/Qualcomm-AI-research/FFNet). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/ffnet_78s_quantized). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.ffnet_78s_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of FFNet-78S-Quantized can be found +* The license for the original implementation of FFNet-78S-Quantized can be found [here](https://github.com/Qualcomm-AI-research/FFNet/blob/master/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Simple and Efficient Architectures for Semantic Segmentation](https://arxiv.org/abs/2206.08236) * [Source Model Implementation](https://github.com/Qualcomm-AI-research/FFNet) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/ffnet_78s_quantized/export.py b/qai_hub_models/models/ffnet_78s_quantized/export.py index 8b6958f4..cc26a492 100644 --- a/qai_hub_models/models/ffnet_78s_quantized/export.py +++ b/qai_hub_models/models/ffnet_78s_quantized/export.py @@ -10,17 +10,17 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.ffnet_78s_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.printing import ( print_inference_metrics, @@ -44,20 +44,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -79,10 +77,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "ffnet_78s_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -108,7 +106,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -119,7 +117,7 @@ def export_model( target_runtime, output_path, input_spec ) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -194,7 +192,11 @@ def export_model( inference_job, inference_result, torch_out, model.get_output_names() ) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/ffnet_78s_quantized/perf.yaml b/qai_hub_models/models/ffnet_78s_quantized/perf.yaml index 925abbd0..b9089de1 100644 --- a/qai_hub_models/models/ffnet_78s_quantized/perf.yaml +++ b/qai_hub_models/models/ffnet_78s_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,44 +20,41 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS6490 (Proxy) - RB3 Gen 2 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Sa8775p Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Qcs6490 Proxy + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: FFNet-78S-Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 5845.0 - throughput: 171.0863986313088 + inference_time: 5745.0 + throughput: 174.06440382941688 estimated_peak_memory_range: - min: 643072 - max: 2934216 + min: 12288 + max: 2606296 primary_compute_unit: NPU precision: int8 layer_info: @@ -64,14 +62,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 156 - job_id: jegn29wkg + job_id: jglvmymm5 job_status: Passed torchscript_onnx: - inference_time: 13450.0 - throughput: 74.34944237918215 + inference_time: 11963.0 + throughput: 83.59107247345983 estimated_peak_memory_range: - min: 94208 - max: 24616864 + min: 126976 + max: 24740320 primary_compute_unit: NPU precision: int8 layer_info: @@ -79,7 +77,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 301 - job_id: jnp10dk85 + job_id: jgkex2xng job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -88,13 +86,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:35:15Z' + timestamp: '2024-10-15T00:37:32Z' - torchscript_onnx_tflite: - inference_time: 4063.0 - throughput: 246.1235540241201 + inference_time: 4089.0 + throughput: 244.5585717779408 estimated_peak_memory_range: min: 638976 - max: 88888720 + max: 90400272 primary_compute_unit: NPU precision: int8 layer_info: @@ -102,14 +100,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 156 - job_id: joprk4705 + job_id: j56y484yp job_status: Passed torchscript_onnx: - inference_time: 10176.0 - throughput: 98.27044025157232 + inference_time: 8580.0 + throughput: 116.55011655011656 estimated_peak_memory_range: - min: 4907008 - max: 143410896 + min: 5042176 + max: 158745840 primary_compute_unit: NPU precision: int8 layer_info: @@ -117,7 +115,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 301 - job_id: jvgdwryr5 + job_id: j5q6qlqop job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -126,13 +124,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:35:16Z' + timestamp: '2024-10-15T00:37:33Z' - torchscript_onnx_tflite: - inference_time: 5706.0 - throughput: 175.2541184717841 + inference_time: 35597.0 + throughput: 28.092254965306065 estimated_peak_memory_range: - min: 626688 - max: 14584768 + min: 700416 + max: 50412352 primary_compute_unit: NPU precision: int8 layer_info: @@ -140,22 +138,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 156 - job_id: jep287zrp + job_id: j5we6l6m5 job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:34:59Z' + chipset: QCS6490 Proxy + timestamp: '2024-10-15T00:37:17Z' - torchscript_onnx_tflite: - inference_time: 7065.0 - throughput: 141.54281670205236 + inference_time: 218427.0 + throughput: 4.578188593900937 estimated_peak_memory_range: - min: 995328 - max: 91954720 + min: 905216 + max: 3281984 primary_compute_unit: NPU precision: int8 layer_info: @@ -163,22 +161,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 156 - job_id: jqpye4y8g + job_id: jg9lnzn8g job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: RB5 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:35:00Z' + chipset: QCS8250 Proxy + timestamp: '2024-10-15T00:37:18Z' - torchscript_onnx_tflite: - inference_time: 5779.0 - throughput: 173.04031839418585 + inference_time: 5683.0 + throughput: 175.96339961288052 estimated_peak_memory_range: - min: 655360 - max: 2276784 + min: 638976 + max: 2532944 primary_compute_unit: NPU precision: int8 layer_info: @@ -186,22 +184,45 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 156 - job_id: j2p0y1x9g + job_id: jp3j0z0ng job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:37:10Z' + - torchscript_onnx_tflite: + inference_time: 5752.0 + throughput: 173.85257301808068 + estimated_peak_memory_range: + min: 651264 + max: 2770552 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 156 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 156 + job_id: jpedm7mv5 + job_status: Passed + reference_device_info: + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:35:01Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:37:15Z' - torchscript_onnx_tflite: - inference_time: 5823.0 - throughput: 171.73278378842522 + inference_time: 5787.0 + throughput: 172.80110592707794 estimated_peak_memory_range: - min: 28672 - max: 2202960 + min: 20480 + max: 2510840 primary_compute_unit: NPU precision: int8 layer_info: @@ -209,7 +230,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 156 - job_id: j1p8o3kkg + job_id: jgjvnrneg job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -217,14 +238,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:35:02Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:37:13Z' - torchscript_onnx_tflite: - inference_time: 5701.0 - throughput: 175.40782318891422 + inference_time: 5717.0 + throughput: 174.91691446562882 estimated_peak_memory_range: - min: 663552 - max: 2493992 + min: 655360 + max: 2904768 primary_compute_unit: NPU precision: int8 layer_info: @@ -232,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 156 - job_id: jogkzlkwg + job_id: jpv6klkr5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:35:03Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:37:12Z' - torchscript_onnx_tflite: - inference_time: 35341.0 - throughput: 28.295747149203475 + inference_time: 7035.0 + throughput: 142.14641080312722 estimated_peak_memory_range: - min: 12288 - max: 48758736 + min: 868352 + max: 94568160 primary_compute_unit: NPU precision: int8 layer_info: @@ -255,22 +276,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 156 - job_id: jn5q87dn5 + job_id: jgo26l6kp job_status: Passed reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T12:35:04Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:37:11Z' - torchscript_onnx_tflite: - inference_time: 222511.0 - throughput: 4.494159839288844 + inference_time: 3501.0 + throughput: 285.6326763781777 estimated_peak_memory_range: - min: 888832 - max: 12590936 + min: 659456 + max: 40289216 primary_compute_unit: NPU precision: int8 layer_info: @@ -278,22 +299,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 156 - job_id: j1gln0qjp + job_id: jp14znz7p + job_status: Passed + torchscript_onnx: + inference_time: 8055.0 + throughput: 124.14649286157666 + estimated_peak_memory_range: + min: 7507968 + max: 80165664 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 301 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 301 + job_id: jp3j0zjng job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T12:35:05Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:37:36Z' - torchscript_onnx: - inference_time: 14011.0 - throughput: 71.37249304118193 + inference_time: 12351.0 + throughput: 80.96510404015869 estimated_peak_memory_range: - min: 23351296 - max: 23351296 + min: 23576576 + max: 23576576 primary_compute_unit: NPU precision: int8 layer_info: @@ -301,7 +337,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 301 - job_id: jz57zj1vp + job_id: jglvmyvm5 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -310,4 +346,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:35:17Z' + timestamp: '2024-10-15T00:37:35Z' diff --git a/qai_hub_models/models/foot_track_net/README.md b/qai_hub_models/models/foot_track_net/README.md new file mode 100644 index 00000000..f073400f --- /dev/null +++ b/qai_hub_models/models/foot_track_net/README.md @@ -0,0 +1,59 @@ +[![Qualcomm® AI Hub Models](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/quic-logo.jpg)](../../README.md) + + +# [Person-Foot-Detection: Multi-task Human detector](https://aihub.qualcomm.com/models/foot_track_net) + +FootTrackNet can detect person and face bounding boxes, head and feet landmark locations and feet visibility. + +This is based on the implementation of Person-Foot-Detection found +[here]({source_repo}). This repository contains scripts for optimized on-device +export suitable to run on Qualcomm® devices. More details on model performance +accross various devices, can be found [here](https://aihub.qualcomm.com/models/foot_track_net). + +[Sign up](https://myaccount.qualcomm.com/signup) to start using Qualcomm AI Hub and run these models on a hosted Qualcomm® device. + + + + +## Example & Usage + + +Once installed, run the following simple CLI demo: + +```bash +python -m qai_hub_models.models.foot_track_net.demo +``` +More details on the CLI tool can be found with the `--help` option. See +[demo.py](demo.py) for sample usage of the model including pre/post processing +scripts. Please refer to our [general instructions on using +models](../../../#getting-started) for more usage instructions. + +## Export for on-device deployment + +This repository contains export scripts that produce a model optimized for +on-device deployment. This can be run as follows: + +```bash +python -m qai_hub_models.models.foot_track_net.export +``` +Additional options are documented with the `--help` option. Note that the above +script requires access to Deployment instructions for Qualcomm® AI Hub. + + +## License +* The license for the original implementation of Person-Foot-Detection can be found + [here](https://github.com/qcom-ai-hub/ai-hub-models-internal/blob/main/LICENSE). +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + + +## References +* [None](None) +* [Source Model Implementation](https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/foot_track_net/model.py) + + + +## Community +* Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. +* For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). + + diff --git a/qai_hub_models/models/foot_track_net/__init__.py b/qai_hub_models/models/foot_track_net/__init__.py new file mode 100644 index 00000000..a300223f --- /dev/null +++ b/qai_hub_models/models/foot_track_net/__init__.py @@ -0,0 +1,7 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +from .app import FootTrackNet_App as App # noqa: F401 +from .model import MODEL_ID # noqa: F401 +from .model import FootTrackNet_model as Model # noqa: F401 diff --git a/qai_hub_models/models/foot_track_net/app.py b/qai_hub_models/models/foot_track_net/app.py new file mode 100644 index 00000000..fbc15d48 --- /dev/null +++ b/qai_hub_models/models/foot_track_net/app.py @@ -0,0 +1,366 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +from __future__ import annotations + +from typing import Callable, List, Tuple + +import cv2 +import numpy as np +import torch +from PIL import Image + +from qai_hub_models.utils.bounding_box_processing import get_iou +from qai_hub_models.utils.image_processing import ( + app_to_net_image_inputs, + normalize_image_transform, +) + +CLASSNAME_TO_ID_MAP = {"face": 0, "person": 1} + + +def id_to_classname(id: int) -> str: + """CLASSNAME_TO_ID_MAP traverse the ID, return the corresponding class name""" + for k, v in CLASSNAME_TO_ID_MAP.items(): + if v == id: + return k + + +def restructure_topk(scores: torch.Tensor, K: int = 20) -> list: + """ + cutomized function for top_k specific for this the FootTrackNet. Wil restructure the original coordinates, class id from the floored index. + After top k operation. this will specifically decoding the coordinates, class from the topk result. + parameters: + scores: the heatmap scores in flat shape + K: how many top k to be kept. + return: + topk_scores: the scorse list for the top k. + topk_inds: the index list for the top k. + topk_clses: the class list for the top k. + topk_ys: the y coordinate list for the top k. + topk_xs: the x coordinate list for the top k. + """ + batch, cat, height, width = scores.size() + + topk_scores, topk_inds = torch.topk( + scores.reshape(batch, -1), min(K, batch * cat * height * width) + ) + topk_clses = (topk_inds // (height * width)).int() + + topk_inds = topk_inds % (height * width) + topk_ys = (topk_inds // width).int().float() + topk_xs = (topk_inds % width).int().float() + return topk_scores, topk_inds, topk_clses, topk_ys, topk_xs + + +class BBox_landmarks: + def __init__( + self, + label: str, + xyrb: list | np.ndarray, + score: float | int = 0, + landmark: list | np.ndarray | None = None, + vis: list | np.ndarray | None = None, + ): + """ + A bounding box plus landmarks structure to hold the hierarchical result. + parameters: + label:str the class label + xyrb: 4 array or list for bbox left, top, right bottom coordinates + score:the score of the deteciton + landmark: 17x2 the landmark of the joints [[x1,y1], [x2,y2]...] + vis: 17 the visiblity of the joints. + """ + self.label = label + self.score = score + self.landmark = landmark + self.vis = vis + self.x, self.y, self.r, self.b = xyrb + minx = min(self.x, self.r) + maxx = max(self.x, self.r) + miny = min(self.y, self.b) + maxy = max(self.y, self.b) + self.x, self.y, self.r, self.b = minx, miny, maxx, maxy + + @property + def label_prop(self): + return self.label + + @property + def haslandmark(self): + return self.landmark is not None + + @property + def box(self): + return [self.x, self.y, self.r, self.b] + + @box.setter + def box(self, newvalue): + self.x, self.y, self.r, self.b = newvalue + + @label_prop.setter + def label_prop(self, newvalue): + self.label = newvalue + + +def nms_bbox_landmark( + objs: list[BBox_landmarks], iou: float = 0.5 +) -> list[BBox_landmarks]: + """ + nms function customized to work on the BBox_landmarks objects list. + parameter: + objs: the list of the BBox_landmarks objects. + return: + the rest of the BBox_landmarks after nms operation. + """ + if objs is None or len(objs) <= 1: + return objs + + objs = sorted(objs, key=lambda obj: obj.score, reverse=True) + keep = [] + flags = [0] * len(objs) + for index, obj in enumerate(objs): + + if flags[index] != 0: + continue + + keep.append(obj) + for j in range(index + 1, len(objs)): + if ( + flags[j] == 0 + and get_iou(np.array(obj.box), np.array(objs[j].box)) > iou + ): + flags[j] = 1 + return keep + + +def drawbbox( + image: np.ndarray, + bbox: BBox_landmarks, + color: list | tuple | None = None, + thickness: int = 2, + landmarkcolor: tuple | list = (0, 0, 255), + visibility: list | np.ndarray | None = None, + joint_to_visualize: list = [0, 15, 16], + visibility_thresh: float = 0.05, +) -> np.ndarray: + """ + draw a bounding box and landmarks on the input image based on the detection result in BBox_landmarks. + parameters: + image: the input image in cv2 format. + bbox: the detection result in format of BBox_landmarks + color:the color for the result + thickness: the thickness of the boundary + landmarkcolor: the color for the landmark + visiblity: the visibility of the landmarks. + joint_to_visualize: which joint to be visualized. + visibility_thresh: the thresh to deem as the landmark visible or not when drawing it. + return: + the image after drawing the result + """ + + x, y, r, b = [int(bb + 0.5) for bb in np.array(bbox.box).astype(int)] + # 3DMM adjustment, reuse the bbox structure + if bbox.label_prop == 0: + cx, cy = (r + x) // 2, (b + y) // 2 + offset = max(r - x, b - y) // 2 + x2 = cx - offset + y2 = cy - offset + r2 = cx + offset + b2 = cy + offset + cv2.rectangle(image, (x2, y2, r2 - x2 + 1, b2 - y2 + 1), color, thickness, 16) + + else: + cv2.rectangle(image, (x, y, r - x + 1, b - y + 1), color, thickness, 16) + + if bbox.haslandmark: + for i in range(len(bbox.landmark)): + x, y = bbox.landmark[i][:2] + + if not joint_to_visualize or i not in joint_to_visualize: + continue + if visibility is not None and visibility[i] > visibility_thresh: + cv2.circle(image, (int(x), int(y)), 4, landmarkcolor, -1, 16) + else: + cv2.circle(image, (int(x), int(y)), 4, (0, 0, 255), -1, 16) + return image + + +def detect_images_multiclass_fb( + output_hm: torch.Tensor, + output_tlrb: torch.Tensor, + output_landmark: torch.Tensor | None = None, + vis: torch.Tensor | None = None, + threshold: list | np.ndarray = [0.7, 0.7, 0.7], + stride: int = 4, + n_lmk: int = 17, +) -> list: + """ + Get the detection result from the model raw output tensors. + parameters: + output_hm: N,C,H,W the model heatmap output. + output_tlrb: N,12,H,W the model bbox output. + output_landmark: N,34,H,W the model output_landmark output. + vis: N,17,H,W the model visiblity output + threshold: 3 the threshold for each class. + stride: the stride of the output map comparing to input. + n_lmk: the landmark number. + return: + detection result: list[BBox_landmarks] + + """ + _, num_classes, hm_height, hm_width = output_hm.shape + hm = output_hm[0].reshape(1, num_classes, hm_height, hm_width) + hm = hm[:, :2] + + tlrb = ( + output_tlrb[0] + .cpu() + .data.numpy() + .reshape(1, num_classes * 4, hm_height, hm_width) + ) + + landmark = output_landmark[0].cpu().data.numpy().reshape(1, -1, hm_height, hm_width) + vis = vis[0].cpu().data.numpy().reshape(1, -1, hm_height, hm_width) + nmskey = hm + + kscore, kinds, kcls, kys, kxs = restructure_topk(nmskey, 1000) + + kys = kys.cpu().data.numpy().astype(np.int) + kxs = kxs.cpu().data.numpy().astype(np.int) + kcls = kcls.cpu().data.numpy().astype(np.int) + kscore = kscore.cpu().data.numpy().astype(np.float32) + kinds = kinds.cpu().data.numpy().astype(np.int) + + key = [[], [], [], [], []] # [ [kys..], [kxs..], [score..], [class..]] + + score_fc = [] + for ind in range(kscore.shape[1]): + score = kscore[0, ind] + thr = threshold[kcls[0, ind]] + if kcls[0, ind] == 0: + score_fc.append(kscore[0, ind]) + if score > thr: + key[0].append(kys[0, ind]) + key[1].append(kxs[0, ind]) + key[2].append(score) + key[3].append(kcls[0, ind]) + key[4].append(kinds[0, ind]) + + imboxs = [] + if key[0] is not None and len(key[0]) > 0: + ky, kx = key[0], key[1] + classes = key[3] + scores = key[2] + + for i in range(len(kx)): + class_ = classes[i] + cx, cy = kx[i], ky[i] + x1, y1, x2, y2 = tlrb[0, class_ * 4 : (class_ + 1) * 4, cy, cx] + x1, y1, x2, y2 = ( + np.array([cx, cy, cx, cy]) + np.array([-x1, -y1, x2, y2]) + ) * stride # back to world + + if class_ == 1: # face person, only person has landmakr otherwise None + x5y5 = landmark[0, : n_lmk * 2, cy, cx] + x5y5 = (x5y5 + np.array([cx] * n_lmk + [cy] * n_lmk)) * stride + boxlandmark = np.array(list(zip(x5y5[:n_lmk], x5y5[n_lmk:]))) + box_vis = vis[0, :, cy, cx].tolist() + else: + boxlandmark = None + box_vis = None + imboxs.append( + BBox_landmarks( + label=str(class_), + xyrb=np.array([x1, y1, x2, y2]), + score=scores[i].item(), + landmark=boxlandmark, + vis=box_vis, + ) + ) + return imboxs + + +class FootTrackNet_App: + """ + This class consists of light-weight "app code" that is required to perform end to end inference with DDRNet. + + The app uses 1 model: + * FootTrackNet + + For a given image input, the app will: + * pre-process the image (convert to range[0, 1]) + * Run FootTrackNet inference + * Convert the output to two lists of BBox_landmarks objects for face and body. + """ + + def __init__(self, model: Callable[[torch.Tensor], torch.Tensor]): + self.model = model + + def predict(self, *args, **kwargs): + return self.det_image(*args, **kwargs) + + def det_image( + self, + pixel_values_or_image: torch.Tensor + | np.ndarray + | Image.Image + | List[Image.Image], + ) -> Tuple[List[BBox_landmarks], List[BBox_landmarks]]: + """ + return two lists, objs_face, objs_person. + Each list contains the object of BBox_landmarks which contains the bbox and landmark info. Please refer to BBox definition. + + Parameters: + pixel_values_or_image + PIL image(s) + or + numpy array (N H W C x uint8) or (H W C x uint8) -- both RGB channel layout + or + pyTorch tensor (N C H W x fp32, value range is [0, 1]), RGB channel layout + + Returns: + objs_face: a list of BBox_landmarks for face list[BBox_landmarks] + objs_person: a list of BBox_landmarks for person list[BBox_landmarks] + """ + NHWC_int_numpy_frames, NCHW_fp32_torch_frames = app_to_net_image_inputs( + pixel_values_or_image + ) + input_transform = normalize_image_transform() + NCHW_fp32_torch_frames = input_transform(NCHW_fp32_torch_frames) # normalize + threshhold = [0.6, 0.7, 0.7] # threshold for each detector + iou_thr = [0.2, 0.5, 0.5] # iou threshold + output = self.model(NCHW_fp32_torch_frames) + + heatmap = output[0] + bbox = output[1] + landmark = output[2] + landmark_visiblity = output[3] + + stride = 4 + num_landmarks = 17 + objs = detect_images_multiclass_fb( + heatmap, + bbox, + landmark, + threshold=threshhold, + stride=stride, + n_lmk=num_landmarks, + vis=landmark_visiblity, + ) + + objs_face = [] + objs_person = [] + + for obj in objs: + label = id_to_classname(int(obj.label_prop)) + if label == "face": + objs_face.append(obj) + elif label == "person": + objs_person.append(obj) + + objs_face = nms_bbox_landmark(objs_face, iou=iou_thr[0]) + objs_person = nms_bbox_landmark(objs_person, iou=iou_thr[1]) + + return objs_face, objs_person diff --git a/qai_hub_models/models/foot_track_net/conftest.py b/qai_hub_models/models/foot_track_net/conftest.py new file mode 100644 index 00000000..811b52bd --- /dev/null +++ b/qai_hub_models/models/foot_track_net/conftest.py @@ -0,0 +1,39 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +# THIS FILE WAS AUTO-GENERATED. DO NOT EDIT MANUALLY. + +import inspect + +import pytest + +from qai_hub_models.models.foot_track_net import Model +from qai_hub_models.utils.testing import skip_clone_repo_check + + +# Instantiate the model only once for all tests. +# Mock from_pretrained to always return the initialized model. +# This speeds up tests and limits memory leaks. +@pytest.fixture(scope="module", autouse=True) +def cached_from_pretrained(): + with pytest.MonkeyPatch.context() as mp: + pretrained_cache = {} + from_pretrained = Model.from_pretrained + sig = inspect.signature(from_pretrained) + + @skip_clone_repo_check + def _cached_from_pretrained(*args, **kwargs): + cache_key = str(args) + str(kwargs) + model = pretrained_cache.get(cache_key, None) + if model: + return model + else: + model = from_pretrained(*args, **kwargs) + pretrained_cache[cache_key] = model + return model + + _cached_from_pretrained.__signature__ = sig + + mp.setattr(Model, "from_pretrained", _cached_from_pretrained) + yield mp diff --git a/qai_hub_models/models/foot_track_net/demo.py b/qai_hub_models/models/foot_track_net/demo.py new file mode 100644 index 00000000..26ee52aa --- /dev/null +++ b/qai_hub_models/models/foot_track_net/demo.py @@ -0,0 +1,107 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +import numpy as np +from PIL import Image + +from qai_hub_models.models.foot_track_net.app import ( + BBox_landmarks, + FootTrackNet_App, + drawbbox, +) +from qai_hub_models.models.foot_track_net.model import ( + MODEL_ASSET_VERSION, + MODEL_ID, + FootTrackNet_model, +) +from qai_hub_models.utils.args import ( + demo_model_from_cli_args, + get_model_cli_parser, + get_on_device_demo_parser, + validate_on_device_demo_args, +) +from qai_hub_models.utils.asset_loaders import CachedWebModelAsset, load_image +from qai_hub_models.utils.display import display_or_save_image +from qai_hub_models.utils.draw import create_color_map +from qai_hub_models.utils.image_processing import pil_resize_pad + +INPUT_IMAGE_ADDRESS = CachedWebModelAsset.from_asset_store( + MODEL_ID, MODEL_ASSET_VERSION, "test1.jpg" +) + + +def undo_resize_pad_BBox(bbox: BBox_landmarks, scale: float, padding: list): + """ + undo the resize and pad in place of the BBox_landmarks object. + operation in place to replace the inner coordinates + Parameters: + scale: single scale from original to target image. + pad: left, top padding size + Return: + None. + """ + if bbox.haslandmark: + for lmk in bbox.landmark: + lmk[0] = (lmk[0] + padding[0]) / scale + lmk[1] = (lmk[1] + padding[1]) / scale + bbox.x = (bbox.x + padding[0]) / scale + bbox.y = (bbox.y + padding[1]) / scale + bbox.r = (bbox.r + padding[0]) / scale + bbox.b = (bbox.b + padding[1]) / scale + + return + + +def main(is_test: bool = False): + parser = get_model_cli_parser(FootTrackNet_model) + parser = get_on_device_demo_parser(parser, add_output_dir=True) + parser.add_argument( + "--image", + type=str, + default=INPUT_IMAGE_ADDRESS, + help="image file path or URL", + ) + args = parser.parse_args([] if is_test else None) + model = demo_model_from_cli_args(FootTrackNet_model, MODEL_ID, args) + validate_on_device_demo_args(args, MODEL_ID) + + # Load image + (_, _, height, width) = FootTrackNet_model.get_input_spec()["image"][0] + orig_image = load_image(args.image) + image, scale, padding = pil_resize_pad(orig_image, (height, width)) + print("Model Loaded") + + app = FootTrackNet_App(model) + objs_face, objs_person = app.det_image(image) + objs = objs_face + objs_person + + img_out = np.array(orig_image)[:, :, ::-1].copy() # to BGR + jt_vis = [0, 15, 16] + vis_thr = 0.5 + color_maps = create_color_map(2) + + for obj in objs: + undo_resize_pad_BBox(obj, scale, padding) + color = color_maps[int(obj.label)] + color = [int(e) for e in color] + vis = obj.vis + img_out = drawbbox( + img_out, + obj, + color=color, + landmarkcolor=color, + visibility=vis, + joint_to_visualize=jt_vis, + visibility_thresh=vis_thr, + ) + img_out_PIL = Image.fromarray(img_out[:, :, ::-1]) + + if not is_test: + display_or_save_image( + img_out_PIL, args.output_dir, "FootTrackNet_demo_output.png" + ) + + +if __name__ == "__main__": + main() diff --git a/qai_hub_models/models/foot_track_net/export.py b/qai_hub_models/models/foot_track_net/export.py new file mode 100644 index 00000000..82ca1a0f --- /dev/null +++ b/qai_hub_models/models/foot_track_net/export.py @@ -0,0 +1,209 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +# THIS FILE WAS AUTO-GENERATED. DO NOT EDIT MANUALLY. + + +from __future__ import annotations + +import os +import warnings +from pathlib import Path +from typing import Any, Dict, List, Optional, cast + +import qai_hub as hub +import torch + +from qai_hub_models.models.common import ExportResult, TargetRuntime +from qai_hub_models.models.foot_track_net import Model +from qai_hub_models.utils.args import ( + export_parser, + get_input_spec_kwargs, + get_model_kwargs, +) +from qai_hub_models.utils.compare import torch_inference +from qai_hub_models.utils.input_spec import make_torch_inputs +from qai_hub_models.utils.printing import ( + print_inference_metrics, + print_profile_metrics_from_job, +) +from qai_hub_models.utils.qai_hub_helpers import ( + can_access_qualcomm_ai_hub, + export_without_hub_access, +) + + +def export_model( + device: str = "Samsung Galaxy S23 (Family)", + chipset: Optional[str] = None, + skip_profiling: bool = False, + skip_inferencing: bool = False, + skip_downloading: bool = False, + skip_summary: bool = False, + output_dir: Optional[str] = None, + target_runtime: TargetRuntime = TargetRuntime.TFLITE, + compile_options: str = "", + profile_options: str = "", + **additional_model_kwargs, +) -> ExportResult | List[str]: + """ + This function executes the following recipe: + + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference + + Each of the last 4 steps can be optionally skipped using the input options. + + Parameters: + device: Device for which to export the model. + Full list of available devices can be found by running `hub.get_devices()`. + Defaults to DEFAULT_DEVICE if not specified. + chipset: If set, will choose a random device with this chipset. + Overrides the `device` argument. + skip_profiling: If set, skips profiling of compiled model on real devices. + skip_inferencing: If set, skips computing on-device outputs from sample data. + skip_downloading: If set, skips downloading of compiled model. + skip_summary: If set, skips waiting for and summarizing results + from profiling and inference. + output_dir: Directory to store generated assets (e.g. compiled model). + Defaults to `/build/`. + target_runtime: Which on-device runtime to target. Default is TFLite. + compile_options: Additional options to pass when submitting the compile job. + profile_options: Additional options to pass when submitting the profile job. + **additional_model_kwargs: Additional optional kwargs used to customize + `model_cls.from_pretrained` and `model.get_input_spec` + + Returns: + A struct of: + * A CompileJob object containing metadata about the compile job submitted to hub. + * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). + """ + model_name = "foot_track_net" + output_path = Path(output_dir or Path.cwd() / "build" / model_name) + if chipset: + hub_device = hub.Device(attributes=f"chipset:{chipset}") + else: + hub_device = hub.Device(name=device) + if not can_access_qualcomm_ai_hub(): + return export_without_hub_access( + "foot_track_net", + "Person-Foot-Detection", + device, + skip_profiling, + skip_inferencing, + skip_downloading, + skip_summary, + output_path, + target_runtime, + compile_options, + profile_options, + ) + + # On-device perf improves with I/O in channel_last format except when using ONNX. + use_channel_last_format = target_runtime != TargetRuntime.ONNX + + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) + input_spec = model.get_input_spec( + **get_input_spec_kwargs(model, additional_model_kwargs) + ) + + # Trace the model + source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) + + # 2. Compiles the model to an asset that can be run on device + model_compile_options = model.get_hub_compile_options( + target_runtime, compile_options, hub_device + ) + print(f"Optimizing model {model_name} to run on-device") + submitted_compile_job = hub.submit_compile_job( + model=source_model, + input_specs=input_spec, + device=hub_device, + name=model_name, + options=model_compile_options, + ) + compile_job = cast(hub.client.CompileJob, submitted_compile_job) + + # 3. Profiles the model performance on a real device + profile_job: Optional[hub.client.ProfileJob] = None + if not skip_profiling: + profile_options_all = model.get_hub_profile_options( + target_runtime, profile_options + ) + print(f"Profiling model {model_name} on a hosted device.") + submitted_profile_job = hub.submit_profile_job( + model=compile_job.get_target_model(), + device=hub_device, + name=model_name, + options=profile_options_all, + ) + profile_job = cast(hub.client.ProfileJob, submitted_profile_job) + + # 4. Inferences the model on sample inputs + inference_job: Optional[hub.client.InferenceJob] = None + if not skip_inferencing: + profile_options_all = model.get_hub_profile_options( + target_runtime, profile_options + ) + print( + f"Running inference for {model_name} on a hosted device with example inputs." + ) + sample_inputs = model.sample_inputs( + input_spec, use_channel_last_format=use_channel_last_format + ) + submitted_inference_job = hub.submit_inference_job( + model=compile_job.get_target_model(), + inputs=sample_inputs, + device=hub_device, + name=model_name, + options=profile_options_all, + ) + inference_job = cast(hub.client.InferenceJob, submitted_inference_job) + + # 5. Downloads the model asset to the local directory + if not skip_downloading: + os.makedirs(output_path, exist_ok=True) + target_model: hub.Model = compile_job.get_target_model() # type: ignore + target_model.download(str(output_path / model_name)) + + # 6. Summarizes the results from profiling and inference + if not skip_summary and not skip_profiling: + assert profile_job is not None and profile_job.wait().success + profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore + print_profile_metrics_from_job(profile_job, profile_data) + + if not skip_summary and not skip_inferencing: + sample_inputs = model.sample_inputs(use_channel_last_format=False) + torch_out = torch_inference( + model, sample_inputs, return_channel_last_output=use_channel_last_format + ) + assert inference_job is not None and inference_job.wait().success + inference_result: hub.client.DatasetEntries = inference_job.download_output_data() # type: ignore + + print_inference_metrics( + inference_job, inference_result, torch_out, model.get_output_names() + ) + + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) + + +def main(): + warnings.filterwarnings("ignore") + parser = export_parser(model_cls=Model) + args = parser.parse_args() + export_model(**vars(args)) + + +if __name__ == "__main__": + main() diff --git a/qai_hub_models/models/foot_track_net/foot_track_net.py b/qai_hub_models/models/foot_track_net/foot_track_net.py new file mode 100644 index 00000000..d899cc7b --- /dev/null +++ b/qai_hub_models/models/foot_track_net/foot_track_net.py @@ -0,0 +1,162 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +import os + +import numpy as np +import torch +import torch.nn as nn + +from .layers import CBAModule, DetectModule, HeadModule, Mbv3SmallFast, UpModule + + +class FootTrackNet(nn.Module): + def __init__( + self, + wide: int = 64, + has_ext: bool = True, + upmode: str = "UCBA", + act: str = "relu", + RGB: bool = True, + strict: bool = False, + n_lmk: int = 17, + ): + super(FootTrackNet, self).__init__() + """ + FootTrackNet multi task human detector model for person, face detection plus head and feet landmark detection. + Draw the given points on the frame. + + Parameters: + wide: the channel size of bandwith of the intermediate layers + has_ext: if add extension layer in the head module. + upmode: upsampling mode. + act: activation function. + RGB: if the input is a 3 channel RGB + acb_mode: the ACBlock mode. + stand_conv: if use the standard convolution. + strict: if load the model weights in a strict way + n_lmk: the number of landmarks for detection. + + Returns: + FootTrackNet model instance. + """ + self.use_rgb = RGB + self.strict = strict + + self.mean = nn.Parameter( + torch.tensor( + np.array([0.408, 0.447, 0.47]).reshape(1, 3, 1, 1).astype(np.float32) + ), + requires_grad=False, + ) + self.std = nn.Parameter( + torch.tensor( + np.array([0.289, 0.274, 0.278]).reshape(1, 3, 1, 1).astype(np.float32) + ), + requires_grad=False, + ) + + # define backbone + self.bb = Mbv3SmallFast(act, RGB) + + # Get the number of branch node channels stride 4, 8, 16 + c0, c1, c2 = self.bb.uplayer_shape + act = "relu" if act == "hswish" else act + self.conv3 = CBAModule( + self.bb.output_channels, + wide, + kernel_size=1, + stride=1, + padding=0, + bias=False, + act=act, + ) # s32 + self.connect0 = CBAModule(c0, wide, kernel_size=1, act=act) # s4 + self.connect1 = CBAModule(c1, wide, kernel_size=1, act=act) # s8 + self.connect2 = CBAModule( + c2, wide, kernel_size=1, act=act + ) # s16, conv, batchnorm activation. + + self.up0 = UpModule( + wide, wide, kernel_size=2, stride=2, mode=upmode, act=act + ) # s16 nearest + self.up1 = UpModule( + wide, wide, kernel_size=2, stride=2, mode=upmode, act=act + ) # s8 + self.up2 = UpModule( + wide, wide, kernel_size=2, stride=2, mode=upmode, act=act + ) # s4 + self.detect = DetectModule(wide, act=act) + + self.heatmap1 = HeadModule(wide, 1, has_ext=has_ext) + self.box1 = HeadModule(wide, 4, has_ext=has_ext) + self.heatmap2 = HeadModule(wide, 1, has_ext=has_ext) + self.box2 = HeadModule(wide, 4, has_ext=has_ext) + self.heatmap3 = HeadModule(wide, 1, has_ext=has_ext) + self.box3 = HeadModule(wide, 4, has_ext=has_ext) + + self.landmark = HeadModule(wide, 2 * n_lmk, has_ext=has_ext) + self.landmark_vis = HeadModule(wide, n_lmk, has_ext=has_ext) + + def forward(self, x: torch.Tensor) -> list: + """ + x: N,C,H,W (1,3,480,640) tensor of input image + return: 4 tensors including + heatmap: N,C,H,W (1,3,120,160) + bbox: N,C,H,W (1,12,120,160) + landmark: N,C,H,W (1,34,120,160) + landmark: N,C,H,W (1,17,120,160) + """ + + s4, s8, s16, s32 = self.bb(x) + s32 = self.conv3(s32) + s16 = self.up0(s32) + self.connect2(s16) + s8 = self.up1(s16) + self.connect1(s8) + s4 = self.up2(s8) + self.connect0(s4) + x = self.detect(s4) + + # simplify with sigmoid + center1 = self.heatmap1(x).sigmoid() + center2 = self.heatmap2(x).sigmoid() + center3 = self.heatmap3(x).sigmoid() + + box1 = self.box1(x) + box2 = self.box2(x) + box3 = self.box3(x) # when demo, no hand + landmark = self.landmark(x) # 2 * 17 + landmark_vis = self.landmark_vis(x).sigmoid() + return ( + torch.cat((center1, center2, center3), dim=1), + torch.cat((box1, box2, box3), dim=1), + landmark, + landmark_vis, + ) # simple one landmark + + def load_weights(self, base_file): + """load pretrined weights""" + other, ext = os.path.splitext(base_file) + if ext == ".pkl" or ".pth": + print("Loading pretrained weights into state dict...") + + pretrained_dict = torch.load( + base_file, map_location=lambda storage, loc: storage + ) + model_dict = self.state_dict() + + if not self.strict: + pretrained_dict = { + k: v for k, v in pretrained_dict.items() if k in model_dict + } + if ( + self.use_rgb and pretrained_dict["bb.conv1.weight"].shape[1] == 1 + ): # single channel. + pretrained_dict["bb.conv1.weight"] = torch.tile( + pretrained_dict["bb.conv1.weight"], [1, 3, 1, 1] + ) # the input channel to 3. + model_dict.update(pretrained_dict) + + self.load_state_dict(model_dict, strict=self.strict) + print("Finished!") + else: + raise ValueError("Sorry only .pth and .pkl files supported.") diff --git a/qai_hub_models/models/foot_track_net/info.yaml b/qai_hub_models/models/foot_track_net/info.yaml new file mode 100644 index 00000000..ecacb51d --- /dev/null +++ b/qai_hub_models/models/foot_track_net/info.yaml @@ -0,0 +1,34 @@ +name: Person-Foot-Detection +# id must match with the model dir name in qai_hub_models +id: foot_track_net +status: public +headline: Multi-task Human detector. +domain: Computer Vision +description: FootTrackNet can detect person and face bounding boxes, head and feet landmark + locations and feet visibility. +use_case: Object Detection +tags: + - real-time +license: https://github.com/qcom-ai-hub/ai-hub-models-internal/blob/main/LICENSE +deploy_license: https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf +source_repo: https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/foot_track_net/model.py +technical_details: + Inference latency: RealTime + Input resolution: 640x480 + Number of output classes: 2 + Number of parameters: 2.53M + Model size: 9.69 MB +applicable_scenarios: + - Restricted zone + - Safety zone +related_models: [] +form_factors: + - Phone + - Tablet + - IoT +has_static_banner: true +has_animated_banner: true +license_type: bsd-3-clause +deploy_license_type: AI Model Hub License +dataset: + - coco diff --git a/qai_hub_models/models/foot_track_net/layers.py b/qai_hub_models/models/foot_track_net/layers.py new file mode 100644 index 00000000..1259ffb0 --- /dev/null +++ b/qai_hub_models/models/foot_track_net/layers.py @@ -0,0 +1,399 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- + + +import math + +import torch +import torch.nn as nn +import torch.nn.functional as F +import torch.nn.init as init + + +class SeModule(nn.Module): + """cutomized squeeze exitationm module""" + + def __init__(self, in_size: int, reduction: int = 4): + super(SeModule, self).__init__() + self.pool = nn.AdaptiveAvgPool2d(1) + self.se = nn.Sequential( + nn.Conv2d( + in_size, + in_size // reduction, + kernel_size=1, + stride=1, + padding=0, + bias=False, + ), + nn.BatchNorm2d(in_size // reduction), + nn.ReLU(inplace=True), + nn.Conv2d( + in_size // reduction, + in_size, + kernel_size=1, + stride=1, + padding=0, + bias=False, + ), + nn.BatchNorm2d(in_size), + nn.Sigmoid(), + ) + + def forward(self, x: torch.Tensor): + """ + x: N,C,H,W tensor + return: N,C,H,W tensor + + """ + return x * self.se(self.pool(x)) + + +class Block3x3(nn.Module): + """mobileNetV3 block modified version for simplification""" + + def __init__( + self, + kernel_size: int, + in_size: int, + expand_size: int, + out_size: int, + nolinear: str, + semodule: nn.Module, + stride: int, + ): + super(Block3x3, self).__init__() + self.kernel_size = kernel_size + self.stride = stride + self.se = semodule + self.conv1 = nn.Conv2d( + in_size, expand_size, kernel_size=1, stride=1, padding=0, bias=False + ) + self.bn1 = nn.BatchNorm2d(expand_size) + self.nolinear1 = nolinear + if kernel_size == 3: + self.conv2 = nn.Conv2d( + expand_size, + out_size, + kernel_size=3, + stride=stride, + padding=1, + bias=False, + ) + else: + self.conv2 = nn.Conv2d( + expand_size, out_size, kernel_size=3, stride=1, padding=1, bias=False + ) + self.conv3 = nn.Conv2d( + out_size, out_size, kernel_size=3, stride=stride, padding=1, bias=False + ) + self.bn3 = nn.BatchNorm2d(out_size) + self.nolinear3 = nolinear + self.bn2 = nn.BatchNorm2d(out_size) + self.nolinear2 = nolinear + self.shortcut = nn.Sequential() + if stride == 1 and in_size != out_size: + self.shortcut = nn.Sequential( + nn.Conv2d( + in_size, out_size, kernel_size=1, stride=1, padding=0, bias=False + ), + nn.BatchNorm2d(out_size), + ) + + def forward(self, x: torch.Tensor): + """ + x: N,C,H,W input feature + return: N,C,H,W torch.tensor + """ + out = self.nolinear1(self.bn1(self.conv1(x))) + out = self.nolinear2(self.bn2(self.conv2(out))) + if self.kernel_size == 5: + out = self.nolinear3(self.bn3(self.conv3(out))) + + if self.se is not None: + out = self.se(out) + out = out + self.shortcut(x) if self.stride == 1 else out + return out + + +class Mbv3SmallFast(nn.Module): + """ + Certain Layers are borrowed and modified based on MobileNet3 + for details of each layer funcitonality please check: https://arxiv.org/abs/1905.02244 + """ + + def __init__(self, act: str = "relu", RGB: bool = True): + super(Mbv3SmallFast, self).__init__() + self.keep = [2, 5, 12] + self.uplayer_shape = [16, 32, 64] + self.output_channels = 96 + if RGB: + self.conv1 = nn.Conv2d( + 3, 16, kernel_size=3, stride=2, padding=1, bias=False + ) + else: + self.conv1 = nn.Conv2d( + 1, 16, kernel_size=3, stride=2, padding=1, bias=False + ) + + self.bn1 = nn.BatchNorm2d(16) + if act == "relu": + self.hs1 = nn.ReLU(inplace=True) + elif act == "prelu": + self.hs1 = nn.PReLU() + elif act == "hswish": + self.hs1 = nn.Hardswish() + + self.bneck = nn.Sequential( + Block3x3(3, 16, 16, 16, self.hs1, None, 1), + Block3x3(3, 16, 64, 16, self.hs1, None, 2), # 1 + Block3x3(3, 16, 64, 16, self.hs1, None, 1), # 2* + Block3x3(5, 16, 96, 32, self.hs1, SeModule(32), 2), # 3 + Block3x3(5, 32, 96, 32, self.hs1, SeModule(32), 1), # 4 + Block3x3(5, 32, 128, 32, self.hs1, SeModule(32), 1), # 5* + Block3x3(3, 32, 128, 64, self.hs1, None, 2), # 6 + Block3x3(3, 64, 128, 64, self.hs1, None, 1), # 7 + Block3x3(3, 64, 160, 64, self.hs1, None, 1), # 8 + Block3x3(3, 64, 160, 64, self.hs1, None, 1), # 9 + Block3x3(3, 64, 256, 64, self.hs1, SeModule(64), 1), # 10 + Block3x3(3, 64, 320, 64, self.hs1, SeModule(64), 1), # 11 + Block3x3(5, 64, 320, 64, self.hs1, SeModule(64), 1), # 12* + Block3x3(5, 64, 320, 96, self.hs1, SeModule(96), 2), # 13 + Block3x3(5, 96, 480, 96, self.hs1, SeModule(96), 1), # 14 + ) + + def initialize_weights(self): + print("random init...") + for m in self.modules(): + if isinstance(m, nn.Conv2d): + + n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels + m.weight.data.normal_(0, math.sqrt(2.0 / n)) + if m.bias is not None: + m.bias.data.zero_() + elif isinstance(m, nn.BatchNorm2d): + m.weight.data.fill_(1) + m.bias.data.zero_() + elif isinstance(m, nn.Linear): + n = m.weight.size(1) + m.weight.data.normal_(0, 0.01) + m.bias.data.zero_() + + def forward(self, x: torch.Tensor): + """ + x: N,C,480,640 image tensor + return: List of tensors as N,C,H,W for each stage, for specific + 0: N,16,120,160 + 1: N,32,60,80 + 2: N 64,30,40 + 3: N 96,15,20 + """ + x = self.hs1(self.bn1(self.conv1(x))) + outs = [] + for index, item in enumerate(self.bneck): + x = item(x) + + if index in self.keep: + outs.append(x) + outs.append(x) + return outs + + +class CBAModule(nn.Module): + """Conv BatchNorm Activation block""" + + def __init__( + self, + in_channels: int, + out_channels: int = 24, + kernel_size: int = 3, + stride: int = 1, + padding: int = 0, + bias: bool = False, + act: str = "relu", + ): + super(CBAModule, self).__init__() + self.conv = nn.Conv2d( + in_channels, out_channels, kernel_size, stride, padding=padding, bias=bias + ) + self.bn = nn.BatchNorm2d(out_channels) + if act == "relu": + self.act = nn.ReLU(inplace=True) + elif act == "identity": + self.act = nn.Identity() + else: + self.act = nn.PReLU() + + init.xavier_uniform_(self.conv.weight.data) + if self.conv.bias is not None: + self.conv.bias.data.zero_() + + def forward(self, x: torch.Tensor): + """ + x: N,C,H,W tensor + return: N,C,H,W tensor + """ + x = self.conv(x) + x = self.bn(x) + x = self.act(x) + return x + + +class UpModule(nn.Module): + """Upsampling module""" + + def __init__( + self, + in_channels: int, + out_channels: int, + kernel_size: int = 2, + stride: int = 2, + bias: bool = False, + mode: str = "UCBA", + act: str = "relu", + ): + super(UpModule, self).__init__() + self.mode = mode + + if self.mode == "UCBA": + self.up = nn.Upsample(size=None, scale_factor=2, mode="nearest") + self.conv = CBAModule( + in_channels, out_channels, 3, padding=1, bias=bias, act=act + ) + elif self.mode == "DeconvBN": + self.dconv = nn.ConvTranspose2d( + in_channels, out_channels, kernel_size, stride, bias=bias + ) + self.bn = nn.BatchNorm2d(out_channels) + elif self.mode == "DeCBA": + self.dconv = nn.ConvTranspose2d( + in_channels, out_channels, kernel_size, stride, bias=bias + ) + self.conv = CBAModule(out_channels, out_channels, 3, padding=1, bias=bias) + else: + raise RuntimeError(f"Unsupport mode: {mode}") + + def forward(self, x: torch.Tensor): + """ + x: N,C,H,W tensor + return: N,C,H,W tesnor. + """ + if self.mode == "UCBA": + return self.conv(self.up(x)) + elif self.mode == "DeconvBN": + return F.relu(self.bn(self.dconv(x))) + elif self.mode == "DeCBA": + return self.conv(self.dconv(x)) + + +class ContextModule(nn.Module): + """single stage headless face detector context module""" + + def __init__(self, in_channels: int, act: str = "relu"): + super(ContextModule, self).__init__() + + block_wide = in_channels // 4 + self.inconv = CBAModule(in_channels, block_wide, 3, 1, padding=1, act=act) + self.upconv = CBAModule(block_wide, block_wide, 3, 1, padding=1, act=act) + self.downconv = CBAModule(block_wide, block_wide, 3, 1, padding=1, act="relu") + self.downconv2 = CBAModule(block_wide, block_wide, 3, 1, padding=1, act=act) + + def forward(self, x: torch.Tensor): + """ + x: N,C,H,W tensor + return: N,C,H,W tensor + """ + x = self.inconv(x) + up = self.upconv(x) + down = self.downconv(x) + down = self.downconv2(down) + return torch.cat([up, down], dim=1) + + +class DetectModule(nn.Module): + def __init__(self, in_channels: int, act: str = "relu"): + super(DetectModule, self).__init__() + + self.upconv = CBAModule(in_channels, in_channels // 2, 3, 1, padding=1, act=act) + self.context = ContextModule(in_channels, act=act) + + def forward(self, x: torch.Tensor): + """ + x: N,C,H,W tensor + return: N,C,H,W tensor + """ + up = self.upconv(x) + down = self.context(x) + return torch.cat([up, down], dim=1) + + +class CropLayer(nn.Module): + """ + crop layer. crop the input tensor based onthe specified number of rows and columns + E.g., (-1, 0) means this layer should crop the first and last rows of the feature map. And (0, -1) crops the first and last columns + """ + + def __init__(self, crop_set: list): + super(CropLayer, self).__init__() + self.rows_to_crop = -crop_set[0] + self.cols_to_crop = -crop_set[1] + assert self.rows_to_crop >= 0 + assert self.cols_to_crop >= 0 + + def forward(self, input): + """ + x: N,C,H,W tensor + return: N,C,H,W tensor + """ + if self.rows_to_crop == 0 and self.cols_to_crop == 0: + return input + elif self.rows_to_crop > 0 and self.cols_to_crop == 0: + return input[:, :, self.rows_to_crop : -self.rows_to_crop, :] + elif self.rows_to_crop == 0 and self.cols_to_crop > 0: + return input[:, :, :, self.cols_to_crop : -self.cols_to_crop] + else: + return input[ + :, + :, + self.rows_to_crop : -self.rows_to_crop, + self.cols_to_crop : -self.cols_to_crop, + ] + + +class HeadModule(nn.Module): + """head module for specific task assignment""" + + def __init__( + self, + in_channels: int, + out_channels: int, + has_ext: bool = False, + act: str = "relu", + ): + super(HeadModule, self).__init__() + self.head = nn.Conv2d(in_channels, out_channels, kernel_size=1) + self.has_ext = has_ext + + if has_ext: + self.ext = CBAModule( + in_channels, + in_channels, + kernel_size=3, + padding=1, + bias=False, + act=act, + ) + + def init_normal(self, std: float, bias: float): + nn.init.normal_(self.head.weight, std=std) + nn.init.constant_(self.head.bias, bias) + + def forward(self, x: torch.Tensor): + """ + x: N,C,H,W tensor + return: N,C,H,W tensor + """ + if self.has_ext: + x = self.ext(x) + return self.head(x) diff --git a/qai_hub_models/models/foot_track_net/model.py b/qai_hub_models/models/foot_track_net/model.py new file mode 100644 index 00000000..b6de34aa --- /dev/null +++ b/qai_hub_models/models/foot_track_net/model.py @@ -0,0 +1,90 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- + +from __future__ import annotations + +from typing import List + +import torch +import torch.nn as nn + +from qai_hub_models.models.foot_track_net.foot_track_net import FootTrackNet +from qai_hub_models.utils.asset_loaders import CachedWebModelAsset # SourceAsRoot +from qai_hub_models.utils.base_model import BaseModel +from qai_hub_models.utils.input_spec import InputSpec + +MODEL_ID = __name__.split(".")[-2] + +DEFAULT_WEIGHTS = "SA-e30_finetune50.pth" +MODEL_ASSET_VERSION = 1 + + +class FootTrackNet_model(BaseModel): + """ + qualcomm multi-task human detector model. + Detect bounding box for person, face, + Detect landmarks: head, feet and also their visibility. + The output will be saved as 4 maps which will be decoded to final result in the FootTrackNet_App. + """ + + def __init__(self, model: nn.Module) -> None: + super().__init__() + self.model = model + + @classmethod + def from_pretrained(cls, checkpoint_path: str | None = None): + """Load FootTrackNet from a weightfile created by the source FootTrackNet repository.""" + + if not checkpoint_path: + checkpoint_path = CachedWebModelAsset.from_asset_store( + MODEL_ID, MODEL_ASSET_VERSION, DEFAULT_WEIGHTS + ).fetch() + foot_track_net_model = FootTrackNet() # original definition + foot_track_net_model.load_weights(checkpoint_path) + foot_track_net_model.to(torch.device("cpu")) + + return cls(foot_track_net_model) + + def forward(self, image: torch.Tensor): + """ + Run FootTrackNet on `image`, and produce a the list of BBox for face and body + + Parameters: + image: Pixel values pre-processed for encoder consumption. + Range: float[0, 1] + 3-channel Color Space: RGB + + Returns: + heatmap: N,C,H,W the heatmap for the person/face detection. + bbox: N,C*4, H,W the bounding box coordinate as a map. + landmark: N,C*34,H,W the coordinates of landmarks as a map. + landmark_visibility: N,C*17,H,W the visibility of the landmark as a map. + """ + return self.model(image) + + @staticmethod + def get_input_spec( + batch_size: int = 1, + height: int = 480, + width: int = 640, + ) -> InputSpec: + """ + Returns the input specification (name -> (shape, type). This can be + used to submit profiling job on Qualcomm AI Hub. Default resolution is 2048x1024 + so this expects an image where width is twice the height. + """ + return {"image": ((batch_size, 3, height, width), "float32")} + + @staticmethod + def get_output_names() -> List[str]: + return ["heatmap", "bbox", "landmark", "landmark_visibility"] + + @staticmethod + def get_channel_last_inputs() -> List[str]: + return ["image"] + + @staticmethod + def get_channel_last_outputs() -> List[str]: + return ["heatmap", "bbox", "landmark", "landmark_visibility"] diff --git a/qai_hub_models/models/foot_track_net/perf.yaml b/qai_hub_models/models/foot_track_net/perf.yaml new file mode 100644 index 00000000..68296b36 --- /dev/null +++ b/qai_hub_models/models/foot_track_net/perf.yaml @@ -0,0 +1,432 @@ +aggregated: + supported_oses: + - Android + supported_devices: + - Snapdragon 8 Elite QRD + - Samsung Galaxy S24 + - Samsung Galaxy S24 Ultra + - Samsung Galaxy S24+ + - Snapdragon 8 Gen 3 QRD + - Samsung Galaxy S23 + - Samsung Galaxy S23 Ultra + - Samsung Galaxy S23+ + - Samsung Galaxy S22 5G + - Samsung Galaxy S22 Ultra 5G + - Samsung Galaxy S22+ 5G + - Samsung Galaxy Tab S8 + - Xiaomi 12 + - Xiaomi 12 Pro + - Samsung Galaxy S21 + - Samsung Galaxy S21 Ultra + - Samsung Galaxy S21+ + - Snapdragon X Elite CRD + - Snapdragon X Plus 8-Core CRD + - QCS8450 (Proxy) + - XR2 Gen 2 (Proxy) + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) + supported_chipsets: + - Snapdragon® 8 Elite + - Snapdragon® 8 Gen 3 + - Snapdragon® 8 Gen 2 + - Snapdragon® 8 Gen 1 + - Snapdragon® 888 + - Snapdragon® X Elite + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy +models: +- name: Person-Foot-Detection + performance_metrics: + - torchscript_onnx_tflite: + inference_time: 3484.0 + throughput: 287.0264064293915 + estimated_peak_memory_range: + min: 16384 + max: 25709488 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 134 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 134 + job_id: jp2kyw6qp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 3592.0 + throughput: 278.39643652561244 + estimated_peak_memory_range: + min: 4210688 + max: 12355864 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 196 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 196 + job_id: jgo26rqkp + job_status: Passed + torchscript_onnx: + inference_time: 5293.0 + throughput: 188.9287738522577 + estimated_peak_memory_range: + min: 15429632 + max: 19234088 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 201 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 201 + job_id: jp4lr1015 + job_status: Passed + reference_device_info: + name: Samsung Galaxy S23 + os: '13' + form_factor: Phone + os_name: Android + manufacturer: Samsung + chipset: Snapdragon® 8 Gen 2 + timestamp: '2024-10-15T00:35:54Z' + - torchscript_onnx_tflite: + inference_time: 2884.0 + throughput: 346.74063800277395 + estimated_peak_memory_range: + min: 12288 + max: 58082624 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 134 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 134 + job_id: jpy13xwlp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 3046.0 + throughput: 328.29940906106367 + estimated_peak_memory_range: + min: 3702784 + max: 21906704 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 196 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 196 + job_id: jpv6kdxr5 + job_status: Passed + torchscript_onnx: + inference_time: 4623.0 + throughput: 216.3097555699762 + estimated_peak_memory_range: + min: 0 + max: 68916192 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 201 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 201 + job_id: jpxko42l5 + job_status: Passed + reference_device_info: + name: Samsung Galaxy S24 + os: '14' + form_factor: Phone + os_name: Android + manufacturer: Samsung + chipset: Snapdragon® 8 Gen 3 + timestamp: '2024-10-15T00:35:56Z' + - torchscript_onnx_tflite: + inference_time: 3339.0 + throughput: 299.4908655286014 + estimated_peak_memory_range: + min: 12288 + max: 1968528 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 134 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 134 + job_id: jp0z0jqn5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 3293.0 + throughput: 303.67446097783176 + estimated_peak_memory_range: + min: 2080768 + max: 3247880 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 196 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 196 + job_id: jpedmz3v5 + job_status: Passed + reference_device_info: + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:35:47Z' + - torchscript_onnx_tflite: + inference_time: 3382.0 + throughput: 295.68302779420463 + estimated_peak_memory_range: + min: 36864 + max: 114761312 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 134 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 134 + job_id: jglvmxrm5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 3365.0 + throughput: 297.1768202080238 + estimated_peak_memory_range: + min: 2957312 + max: 4465480 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 196 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 196 + job_id: jg9lnme8g + job_status: Passed + reference_device_info: + name: SA8255 (Proxy) + os: '13' + form_factor: Auto + os_name: Android + manufacturer: Qualcomm + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:35:50Z' + - torchscript_onnx_tflite: + inference_time: 3443.0 + throughput: 290.4443799012489 + estimated_peak_memory_range: + min: 28672 + max: 43066584 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 134 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 134 + job_id: j5q6qykop + job_status: Passed + torchscript_onnx_qnn: + inference_time: 3387.0 + throughput: 295.24653085326247 + estimated_peak_memory_range: + min: 2310144 + max: 3595752 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 196 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 196 + job_id: j5we67nm5 + job_status: Passed + reference_device_info: + name: SA8775 (Proxy) + os: '13' + form_factor: Auto + os_name: Android + manufacturer: Qualcomm + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:35:49Z' + - torchscript_onnx_tflite: + inference_time: 3504.0 + throughput: 285.38812785388126 + estimated_peak_memory_range: + min: 12288 + max: 4233992 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 134 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 134 + job_id: jgkex4nng + job_status: Passed + torchscript_onnx_qnn: + inference_time: 3384.0 + throughput: 295.5082742316785 + estimated_peak_memory_range: + min: 3698688 + max: 5094096 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 196 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 196 + job_id: jgz3dmkx5 + job_status: Passed + reference_device_info: + name: SA8650 (Proxy) + os: '13' + form_factor: Auto + os_name: Android + manufacturer: Qualcomm + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:35:48Z' + - torchscript_onnx_tflite: + inference_time: 5660.0 + throughput: 176.67844522968198 + estimated_peak_memory_range: + min: 5107712 + max: 60626976 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 134 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 134 + job_id: jp8qyx9op + job_status: Passed + torchscript_onnx_qnn: + inference_time: 5772.0 + throughput: 173.25017325017325 + estimated_peak_memory_range: + min: 3702784 + max: 25944608 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 196 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 196 + job_id: jgdx13lzp + job_status: Passed + reference_device_info: + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:35:52Z' + - torchscript_onnx_tflite: + inference_time: 2376.0 + throughput: 420.8754208754209 + estimated_peak_memory_range: + min: 8192 + max: 29916112 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 134 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 134 + job_id: jp3j092ng + job_status: Passed + torchscript_onnx_qnn: + inference_time: 2491.0 + throughput: 401.4452027298274 + estimated_peak_memory_range: + min: 0 + max: 17896576 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 196 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 196 + job_id: j57yr4395 + job_status: Passed + torchscript_onnx: + inference_time: 3680.0 + throughput: 271.7391304347826 + estimated_peak_memory_range: + min: 18051072 + max: 53326816 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 201 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 201 + job_id: jprv30j7g + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:35:58Z' + - torchscript_onnx_qnn: + inference_time: 3669.0 + throughput: 272.5538293813028 + estimated_peak_memory_range: + min: 3690496 + max: 3690496 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 196 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 196 + job_id: jgjvn74eg + job_status: Passed + torchscript_onnx: + inference_time: 5762.0 + throughput: 173.55085039916696 + estimated_peak_memory_range: + min: 17518592 + max: 17518592 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 201 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 201 + job_id: j5mnxmy9p + job_status: Passed + reference_device_info: + name: Snapdragon X Elite CRD + os: '11' + form_factor: Compute + os_name: Windows + manufacturer: Qualcomm + chipset: Snapdragon® X Elite + timestamp: '2024-10-15T00:35:57Z' diff --git a/qai_hub_models/models/foot_track_net/test.py b/qai_hub_models/models/foot_track_net/test.py new file mode 100644 index 00000000..d2fdc8e9 --- /dev/null +++ b/qai_hub_models/models/foot_track_net/test.py @@ -0,0 +1,267 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +import pickle as pkl + +import numpy as np +import pytest + +from qai_hub_models.models.foot_track_net.app import FootTrackNet_App +from qai_hub_models.models.foot_track_net.demo import INPUT_IMAGE_ADDRESS +from qai_hub_models.models.foot_track_net.demo import main as demo_main +from qai_hub_models.models.foot_track_net.model import ( + MODEL_ASSET_VERSION, + MODEL_ID, + FootTrackNet_model, +) +from qai_hub_models.utils.asset_loaders import ( + CachedWebModelAsset, + load_image, + load_path, +) +from qai_hub_models.utils.testing import assert_most_close, skip_clone_repo_check + +OUTPUT_RST_ADDRESS = CachedWebModelAsset.from_asset_store( + MODEL_ID, MODEL_ASSET_VERSION, "oracle_rst1.pkl" +) + + +# Verify that the output from Torch is as expected. bbox, landmark, visibility +@skip_clone_repo_check +def test_task(): + app = FootTrackNet_App(FootTrackNet_model.from_pretrained()) + original_image = load_image(INPUT_IMAGE_ADDRESS) + objs_face, objs_person = app.det_image(original_image) + + pth = load_path(OUTPUT_RST_ADDRESS, "tmp") + print("pth is", pth) + with open(pth, "rb") as handle: + objs_face_oracle, objs_person_oracle = pkl.load(handle) + + # extrac the oracle result + faces_bbox_ora = np.array( + [ + objs_face_oracle[0].box, + objs_face_oracle[1].box, + ] + ) + + persons_bbox_ora = np.array( + [ + objs_person_oracle[0].box, + objs_person_oracle[1].box, + ] + ) + + persons_landmark_ora = np.array( + [ + [ + objs_person_oracle[0].landmark[0], + objs_person_oracle[0].landmark[15], + objs_person_oracle[0].landmark[16], + ], + [ + objs_person_oracle[1].landmark[0], + objs_person_oracle[1].landmark[15], + objs_person_oracle[1].landmark[16], + ], + ] + ) + + persons_visibility_ora = np.array( + [ + [objs_person_oracle[0].vis[15], objs_person_oracle[0].vis[16]], + [objs_person_oracle[1].vis[15], objs_person_oracle[1].vis[16]], + ] + ) + + # extract the key detection result + faces_bbox = np.array( + [ + objs_face[0].box, + objs_face[1].box, + ] + ) + + persons_bbox = np.array( + [ + objs_person[0].box, + objs_person[1].box, + ] + ) + + persons_landmark = np.array( + [ + [ + objs_person[0].landmark[0], + objs_person[0].landmark[15], + objs_person[0].landmark[16], + ], + [ + objs_person[1].landmark[0], + objs_person[1].landmark[15], + objs_person[1].landmark[16], + ], + ] + ) + + persons_visibility = np.array( + [ + [objs_person[0].vis[15], objs_person[0].vis[16]], + [objs_person[1].vis[15], objs_person[1].vis[16]], + ] + ) + + # assert, face_bbox, person_bbox, person_landmark and person_landmark_visibility + assert_most_close( + np.asarray(faces_bbox_ora), + np.asarray(faces_bbox), + diff_tol=0.01, + atol=0.001, + rtol=0.001, + ) + assert_most_close( + np.asarray(persons_bbox_ora), + np.asarray(persons_bbox), + diff_tol=0.01, + atol=0.001, + rtol=0.001, + ) + assert_most_close( + np.asarray(persons_landmark_ora), + np.asarray(persons_landmark), + diff_tol=0.01, + atol=0.001, + rtol=0.001, + ) + assert_most_close( + np.asarray(persons_visibility_ora), + np.asarray(persons_visibility), + diff_tol=0.01, + atol=0.001, + rtol=0.001, + ) + + +@pytest.mark.trace +@skip_clone_repo_check +def test_trace(): + app = FootTrackNet_App( + FootTrackNet_model.from_pretrained().convert_to_torchscript() + ) + original_image = load_image(INPUT_IMAGE_ADDRESS) + objs_face, objs_person = app.det_image(original_image) + + pth = load_path(OUTPUT_RST_ADDRESS, "tmp") + with open(pth, "rb") as handle: + objs_face_oracle, objs_person_oracle = pkl.load(handle) + + # extrac the oracle result + faces_bbox_ora = np.array( + [ + objs_face_oracle[0].box, + objs_face_oracle[1].box, + ] + ) + + persons_bbox_ora = np.array( + [ + objs_person_oracle[0].box, + objs_person_oracle[1].box, + ] + ) + + persons_landmark_ora = np.array( + [ + [ + objs_person_oracle[0].landmark[0], + objs_person_oracle[0].landmark[15], + objs_person_oracle[0].landmark[16], + ], + [ + objs_person_oracle[1].landmark[0], + objs_person_oracle[1].landmark[15], + objs_person_oracle[1].landmark[16], + ], + ] + ) + + persons_visibility_ora = np.array( + [ + [objs_person_oracle[0].vis[15], objs_person_oracle[0].vis[16]], + [objs_person_oracle[1].vis[15], objs_person_oracle[1].vis[16]], + ] + ) + + # extract the key detection result + faces_bbox = np.array( + [ + objs_face[0].box, + objs_face[1].box, + ] + ) + + persons_bbox = np.array( + [ + objs_person[0].box, + objs_person[1].box, + ] + ) + + persons_landmark = np.array( + [ + [ + objs_person[0].landmark[0], + objs_person[0].landmark[15], + objs_person[0].landmark[16], + ], + [ + objs_person[1].landmark[0], + objs_person[1].landmark[15], + objs_person[1].landmark[16], + ], + ] + ) + + persons_visibility = np.array( + [ + [objs_person[0].vis[15], objs_person[0].vis[16]], + [objs_person[1].vis[15], objs_person[1].vis[16]], + ] + ) + + # assert, face_bbox, person_bbox, person_landmark and person_landmark_visibility + assert_most_close( + np.asarray(faces_bbox_ora), + np.asarray(faces_bbox), + diff_tol=0.01, + atol=0.001, + rtol=0.001, + ) + assert_most_close( + np.asarray(persons_bbox_ora), + np.asarray(persons_bbox), + diff_tol=0.01, + atol=0.001, + rtol=0.001, + ) + assert_most_close( + np.asarray(persons_landmark_ora), + np.asarray(persons_landmark), + diff_tol=0.01, + atol=0.001, + rtol=0.001, + ) + assert_most_close( + np.asarray(persons_visibility_ora), + np.asarray(persons_visibility), + diff_tol=0.01, + atol=0.001, + rtol=0.001, + ) + + +@skip_clone_repo_check +def test_demo(): + demo_main(is_test=True) diff --git a/qai_hub_models/models/gear_guard_net/README.md b/qai_hub_models/models/gear_guard_net/README.md new file mode 100644 index 00000000..cd7ecdda --- /dev/null +++ b/qai_hub_models/models/gear_guard_net/README.md @@ -0,0 +1,59 @@ +[![Qualcomm® AI Hub Models](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/quic-logo.jpg)](../../README.md) + + +# [PPE-Detection: Object detection for personal protective equipments (PPE)](https://aihub.qualcomm.com/models/gear_guard_net) + +Detect if a person is wearing personal protective equipments (PPE) in real-time. + +This is based on the implementation of PPE-Detection found +[here]({source_repo}). This repository contains scripts for optimized on-device +export suitable to run on Qualcomm® devices. More details on model performance +accross various devices, can be found [here](https://aihub.qualcomm.com/models/gear_guard_net). + +[Sign up](https://myaccount.qualcomm.com/signup) to start using Qualcomm AI Hub and run these models on a hosted Qualcomm® device. + + + + +## Example & Usage + + +Once installed, run the following simple CLI demo: + +```bash +python -m qai_hub_models.models.gear_guard_net.demo +``` +More details on the CLI tool can be found with the `--help` option. See +[demo.py](demo.py) for sample usage of the model including pre/post processing +scripts. Please refer to our [general instructions on using +models](../../../#getting-started) for more usage instructions. + +## Export for on-device deployment + +This repository contains export scripts that produce a model optimized for +on-device deployment. This can be run as follows: + +```bash +python -m qai_hub_models.models.gear_guard_net.export +``` +Additional options are documented with the `--help` option. Note that the above +script requires access to Deployment instructions for Qualcomm® AI Hub. + + +## License +* The license for the original implementation of PPE-Detection can be found + [here](https://github.com/qcom-ai-hub/ai-hub-models-internal/blob/main/LICENSE). +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + + +## References +* [None](None) +* [Source Model Implementation](https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/gear_guard_net/model.py) + + + +## Community +* Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. +* For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). + + diff --git a/qai_hub_models/models/gear_guard_net/__init__.py b/qai_hub_models/models/gear_guard_net/__init__.py new file mode 100644 index 00000000..b4b59da2 --- /dev/null +++ b/qai_hub_models/models/gear_guard_net/__init__.py @@ -0,0 +1,10 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +from qai_hub_models.models._shared.body_detection.app import ( # noqa: F401 + BodyDetectionApp as App, +) + +from .model import MODEL_ID # noqa: F401 +from .model import GearGuardNet as Model # noqa: F401 diff --git a/qai_hub_models/models/gear_guard_net/conftest.py b/qai_hub_models/models/gear_guard_net/conftest.py new file mode 100644 index 00000000..62e6b22e --- /dev/null +++ b/qai_hub_models/models/gear_guard_net/conftest.py @@ -0,0 +1,39 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +# THIS FILE WAS AUTO-GENERATED. DO NOT EDIT MANUALLY. + +import inspect + +import pytest + +from qai_hub_models.models.gear_guard_net import Model +from qai_hub_models.utils.testing import skip_clone_repo_check + + +# Instantiate the model only once for all tests. +# Mock from_pretrained to always return the initialized model. +# This speeds up tests and limits memory leaks. +@pytest.fixture(scope="module", autouse=True) +def cached_from_pretrained(): + with pytest.MonkeyPatch.context() as mp: + pretrained_cache = {} + from_pretrained = Model.from_pretrained + sig = inspect.signature(from_pretrained) + + @skip_clone_repo_check + def _cached_from_pretrained(*args, **kwargs): + cache_key = str(args) + str(kwargs) + model = pretrained_cache.get(cache_key, None) + if model: + return model + else: + model = from_pretrained(*args, **kwargs) + pretrained_cache[cache_key] = model + return model + + _cached_from_pretrained.__signature__ = sig + + mp.setattr(Model, "from_pretrained", _cached_from_pretrained) + yield mp diff --git a/qai_hub_models/models/gear_guard_net/demo.py b/qai_hub_models/models/gear_guard_net/demo.py new file mode 100644 index 00000000..7dc1f4a6 --- /dev/null +++ b/qai_hub_models/models/gear_guard_net/demo.py @@ -0,0 +1,33 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +from qai_hub_models.models._shared.body_detection.app import BodyDetectionApp +from qai_hub_models.models._shared.body_detection.demo import BodyDetectionDemo +from qai_hub_models.models.gear_guard_net.model import ( + MODEL_ASSET_VERSION, + MODEL_ID, + GearGuardNet, +) +from qai_hub_models.utils.asset_loaders import CachedWebModelAsset + +INPUT_IMAGE_ADDRESS = CachedWebModelAsset.from_asset_store( + MODEL_ID, MODEL_ASSET_VERSION, "test_image.jpg" +) + + +def main(is_test: bool = False): + BodyDetectionDemo( + is_test, + GearGuardNet, + MODEL_ID, + BodyDetectionApp, + INPUT_IMAGE_ADDRESS, + 320, + 192, + 0.9, + ) + + +if __name__ == "__main__": + main() diff --git a/qai_hub_models/models/gear_guard_net/export.py b/qai_hub_models/models/gear_guard_net/export.py new file mode 100644 index 00000000..78308196 --- /dev/null +++ b/qai_hub_models/models/gear_guard_net/export.py @@ -0,0 +1,213 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +# THIS FILE WAS AUTO-GENERATED. DO NOT EDIT MANUALLY. + + +from __future__ import annotations + +import os +import warnings +from pathlib import Path +from typing import Any, Dict, List, Optional, cast + +import qai_hub as hub +import torch + +from qai_hub_models.models.common import ExportResult, TargetRuntime +from qai_hub_models.models.gear_guard_net import Model +from qai_hub_models.utils.args import ( + export_parser, + get_input_spec_kwargs, + get_model_kwargs, +) +from qai_hub_models.utils.compare import torch_inference +from qai_hub_models.utils.input_spec import make_torch_inputs +from qai_hub_models.utils.printing import ( + print_inference_metrics, + print_on_target_demo_cmd, + print_profile_metrics_from_job, +) +from qai_hub_models.utils.qai_hub_helpers import ( + can_access_qualcomm_ai_hub, + export_without_hub_access, +) + + +def export_model( + device: str = "Samsung Galaxy S23 (Family)", + chipset: Optional[str] = None, + skip_profiling: bool = False, + skip_inferencing: bool = False, + skip_downloading: bool = False, + skip_summary: bool = False, + output_dir: Optional[str] = None, + target_runtime: TargetRuntime = TargetRuntime.TFLITE, + compile_options: str = "", + profile_options: str = "", + **additional_model_kwargs, +) -> ExportResult | List[str]: + """ + This function executes the following recipe: + + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference + + Each of the last 4 steps can be optionally skipped using the input options. + + Parameters: + device: Device for which to export the model. + Full list of available devices can be found by running `hub.get_devices()`. + Defaults to DEFAULT_DEVICE if not specified. + chipset: If set, will choose a random device with this chipset. + Overrides the `device` argument. + skip_profiling: If set, skips profiling of compiled model on real devices. + skip_inferencing: If set, skips computing on-device outputs from sample data. + skip_downloading: If set, skips downloading of compiled model. + skip_summary: If set, skips waiting for and summarizing results + from profiling and inference. + output_dir: Directory to store generated assets (e.g. compiled model). + Defaults to `/build/`. + target_runtime: Which on-device runtime to target. Default is TFLite. + compile_options: Additional options to pass when submitting the compile job. + profile_options: Additional options to pass when submitting the profile job. + **additional_model_kwargs: Additional optional kwargs used to customize + `model_cls.from_pretrained` and `model.get_input_spec` + + Returns: + A struct of: + * A CompileJob object containing metadata about the compile job submitted to hub. + * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). + """ + model_name = "gear_guard_net" + output_path = Path(output_dir or Path.cwd() / "build" / model_name) + if chipset: + hub_device = hub.Device(attributes=f"chipset:{chipset}") + else: + hub_device = hub.Device(name=device) + if not can_access_qualcomm_ai_hub(): + return export_without_hub_access( + "gear_guard_net", + "PPE-Detection", + device, + skip_profiling, + skip_inferencing, + skip_downloading, + skip_summary, + output_path, + target_runtime, + compile_options, + profile_options, + ) + + # On-device perf improves with I/O in channel_last format except when using ONNX. + use_channel_last_format = target_runtime != TargetRuntime.ONNX + + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) + input_spec = model.get_input_spec( + **get_input_spec_kwargs(model, additional_model_kwargs) + ) + + # Trace the model + source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) + + # 2. Compiles the model to an asset that can be run on device + model_compile_options = model.get_hub_compile_options( + target_runtime, compile_options, hub_device + ) + print(f"Optimizing model {model_name} to run on-device") + submitted_compile_job = hub.submit_compile_job( + model=source_model, + input_specs=input_spec, + device=hub_device, + name=model_name, + options=model_compile_options, + ) + compile_job = cast(hub.client.CompileJob, submitted_compile_job) + + # 3. Profiles the model performance on a real device + profile_job: Optional[hub.client.ProfileJob] = None + if not skip_profiling: + profile_options_all = model.get_hub_profile_options( + target_runtime, profile_options + ) + print(f"Profiling model {model_name} on a hosted device.") + submitted_profile_job = hub.submit_profile_job( + model=compile_job.get_target_model(), + device=hub_device, + name=model_name, + options=profile_options_all, + ) + profile_job = cast(hub.client.ProfileJob, submitted_profile_job) + + # 4. Inferences the model on sample inputs + inference_job: Optional[hub.client.InferenceJob] = None + if not skip_inferencing: + profile_options_all = model.get_hub_profile_options( + target_runtime, profile_options + ) + print( + f"Running inference for {model_name} on a hosted device with example inputs." + ) + sample_inputs = model.sample_inputs( + input_spec, use_channel_last_format=use_channel_last_format + ) + submitted_inference_job = hub.submit_inference_job( + model=compile_job.get_target_model(), + inputs=sample_inputs, + device=hub_device, + name=model_name, + options=profile_options_all, + ) + inference_job = cast(hub.client.InferenceJob, submitted_inference_job) + + # 5. Downloads the model asset to the local directory + if not skip_downloading: + os.makedirs(output_path, exist_ok=True) + target_model: hub.Model = compile_job.get_target_model() # type: ignore + target_model.download(str(output_path / model_name)) + + # 6. Summarizes the results from profiling and inference + if not skip_summary and not skip_profiling: + assert profile_job is not None and profile_job.wait().success + profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore + print_profile_metrics_from_job(profile_job, profile_data) + + if not skip_summary and not skip_inferencing: + sample_inputs = model.sample_inputs(use_channel_last_format=False) + torch_out = torch_inference( + model, sample_inputs, return_channel_last_output=use_channel_last_format + ) + assert inference_job is not None and inference_job.wait().success + inference_result: hub.client.DatasetEntries = inference_job.download_output_data() # type: ignore + + print_inference_metrics( + inference_job, inference_result, torch_out, model.get_output_names() + ) + + if not skip_summary: + print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) + + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) + + +def main(): + warnings.filterwarnings("ignore") + parser = export_parser(model_cls=Model) + args = parser.parse_args() + export_model(**vars(args)) + + +if __name__ == "__main__": + main() diff --git a/qai_hub_models/models/gear_guard_net/info.yaml b/qai_hub_models/models/gear_guard_net/info.yaml new file mode 100644 index 00000000..ee874a07 --- /dev/null +++ b/qai_hub_models/models/gear_guard_net/info.yaml @@ -0,0 +1,32 @@ +name: PPE-Detection +# id must match with the model dir name in qai_hub_models +id: gear_guard_net +status: public +headline: Object detection for personal protective equipments (PPE). +domain: Computer Vision +description: Detect if a person is wearing personal protective equipments (PPE) in real-time. +use_case: Object detection +tags: + - real-time +license: https://github.com/qcom-ai-hub/ai-hub-models-internal/blob/main/LICENSE +deploy_license: https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf +source_repo: https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/gear_guard_net/model.py +technical_details: + Inference latency: RealTime + Input resolution: 320x192 + Number of parameters: 7.02M + Model size: 13.5 MB + Number of output classes: 2 +applicable_scenarios: + - IoT +related_models: + - face_body_net +form_factors: + - Phone + - Tablet + - IoT +has_static_banner: true +has_animated_banner: true +license_type: bsd-3-clause +deploy_license_type: AI Model Hub License +dataset: [] diff --git a/qai_hub_models/models/gear_guard_net/model.py b/qai_hub_models/models/gear_guard_net/model.py new file mode 100644 index 00000000..12cd9617 --- /dev/null +++ b/qai_hub_models/models/gear_guard_net/model.py @@ -0,0 +1,124 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +from typing import List + +import torch +import torch.nn as nn + +from qai_hub_models.models._shared.body_detection.model import Model +from qai_hub_models.utils.asset_loaders import CachedWebModelAsset, load_torch +from qai_hub_models.utils.base_model import BaseModel +from qai_hub_models.utils.input_spec import InputSpec + +MODEL_ID = __name__.split(".")[-2] +MODEL_ASSET_VERSION = 1 +DEFAULT_WEIGHTS = CachedWebModelAsset.from_asset_store( + MODEL_ID, MODEL_ASSET_VERSION, "weights_v1.1.pt" +) + + +class GearGuardNet(BaseModel): + """GearGuardNet model""" + + def __init__(self, model: nn.Module) -> None: + """ + Initialize GearGuardNet + + Inputs: + model: nn.Module + GearGuardNet model. + """ + super().__init__() + self.model = model + + @classmethod + def from_pretrained(cls, checkpoint_path: str = None) -> nn.Module: + """ + Load model from pretrained weights. + + Inputs: + checkpoint_path: str + Checkpoint path of pretrained weights. + Output: nn.Module + Detection model. + """ + cfg = { + "nc": 2, + "depth_multiple": 0.33, + "width_multiple": 0.5, + "anchors": [ + [10, 13, 16, 30, 33, 23], + [30, 61, 62, 45, 59, 119], + [116, 90, 156, 198, 373, 326], + ], + "backbone": [ + [-1, 1, "FusedConvBatchNorm", [64, 6, 2, 2]], + [-1, 1, "FusedConvBatchNorm", [128, 3, 2]], + [-1, 3, "DoubleBlazeBlock", [128]], + [-1, 1, "FusedConvBatchNorm", [256, 3, 2]], + [-1, 3, "DoubleBlazeBlock", [256]], + [-1, 1, "FusedConvBatchNorm", [512, 3, 2]], + [-1, 9, "DoubleBlazeBlock", [512]], + [-1, 1, "FusedConvBatchNorm", [1024, 3, 2]], + [-1, 3, "DoubleBlazeBlock", [1024]], + [-1, 1, "FusedConvBatchNorm", [1024, 3, 1]], + ], + "head": [ + [-1, 1, "FusedConvBatchNorm", [512, 1, 1]], + [-1, 1, "nn.Upsample", [None, 2, "nearest"]], + [[-1, 6], 1, "Concat", [1]], + [-1, 3, "DoubleBlazeBlock", [512]], + [-1, 1, "FusedConvBatchNorm", [256, 1, 1]], + [-1, 1, "nn.Upsample", [None, 2, "nearest"]], + [[-1, 4], 1, "Concat", [1]], + [-1, 3, "DoubleBlazeBlock", [256]], + [-1, 1, "FusedConvBatchNorm", [256, 3, 2]], + [[-1, 14], 1, "Concat", [1]], + [-1, 3, "DoubleBlazeBlock", [512]], + [-1, 1, "FusedConvBatchNorm", [512, 3, 2]], + [[-1, 10], 1, "Concat", [1]], + [-1, 3, "DoubleBlazeBlock", [1024]], + [[17, 20, 23], 1, "Detect", ["nc", "anchors"]], + ], + } + model = Model(cfg) + if checkpoint_path is None: + checkpoint_path = DEFAULT_WEIGHTS + ckpt = load_torch(checkpoint_path) + model.load_state_dict(ckpt) + model.eval() + return cls(model) + + def forward(self, image: torch.Tensor) -> List[torch.Tensor]: + """ + Forward computation of GearGuardNet. + + Inputs: + image: torch.Tensor + Input image. + Outputs: List[torch.Tensor] + Multi-scale detection result. + """ + return self.model(image) + + @staticmethod + def get_input_spec( + batch_size: int = 1, + height: int = 320, + width: int = 192, + ) -> InputSpec: + """ + Returns the input specification (name -> (shape, type). This can be + used to submit profiling job on Qualcomm AI Hub. + """ + return {"image": ((batch_size, 3, height, width), "float32")} + + @staticmethod + def get_output_names() -> List[str]: + return ["bbox_8x", "bbox_16x", "bbox_32x"] + + @staticmethod + def get_channel_last_inputs() -> List[str]: + return ["image"] diff --git a/qai_hub_models/models/gear_guard_net/perf.yaml b/qai_hub_models/models/gear_guard_net/perf.yaml new file mode 100644 index 00000000..e01b7b5b --- /dev/null +++ b/qai_hub_models/models/gear_guard_net/perf.yaml @@ -0,0 +1,432 @@ +aggregated: + supported_oses: + - Android + supported_devices: + - Snapdragon 8 Elite QRD + - Samsung Galaxy S24 + - Samsung Galaxy S24 Ultra + - Samsung Galaxy S24+ + - Snapdragon 8 Gen 3 QRD + - Samsung Galaxy S23 + - Samsung Galaxy S23 Ultra + - Samsung Galaxy S23+ + - Samsung Galaxy S22 5G + - Samsung Galaxy S22 Ultra 5G + - Samsung Galaxy S22+ 5G + - Samsung Galaxy Tab S8 + - Xiaomi 12 + - Xiaomi 12 Pro + - Samsung Galaxy S21 + - Samsung Galaxy S21 Ultra + - Samsung Galaxy S21+ + - Snapdragon X Elite CRD + - Snapdragon X Plus 8-Core CRD + - QCS8450 (Proxy) + - XR2 Gen 2 (Proxy) + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) + supported_chipsets: + - Snapdragon® 8 Elite + - Snapdragon® 8 Gen 3 + - Snapdragon® 8 Gen 2 + - Snapdragon® 8 Gen 1 + - Snapdragon® 888 + - Snapdragon® X Elite + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy +models: +- name: PPE-Detection + performance_metrics: + - torchscript_onnx_tflite: + inference_time: 670.0 + throughput: 1492.5373134328358 + estimated_peak_memory_range: + min: 28672 + max: 243162120 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 80 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 80 + job_id: jp2kyw8rp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 725.0 + throughput: 1379.3103448275863 + estimated_peak_memory_range: + min: 12288 + max: 51797976 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 106 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 106 + job_id: jgo26ryqp + job_status: Passed + torchscript_onnx: + inference_time: 1105.0 + throughput: 904.9773755656108 + estimated_peak_memory_range: + min: 12288 + max: 15443864 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 107 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 107 + job_id: jg9lnm18g + job_status: Passed + reference_device_info: + name: Samsung Galaxy S23 + os: '13' + form_factor: Phone + os_name: Android + manufacturer: Samsung + chipset: Snapdragon® 8 Gen 2 + timestamp: '2024-10-15T00:33:20Z' + - torchscript_onnx_tflite: + inference_time: 561.0 + throughput: 1782.5311942959001 + estimated_peak_memory_range: + min: 12288 + max: 43883088 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 80 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 80 + job_id: jpy13xe8p + job_status: Passed + torchscript_onnx_qnn: + inference_time: 613.0 + throughput: 1631.3213703099511 + estimated_peak_memory_range: + min: 757760 + max: 17179328 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 106 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 106 + job_id: jpv6kdok5 + job_status: Passed + torchscript_onnx: + inference_time: 943.0 + throughput: 1060.4453870625662 + estimated_peak_memory_range: + min: 0 + max: 48287296 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 107 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 107 + job_id: jp14zjl7p + job_status: Passed + reference_device_info: + name: Samsung Galaxy S24 + os: '14' + form_factor: Phone + os_name: Android + manufacturer: Samsung + chipset: Snapdragon® 8 Gen 3 + timestamp: '2024-10-15T00:33:21Z' + - torchscript_onnx_tflite: + inference_time: 666.0 + throughput: 1501.5015015015015 + estimated_peak_memory_range: + min: 12288 + max: 4694624 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 80 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 80 + job_id: jp0z0jy95 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 722.0 + throughput: 1385.0415512465374 + estimated_peak_memory_range: + min: 765952 + max: 2413360 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 106 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 106 + job_id: jpedmz1o5 + job_status: Passed + reference_device_info: + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:33:12Z' + - torchscript_onnx_tflite: + inference_time: 671.0 + throughput: 1490.312965722802 + estimated_peak_memory_range: + min: 135168 + max: 253906056 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 80 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 80 + job_id: jglvmxnj5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 728.0 + throughput: 1373.6263736263736 + estimated_peak_memory_range: + min: 782336 + max: 2112144 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 106 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 106 + job_id: jg9lnm1wg + job_status: Passed + reference_device_info: + name: SA8255 (Proxy) + os: '13' + form_factor: Auto + os_name: Android + manufacturer: Qualcomm + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:33:15Z' + - torchscript_onnx_tflite: + inference_time: 669.0 + throughput: 1494.7683109118086 + estimated_peak_memory_range: + min: 28672 + max: 181989176 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 80 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 80 + job_id: j5q6qy8np + job_status: Passed + torchscript_onnx_qnn: + inference_time: 731.0 + throughput: 1367.9890560875513 + estimated_peak_memory_range: + min: 770048 + max: 1966424 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 106 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 106 + job_id: j5we67v35 + job_status: Passed + reference_device_info: + name: SA8775 (Proxy) + os: '13' + form_factor: Auto + os_name: Android + manufacturer: Qualcomm + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:33:14Z' + - torchscript_onnx_tflite: + inference_time: 666.0 + throughput: 1501.5015015015015 + estimated_peak_memory_range: + min: 16384 + max: 7082824 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 80 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 80 + job_id: jgkex4zwg + job_status: Passed + torchscript_onnx_qnn: + inference_time: 730.0 + throughput: 1369.86301369863 + estimated_peak_memory_range: + min: 770048 + max: 2102016 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 106 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 106 + job_id: jgz3dm9o5 + job_status: Passed + reference_device_info: + name: SA8650 (Proxy) + os: '13' + form_factor: Auto + os_name: Android + manufacturer: Qualcomm + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:33:13Z' + - torchscript_onnx_tflite: + inference_time: 1428.0 + throughput: 700.2801120448179 + estimated_peak_memory_range: + min: 16384 + max: 40736464 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 80 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 80 + job_id: jp8qyxokp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 1492.0 + throughput: 670.2412868632708 + estimated_peak_memory_range: + min: 753664 + max: 17038528 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 106 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 106 + job_id: jgdx139rp + job_status: Passed + reference_device_info: + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:33:18Z' + - torchscript_onnx_tflite: + inference_time: 480.0 + throughput: 2083.3333333333335 + estimated_peak_memory_range: + min: 8192 + max: 23210944 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 80 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 80 + job_id: jp3j09k3g + job_status: Passed + torchscript_onnx_qnn: + inference_time: 440.0 + throughput: 2272.7272727272725 + estimated_peak_memory_range: + min: 0 + max: 14730448 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 106 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 106 + job_id: j5we67vm5 + job_status: Passed + torchscript_onnx: + inference_time: 713.0 + throughput: 1402.5245441795232 + estimated_peak_memory_range: + min: 0 + max: 24912112 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 107 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 107 + job_id: jp4lr1o15 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:33:24Z' + - torchscript_onnx_qnn: + inference_time: 853.0 + throughput: 1172.3329425556858 + estimated_peak_memory_range: + min: 737280 + max: 737280 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 106 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 106 + job_id: jgjvn7mvg + job_status: Passed + torchscript_onnx: + inference_time: 1173.0 + throughput: 852.5149190110827 + estimated_peak_memory_range: + min: 13352960 + max: 13352960 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 107 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 107 + job_id: jgdx139zp + job_status: Passed + reference_device_info: + name: Snapdragon X Elite CRD + os: '11' + form_factor: Compute + os_name: Windows + manufacturer: Qualcomm + chipset: Snapdragon® X Elite + timestamp: '2024-10-15T00:33:22Z' diff --git a/qai_hub_models/models/gear_guard_net/test.py b/qai_hub_models/models/gear_guard_net/test.py new file mode 100644 index 00000000..534809d0 --- /dev/null +++ b/qai_hub_models/models/gear_guard_net/test.py @@ -0,0 +1,48 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +import numpy as np +import pytest + +from qai_hub_models.models._shared.body_detection.app import BodyDetectionApp +from qai_hub_models.models.gear_guard_net.demo import main as demo_main +from qai_hub_models.models.gear_guard_net.model import ( + MODEL_ASSET_VERSION, + MODEL_ID, + GearGuardNet, +) +from qai_hub_models.utils.asset_loaders import CachedWebModelAsset, load_raw_file +from qai_hub_models.utils.bounding_box_processing import get_iou +from qai_hub_models.utils.testing import skip_clone_repo_check + +INPUT_IMAGE_ADDRESS = CachedWebModelAsset.from_asset_store( + MODEL_ID, MODEL_ASSET_VERSION, "test_image.jpg" +) +GROUND_TRUTH_RESULT = CachedWebModelAsset.from_asset_store( + MODEL_ID, MODEL_ASSET_VERSION, "ground_truth.txt" +) + + +@skip_clone_repo_check +def test_task(): + app = BodyDetectionApp(GearGuardNet.from_pretrained()) + result = app.detect(INPUT_IMAGE_ADDRESS, 320, 192, 0.9) + assert len(result) == 2 + + +@pytest.mark.trace +@skip_clone_repo_check +def test_trace(): + app = BodyDetectionApp(GearGuardNet.from_pretrained().convert_to_torchscript()) + result = app.detect(INPUT_IMAGE_ADDRESS, 320, 192, 0.9) + gt = load_raw_file(GROUND_TRUTH_RESULT) + gt = np.array(gt.split(), dtype=int) + result = result.astype(int) + assert result[0][0] == gt[0] + assert get_iou(result[0][1:5], gt[1:5]) > 0.8 + + +@skip_clone_repo_check +def test_demo(): + demo_main(is_test=True) diff --git a/qai_hub_models/models/googlenet/README.md b/qai_hub_models/models/googlenet/README.md index bf12b13f..fc59c1fd 100644 --- a/qai_hub_models/models/googlenet/README.md +++ b/qai_hub_models/models/googlenet/README.md @@ -6,7 +6,7 @@ GoogLeNet is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of GoogLeNet found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/googlenet.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/googlenet). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.googlenet.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of GoogLeNet can be found +* The license for the original implementation of GoogLeNet can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Going Deeper with Convolutions](https://arxiv.org/abs/1409.4842) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/googlenet.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/googlenet/export.py b/qai_hub_models/models/googlenet/export.py index b53c56a8..7e96979f 100644 --- a/qai_hub_models/models/googlenet/export.py +++ b/qai_hub_models/models/googlenet/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.googlenet import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "googlenet" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/googlenet/perf.yaml b/qai_hub_models/models/googlenet/perf.yaml index da2445b1..bec69247 100644 --- a/qai_hub_models/models/googlenet/perf.yaml +++ b/qai_hub_models/models/googlenet/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: GoogLeNet performance_metrics: - torchscript_onnx_tflite: - inference_time: 1013.0 - throughput: 987.1668311944719 + inference_time: 1015.0 + throughput: 985.2216748768473 estimated_peak_memory_range: - min: 32768 - max: 1365224 + min: 36864 + max: 25018896 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 84 - job_id: jep287drp + job_id: jpxko4w35 job_status: Passed torchscript_onnx_qnn: - inference_time: 1079.0 - throughput: 926.7840593141798 + inference_time: 1077.0 + throughput: 928.5051067780872 estimated_peak_memory_range: - min: 28672 - max: 36451744 + min: 20480 + max: 34586888 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 143 - job_id: jw5663m65 + job_id: j5q6qyjnp job_status: Passed torchscript_onnx: - inference_time: 1274.0 - throughput: 784.9293563579278 + inference_time: 1143.0 + throughput: 874.8906386701663 estimated_peak_memory_range: - min: 12288 - max: 15713920 + min: 294912 + max: 37222592 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 145 - job_id: jmg9v3qw5 + job_id: jg9lnmvwg job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:33:46Z' + timestamp: '2024-10-15T00:32:33Z' - torchscript_onnx_tflite: - inference_time: 811.0 - throughput: 1233.0456226880394 + inference_time: 743.0 + throughput: 1345.8950201884254 estimated_peak_memory_range: min: 16384 - max: 51495824 + max: 52283360 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 84 - job_id: jqpye428g + job_id: j5mnxmjdp job_status: Passed torchscript_onnx_qnn: - inference_time: 790.0 - throughput: 1265.8227848101267 + inference_time: 787.0 + throughput: 1270.6480304955528 estimated_peak_memory_range: - min: 0 - max: 16446528 + min: 299008 + max: 15968768 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 143 - job_id: j1p3k4735 + job_id: jglvmxjj5 job_status: Passed torchscript_onnx: - inference_time: 937.0 - throughput: 1067.2358591248667 + inference_time: 901.0 + throughput: 1109.8779134295228 estimated_peak_memory_range: - min: 0 - max: 53819328 + min: 258048 + max: 55363168 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 145 - job_id: jnp10dm85 + job_id: jp14zj08p job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:33:47Z' + timestamp: '2024-10-15T00:32:34Z' - torchscript_onnx_tflite: - inference_time: 1010.0 - throughput: 990.0990099009902 + inference_time: 1013.0 + throughput: 987.1668311944719 estimated_peak_memory_range: - min: 12288 - max: 13186544 + min: 20480 + max: 1369672 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 84 - job_id: j2p0y199g + job_id: jgn6vnjk5 job_status: Passed torchscript_onnx_qnn: - inference_time: 898.0 - throughput: 1113.5857461024498 + inference_time: 899.0 + throughput: 1112.3470522803113 estimated_peak_memory_range: min: 634880 - max: 2319688 + max: 1861080 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 143 - job_id: j1pv31nk5 + job_id: jp3j09y3g job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:33:41Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:32:25Z' - torchscript_onnx_tflite: - inference_time: 1491.0 - throughput: 670.690811535882 + inference_time: 1015.0 + throughput: 985.2216748768473 estimated_peak_memory_range: - min: 20480 - max: 52825136 + min: 28672 + max: 84791208 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 84 - job_id: j1p8o3rkg + job_id: jp0z0jn95 job_status: Passed torchscript_onnx_qnn: - inference_time: 1564.0 - throughput: 639.386189258312 + inference_time: 904.0 + throughput: 1106.1946902654868 estimated_peak_memory_range: - min: 618496 - max: 18373472 + min: 626688 + max: 1913544 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 143 - job_id: jz5wodr3p + job_id: jgjvn7xvg job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:33:45Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:32:29Z' - torchscript_onnx_tflite: - inference_time: 1013.0 - throughput: 987.1668311944719 + inference_time: 1012.0 + throughput: 988.1422924901186 estimated_peak_memory_range: - min: 12288 - max: 4872152 + min: 24576 + max: 1484152 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 84 - job_id: jogkzl0wg + job_id: jpy13x98p job_status: Passed torchscript_onnx_qnn: - inference_time: 904.0 - throughput: 1106.1946902654868 + inference_time: 896.0 + throughput: 1116.0714285714287 estimated_peak_memory_range: min: 626688 - max: 1938840 + max: 2228968 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 143 - job_id: j7gjx08vp + job_id: jpv6kd3k5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:33:42Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:32:28Z' - torchscript_onnx_tflite: - inference_time: 1014.0 - throughput: 986.1932938856016 + inference_time: 1011.0 + throughput: 989.1196834817013 estimated_peak_memory_range: - min: 24576 - max: 1423208 + min: 40960 + max: 3671544 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 84 - job_id: jn5q871n5 + job_id: jp2kyw2rp job_status: Passed torchscript_onnx_qnn: - inference_time: 905.0 - throughput: 1104.9723756906078 + inference_time: 897.0 + throughput: 1114.8272017837235 estimated_peak_memory_range: - min: 626688 - max: 2273688 + min: 630784 + max: 1932272 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 143 - job_id: jlpe9rnog + job_id: jgo26rjqp job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:33:43Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:32:27Z' - torchscript_onnx_tflite: - inference_time: 1012.0 - throughput: 988.1422924901186 + inference_time: 1501.0 + throughput: 666.2225183211193 estimated_peak_memory_range: - min: 12288 - max: 4382048 + min: 16384 + max: 53582944 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 84 - job_id: j1gln08jp + job_id: jprv30z0g job_status: Passed torchscript_onnx_qnn: - inference_time: 937.0 - throughput: 1067.2358591248667 + inference_time: 1577.0 + throughput: 634.1154090044388 estimated_peak_memory_range: - min: 643072 - max: 2019512 + min: 618496 + max: 20789424 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 143 - job_id: jygzex0og + job_id: jgz3dmeo5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:32:31Z' + - torchscript_onnx_tflite: + inference_time: 674.0 + throughput: 1483.679525222552 + estimated_peak_memory_range: + min: 8192 + max: 19830480 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 84 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 84 + job_id: jgkex4jwg + job_status: Passed + torchscript_onnx_qnn: + inference_time: 719.0 + throughput: 1390.8205841446454 + estimated_peak_memory_range: + min: 0 + max: 11830000 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 143 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 143 + job_id: j5we67o35 + job_status: Passed + torchscript_onnx: + inference_time: 846.0 + throughput: 1182.033096926714 + estimated_peak_memory_range: + min: 0 + max: 20493024 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 145 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 145 + job_id: jp4lr1q85 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:33:44Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:32:37Z' - torchscript_onnx_qnn: - inference_time: 1043.0 - throughput: 958.7727708533077 + inference_time: 1060.0 + throughput: 943.3962264150944 estimated_peak_memory_range: min: 602112 max: 602112 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 143 - job_id: jwgoy1wq5 + job_id: j56y47k6p job_status: Passed torchscript_onnx: - inference_time: 1323.0 - throughput: 755.8578987150415 + inference_time: 1334.0 + throughput: 749.6251874062968 estimated_peak_memory_range: - min: 15245312 - max: 15245312 + min: 14516224 + max: 14516224 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 145 - job_id: jvgdwrmr5 + job_id: jgdx13wrp job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:33:48Z' + timestamp: '2024-10-15T00:32:35Z' diff --git a/qai_hub_models/models/googlenet_quantized/README.md b/qai_hub_models/models/googlenet_quantized/README.md index 49c02f9f..bef9fc8f 100644 --- a/qai_hub_models/models/googlenet_quantized/README.md +++ b/qai_hub_models/models/googlenet_quantized/README.md @@ -6,7 +6,7 @@ GoogLeNet is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of GoogLeNetQuantized found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/googlenet.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/googlenet_quantized). @@ -17,11 +17,6 @@ accross various devices, can be found [here](https://aihub.qualcomm.com/models/g ## Example & Usage -Install the package via pip: -```bash -pip install "qai_hub_models[googlenet_quantized]" -``` - Once installed, run the following simple CLI demo: @@ -44,15 +39,19 @@ python -m qai_hub_models.models.googlenet_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of GoogLeNetQuantized can be found +* The license for the original implementation of GoogLeNetQuantized can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Going Deeper with Convolutions](https://arxiv.org/abs/1409.4842) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/googlenet.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/googlenet_quantized/evaluate.py b/qai_hub_models/models/googlenet_quantized/evaluate.py index 0e8be6d5..57156cc6 100644 --- a/qai_hub_models/models/googlenet_quantized/evaluate.py +++ b/qai_hub_models/models/googlenet_quantized/evaluate.py @@ -13,10 +13,8 @@ from qai_hub_models.models.googlenet_quantized import MODEL_ID, Model from qai_hub_models.utils.args import evaluate_parser, get_hub_device, get_model_kwargs -from qai_hub_models.utils.base_model import BaseModel from qai_hub_models.utils.evaluate import evaluate_on_dataset from qai_hub_models.utils.inference import compile_model_from_args -from qai_hub_models.utils.quantization_aimet import AIMETQuantizableMixin SUPPORTED_DATASETS = ["imagenette", "imagenet"] @@ -27,6 +25,7 @@ def main(): model_cls=Model, default_split_size=2500, supported_datasets=SUPPORTED_DATASETS, + is_hub_quantized=True, ) args = parser.parse_args() args.device = None @@ -38,13 +37,7 @@ def main(): MODEL_ID, args, get_model_kwargs(Model, vars(args)) ) hub_device = get_hub_device(None, args.chipset) - - # Use Fp16 model for torch inference - for cls in Model.__mro__: - if issubclass(cls, BaseModel) and not issubclass(cls, AIMETQuantizableMixin): - torch_cls = cls - break - torch_model = torch_cls.from_pretrained(**get_model_kwargs(torch_cls, vars(args))) + torch_model = Model.from_pretrained(**get_model_kwargs(Model, vars(args))) evaluate_on_dataset( hub_model, torch_model, diff --git a/qai_hub_models/models/googlenet_quantized/export.py b/qai_hub_models/models/googlenet_quantized/export.py index 8932c1fe..1a83d465 100644 --- a/qai_hub_models/models/googlenet_quantized/export.py +++ b/qai_hub_models/models/googlenet_quantized/export.py @@ -10,18 +10,20 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.googlenet_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference +from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( print_inference_metrics, print_on_target_demo_cmd, @@ -31,11 +33,14 @@ can_access_qualcomm_ai_hub, export_without_hub_access, ) +from qai_hub_models.utils.quantization import get_calibration_data def export_model( device: str = "Samsung Galaxy S23 (Family)", chipset: Optional[str] = None, + num_calibration_samples: int = 100, + skip_compiling: bool = False, skip_profiling: bool = False, skip_inferencing: bool = False, skip_downloading: bool = False, @@ -45,20 +50,19 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + 3. Compiles the model to an asset that can be run on device + 4. Profiles the model performance on a real device + 5. Inferences the model on sample inputs + 6. Downloads the model asset to the local directory + 7. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 5 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -66,6 +70,9 @@ def export_model( Defaults to DEFAULT_DEVICE if not specified. chipset: If set, will choose a random device with this chipset. Overrides the `device` argument. + num_calibration_samples: The number of calibration data samples + to use for quantization. + skip_compiling: If set, skips compiling model to format that can run on device. skip_profiling: If set, skips profiling of compiled model on real devices. skip_inferencing: If set, skips computing on-device outputs from sample data. skip_downloading: If set, skips downloading of compiled model. @@ -80,10 +87,11 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: - * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). + A struct of: + * A CompileJob object containing metadata about the compile job submitted to hub (None if compiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). + * A QuantizeJob object containing metadata about the quantize job submitted to hub """ model_name = "googlenet_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,33 +117,52 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) ) # Trace the model - source_model = model.convert_to_hub_source_model( - target_runtime, output_path, input_spec + source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) + + print(f"Quantizing model {model_name} with {num_calibration_samples} samples.") + # 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + onnx_compile_job = hub.submit_compile_job( + model=source_model, + input_specs=input_spec, + device=hub_device, + name=model_name, + options="--target_runtime onnx", + ) + quantize_job = hub.submit_quantize_job( + model=onnx_compile_job.get_target_model(), + calibration_data=get_calibration_data( + input_spec, "imagenette", num_calibration_samples + ), + weights_dtype=model.get_weights_dtype(), + activations_dtype=model.get_activations_dtype(), + name=model_name, + options=model.get_quantize_options(), ) + if skip_compiling: + return ExportResult(quantize_job=quantize_job) - # 2. Compile the model to an on-device asset + # 3. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) print(f"Optimizing model {model_name} to run on-device") submitted_compile_job = hub.submit_compile_job( - model=source_model, + model=quantize_job.get_target_model(), input_specs=input_spec, device=hub_device, name=model_name, - calibration_data=model.get_calibration_data(target_runtime), options=model_compile_options, ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 4. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +177,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 5. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +198,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 6. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 7. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -202,12 +229,17 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + quantize_job=quantize_job, + ) def main(): warnings.filterwarnings("ignore") - parser = export_parser(model_cls=Model) + parser = export_parser(model_cls=Model, is_hub_quantized=True) args = parser.parse_args() export_model(**vars(args)) diff --git a/qai_hub_models/models/googlenet_quantized/model.py b/qai_hub_models/models/googlenet_quantized/model.py index fc0e583d..f2622c3f 100644 --- a/qai_hub_models/models/googlenet_quantized/model.py +++ b/qai_hub_models/models/googlenet_quantized/model.py @@ -4,104 +4,11 @@ # --------------------------------------------------------------------- from __future__ import annotations -# isort: off -# This verifies aimet is installed, and this must be included first. -from qai_hub_models.utils.quantization_aimet import ( - AIMETQuantizableMixin, -) - -# isort: on - -from typing import Optional - -import torch -from aimet_torch.cross_layer_equalization import equalize_model -from aimet_torch.model_preparer import prepare_model -from aimet_torch.quantsim import QuantizationSimModel, load_encodings_to_sim -from qai_hub import Device - -from qai_hub_models.models.common import TargetRuntime from qai_hub_models.models.googlenet.model import GoogLeNet -from qai_hub_models.utils.aimet.config_loader import get_default_aimet_config -from qai_hub_models.utils.asset_loaders import CachedWebModelAsset -from qai_hub_models.utils.quantization_aimet import ( - constrain_quantized_inputs_to_image_range, - tie_observers, -) +from qai_hub_models.utils.quantization import HubQuantizableMixin MODEL_ID = __name__.split(".")[-2] -MODEL_ASSET_VERSION = 4 -DEFAULT_ENCODINGS = "googlenet_quantized_encodings.json" - - -class GoogLeNetQuantizable(AIMETQuantizableMixin, GoogLeNet): - """GoogleNet with post train quantization support. - - Supports only 8 bit weights and activations, and only loads pre-quantized checkpoints. - Support for quantizing using your own weights & data will come at a later date.""" - - def __init__( - self, - sim_model: QuantizationSimModel, - ) -> None: - # Input is already normalized by sim_model. Disable it in the wrapper model. - GoogLeNet.__init__(self, sim_model.model, normalize_input=False) - AIMETQuantizableMixin.__init__( - self, - sim_model, - needs_onnx_direct_aimet_export=True, - ) - - @classmethod - def from_pretrained( - cls, - aimet_encodings: str | None = "DEFAULT", - ) -> "GoogLeNetQuantizable": - """ - Parameters: - aimet_encodings: - if "DEFAULT": Loads the model with aimet encodings calibrated on imagenette. - elif None: Doesn't load any encodings. Used when computing encodings. - else: Interprets as a filepath and loads the encodings stored there. - """ - model = GoogLeNet.from_pretrained() - input_shape = cls.get_input_spec()["image_tensor"][0] - - model = prepare_model(model) - equalize_model(model, input_shape) - sim = QuantizationSimModel( - model, - quant_scheme="tf_enhanced", - default_param_bw=8, - default_output_bw=8, - config_file=get_default_aimet_config(), - dummy_input=torch.rand(input_shape), - ) - tie_observers(sim) - constrain_quantized_inputs_to_image_range(sim) - - if aimet_encodings: - if aimet_encodings == "DEFAULT": - aimet_encodings = CachedWebModelAsset.from_asset_store( - MODEL_ID, MODEL_ASSET_VERSION, DEFAULT_ENCODINGS - ).fetch() - load_encodings_to_sim(sim, aimet_encodings) - return cls(sim) - # TODO(12424) remove this once encodings export correctly - def get_hub_compile_options( - self, - target_runtime: TargetRuntime, - other_compile_options: str = "", - device: Optional[Device] = None, - ) -> str: - compile_options = super().get_hub_compile_options( - target_runtime, other_compile_options, device - ) - if target_runtime not in [ - TargetRuntime.ONNX, - TargetRuntime.PRECOMPILED_QNN_ONNX, - ]: - compile_options += " --quantize_full_type int8" - return compile_options +class GoogLeNetQuantizable(HubQuantizableMixin, GoogLeNet): + pass diff --git a/qai_hub_models/models/googlenet_quantized/perf.yaml b/qai_hub_models/models/googlenet_quantized/perf.yaml index 044337e8..01959c73 100644 --- a/qai_hub_models/models/googlenet_quantized/perf.yaml +++ b/qai_hub_models/models/googlenet_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,44 +20,39 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS6490 (Proxy) - RB3 Gen 2 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Sa8775p Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Qcs6490 Proxy + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8775P Proxy models: - name: GoogLeNetQuantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 275.0 - throughput: 3636.3636363636365 + inference_time: 284.0 + throughput: 3521.1267605633802 estimated_peak_memory_range: - min: 36864 - max: 1543984 + min: 12288 + max: 2756880 primary_compute_unit: NPU precision: int8 layer_info: @@ -64,37 +60,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 86 - job_id: joprk4x05 + job_id: jgjvd0z8g job_status: Passed torchscript_onnx_qnn: - inference_time: 340.0 - throughput: 2941.176470588235 + inference_time: 342.0 + throughput: 2923.9766081871344 estimated_peak_memory_range: - min: 16384 - max: 12814800 + min: 28672 + max: 10027984 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 86 + layers_on_npu: 143 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 86 - job_id: j1p3k4o35 + total_layers: 143 + job_id: jgn609mm5 job_status: Passed torchscript_onnx: - inference_time: 1229.0 - throughput: 813.6696501220505 + inference_time: 499.0 + throughput: 2004.0080160320642 estimated_peak_memory_range: - min: 151552 - max: 1582768 + min: 45056 + max: 10383960 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 91 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jvgdwr4r5 + total_layers: 91 + job_id: jgo2z1ndp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -103,13 +99,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:33:09Z' + timestamp: '2024-10-17T17:31:01Z' - torchscript_onnx_tflite: - inference_time: 200.0 - throughput: 5000.0 + inference_time: 209.0 + throughput: 4784.688995215311 estimated_peak_memory_range: min: 12288 - max: 39364304 + max: 39378000 primary_compute_unit: NPU precision: int8 layer_info: @@ -117,37 +113,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 86 - job_id: jep287orp + job_id: jpedore05 job_status: Passed torchscript_onnx_qnn: - inference_time: 251.0 - throughput: 3984.06374501992 + inference_time: 253.0 + throughput: 3952.5691699604745 estimated_peak_memory_range: - min: 0 - max: 13284944 + min: 159744 + max: 15566272 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 86 + layers_on_npu: 143 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 86 - job_id: jwgoy1dq5 + total_layers: 143 + job_id: jprv642eg job_status: Passed torchscript_onnx: - inference_time: 884.0 - throughput: 1131.2217194570135 + inference_time: 461.0 + throughput: 2169.1973969631235 estimated_peak_memory_range: min: 0 - max: 54190736 + max: 59955712 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 91 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jz57zjnvp + total_layers: 91 + job_id: jpv6q1rm5 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -156,13 +152,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:33:10Z' + timestamp: '2024-10-17T17:31:03Z' - torchscript_onnx_tflite: - inference_time: 280.0 - throughput: 3571.4285714285716 + inference_time: 920.0 + throughput: 1086.9565217391305 estimated_peak_memory_range: min: 12288 - max: 112193256 + max: 22360992 primary_compute_unit: NPU precision: int8 layer_info: @@ -170,37 +166,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 86 - job_id: jqpye488g + job_id: jgz32xo65 job_status: Passed torchscript_onnx_qnn: - inference_time: 302.0 - throughput: 3311.2582781456954 + inference_time: 1155.0 + throughput: 865.8008658008658 estimated_peak_memory_range: - min: 180224 - max: 1293824 + min: 12288 + max: 7596560 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 86 + layers_on_npu: 143 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 86 - job_id: j7gjx0yvp + total_layers: 143 + job_id: jp2kx79mp job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:33:02Z' + chipset: QCS6490 Proxy + timestamp: '2024-10-17T17:30:44Z' - torchscript_onnx_tflite: - inference_time: 346.0 - throughput: 2890.173410404624 + inference_time: 5708.0 + throughput: 175.1927119831815 estimated_peak_memory_range: - min: 16384 - max: 39709408 + min: 49152 + max: 2077784 primary_compute_unit: NPU precision: int8 layer_info: @@ -208,14 +204,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 86 - job_id: j2p0y1o9g + job_id: j5wewd2j5 job_status: Passed - torchscript_onnx_qnn: - inference_time: 405.0 - throughput: 2469.135802469136 + reference_device_info: + name: RB5 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8250 Proxy + timestamp: '2024-10-17T17:30:25Z' + - torchscript_onnx_tflite: + inference_time: 289.0 + throughput: 3460.2076124567475 estimated_peak_memory_range: min: 12288 - max: 18102896 + max: 1551016 primary_compute_unit: NPU precision: int8 layer_info: @@ -223,22 +227,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 86 - job_id: jmg9v32w5 + job_id: jg9l03jvg + job_status: Passed + torchscript_onnx_qnn: + inference_time: 299.0 + throughput: 3344.4816053511704 + estimated_peak_memory_range: + min: 200704 + max: 1480456 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 143 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 143 + job_id: jpy1z4j4p job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:33:07Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-17T17:30:46Z' - torchscript_onnx_tflite: - inference_time: 279.0 - throughput: 3584.2293906810037 + inference_time: 278.0 + throughput: 3597.122302158273 estimated_peak_memory_range: - min: 32768 - max: 1440040 + min: 12288 + max: 1378104 primary_compute_unit: NPU precision: int8 layer_info: @@ -246,37 +265,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 86 - job_id: j1p8o3jkg + job_id: jp142dylp job_status: Passed torchscript_onnx_qnn: inference_time: 301.0 throughput: 3322.2591362126245 estimated_peak_memory_range: min: 184320 - max: 1705792 + max: 1514328 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 86 + layers_on_npu: 143 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 86 - job_id: jlpe9rxog + total_layers: 143 + job_id: jp8q23m8p job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:33:03Z' + chipset: SA8255P Proxy + timestamp: '2024-10-17T17:30:50Z' - torchscript_onnx_tflite: - inference_time: 280.0 - throughput: 3571.4285714285716 + inference_time: 290.0 + throughput: 3448.2758620689656 estimated_peak_memory_range: - min: 12288 - max: 1459240 + min: 28672 + max: 32153096 primary_compute_unit: NPU precision: int8 layer_info: @@ -284,22 +303,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 86 - job_id: jogkzl6wg + job_id: jgdxnrelp job_status: Passed torchscript_onnx_qnn: - inference_time: 294.0 - throughput: 3401.360544217687 + inference_time: 306.0 + throughput: 3267.97385620915 estimated_peak_memory_range: - min: 184320 - max: 1430800 + min: 180224 + max: 1566248 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 86 + layers_on_npu: 143 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 86 - job_id: jygzexyog + total_layers: 143 + job_id: jgkevlqog job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -307,14 +326,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:33:05Z' + chipset: SA8775P Proxy + timestamp: '2024-10-17T17:30:51Z' - torchscript_onnx_tflite: - inference_time: 282.0 - throughput: 3546.099290780142 + inference_time: 352.0 + throughput: 2840.909090909091 estimated_peak_memory_range: - min: 12288 - max: 112695712 + min: 16384 + max: 39549744 primary_compute_unit: NPU precision: int8 layer_info: @@ -322,37 +341,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 86 - job_id: jn5q874n5 + job_id: j57y2jlr5 job_status: Passed torchscript_onnx_qnn: - inference_time: 301.0 - throughput: 3322.2591362126245 + inference_time: 410.0 + throughput: 2439.0243902439024 estimated_peak_memory_range: - min: 180224 - max: 1450768 + min: 163840 + max: 17548080 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 86 + layers_on_npu: 143 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 86 - job_id: jz5wodz3p + total_layers: 143 + job_id: j5q607rmp job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:33:06Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-17T17:30:53Z' - torchscript_onnx_tflite: - inference_time: 906.0 - throughput: 1103.7527593818984 + inference_time: 180.0 + throughput: 5555.555555555556 estimated_peak_memory_range: - min: 12288 - max: 22392944 + min: 8192 + max: 20749520 primary_compute_unit: NPU precision: int8 layer_info: @@ -360,83 +379,75 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 86 - job_id: j1gln0wjp + job_id: jp4lnxdl5 job_status: Passed torchscript_onnx_qnn: - inference_time: 1153.0 - throughput: 867.3026886383348 + inference_time: 255.0 + throughput: 3921.5686274509803 estimated_peak_memory_range: - min: 12288 - max: 8146544 + min: 163840 + max: 12187936 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 86 + layers_on_npu: 143 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 86 - job_id: jnp10d185 + total_layers: 143 + job_id: jglv402l5 job_status: Passed - reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot - os_name: Android - manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T12:33:08Z' - - torchscript_onnx_tflite: - inference_time: 5814.0 - throughput: 171.9986240110079 + torchscript_onnx: + inference_time: 403.0 + throughput: 2481.3895781637716 estimated_peak_memory_range: - min: 28672 - max: 6101632 + min: 0 + max: 27298256 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 86 + layers_on_npu: 91 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 86 - job_id: jw5663o65 + total_layers: 91 + job_id: jpedorw05 job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T12:32:58Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-17T17:31:06Z' - torchscript_onnx_qnn: - inference_time: 420.0 - throughput: 2380.9523809523807 + inference_time: 409.0 + throughput: 2444.987775061125 estimated_peak_memory_range: - min: 499712 - max: 499712 + min: 512000 + max: 512000 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 86 + layers_on_npu: 143 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 86 - job_id: j1pv31mk5 + total_layers: 143 + job_id: jp0z412e5 job_status: Passed torchscript_onnx: - inference_time: 1282.0 - throughput: 780.0312012480499 + inference_time: 549.0 + throughput: 1821.4936247723133 estimated_peak_memory_range: - min: 15044608 - max: 15044608 + min: 8499200 + max: 8499200 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 91 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jqp4qx48g + total_layers: 91 + job_id: jgjvd028g job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -445,4 +456,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:33:11Z' + timestamp: '2024-10-17T17:31:05Z' diff --git a/qai_hub_models/models/googlenet_quantized/requirements.txt b/qai_hub_models/models/googlenet_quantized/requirements.txt deleted file mode 100644 index de5b80e8..00000000 --- a/qai_hub_models/models/googlenet_quantized/requirements.txt +++ /dev/null @@ -1 +0,0 @@ -aimet-torch==1.32.1.post1; sys_platform == "linux" diff --git a/qai_hub_models/models/googlenet_quantized/test.py b/qai_hub_models/models/googlenet_quantized/test.py deleted file mode 100644 index c116898d..00000000 --- a/qai_hub_models/models/googlenet_quantized/test.py +++ /dev/null @@ -1,29 +0,0 @@ -# --------------------------------------------------------------------- -# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. -# SPDX-License-Identifier: BSD-3-Clause -# --------------------------------------------------------------------- -from qai_hub_models.models._shared.imagenet_classifier.test_utils import ( - run_imagenet_classifier_test, -) -from qai_hub_models.models.googlenet_quantized.demo import main as demo_main -from qai_hub_models.models.googlenet_quantized.model import ( - MODEL_ASSET_VERSION, - MODEL_ID, - GoogLeNetQuantizable, -) - - -def test_task(): - run_imagenet_classifier_test( - GoogLeNetQuantizable.from_pretrained(), - MODEL_ID, - asset_version=MODEL_ASSET_VERSION, - diff_tol=0.005, - rtol=0.02, - atol=0.2, - ) - - -def test_demo(): - # Verify demo does not crash - demo_main(is_test=True) diff --git a/qai_hub_models/models/hrnet_pose/README.md b/qai_hub_models/models/hrnet_pose/README.md index 2fa47408..6c41c41d 100644 --- a/qai_hub_models/models/hrnet_pose/README.md +++ b/qai_hub_models/models/hrnet_pose/README.md @@ -6,7 +6,7 @@ HRNet performs pose estimation in high-resolution representations. This is based on the implementation of HRNetPose found -[here](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/hrnet_posenet). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/hrnet_pose). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.hrnet_pose.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of HRNetPose can be found +* The license for the original implementation of HRNetPose can be found [here](https://github.com/quic/aimet-model-zoo/blob/develop/LICENSE.pdf). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Deep High-Resolution Representation Learning for Human Pose Estimation](https://arxiv.org/abs/1902.09212) * [Source Model Implementation](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/hrnet_posenet) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/hrnet_pose/export.py b/qai_hub_models/models/hrnet_pose/export.py index d77fbba8..5af9642c 100644 --- a/qai_hub_models/models/hrnet_pose/export.py +++ b/qai_hub_models/models/hrnet_pose/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.hrnet_pose import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "hrnet_pose" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/hrnet_pose/perf.yaml b/qai_hub_models/models/hrnet_pose/perf.yaml index 6146f70d..3b35594d 100644 --- a/qai_hub_models/models/hrnet_pose/perf.yaml +++ b/qai_hub_models/models/hrnet_pose/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: HRNetPose performance_metrics: - torchscript_onnx_tflite: - inference_time: 2886.0 - throughput: 346.5003465003465 + inference_time: 2847.0 + throughput: 351.24692658939233 estimated_peak_memory_range: - min: 237568 - max: 373364816 + min: 16384 + max: 2387888 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 516 - job_id: jn5q87l45 + job_id: jgjvn78xg job_status: Passed torchscript_onnx_qnn: - inference_time: 2964.0 - throughput: 337.38191632928476 + inference_time: 2906.0 + throughput: 344.1156228492774 estimated_peak_memory_range: - min: 606208 - max: 16821840 + min: 16384 + max: 14831168 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 747 - job_id: jlpe9r71g + job_id: jp14zjk8p job_status: Passed torchscript_onnx: - inference_time: 3062.0 - throughput: 326.5839320705421 + inference_time: 2910.0 + throughput: 343.64261168384877 estimated_peak_memory_range: - min: 12288 - max: 59999824 + min: 20480 + max: 710085448 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 749 - job_id: jnp10do85 + job_id: jp0z0j895 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:32:14Z' + timestamp: '2024-10-15T00:30:45Z' - torchscript_onnx_tflite: - inference_time: 2308.0 - throughput: 433.27556325823224 + inference_time: 2292.0 + throughput: 436.3001745200698 estimated_peak_memory_range: - min: 20480 - max: 122914928 + min: 16384 + max: 126576592 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 516 - job_id: j1gln0y8p + job_id: jpedmzn15 job_status: Passed torchscript_onnx_qnn: - inference_time: 2368.0 - throughput: 422.2972972972973 + inference_time: 2517.0 + throughput: 397.29837107667856 estimated_peak_memory_range: min: 606208 - max: 34133200 + max: 37904320 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 747 - job_id: jygzexlkg + job_id: jgdx13yrp job_status: Passed torchscript_onnx: - inference_time: 2463.0 - throughput: 406.00893219650834 + inference_time: 2455.0 + throughput: 407.33197556008145 estimated_peak_memory_range: min: 0 - max: 150875360 + max: 155262800 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 749 - job_id: jvgdwr6r5 + job_id: jgkex4wwg job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:32:15Z' + timestamp: '2024-10-15T00:30:46Z' - torchscript_onnx_tflite: - inference_time: 2816.0 - throughput: 355.1136363636364 + inference_time: 2838.0 + throughput: 352.36081747709653 estimated_peak_memory_range: - min: 20480 - max: 2897008 + min: 0 + max: 2369552 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 516 - job_id: jw5663805 + job_id: jgz3dm0k5 job_status: Passed torchscript_onnx_qnn: - inference_time: 2715.0 - throughput: 368.3241252302026 + inference_time: 2709.0 + throughput: 369.139904023625 estimated_peak_memory_range: - min: 614400 - max: 1971304 + min: 622592 + max: 1763760 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 747 - job_id: jmg9v3ol5 + job_id: jp4lr1685 job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:32:08Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:30:37Z' - torchscript_onnx_tflite: - inference_time: 3760.0 - throughput: 265.9574468085106 + inference_time: 2835.0 + throughput: 352.7336860670194 estimated_peak_memory_range: min: 16384 - max: 113222160 + max: 2326424 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 516 - job_id: j1p3k4zl5 + job_id: jgdx13yep job_status: Passed torchscript_onnx_qnn: - inference_time: 3866.0 - throughput: 258.6652871184687 + inference_time: 2717.0 + throughput: 368.052999631947 estimated_peak_memory_range: - min: 356352 - max: 25341744 + min: 663552 + max: 2003840 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 747 - job_id: jmg9v3ow5 + job_id: jgn6vndk5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:32:13Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:30:40Z' - torchscript_onnx_tflite: - inference_time: 2839.0 - throughput: 352.23670306445933 + inference_time: 2815.0 + throughput: 355.23978685612786 estimated_peak_memory_range: min: 16384 - max: 3214984 + max: 2092832 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 516 - job_id: jwgoy1lx5 + job_id: jp14zjk2p job_status: Passed torchscript_onnx_qnn: - inference_time: 2782.0 - throughput: 359.45363048166786 + inference_time: 2758.0 + throughput: 362.58158085569255 estimated_peak_memory_range: - min: 618496 - max: 1824488 + min: 622592 + max: 1809960 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 747 - job_id: jnp10do25 + job_id: j5mnxm1dp job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:32:10Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:30:39Z' - torchscript_onnx_tflite: - inference_time: 2821.0 - throughput: 354.4842254519674 + inference_time: 2843.0 + throughput: 351.74111853675697 estimated_peak_memory_range: - min: 32768 - max: 2156360 + min: 24576 + max: 2200184 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 516 - job_id: j1pv31lj5 + job_id: jg9lnm7lg job_status: Passed torchscript_onnx_qnn: - inference_time: 2733.0 - throughput: 365.89828027808267 + inference_time: 2748.0 + throughput: 363.901018922853 estimated_peak_memory_range: min: 622592 - max: 2259984 + max: 1897176 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 747 - job_id: jvgdwr6e5 + job_id: jpxko4835 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:32:11Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:30:38Z' - torchscript_onnx_tflite: - inference_time: 2834.0 - throughput: 352.85815102328866 + inference_time: 3758.0 + throughput: 266.0989888238425 estimated_peak_memory_range: - min: 28672 - max: 2331352 + min: 20480 + max: 112712896 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 516 - job_id: j7gjx0rxp + job_id: j5we67065 job_status: Passed torchscript_onnx_qnn: - inference_time: 2805.0 - throughput: 356.50623885918003 + inference_time: 3885.0 + throughput: 257.4002574002574 estimated_peak_memory_range: - min: 618496 - max: 2063520 + min: 606208 + max: 28636192 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 747 - job_id: jz5wody3p + job_id: jp2kywqrp job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:30:43Z' + - torchscript_onnx_tflite: + inference_time: 1970.0 + throughput: 507.61421319796955 + estimated_peak_memory_range: + min: 12288 + max: 61964880 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 516 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 516 + job_id: jg9lnm7wg + job_status: Passed + torchscript_onnx_qnn: + inference_time: 2035.0 + throughput: 491.4004914004914 + estimated_peak_memory_range: + min: 602112 + max: 35413504 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 747 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 747 + job_id: jpy13xk8p + job_status: Passed + torchscript_onnx: + inference_time: 1866.0 + throughput: 535.9056806002144 + estimated_peak_memory_range: + min: 0 + max: 75051936 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 749 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 749 + job_id: j56y4796p + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:32:12Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:30:49Z' - torchscript_onnx_qnn: - inference_time: 2959.0 - throughput: 337.95201081446436 + inference_time: 2978.0 + throughput: 335.795836131632 estimated_peak_memory_range: min: 589824 max: 589824 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 747 - job_id: jz5wody6p + job_id: j57yr41v5 job_status: Passed torchscript_onnx: - inference_time: 2986.0 - throughput: 334.8961821835231 + inference_time: 2972.0 + throughput: 336.47375504710635 estimated_peak_memory_range: - min: 58335232 - max: 58335232 + min: 59396096 + max: 59396096 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 749 - job_id: jz57zjovp + job_id: j5q6qyxnp job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:32:16Z' + timestamp: '2024-10-15T00:30:47Z' diff --git a/qai_hub_models/models/hrnet_pose_quantized/README.md b/qai_hub_models/models/hrnet_pose_quantized/README.md index 0e936653..03707c36 100644 --- a/qai_hub_models/models/hrnet_pose_quantized/README.md +++ b/qai_hub_models/models/hrnet_pose_quantized/README.md @@ -6,7 +6,7 @@ HRNet performs pose estimation in high-resolution representations. This is based on the implementation of HRNetPoseQuantized found -[here](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/hrnet_posenet). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/hrnet_pose_quantized). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.hrnet_pose_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of HRNetPoseQuantized can be found +* The license for the original implementation of HRNetPoseQuantized can be found [here](https://github.com/quic/aimet-model-zoo/blob/develop/LICENSE.pdf). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Deep High-Resolution Representation Learning for Human Pose Estimation](https://arxiv.org/abs/1902.09212) * [Source Model Implementation](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/hrnet_posenet) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/hrnet_pose_quantized/export.py b/qai_hub_models/models/hrnet_pose_quantized/export.py index fec68968..6872d705 100644 --- a/qai_hub_models/models/hrnet_pose_quantized/export.py +++ b/qai_hub_models/models/hrnet_pose_quantized/export.py @@ -10,17 +10,17 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.hrnet_pose_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.printing import ( print_inference_metrics, @@ -45,20 +45,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -80,10 +78,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "hrnet_pose_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,7 +107,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( target_runtime, output_path, input_spec ) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -135,7 +133,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +148,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +169,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -198,7 +196,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/hrnet_pose_quantized/perf.yaml b/qai_hub_models/models/hrnet_pose_quantized/perf.yaml index 5d09c38d..6c6e5cf2 100644 --- a/qai_hub_models/models/hrnet_pose_quantized/perf.yaml +++ b/qai_hub_models/models/hrnet_pose_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,41 +20,38 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: HRNetPoseQuantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 966.0 - throughput: 1035.1966873706003 + inference_time: 956.0 + throughput: 1046.0251046025105 estimated_peak_memory_range: - min: 12288 - max: 2325608 + min: 16384 + max: 2210536 primary_compute_unit: NPU precision: int8 layer_info: @@ -61,22 +59,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 518 - job_id: jn5q87y45 + job_id: j5q6qy14p job_status: Passed torchscript_onnx_qnn: - inference_time: 1250.0 - throughput: 800.0 + inference_time: 1251.0 + throughput: 799.3605115907275 estimated_peak_memory_range: - min: 16384 - max: 22198416 + min: 167936 + max: 8571056 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 487 + layers_on_npu: 748 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 487 - job_id: jz5wodl6p + total_layers: 748 + job_id: jp14zjm2p job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -85,13 +83,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:31:16Z' + timestamp: '2024-10-15T00:29:37Z' - torchscript_onnx_tflite: - inference_time: 696.0 - throughput: 1436.7816091954023 + inference_time: 792.0 + throughput: 1262.6262626262626 estimated_peak_memory_range: - min: 65536 - max: 109951616 + min: 16384 + max: 113943488 primary_compute_unit: NPU precision: int8 layer_info: @@ -99,22 +97,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 518 - job_id: j1gln0x8p + job_id: jglvmx885 job_status: Passed torchscript_onnx_qnn: - inference_time: 920.0 - throughput: 1086.9565217391305 + inference_time: 1051.0 + throughput: 951.4747859181732 estimated_peak_memory_range: - min: 172032 - max: 30865488 + min: 0 + max: 34824448 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 487 + layers_on_npu: 748 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 487 - job_id: jmg9v3zl5 + total_layers: 748 + job_id: jgdx13mep job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -123,13 +121,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:31:17Z' + timestamp: '2024-10-15T00:29:38Z' - torchscript_onnx_tflite: - inference_time: 943.0 - throughput: 1060.4453870625662 + inference_time: 3825.0 + throughput: 261.437908496732 estimated_peak_memory_range: - min: 16384 - max: 2080912 + min: 12288 + max: 72031904 primary_compute_unit: NPU precision: int8 layer_info: @@ -137,37 +135,60 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 518 - job_id: jw5663705 + job_id: jgz3dmyk5 job_status: Passed torchscript_onnx_qnn: - inference_time: 1190.0 - throughput: 840.3361344537815 + inference_time: 5523.0 + throughput: 181.0610175629187 estimated_peak_memory_range: - min: 172032 - max: 1393232 + min: 225280 + max: 8107920 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 487 + layers_on_npu: 748 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 487 - job_id: jvgdwrde5 + total_layers: 748 + job_id: jpy13xy7p job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS6490 Proxy + timestamp: '2024-10-15T00:29:47Z' + - torchscript_onnx_tflite: + inference_time: 17117.0 + throughput: 58.421452357305604 + estimated_peak_memory_range: + min: 98304 + max: 4442464 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 518 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 518 + job_id: j5we67r65 + job_status: Passed + reference_device_info: + name: RB5 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:31:19Z' + chipset: QCS8250 Proxy + timestamp: '2024-10-15T00:29:34Z' - torchscript_onnx_tflite: - inference_time: 1179.0 - throughput: 848.1764206955047 + inference_time: 953.0 + throughput: 1049.3179433368311 estimated_peak_memory_range: - min: 45056 - max: 113234096 + min: 12288 + max: 2785544 primary_compute_unit: NPU precision: int8 layer_info: @@ -175,37 +196,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 518 - job_id: j1p3k49l5 + job_id: j56y47m0p job_status: Passed torchscript_onnx_qnn: - inference_time: 1468.0 - throughput: 681.1989100817439 + inference_time: 1199.0 + throughput: 834.0283569641368 estimated_peak_memory_range: - min: 163840 - max: 37739744 + min: 176128 + max: 1413960 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 487 + layers_on_npu: 748 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 487 - job_id: jo5mrw0wg + total_layers: 748 + job_id: jp4lr12v5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:31:24Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:29:40Z' - torchscript_onnx_tflite: - inference_time: 964.0 - throughput: 1037.344398340249 + inference_time: 952.0 + throughput: 1050.420168067227 estimated_peak_memory_range: - min: 36864 - max: 2120904 + min: 12288 + max: 2260088 primary_compute_unit: NPU precision: int8 layer_info: @@ -213,37 +234,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 518 - job_id: jwgoy1rx5 + job_id: jgjvn7yxg job_status: Passed torchscript_onnx_qnn: - inference_time: 1205.0 - throughput: 829.8755186721992 + inference_time: 1211.0 + throughput: 825.7638315441784 estimated_peak_memory_range: - min: 172032 - max: 1515664 + min: 180224 + max: 1924760 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 487 + layers_on_npu: 748 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 487 - job_id: jz57zjelp + total_layers: 748 + job_id: jgn6vnwr5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:31:20Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:29:43Z' - torchscript_onnx_tflite: - inference_time: 968.0 - throughput: 1033.0578512396694 + inference_time: 962.0 + throughput: 1039.5010395010395 estimated_peak_memory_range: - min: 24576 - max: 1932768 + min: 12288 + max: 1925872 primary_compute_unit: NPU precision: int8 layer_info: @@ -251,22 +272,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 518 - job_id: j1pv31dj5 + job_id: jpv6kdmj5 job_status: Passed torchscript_onnx_qnn: - inference_time: 1206.0 - throughput: 829.1873963515754 + inference_time: 1221.0 + throughput: 819.000819000819 estimated_peak_memory_range: - min: 184320 - max: 1395928 + min: 180224 + max: 1373992 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 487 + layers_on_npu: 748 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 487 - job_id: jqp4qxyvg + total_layers: 748 + job_id: j5mnxmlwp job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -274,14 +295,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:31:22Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:29:42Z' - torchscript_onnx_tflite: - inference_time: 962.0 - throughput: 1039.5010395010395 + inference_time: 960.0 + throughput: 1041.6666666666667 estimated_peak_memory_range: - min: 16384 - max: 2151952 + min: 12288 + max: 18731344 primary_compute_unit: NPU precision: int8 layer_info: @@ -289,37 +310,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 518 - job_id: j7gjx07xp + job_id: jgo26rwxp job_status: Passed torchscript_onnx_qnn: - inference_time: 1195.0 - throughput: 836.8200836820083 + inference_time: 1212.0 + throughput: 825.0825082508251 estimated_peak_memory_range: - min: 180224 - max: 1688952 + min: 172032 + max: 1445440 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 487 + layers_on_npu: 748 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 487 - job_id: j0pxv7l1g + total_layers: 748 + job_id: jpxko4z15 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:31:23Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:29:41Z' - torchscript_onnx_tflite: - inference_time: 3860.0 - throughput: 259.0673575129534 + inference_time: 1172.0 + throughput: 853.2423208191126 estimated_peak_memory_range: - min: 12288 - max: 71371696 + min: 61440 + max: 114534080 primary_compute_unit: NPU precision: int8 layer_info: @@ -327,37 +348,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 518 - job_id: jlpe9rz1g + job_id: jp3j097lg job_status: Passed torchscript_onnx_qnn: - inference_time: 5564.0 - throughput: 179.72681524083393 + inference_time: 1524.0 + throughput: 656.1679790026246 estimated_peak_memory_range: - min: 172032 - max: 8591888 + min: 163840 + max: 40149008 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 487 + layers_on_npu: 748 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 487 - job_id: jegn29zrg + total_layers: 748 + job_id: jp2kywz4p job_status: Passed reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T12:31:25Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:29:46Z' - torchscript_onnx_tflite: - inference_time: 17160.0 - throughput: 58.27505827505828 + inference_time: 663.0 + throughput: 1508.2956259426849 estimated_peak_memory_range: - min: 561152 - max: 3637416 + min: 8192 + max: 66830176 primary_compute_unit: NPU precision: int8 layer_info: @@ -365,30 +386,45 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 518 - job_id: jygzexmkg + job_id: jg9lnmqlg + job_status: Passed + torchscript_onnx_qnn: + inference_time: 746.0 + throughput: 1340.4825737265417 + estimated_peak_memory_range: + min: 159744 + max: 33374624 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 748 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 748 + job_id: jp0z0jx65 job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T12:31:15Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:29:48Z' - torchscript_onnx_qnn: - inference_time: 1326.0 - throughput: 754.1478129713424 + inference_time: 1343.0 + throughput: 744.6016381236038 estimated_peak_memory_range: - min: 327680 - max: 327680 + min: 446464 + max: 446464 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 487 + layers_on_npu: 748 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 487 - job_id: jnp10dn25 + total_layers: 748 + job_id: j57yr48l5 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -397,4 +433,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:31:18Z' + timestamp: '2024-10-15T00:29:39Z' diff --git a/qai_hub_models/models/huggingface_wavlm_base_plus/README.md b/qai_hub_models/models/huggingface_wavlm_base_plus/README.md index 04bf447b..6670fbde 100644 --- a/qai_hub_models/models/huggingface_wavlm_base_plus/README.md +++ b/qai_hub_models/models/huggingface_wavlm_base_plus/README.md @@ -6,7 +6,7 @@ HuggingFaceWavLMBasePlus is a real time speech processing backbone based on Microsoft's WavLM model. This is based on the implementation of HuggingFace-WavLM-Base-Plus found -[here](https://huggingface.co/patrickvonplaten/wavlm-libri-clean-100h-base-plus/tree/main). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/huggingface_wavlm_base_plus). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.huggingface_wavlm_base_plus.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of HuggingFace-WavLM-Base-Plus can be found +* The license for the original implementation of HuggingFace-WavLM-Base-Plus can be found [here](https://github.com/microsoft/unilm/blob/master/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing](https://arxiv.org/abs/2110.13900) * [Source Model Implementation](https://huggingface.co/patrickvonplaten/wavlm-libri-clean-100h-base-plus/tree/main) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/huggingface_wavlm_base_plus/export.py b/qai_hub_models/models/huggingface_wavlm_base_plus/export.py index 6975b7b5..68b630c1 100644 --- a/qai_hub_models/models/huggingface_wavlm_base_plus/export.py +++ b/qai_hub_models/models/huggingface_wavlm_base_plus/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.huggingface_wavlm_base_plus import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -46,20 +46,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -81,10 +79,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "huggingface_wavlm_base_plus" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -110,7 +108,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -119,7 +117,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -133,7 +131,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -148,7 +146,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -169,13 +167,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -193,7 +191,11 @@ def export_model( inference_job, inference_result, torch_out, model.get_output_names() ) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/huggingface_wavlm_base_plus/perf.yaml b/qai_hub_models/models/huggingface_wavlm_base_plus/perf.yaml index 1f9e0723..5e2877c2 100644 --- a/qai_hub_models/models/huggingface_wavlm_base_plus/perf.yaml +++ b/qai_hub_models/models/huggingface_wavlm_base_plus/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: HuggingFace-WavLM-Base-Plus performance_metrics: - torchscript_onnx_tflite: - inference_time: 957880.0 - throughput: 1.0439721050653528 + inference_time: 817718.0 + throughput: 1.2229154794195554 estimated_peak_memory_range: - min: 66510848 - max: 69562208 + min: 65708032 + max: 68249568 primary_compute_unit: CPU precision: fp32 layer_info: @@ -58,7 +56,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 871 total_layers: 871 - job_id: jlpe9ry1g + job_id: jp3j09olg job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -67,13 +65,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:28:39Z' + timestamp: '2024-10-15T00:26:50Z' - torchscript_onnx_tflite: - inference_time: 633653.0 - throughput: 1.578150817561031 + inference_time: 631561.0 + throughput: 1.583378327667478 estimated_peak_memory_range: - min: 65622016 - max: 87489936 + min: 66519040 + max: 88097792 primary_compute_unit: CPU precision: fp32 layer_info: @@ -81,7 +79,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 871 total_layers: 871 - job_id: jygzexnkg + job_id: jgo26rdxp job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -90,13 +88,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:28:40Z' + timestamp: '2024-10-15T00:26:52Z' - torchscript_onnx_tflite: - inference_time: 886695.0 - throughput: 1.1277835106772904 + inference_time: 849395.0 + throughput: 1.1773085549126143 estimated_peak_memory_range: - min: 65736704 - max: 67874536 + min: 60493824 + max: 631125408 primary_compute_unit: CPU precision: fp32 layer_info: @@ -104,7 +102,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 871 total_layers: 871 - job_id: jz5wod76p + job_id: jpv6kd2j5 job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -112,14 +110,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:28:41Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:26:53Z' - torchscript_onnx_tflite: - inference_time: 1118324.0 - throughput: 0.8941952421659555 + inference_time: 850762.0 + throughput: 1.175416861589963 estimated_peak_memory_range: - min: 65699840 - max: 92825376 + min: 65540096 + max: 68435096 primary_compute_unit: CPU precision: fp32 layer_info: @@ -127,22 +125,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 871 total_layers: 871 - job_id: jmg9v3ml5 + job_id: j5we67z65 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:28:42Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:26:57Z' - torchscript_onnx_tflite: - inference_time: 749518.0 - throughput: 1.3341907732702885 + inference_time: 846763.0 + throughput: 1.1809679922245067 estimated_peak_memory_range: - min: 65617920 - max: 68774152 + min: 65814528 + max: 68615184 primary_compute_unit: CPU precision: fp32 layer_info: @@ -150,22 +148,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 871 total_layers: 871 - job_id: jnp10dj25 + job_id: jgz3dmzk5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:28:43Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:26:56Z' - torchscript_onnx_tflite: - inference_time: 802634.0 - throughput: 1.2458978812260633 + inference_time: 889799.0 + throughput: 1.1238493187787355 estimated_peak_memory_range: - min: 65568768 - max: 104111960 + min: 65630208 + max: 68682344 primary_compute_unit: CPU precision: fp32 layer_info: @@ -173,22 +171,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 871 total_layers: 871 - job_id: jvgdwr3e5 + job_id: jpedmz615 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:28:44Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:26:55Z' - torchscript_onnx_tflite: - inference_time: 897006.0 - throughput: 1.114819744795464 + inference_time: 1305119.0 + throughput: 0.766213655613013 estimated_peak_memory_range: - min: 65593344 - max: 104803536 + min: 65798144 + max: 93414896 primary_compute_unit: CPU precision: fp32 layer_info: @@ -196,13 +194,36 @@ models: layers_on_gpu: 0 layers_on_cpu: 871 total_layers: 871 - job_id: jz57zj4lp + job_id: jgjvn73xg job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:26:54Z' + - torchscript_onnx_tflite: + inference_time: 568662.0 + throughput: 1.7585138447795 + estimated_peak_memory_range: + min: 65544192 + max: 81468816 + primary_compute_unit: CPU + precision: fp32 + layer_info: + layers_on_npu: 0 + layers_on_gpu: 0 + layers_on_cpu: 871 + total_layers: 871 + job_id: jp14zj12p + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:28:45Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:26:59Z' diff --git a/qai_hub_models/models/ibm_granite_3b_code_instruct/README.md b/qai_hub_models/models/ibm_granite_3b_code_instruct/README.md new file mode 100644 index 00000000..4bc8ef0d --- /dev/null +++ b/qai_hub_models/models/ibm_granite_3b_code_instruct/README.md @@ -0,0 +1,58 @@ +[![Qualcomm® AI Hub Models](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/quic-logo.jpg)](../../README.md) + + +# [IBM-Granite-3B-Code-Instruct: State-of-the-art large language model useful on a variety of code understanding and generation tasks](https://aihub.qualcomm.com/models/ibm_granite_3b_code_instruct) + +Granite-3B-Code-Instruct-2K is a 3B parameter model fine tuned from Granite-3B-Code-Base-2K on a combination of permissively licensed instruction data to enhance instruction following capabilities including logical reasoning and problem-solving skills. + +This is based on the implementation of IBM-Granite-3B-Code-Instruct found +[here]({source_repo}). This repository contains scripts for optimized on-device +export suitable to run on Qualcomm® devices. More details on model performance +accross various devices, can be found [here](https://aihub.qualcomm.com/models/ibm_granite_3b_code_instruct). + +[Sign up](https://myaccount.qualcomm.com/signup) to start using Qualcomm AI Hub and run these models on a hosted Qualcomm® device. + + + + + + +## License +* The license for the original implementation of IBM-Granite-3B-Code-Instruct can be found + [here](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md). +* The license for the compiled assets for on-device deployment can be found [here](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md) + + +## References +* [Granite Code Models: A Family of Open Foundation Models for Code Intelligence](https://arxiv.org/abs/2405.04324) +* [Source Model Implementation](https://huggingface.co/ibm-granite/granite-3b-code-instruct-2k) + + + +## Community +* Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. +* For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). + + +## Usage and Limitations + +This model may not be used for or in connection with any of the following applications: + +- Accessing essential private and public services and benefits; +- Administration of justice and democratic processes; +- Assessing or recognizing the emotional state of a person; +- Biometric and biometrics-based systems, including categorization of persons based on sensitive characteristics; +- Education and vocational training; +- Employment and workers management; +- Exploitation of the vulnerabilities of persons resulting in harmful behavior; +- General purpose social scoring; +- Law enforcement; +- Management and operation of critical infrastructure; +- Migration, asylum and border control management; +- Predictive policing; +- Real-time remote biometric identification in public spaces; +- Recommender systems of social media platforms; +- Scraping of facial images (from the internet or otherwise); and/or +- Subliminal manipulation + + diff --git a/qai_hub_models/models/ibm_granite_3b_code_instruct/info.yaml b/qai_hub_models/models/ibm_granite_3b_code_instruct/info.yaml new file mode 100644 index 00000000..bb428616 --- /dev/null +++ b/qai_hub_models/models/ibm_granite_3b_code_instruct/info.yaml @@ -0,0 +1,58 @@ +name: IBM-Granite-3B-Code-Instruct +id: ibm_granite_3b_code_instruct +status: public +headline: State-of-the-art large language model useful on a variety of code + understanding and generation tasks. +domain: Generative AI +description: Granite-3B-Code-Instruct-2K is a 3B parameter model fine tuned from Granite-3B-Code-Base-2K + on a combination of permissively licensed instruction data to enhance instruction following + capabilities including logical reasoning and problem-solving skills. +use_case: Text Generation +tags: + - llm + - generative-ai +research_paper: https://arxiv.org/abs/2405.04324 +research_paper_title: "Granite Code Models: A Family of Open Foundation Models for Code Intelligence" +license: https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md +deploy_license: https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md +source_repo: https://huggingface.co/ibm-granite/granite-3b-code-instruct-2k +model_maker_id: ibm-watsonx +technical_details: + Input sequence length for Prompt Processor: 128 + Context length: 2048 + Number of parameters: 3.48B + Precision: fp16 + Num of key-value heads: 32 + Information about the model parts: Prompt Processor and Token Generator are split into 4 parts each. Each corresponding Prompt Processor and Token Generator part share weights. + Prompt processor model size: 7 GB + Prompt processor input (part1): 128 tokens + Prompt processor output (part1): Embeddings output + Prompt processor input (other parts): 128 tokens + KVCache initialized with pad token + Prompt processor output (other parts): 128 output tokens + KVCache for token generator + Token generator model size: 7 GB + Token generator input (part1): 1 token + Token generator output (part1): Embeddings output + Token generator input (other parts): 1 input token + past KVCache + Token generator output (other parts): 1 output token + KVCache for next iteration + Use: Initiate conversation with prompt-processor and then token generator for subsequent iterations. + Supported natural languages: English + Supported programming languages: The Granite code foundation models support 116 programming languages including Python, Javascript, Java, C++, Go, and Rust. + Minimum QNN SDK version required: 2.27.7 + TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (2048 tokens). + Response Rate: Rate of response generation after the first response token. +applicable_scenarios: + - Coding + - Coding assist +related_models: [] +form_factors: + - Phone + - Tablet +has_static_banner: true +has_animated_banner: false +dataset: [] +model_type_llm: true +restrict_model_sharing: true +license_type: apache-2.0 +deploy_license_type: apache-2. +llm_details: + call_to_action: 'contact_for_download' diff --git a/qai_hub_models/models/ibm_granite_3b_code_instruct/perf.yaml b/qai_hub_models/models/ibm_granite_3b_code_instruct/perf.yaml new file mode 100644 index 00000000..338d5d66 --- /dev/null +++ b/qai_hub_models/models/ibm_granite_3b_code_instruct/perf.yaml @@ -0,0 +1,29 @@ +aggregated: + supported_oses: + - Android + supported_devices: + - Samsung Galaxy S24 + - Samsung Galaxy S24 Ultra + - Samsung Galaxy S24+ + - Snapdragon 8 Gen 3 QRD + - Snapdragon 8 Elite QRD + supported_chipsets: + - Snapdragon® 8 Gen 3 + - Snapdragon® 8 Elite +models: + name: 'IBM-Granite-3B-Code' + performance_metrics: + - torchscript_onnx_qnn: + llm_metrics: + time_to_first_token_range: + min: 326200 + max: 5219200 + tokens_per_second: 5.47 + reference_device_info: + name: Samsung Galaxy S24 + os: '14' + form_factor: Phone + os_name: Android + manufacturer: Samsung + chipset: Snapdragon® 8 Gen 3 + timestamp: '2024-10-17T23:51:08Z' diff --git a/qai_hub_models/models/inception_v3/README.md b/qai_hub_models/models/inception_v3/README.md index 81e30641..deeeab5e 100644 --- a/qai_hub_models/models/inception_v3/README.md +++ b/qai_hub_models/models/inception_v3/README.md @@ -6,7 +6,7 @@ InceptionNetV3 is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of Inception-v3 found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/inception.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/inception_v3). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.inception_v3.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Inception-v3 can be found +* The license for the original implementation of Inception-v3 can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Rethinking the Inception Architecture for Computer Vision](http://arxiv.org/abs/1512.00567) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/inception.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/inception_v3/export.py b/qai_hub_models/models/inception_v3/export.py index 53449848..d16a859c 100644 --- a/qai_hub_models/models/inception_v3/export.py +++ b/qai_hub_models/models/inception_v3/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.inception_v3 import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "inception_v3" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/inception_v3/perf.yaml b/qai_hub_models/models/inception_v3/perf.yaml index e01582b0..424426b4 100644 --- a/qai_hub_models/models/inception_v3/perf.yaml +++ b/qai_hub_models/models/inception_v3/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: Inception-v3 performance_metrics: - torchscript_onnx_tflite: - inference_time: 1329.0 - throughput: 752.4454477050414 + inference_time: 1332.0 + throughput: 750.7507507507507 estimated_peak_memory_range: - min: 16384 - max: 2231320 + min: 20480 + max: 2400960 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 129 - job_id: jz5wod46p + job_id: jgkex422g job_status: Passed torchscript_onnx_qnn: - inference_time: 1416.0 - throughput: 706.2146892655368 + inference_time: 1400.0 + throughput: 714.2857142857143 estimated_peak_memory_range: min: 16384 - max: 149319888 + max: 148127296 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 219 - job_id: jo5mrw8wg + job_id: jgz3dmlk5 job_status: Passed torchscript_onnx: - inference_time: 1737.0 - throughput: 575.7052389176741 + inference_time: 1751.0 + throughput: 571.1022272986864 estimated_peak_memory_range: - min: 49152 - max: 51589192 + min: 16384 + max: 51841872 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 221 - job_id: jn5q87v45 + job_id: jprv30x9g job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:28:05Z' + timestamp: '2024-10-15T00:26:16Z' - torchscript_onnx_tflite: inference_time: 1148.0 throughput: 871.0801393728223 estimated_peak_memory_range: min: 16384 - max: 59265056 + max: 60836976 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 129 - job_id: jmg9v3dl5 + job_id: j5q6qyl4p job_status: Passed torchscript_onnx_qnn: - inference_time: 1203.0 - throughput: 831.255195344971 + inference_time: 1197.0 + throughput: 835.421888053467 estimated_peak_memory_range: - min: 634880 - max: 20864032 + min: 618496 + max: 17772976 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 219 - job_id: jegn29krg + job_id: j5we67y65 job_status: Passed torchscript_onnx: - inference_time: 1448.0 - throughput: 690.6077348066299 + inference_time: 1419.0 + throughput: 704.7216349541931 estimated_peak_memory_range: min: 0 - max: 58046256 + max: 60074848 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 221 - job_id: j1gln0l8p + job_id: jp2kywo4p job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:28:06Z' + timestamp: '2024-10-15T00:26:17Z' - torchscript_onnx_tflite: - inference_time: 1325.0 - throughput: 754.7169811320755 + inference_time: 1327.0 + throughput: 753.5795026375282 estimated_peak_memory_range: - min: 32768 - max: 6272856 + min: 163840 + max: 53492312 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 129 - job_id: jnp10d625 + job_id: jglvmxy85 job_status: Passed torchscript_onnx_qnn: - inference_time: 1423.0 - throughput: 702.7406886858749 + inference_time: 1468.0 + throughput: 681.1989100817439 estimated_peak_memory_range: min: 634880 - max: 1851200 + max: 1868008 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 219 - job_id: jep287e4p + job_id: jp14zjo2p job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:28:00Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:26:08Z' - torchscript_onnx_tflite: - inference_time: 2093.0 - throughput: 477.78308647873865 + inference_time: 1329.0 + throughput: 752.4454477050414 estimated_peak_memory_range: - min: 12288 - max: 60240480 + min: 53248 + max: 2359064 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 129 - job_id: jvgdwr2e5 + job_id: jpv6kdlj5 job_status: Passed torchscript_onnx_qnn: - inference_time: 2176.0 - throughput: 459.55882352941177 + inference_time: 1462.0 + throughput: 683.9945280437756 estimated_peak_memory_range: - min: 618496 - max: 22391728 + min: 634880 + max: 1924592 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 219 - job_id: jogkzl82g + job_id: jp4lr1ev5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:28:04Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:26:12Z' - torchscript_onnx_tflite: inference_time: 1328.0 throughput: 753.0120481927711 estimated_peak_memory_range: - min: 0 - max: 55130360 + min: 28672 + max: 1681584 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 129 - job_id: jz57zj9lp + job_id: jgo26rlxp job_status: Passed torchscript_onnx_qnn: - inference_time: 1427.0 - throughput: 700.770847932726 + inference_time: 1454.0 + throughput: 687.757909215956 estimated_peak_memory_range: - min: 634880 - max: 1925000 + min: 626688 + max: 2346336 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 219 - job_id: jqpye4m7g + job_id: j57yr4ol5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:28:01Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:26:11Z' - torchscript_onnx_tflite: inference_time: 1331.0 throughput: 751.3148009015778 estimated_peak_memory_range: - min: 24576 - max: 1709592 + min: 16384 + max: 5672656 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 129 - job_id: jqp4qx3vg + job_id: jp3j09zlg job_status: Passed torchscript_onnx_qnn: - inference_time: 1427.0 - throughput: 700.770847932726 + inference_time: 1476.0 + throughput: 677.5067750677507 estimated_peak_memory_range: - min: 634880 - max: 1871288 + min: 651264 + max: 1957064 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 219 - job_id: j2p0y166g + job_id: jgdx136ep job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:28:02Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:26:09Z' - torchscript_onnx_tflite: - inference_time: 1328.0 - throughput: 753.0120481927711 + inference_time: 2100.0 + throughput: 476.1904761904762 estimated_peak_memory_range: - min: 28672 - max: 1909136 + min: 180224 + max: 60784512 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 129 - job_id: j0pxv7x1g + job_id: j56y4780p job_status: Passed torchscript_onnx_qnn: - inference_time: 1420.0 - throughput: 704.2253521126761 + inference_time: 2187.0 + throughput: 457.2473708276177 estimated_peak_memory_range: - min: 626688 - max: 1963880 + min: 0 + max: 22346048 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 219 - job_id: j1p8o31xg + job_id: j5mnxm9wp job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:28:03Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:26:14Z' + - torchscript_onnx_tflite: + inference_time: 948.0 + throughput: 1054.8523206751054 + estimated_peak_memory_range: + min: 12288 + max: 23427968 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 129 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 129 + job_id: jpedmz715 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 998.0 + throughput: 1002.0040080160321 + estimated_peak_memory_range: + min: 618496 + max: 16156208 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 219 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 219 + job_id: jgn6vn1r5 + job_status: Passed + torchscript_onnx: + inference_time: 1251.0 + throughput: 799.3605115907275 + estimated_peak_memory_range: + min: 0 + max: 25469632 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 221 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 221 + job_id: jp8qyxjxp + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:26:20Z' - torchscript_onnx_qnn: - inference_time: 1476.0 - throughput: 677.5067750677507 + inference_time: 1482.0 + throughput: 674.7638326585695 estimated_peak_memory_range: min: 602112 max: 602112 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 219 - job_id: joprk4w95 + job_id: jg9lnmolg job_status: Passed torchscript_onnx: - inference_time: 1682.0 - throughput: 594.5303210463734 + inference_time: 1687.0 + throughput: 592.7682276229995 estimated_peak_memory_range: - min: 49823744 - max: 49823744 + min: 48934912 + max: 48934912 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 221 - job_id: jw5663w05 + job_id: jpy13x87p job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:28:07Z' + timestamp: '2024-10-15T00:26:18Z' diff --git a/qai_hub_models/models/inception_v3_quantized/README.md b/qai_hub_models/models/inception_v3_quantized/README.md index 87f38606..84df8a1b 100644 --- a/qai_hub_models/models/inception_v3_quantized/README.md +++ b/qai_hub_models/models/inception_v3_quantized/README.md @@ -6,7 +6,7 @@ InceptionNetV3 is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This model is post-training quantized to int8 using samples from Google's open images dataset. This is based on the implementation of Inception-v3-Quantized found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/inception.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/inception_v3_quantized). @@ -17,11 +17,6 @@ accross various devices, can be found [here](https://aihub.qualcomm.com/models/i ## Example & Usage -Install the package via pip: -```bash -pip install "qai_hub_models[inception_v3_quantized]" -``` - Once installed, run the following simple CLI demo: @@ -44,15 +39,19 @@ python -m qai_hub_models.models.inception_v3_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Inception-v3-Quantized can be found +* The license for the original implementation of Inception-v3-Quantized can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Rethinking the Inception Architecture for Computer Vision](http://arxiv.org/abs/1512.00567) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/inception.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/inception_v3_quantized/evaluate.py b/qai_hub_models/models/inception_v3_quantized/evaluate.py index 47341fcd..6547e871 100644 --- a/qai_hub_models/models/inception_v3_quantized/evaluate.py +++ b/qai_hub_models/models/inception_v3_quantized/evaluate.py @@ -13,10 +13,8 @@ from qai_hub_models.models.inception_v3_quantized import MODEL_ID, Model from qai_hub_models.utils.args import evaluate_parser, get_hub_device, get_model_kwargs -from qai_hub_models.utils.base_model import BaseModel from qai_hub_models.utils.evaluate import evaluate_on_dataset from qai_hub_models.utils.inference import compile_model_from_args -from qai_hub_models.utils.quantization_aimet import AIMETQuantizableMixin SUPPORTED_DATASETS = ["imagenette", "imagenet"] @@ -27,6 +25,7 @@ def main(): model_cls=Model, default_split_size=2500, supported_datasets=SUPPORTED_DATASETS, + is_hub_quantized=True, ) args = parser.parse_args() args.device = None @@ -38,13 +37,7 @@ def main(): MODEL_ID, args, get_model_kwargs(Model, vars(args)) ) hub_device = get_hub_device(None, args.chipset) - - # Use Fp16 model for torch inference - for cls in Model.__mro__: - if issubclass(cls, BaseModel) and not issubclass(cls, AIMETQuantizableMixin): - torch_cls = cls - break - torch_model = torch_cls.from_pretrained(**get_model_kwargs(torch_cls, vars(args))) + torch_model = Model.from_pretrained(**get_model_kwargs(Model, vars(args))) evaluate_on_dataset( hub_model, torch_model, diff --git a/qai_hub_models/models/inception_v3_quantized/export.py b/qai_hub_models/models/inception_v3_quantized/export.py index f68f8a52..47650ff0 100644 --- a/qai_hub_models/models/inception_v3_quantized/export.py +++ b/qai_hub_models/models/inception_v3_quantized/export.py @@ -10,18 +10,20 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.inception_v3_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference +from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( print_inference_metrics, print_on_target_demo_cmd, @@ -31,11 +33,14 @@ can_access_qualcomm_ai_hub, export_without_hub_access, ) +from qai_hub_models.utils.quantization import get_calibration_data def export_model( device: str = "Samsung Galaxy S23 (Family)", chipset: Optional[str] = None, + num_calibration_samples: int = 100, + skip_compiling: bool = False, skip_profiling: bool = False, skip_inferencing: bool = False, skip_downloading: bool = False, @@ -45,20 +50,19 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + 3. Compiles the model to an asset that can be run on device + 4. Profiles the model performance on a real device + 5. Inferences the model on sample inputs + 6. Downloads the model asset to the local directory + 7. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 5 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -66,6 +70,9 @@ def export_model( Defaults to DEFAULT_DEVICE if not specified. chipset: If set, will choose a random device with this chipset. Overrides the `device` argument. + num_calibration_samples: The number of calibration data samples + to use for quantization. + skip_compiling: If set, skips compiling model to format that can run on device. skip_profiling: If set, skips profiling of compiled model on real devices. skip_inferencing: If set, skips computing on-device outputs from sample data. skip_downloading: If set, skips downloading of compiled model. @@ -80,10 +87,11 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: - * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). + A struct of: + * A CompileJob object containing metadata about the compile job submitted to hub (None if compiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). + * A QuantizeJob object containing metadata about the quantize job submitted to hub """ model_name = "inception_v3_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,33 +117,52 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) ) # Trace the model - source_model = model.convert_to_hub_source_model( - target_runtime, output_path, input_spec + source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) + + print(f"Quantizing model {model_name} with {num_calibration_samples} samples.") + # 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + onnx_compile_job = hub.submit_compile_job( + model=source_model, + input_specs=input_spec, + device=hub_device, + name=model_name, + options="--target_runtime onnx", + ) + quantize_job = hub.submit_quantize_job( + model=onnx_compile_job.get_target_model(), + calibration_data=get_calibration_data( + input_spec, "imagenette", num_calibration_samples + ), + weights_dtype=model.get_weights_dtype(), + activations_dtype=model.get_activations_dtype(), + name=model_name, + options=model.get_quantize_options(), ) + if skip_compiling: + return ExportResult(quantize_job=quantize_job) - # 2. Compile the model to an on-device asset + # 3. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) print(f"Optimizing model {model_name} to run on-device") submitted_compile_job = hub.submit_compile_job( - model=source_model, + model=quantize_job.get_target_model(), input_specs=input_spec, device=hub_device, name=model_name, - calibration_data=model.get_calibration_data(target_runtime), options=model_compile_options, ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 4. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +177,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 5. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +198,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 6. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 7. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -202,12 +229,17 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + quantize_job=quantize_job, + ) def main(): warnings.filterwarnings("ignore") - parser = export_parser(model_cls=Model) + parser = export_parser(model_cls=Model, is_hub_quantized=True) args = parser.parse_args() export_model(**vars(args)) diff --git a/qai_hub_models/models/inception_v3_quantized/model.py b/qai_hub_models/models/inception_v3_quantized/model.py index c5eaac55..3d95c1aa 100644 --- a/qai_hub_models/models/inception_v3_quantized/model.py +++ b/qai_hub_models/models/inception_v3_quantized/model.py @@ -4,85 +4,11 @@ # --------------------------------------------------------------------- from __future__ import annotations -# isort: off -# This verifies aimet is installed, and this must be included first. -from qai_hub_models.utils.quantization_aimet import ( - AIMETQuantizableMixin, -) - -# isort: on - -import torch -from aimet_torch.cross_layer_equalization import equalize_model -from aimet_torch.model_preparer import prepare_model -from aimet_torch.quantsim import QuantizationSimModel, load_encodings_to_sim - from qai_hub_models.models.inception_v3.model import InceptionNetV3 -from qai_hub_models.utils.aimet.config_loader import get_default_aimet_config -from qai_hub_models.utils.asset_loaders import CachedWebModelAsset -from qai_hub_models.utils.quantization_aimet import ( - constrain_quantized_inputs_to_image_range, - tie_observers, -) +from qai_hub_models.utils.quantization import HubQuantizableMixin MODEL_ID = __name__.split(".")[-2] -MODEL_ASSET_VERSION = 6 -DEFAULT_ENCODINGS = "inception_v3_quantized_encodings.json" - - -class InceptionNetV3Quantizable( - AIMETQuantizableMixin, - InceptionNetV3, -): - """InceptionNetV3 with post train quantization support. - - Supports only 8 bit weights and activations, and only loads pre-quantized checkpoints. - Support for quantizing using your own weights & data will come at a later date.""" - - def __init__( - self, - sim_model: QuantizationSimModel, - ) -> None: - # Input is already normalized by sim_model. Disable it in the wrapper model. - InceptionNetV3.__init__(self, sim_model.model, normalize_input=False) - AIMETQuantizableMixin.__init__( - self, - sim_model, - ) - - @classmethod - def from_pretrained( - cls, - aimet_encodings: str | None = "DEFAULT", - ) -> "InceptionNetV3Quantizable": - """ - Parameters: - aimet_encodings: - if "DEFAULT": Loads the model with aimet encodings calibrated on imagenette. - elif None: Doesn't load any encodings. Used when computing encodings. - else: Interprets as a filepath and loads the encodings stored there. - """ - model = InceptionNetV3.from_pretrained() - input_shape = cls.get_input_spec()["image_tensor"][0] - - model = prepare_model(model) - equalize_model(model, input_shape) - sim = QuantizationSimModel( - model, - quant_scheme="tf_enhanced", - default_param_bw=8, - default_output_bw=8, - config_file=get_default_aimet_config(), - dummy_input=torch.rand(input_shape), - ) - tie_observers(sim) - constrain_quantized_inputs_to_image_range(sim) - if aimet_encodings: - if aimet_encodings == "DEFAULT": - aimet_encodings = CachedWebModelAsset.from_asset_store( - MODEL_ID, MODEL_ASSET_VERSION, DEFAULT_ENCODINGS - ).fetch() - load_encodings_to_sim(sim, aimet_encodings) - return cls(sim) +class InceptionNetV3Quantizable(HubQuantizableMixin, InceptionNetV3): + pass diff --git a/qai_hub_models/models/inception_v3_quantized/perf.yaml b/qai_hub_models/models/inception_v3_quantized/perf.yaml index 944f9dc9..6237e43b 100644 --- a/qai_hub_models/models/inception_v3_quantized/perf.yaml +++ b/qai_hub_models/models/inception_v3_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,82 +20,77 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS6490 (Proxy) - RB3 Gen 2 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Sa8775p Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Qcs6490 Proxy + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8775P Proxy models: - name: Inception-v3-Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 595.0 - throughput: 1680.672268907563 + inference_time: 590.0 + throughput: 1694.915254237288 estimated_peak_memory_range: min: 12288 - max: 15492704 + max: 1448856 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 142 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jvgdwr8l5 + total_layers: 142 + job_id: jgn6090m5 job_status: Passed torchscript_onnx_qnn: - inference_time: 645.0 - throughput: 1550.3875968992247 + inference_time: 650.0 + throughput: 1538.4615384615386 estimated_peak_memory_range: - min: 16384 - max: 29053320 + min: 20480 + max: 229064024 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 125 + layers_on_npu: 219 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 125 - job_id: jegn297rg + total_layers: 219 + job_id: jgo2z1zdp job_status: Passed torchscript_onnx: - inference_time: 861.0 - throughput: 1161.4401858304298 + inference_time: 873.0 + throughput: 1145.475372279496 estimated_peak_memory_range: - min: 16384 - max: 31125256 + min: 12288 + max: 31093288 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 134 + layers_on_npu: 130 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 134 - job_id: jw5663d05 + total_layers: 130 + job_id: jpxk97n95 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -103,51 +99,51 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:27:25Z' + timestamp: '2024-10-17T17:29:35Z' - torchscript_onnx_tflite: - inference_time: 442.0 - throughput: 2262.443438914027 + inference_time: 445.0 + throughput: 2247.191011235955 estimated_peak_memory_range: - min: 20480 - max: 73483536 + min: 12288 + max: 73743552 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 142 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jz5wod16p + total_layers: 142 + job_id: jprv646eg job_status: Passed torchscript_onnx_qnn: - inference_time: 495.0 - throughput: 2020.20202020202 + inference_time: 490.0 + throughput: 2040.8163265306123 estimated_peak_memory_range: - min: 167936 - max: 18911600 + min: 172032 + max: 21424032 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 125 + layers_on_npu: 219 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 125 - job_id: joprk4n95 + total_layers: 219 + job_id: jpv6q1qm5 job_status: Passed torchscript_onnx: - inference_time: 627.0 - throughput: 1594.896331738437 + inference_time: 695.0 + throughput: 1438.8489208633093 estimated_peak_memory_range: min: 0 - max: 99466880 + max: 101539664 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 134 + layers_on_npu: 130 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 134 - job_id: j1p3k4wl5 + total_layers: 130 + job_id: j5mnewqqp job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -156,150 +152,173 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:27:26Z' + timestamp: '2024-10-17T17:29:36Z' - torchscript_onnx_tflite: - inference_time: 588.0 - throughput: 1700.6802721088436 + inference_time: 2322.0 + throughput: 430.66322136089576 estimated_peak_memory_range: - min: 36864 - max: 9896376 + min: 16384 + max: 27591120 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 142 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jmg9v3xl5 + total_layers: 142 + job_id: jp2kx7xmp job_status: Passed torchscript_onnx_qnn: - inference_time: 655.0 - throughput: 1526.7175572519084 + inference_time: 2854.0 + throughput: 350.385423966363 estimated_peak_memory_range: - min: 184320 - max: 1426904 + min: 167936 + max: 7752784 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 125 + layers_on_npu: 219 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 125 - job_id: jqpye477g + total_layers: 219 + job_id: jgjvd0d8g job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS6490 Proxy + timestamp: '2024-10-17T17:29:18Z' + - torchscript_onnx_tflite: + inference_time: 7622.0 + throughput: 131.19916032537392 + estimated_peak_memory_range: + min: 16384 + max: 2592552 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 142 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 142 + job_id: jpy1z4z4p + job_status: Passed + reference_device_info: + name: RB5 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:27:19Z' + chipset: QCS8250 Proxy + timestamp: '2024-10-17T17:29:00Z' - torchscript_onnx_tflite: - inference_time: 704.0 - throughput: 1420.4545454545455 + inference_time: 587.0 + throughput: 1703.5775127768313 estimated_peak_memory_range: min: 16384 - max: 74810784 + max: 1363992 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 142 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jnp10dv25 + total_layers: 142 + job_id: jp0z414e5 job_status: Passed torchscript_onnx_qnn: - inference_time: 761.0 - throughput: 1314.060446780552 + inference_time: 654.0 + throughput: 1529.051987767584 estimated_peak_memory_range: - min: 172032 - max: 20979040 + min: 176128 + max: 1503016 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 125 + layers_on_npu: 219 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 125 - job_id: jn5q87m45 + total_layers: 219 + job_id: jpedoro05 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:27:23Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-17T17:29:20Z' - torchscript_onnx_tflite: - inference_time: 588.0 - throughput: 1700.6802721088436 + inference_time: 592.0 + throughput: 1689.1891891891892 estimated_peak_memory_range: - min: 12288 - max: 19061664 + min: 28672 + max: 2042328 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 142 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jvgdwrze5 + total_layers: 142 + job_id: jp8q2328p job_status: Passed torchscript_onnx_qnn: - inference_time: 657.0 - throughput: 1522.0700152207 + inference_time: 649.0 + throughput: 1540.8320493066255 estimated_peak_memory_range: - min: 176128 - max: 1497000 + min: 196608 + max: 1392784 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 125 + layers_on_npu: 219 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 125 - job_id: j2p0y1v6g + total_layers: 219 + job_id: j5wewdwj5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:27:20Z' + chipset: SA8255P Proxy + timestamp: '2024-10-17T17:29:23Z' - torchscript_onnx_tflite: - inference_time: 591.0 - throughput: 1692.047377326565 + inference_time: 595.0 + throughput: 1680.672268907563 estimated_peak_memory_range: - min: 24576 - max: 28002680 + min: 40960 + max: 211398824 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 142 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jz57zj7lp + total_layers: 142 + job_id: jgkevlvog job_status: Passed torchscript_onnx_qnn: - inference_time: 659.0 - throughput: 1517.4506828528072 + inference_time: 647.0 + throughput: 1545.595054095827 estimated_peak_memory_range: - min: 180224 - max: 1259464 + min: 184320 + max: 1395136 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 125 + layers_on_npu: 219 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 125 - job_id: j1p8o34xg + total_layers: 219 + job_id: jg9l030vg job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -307,136 +326,128 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:27:21Z' + chipset: SA8775P Proxy + timestamp: '2024-10-17T17:29:26Z' - torchscript_onnx_tflite: - inference_time: 589.0 - throughput: 1697.792869269949 + inference_time: 700.0 + throughput: 1428.5714285714287 estimated_peak_memory_range: min: 12288 - max: 208360352 + max: 74882336 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 142 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jqp4qx9vg + total_layers: 142 + job_id: j5q6070mp job_status: Passed torchscript_onnx_qnn: - inference_time: 655.0 - throughput: 1526.7175572519084 + inference_time: 770.0 + throughput: 1298.7012987012988 estimated_peak_memory_range: - min: 180224 - max: 1692520 + min: 167936 + max: 23876128 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 125 + layers_on_npu: 219 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 125 - job_id: jogkzl92g + total_layers: 219 + job_id: jp142d2lp job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:27:22Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-17T17:29:28Z' - torchscript_onnx_tflite: - inference_time: 2312.0 - throughput: 432.52595155709344 + inference_time: 421.0 + throughput: 2375.296912114014 estimated_peak_memory_range: min: 12288 - max: 28043472 + max: 24823056 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 142 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: j0pxv7d1g + total_layers: 142 + job_id: jglv404l5 job_status: Passed torchscript_onnx_qnn: - inference_time: 2944.0 - throughput: 339.67391304347825 + inference_time: 432.0 + throughput: 2314.814814814815 estimated_peak_memory_range: - min: 12288 - max: 8019344 + min: 0 + max: 16446992 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 125 + layers_on_npu: 219 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 125 - job_id: j1gln018p + total_layers: 219 + job_id: jgdxnrnlp job_status: Passed - reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot - os_name: Android - manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T12:27:24Z' - - torchscript_onnx_tflite: - inference_time: 7834.0 - throughput: 127.64871074802144 + torchscript_onnx: + inference_time: 647.0 + throughput: 1545.595054095827 estimated_peak_memory_range: - min: 12288 - max: 3393688 + min: 8192 + max: 34873856 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 130 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jo5mrwdwg + total_layers: 130 + job_id: jprv648eg job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T12:27:15Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-17T17:29:40Z' - torchscript_onnx_qnn: - inference_time: 697.0 - throughput: 1434.7202295552368 + inference_time: 721.0 + throughput: 1386.9625520110958 estimated_peak_memory_range: - min: 475136 - max: 475136 + min: 577536 + max: 577536 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 125 + layers_on_npu: 219 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 125 - job_id: jep287v4p + total_layers: 219 + job_id: jgz32x265 job_status: Passed torchscript_onnx: - inference_time: 793.0 - throughput: 1261.034047919294 + inference_time: 784.0 + throughput: 1275.5102040816328 estimated_peak_memory_range: - min: 28610560 - max: 28610560 + min: 28753920 + max: 28753920 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 134 + layers_on_npu: 130 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 134 - job_id: jwgoy14x5 + total_layers: 130 + job_id: jgn609lm5 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -445,4 +456,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:27:27Z' + timestamp: '2024-10-17T17:29:38Z' diff --git a/qai_hub_models/models/inception_v3_quantized/requirements.txt b/qai_hub_models/models/inception_v3_quantized/requirements.txt deleted file mode 100644 index de5b80e8..00000000 --- a/qai_hub_models/models/inception_v3_quantized/requirements.txt +++ /dev/null @@ -1 +0,0 @@ -aimet-torch==1.32.1.post1; sys_platform == "linux" diff --git a/qai_hub_models/models/inception_v3_quantized/test.py b/qai_hub_models/models/inception_v3_quantized/test.py deleted file mode 100644 index 486a8cee..00000000 --- a/qai_hub_models/models/inception_v3_quantized/test.py +++ /dev/null @@ -1,29 +0,0 @@ -# --------------------------------------------------------------------- -# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. -# SPDX-License-Identifier: BSD-3-Clause -# --------------------------------------------------------------------- -from qai_hub_models.models._shared.imagenet_classifier.test_utils import ( - run_imagenet_classifier_test, -) -from qai_hub_models.models.inception_v3_quantized.demo import main as demo_main -from qai_hub_models.models.inception_v3_quantized.model import ( - MODEL_ASSET_VERSION, - MODEL_ID, - InceptionNetV3Quantizable, -) - - -def test_task(): - run_imagenet_classifier_test( - InceptionNetV3Quantizable.from_pretrained(), - MODEL_ID, - diff_tol=0.005, - rtol=0.02, - atol=0.2, - asset_version=MODEL_ASSET_VERSION, - ) - - -def test_demo(): - # Verify demo does not crash - demo_main(is_test=True) diff --git a/qai_hub_models/models/indus_1b_quantized/README.md b/qai_hub_models/models/indus_1b_quantized/README.md new file mode 100644 index 00000000..4950ce2e --- /dev/null +++ b/qai_hub_models/models/indus_1b_quantized/README.md @@ -0,0 +1,55 @@ +[![Qualcomm® AI Hub Models](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/quic-logo.jpg)](../../README.md) + + +# [IndusQ-1.1B: State-of-the-art large language model useful on a variety of language understanding and generation tasks](https://aihub.qualcomm.com/models/indus_1b_quantized) + +Indus is today a 1.2 billion parameter model and has been supervised fine tuned for Hindi and dialects. + +This is based on the implementation of IndusQ-1.1B found +[here]({source_repo}). This repository contains scripts for optimized on-device +export suitable to run on Qualcomm® devices. More details on model performance +accross various devices, can be found [here](https://aihub.qualcomm.com/models/indus_1b_quantized). + +[Sign up](https://myaccount.qualcomm.com/signup) to start using Qualcomm AI Hub and run these models on a hosted Qualcomm® device. + +## Deploying IndusQ-1.1B on-device + +Please follow the [LLM on-device deployment](https://github.com/quic/ai-hub-apps/tree/main/tutorials/llm_on_genie) tutorial. + + + + + +## References +* [Project Indus: A Foundational Model for Indian Languages](https://www.techmahindra.com/makers-lab/indus-project/) +* [Source Model Implementation](https://huggingface.co/nickmalhotra/ProjectIndus) + + + +## Community +* Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. +* For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). + + +## Usage and Limitations + +This model may not be used for or in connection with any of the following applications: + +- Accessing essential private and public services and benefits; +- Administration of justice and democratic processes; +- Assessing or recognizing the emotional state of a person; +- Biometric and biometrics-based systems, including categorization of persons based on sensitive characteristics; +- Education and vocational training; +- Employment and workers management; +- Exploitation of the vulnerabilities of persons resulting in harmful behavior; +- General purpose social scoring; +- Law enforcement; +- Management and operation of critical infrastructure; +- Migration, asylum and border control management; +- Predictive policing; +- Real-time remote biometric identification in public spaces; +- Recommender systems of social media platforms; +- Scraping of facial images (from the internet or otherwise); and/or +- Subliminal manipulation + + diff --git a/qai_hub_models/models/indus_1b_quantized/info.yaml b/qai_hub_models/models/indus_1b_quantized/info.yaml new file mode 100644 index 00000000..f4badf4c --- /dev/null +++ b/qai_hub_models/models/indus_1b_quantized/info.yaml @@ -0,0 +1,42 @@ +name: IndusQ-1.1B +id: indus_1b_quantized +status: public +headline: State-of-the-art large language model useful on a variety of language + understanding and generation tasks. +domain: Generative AI +description: Indus is today a 1.2 billion parameter model and has been supervised fine tuned for Hindi and dialects. +use_case: Text Generation +tags: + - llm + - generative-ai + - quantized +research_paper: https://www.techmahindra.com/makers-lab/indus-project/ +research_paper_title: "Project Indus: A Foundational Model for Indian Languages" +source_repo: https://huggingface.co/nickmalhotra/ProjectIndus +model_maker_id: tech-mahindra +technical_details: + Input sequence length for Prompt Processor: 128 + Max context length: 1024 + Number of parameters: 1B + Precision: w4a16 + w8a16 (few layers) + Use: Initiate conversation with prompt-processor and then token generator for subsequent iterations. + Minimum QNN SDK version required: 2.27.7 + Supported languages: Hindi and English. + TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (1024 tokens). + Response Rate: Rate of response generation after the first response token. +applicable_scenarios: + - Dialogue + - Content Generation + - Customer Support +related_models: [] +form_factors: + - Phone + - Tablet +has_static_banner: true +has_animated_banner: false +dataset: [] +model_type_llm: true +license_type: 'other' +restrict_model_sharing: true +llm_details: + call_to_action: 'contact_for_purchase' diff --git a/qai_hub_models/models/indus_1b_quantized/perf.yaml b/qai_hub_models/models/indus_1b_quantized/perf.yaml new file mode 100644 index 00000000..795a43b2 --- /dev/null +++ b/qai_hub_models/models/indus_1b_quantized/perf.yaml @@ -0,0 +1,25 @@ +aggregated: + supported_oses: + - Android + supported_devices: + - Snapdragon 8 Elite QRD + supported_chipsets: + - Snapdragon® 8 Elite +models: + name: 'IndusQ-1.1B' + performance_metrics: + - torchscript_onnx_qnn: + llm_metrics: + time_to_first_token_range: + min: 28561 + max: 228489 + tokens_per_second: 74.60 + evaluation_metrics: null + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-16T00:32:42.210701Z' diff --git a/qai_hub_models/models/jais_6p7b_chat_quantized/README.md b/qai_hub_models/models/jais_6p7b_chat_quantized/README.md new file mode 100644 index 00000000..fb207748 --- /dev/null +++ b/qai_hub_models/models/jais_6p7b_chat_quantized/README.md @@ -0,0 +1,55 @@ +[![Qualcomm® AI Hub Models](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/quic-logo.jpg)](../../README.md) + + +# [JAIS-6p7b-Chat: State-of-the-art large language model useful on a variety of language understanding and generation tasks](https://aihub.qualcomm.com/models/jais_6p7b_chat_quantized) + +JAIS 6.7B is a bilingual large language model (LLM) for both Arabic and English developed by Inception, a G42 company in partnership with MBZUAI and Cerebras. This is a 6.7 billion parameter LLM, trained on a dataset containing 141 billion Arabic tokens and 339 billion English/code tokens. The model is based on transformer-based decoder-only (GPT-3) architecture and uses SwiGLU non-linearity. It implements ALiBi position embeddings, enabling the model to extrapolate to long sequence lengths, providing improved context handling and model precision. The JAIS family of models is a comprehensive series of bilingual English-Arabic LLMs. These models are optimized to excel in Arabic while having strong English capabilities. + +This is based on the implementation of JAIS-6p7b-Chat found +[here]({source_repo}). This repository contains scripts for optimized on-device +export suitable to run on Qualcomm® devices. More details on model performance +accross various devices, can be found [here](https://aihub.qualcomm.com/models/jais_6p7b_chat_quantized). + +[Sign up](https://myaccount.qualcomm.com/signup) to start using Qualcomm AI Hub and run these models on a hosted Qualcomm® device. + +## Deploying JAIS-6p7b-Chat on-device + +Please follow the [LLM on-device deployment](https://github.com/quic/ai-hub-apps/tree/main/tutorials/llm_on_genie) tutorial. + + + + + +## References +* [Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models](https://arxiv.org/abs/2308.16149) +* [Source Model Implementation](https://huggingface.co/inceptionai/jais-family-6p7b) + + + +## Community +* Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. +* For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). + + +## Usage and Limitations + +This model may not be used for or in connection with any of the following applications: + +- Accessing essential private and public services and benefits; +- Administration of justice and democratic processes; +- Assessing or recognizing the emotional state of a person; +- Biometric and biometrics-based systems, including categorization of persons based on sensitive characteristics; +- Education and vocational training; +- Employment and workers management; +- Exploitation of the vulnerabilities of persons resulting in harmful behavior; +- General purpose social scoring; +- Law enforcement; +- Management and operation of critical infrastructure; +- Migration, asylum and border control management; +- Predictive policing; +- Real-time remote biometric identification in public spaces; +- Recommender systems of social media platforms; +- Scraping of facial images (from the internet or otherwise); and/or +- Subliminal manipulation + + diff --git a/qai_hub_models/models/jais_6p7b_chat_quantized/info.yaml b/qai_hub_models/models/jais_6p7b_chat_quantized/info.yaml new file mode 100644 index 00000000..b971e078 --- /dev/null +++ b/qai_hub_models/models/jais_6p7b_chat_quantized/info.yaml @@ -0,0 +1,42 @@ +name: JAIS-6p7b-Chat +id: jais_6p7b_chat_quantized +status: public +headline: State-of-the-art large language model useful on a variety of language + understanding and generation tasks. +domain: Generative AI +description: JAIS 6.7B is a bilingual large language model (LLM) for both Arabic and English developed by Inception, a G42 company in partnership with MBZUAI and Cerebras. This is a 6.7 billion parameter LLM, trained on a dataset containing 141 billion Arabic tokens and 339 billion English/code tokens. The model is based on transformer-based decoder-only (GPT-3) architecture and uses SwiGLU non-linearity. It implements ALiBi position embeddings, enabling the model to extrapolate to long sequence lengths, providing improved context handling and model precision. The JAIS family of models is a comprehensive series of bilingual English-Arabic LLMs. These models are optimized to excel in Arabic while having strong English capabilities. +use_case: Text Generation +tags: + - llm + - generative-ai + - quantized +research_paper: https://arxiv.org/abs/2308.16149 +research_paper_title: "Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models" +source_repo: https://huggingface.co/inceptionai/jais-family-6p7b +model_maker_id: g42 +technical_details: + Input sequence length for Prompt Processor: 128 + Max context length: 2048 + Number of parameters: 6.7B + Precision: w4a16 + w8a16 (a few layers) + Use: Initiate conversation with prompt-processor and then token generator for subsequent iterations. + Supported languages: Arabic (MSA) and English. + Minimum QNN SDK version required: 2.27.7 + TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (2048 tokens). + Response Rate: Rate of response generation after the first response token. +applicable_scenarios: + - Dialogue + - Content Generation + - Customer Support +related_models: [] +form_factors: + - Phone + - Tablet +has_static_banner: true +has_animated_banner: false +dataset: [] +model_type_llm: true +license_type: 'other' +restrict_model_sharing: true +llm_details: + call_to_action: 'contact_for_purchase' diff --git a/qai_hub_models/models/jais_6p7b_chat_quantized/perf.yaml b/qai_hub_models/models/jais_6p7b_chat_quantized/perf.yaml new file mode 100644 index 00000000..7f0a7e35 --- /dev/null +++ b/qai_hub_models/models/jais_6p7b_chat_quantized/perf.yaml @@ -0,0 +1,25 @@ +aggregated: + supported_oses: + - Android + supported_devices: + - Snapdragon 8 Elite QRD + supported_chipsets: + - Snapdragon® 8 Elite +models: + name: 'Jais-6p7b-Chat' + performance_metrics: + - torchscript_onnx_qnn: + llm_metrics: + time_to_first_token_range: + min: 238231 + max: 3811696 + tokens_per_second: 13.33 + evaluation_metrics: null + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-16T00:32:42.210701Z' diff --git a/qai_hub_models/models/lama_dilated/README.md b/qai_hub_models/models/lama_dilated/README.md index 685dab86..b84123a9 100644 --- a/qai_hub_models/models/lama_dilated/README.md +++ b/qai_hub_models/models/lama_dilated/README.md @@ -6,7 +6,7 @@ LaMa-Dilated is a machine learning model that allows to erase and in-paint part of given input image. This is based on the implementation of LaMa-Dilated found -[here](https://github.com/advimman/lama). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/lama_dilated). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.lama_dilated.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of LaMa-Dilated can be found +* The license for the original implementation of LaMa-Dilated can be found [here](https://github.com/advimman/lama/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Resolution-robust Large Mask Inpainting with Fourier Convolutions](https://arxiv.org/abs/2109.07161) * [Source Model Implementation](https://github.com/advimman/lama) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/lama_dilated/export.py b/qai_hub_models/models/lama_dilated/export.py index 1ded990c..db6c385b 100644 --- a/qai_hub_models/models/lama_dilated/export.py +++ b/qai_hub_models/models/lama_dilated/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.lama_dilated import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "lama_dilated" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/lama_dilated/perf.yaml b/qai_hub_models/models/lama_dilated/perf.yaml index 54fe8b30..9bd81190 100644 --- a/qai_hub_models/models/lama_dilated/perf.yaml +++ b/qai_hub_models/models/lama_dilated/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: LaMa-Dilated performance_metrics: - torchscript_onnx_tflite: - inference_time: 74926.0 - throughput: 13.346501881856765 + inference_time: 74881.0 + throughput: 13.354522509047689 estimated_peak_memory_range: - min: 3211264 - max: 137684112 + min: 3252224 + max: 138080320 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 343 - job_id: j0pxv739g + job_id: j5mnxm8qp job_status: Passed torchscript_onnx_qnn: - inference_time: 70590.0 - throughput: 14.166312508853945 + inference_time: 70655.0 + throughput: 14.153280022645248 estimated_peak_memory_range: - min: 12288 - max: 42684872 + min: 1724416 + max: 45349520 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,7 +71,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 332 - job_id: jw566vd75 + job_id: jglvmxll5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -82,13 +80,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T23:09:34Z' + timestamp: '2024-10-15T00:23:46Z' - torchscript_onnx_tflite: - inference_time: 56057.0 - throughput: 17.838985318515082 + inference_time: 55681.0 + throughput: 17.95944756739283 estimated_peak_memory_range: - min: 3219456 - max: 250625712 + min: 2813952 + max: 273462720 primary_compute_unit: NPU precision: fp16 layer_info: @@ -96,14 +94,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 343 - job_id: jo5mrwoqg + job_id: jgn6vnkm5 job_status: Passed torchscript_onnx_qnn: - inference_time: 52897.0 - throughput: 18.904663780554664 + inference_time: 52685.0 + throughput: 18.980734554427258 estimated_peak_memory_range: - min: 4268032 - max: 85104720 + min: 4272128 + max: 95572000 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,7 +109,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 332 - job_id: j1p3k8wz5 + job_id: j56y47w7p job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -120,13 +118,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T23:09:35Z' + timestamp: '2024-10-15T00:23:47Z' - torchscript_onnx_tflite: - inference_time: 74797.0 - throughput: 13.369520167921173 + inference_time: 74832.0 + throughput: 13.363267051528759 estimated_peak_memory_range: min: 3264512 - max: 138471064 + max: 137993800 primary_compute_unit: NPU precision: fp16 layer_info: @@ -134,14 +132,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 343 - job_id: jegn29omg + job_id: jprv30weg job_status: Passed torchscript_onnx_qnn: - inference_time: 66777.0 - throughput: 14.975216017491052 + inference_time: 70284.0 + throughput: 14.227989300552046 estimated_peak_memory_range: - min: 4349952 - max: 5644752 + min: 4382720 + max: 5517256 primary_compute_unit: NPU precision: fp16 layer_info: @@ -149,7 +147,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 332 - job_id: j1pv349m5 + job_id: jgo26r8dp job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -157,14 +155,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T23:09:37Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:23:49Z' - torchscript_onnx_tflite: - inference_time: 105500.0 - throughput: 9.47867298578199 + inference_time: 74735.0 + throughput: 13.380611493945274 estimated_peak_memory_range: - min: 3391488 - max: 158055104 + min: 3284992 + max: 137824744 primary_compute_unit: NPU precision: fp16 layer_info: @@ -172,14 +170,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 343 - job_id: joprk4oe5 + job_id: jp8qyx18p job_status: Passed torchscript_onnx_qnn: - inference_time: 100523.0 - throughput: 9.947972105886215 + inference_time: 70519.0 + throughput: 14.18057544775167 estimated_peak_memory_range: - min: 4235264 - max: 45340112 + min: 4395008 + max: 5606856 primary_compute_unit: NPU precision: fp16 layer_info: @@ -187,22 +185,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 332 - job_id: jz5wox1jp + job_id: jpedmzy05 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T23:09:40Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:23:53Z' - torchscript_onnx_tflite: - inference_time: 74797.0 - throughput: 13.369520167921173 + inference_time: 74677.0 + throughput: 13.39100392356415 estimated_peak_memory_range: - min: 3166208 - max: 221163520 + min: 6860800 + max: 311096848 primary_compute_unit: NPU precision: fp16 layer_info: @@ -210,14 +208,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 343 - job_id: jep2874mp + job_id: jp0z0j6e5 job_status: Passed torchscript_onnx_qnn: - inference_time: 66962.0 - throughput: 14.933843075176966 + inference_time: 70451.0 + throughput: 14.194262679025138 estimated_peak_memory_range: - min: 4362240 - max: 5961936 + min: 3338240 + max: 6555048 primary_compute_unit: NPU precision: fp16 layer_info: @@ -225,22 +223,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 332 - job_id: j7gjx1w8p + job_id: jgjvn7q8g job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T23:09:37Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:23:52Z' - torchscript_onnx_tflite: - inference_time: 74435.0 - throughput: 13.434540202861557 + inference_time: 74665.0 + throughput: 13.393156097234312 estimated_peak_memory_range: - min: 3268608 - max: 138088632 + min: 3284992 + max: 137949104 primary_compute_unit: NPU precision: fp16 layer_info: @@ -248,14 +246,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 343 - job_id: jqpye4q4g + job_id: jpy13xm4p job_status: Passed torchscript_onnx_qnn: - inference_time: 67059.0 - throughput: 14.912241459013705 + inference_time: 70620.0 + throughput: 14.16029453412631 estimated_peak_memory_range: - min: 4390912 - max: 5581424 + min: 4435968 + max: 5702872 primary_compute_unit: NPU precision: fp16 layer_info: @@ -263,22 +261,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 332 - job_id: jlpe92l0g + job_id: jpv6kd7m5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T23:09:38Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:23:50Z' - torchscript_onnx_tflite: - inference_time: 75016.0 - throughput: 13.330489495574277 + inference_time: 105083.0 + throughput: 9.516287125415149 estimated_peak_memory_range: - min: 3289088 - max: 138110264 + min: 3403776 + max: 168433776 primary_compute_unit: NPU precision: fp16 layer_info: @@ -286,14 +284,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 343 - job_id: j2p0y1deg + job_id: jp2kywemp job_status: Passed torchscript_onnx_qnn: - inference_time: 66302.0 - throughput: 15.08250128201261 + inference_time: 100500.0 + throughput: 9.950248756218905 estimated_peak_memory_range: - min: 4395008 - max: 5975696 + min: 4235264 + max: 46386912 primary_compute_unit: NPU precision: fp16 layer_info: @@ -301,19 +299,57 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 332 - job_id: jygzew46g + job_id: j5we674j5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:23:55Z' + - torchscript_onnx_tflite: + inference_time: 49175.0 + throughput: 20.335536349771225 + estimated_peak_memory_range: + min: 2408448 + max: 169963632 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 343 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 343 + job_id: j5q6qyvmp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 45997.0 + throughput: 21.74054829662804 + estimated_peak_memory_range: + min: 1814528 + max: 92266624 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 332 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 332 + job_id: jg9lnmdvg + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T23:09:39Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:23:56Z' - torchscript_onnx_qnn: - inference_time: 69435.0 - throughput: 14.401958666378627 + inference_time: 71913.0 + throughput: 13.905691599571705 estimated_peak_memory_range: min: 4202496 max: 4202496 @@ -324,7 +360,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 332 - job_id: jwgoym4d5 + job_id: jp3j096zg job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -333,4 +369,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T23:09:36Z' + timestamp: '2024-10-15T00:23:48Z' diff --git a/qai_hub_models/models/litehrnet/README.md b/qai_hub_models/models/litehrnet/README.md index c6f890e7..8084a6c2 100644 --- a/qai_hub_models/models/litehrnet/README.md +++ b/qai_hub_models/models/litehrnet/README.md @@ -6,7 +6,7 @@ LiteHRNet is a machine learning model that detects human pose and returns a location and confidence for each of 17 joints. This is based on the implementation of LiteHRNet found -[here](https://github.com/HRNet/Lite-HRNet). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/litehrnet). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.litehrnet.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of LiteHRNet can be found +* The license for the original implementation of LiteHRNet can be found [here](https://github.com/HRNet/Lite-HRNet/blob/hrnet/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Lite-HRNet: A Lightweight High-Resolution Network](https://arxiv.org/abs/2104.06403) * [Source Model Implementation](https://github.com/HRNet/Lite-HRNet) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/litehrnet/export.py b/qai_hub_models/models/litehrnet/export.py index 2a9e800e..74d3e6a5 100644 --- a/qai_hub_models/models/litehrnet/export.py +++ b/qai_hub_models/models/litehrnet/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.litehrnet import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "litehrnet" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,16 +195,17 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): warnings.filterwarnings("ignore") parser = export_parser( - model_cls=Model, - supports_qnn=False, - supports_onnx=False, - supports_precompiled_qnn_onnx=False, + model_cls=Model, supports_qnn=False, supports_precompiled_qnn_onnx=False ) args = parser.parse_args() export_model(**vars(args)) diff --git a/qai_hub_models/models/litehrnet/perf.yaml b/qai_hub_models/models/litehrnet/perf.yaml index e6cada4c..9d95b1f9 100644 --- a/qai_hub_models/models/litehrnet/perf.yaml +++ b/qai_hub_models/models/litehrnet/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: LiteHRNet performance_metrics: - torchscript_onnx_tflite: - inference_time: 7904.0 - throughput: 126.51821862348179 + inference_time: 7959.0 + throughput: 125.64392511622063 estimated_peak_memory_range: - min: 249856 - max: 4537216 + min: 253952 + max: 2920944 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,7 +56,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 2 total_layers: 1235 - job_id: jogkzldog + job_id: jpy13x74p + job_status: Passed + torchscript_onnx: + inference_time: 7130.0 + throughput: 140.25245441795232 + estimated_peak_memory_range: + min: 425984 + max: 6890912 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 1254 + layers_on_gpu: 0 + layers_on_cpu: 4 + total_layers: 1258 + job_id: j5we671j5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -67,13 +80,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:25:03Z' + timestamp: '2024-10-15T00:23:00Z' - torchscript_onnx_tflite: - inference_time: 5976.0 - throughput: 167.33601070950468 + inference_time: 4910.0 + throughput: 203.66598778004072 estimated_peak_memory_range: min: 249856 - max: 94933776 + max: 99864736 primary_compute_unit: NPU precision: fp16 layer_info: @@ -81,7 +94,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 2 total_layers: 1235 - job_id: jn5q87wm5 + job_id: jp0z0jve5 + job_status: Passed + torchscript_onnx: + inference_time: 4533.0 + throughput: 220.60445621001546 + estimated_peak_memory_range: + min: 606208 + max: 112216896 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 1254 + layers_on_gpu: 0 + layers_on_cpu: 4 + total_layers: 1258 + job_id: jg9lnmxvg job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -90,13 +118,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:25:04Z' + timestamp: '2024-10-15T00:23:01Z' - torchscript_onnx_tflite: - inference_time: 7904.0 - throughput: 126.51821862348179 + inference_time: 7938.0 + throughput: 125.97631645250694 estimated_peak_memory_range: - min: 266240 - max: 2070696 + min: 253952 + max: 2423880 primary_compute_unit: NPU precision: fp16 layer_info: @@ -104,7 +132,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 2 total_layers: 1235 - job_id: j1gln07lp + job_id: jp8qyx48p job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -112,14 +140,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:25:05Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:22:46Z' - torchscript_onnx_tflite: - inference_time: 8607.0 - throughput: 116.18450098756826 + inference_time: 7965.0 + throughput: 125.54927809165098 estimated_peak_memory_range: - min: 249856 - max: 85742576 + min: 245760 + max: 2855640 primary_compute_unit: NPU precision: fp16 layer_info: @@ -127,22 +155,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 2 total_layers: 1235 - job_id: jw5663v75 + job_id: j56y47d7p job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:25:06Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:22:50Z' - torchscript_onnx_tflite: - inference_time: 7908.0 - throughput: 126.45422357106727 + inference_time: 7929.0 + throughput: 126.11930886618741 estimated_peak_memory_range: - min: 270336 - max: 2944712 + min: 225280 + max: 2036536 primary_compute_unit: NPU precision: fp16 layer_info: @@ -150,22 +178,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 2 total_layers: 1235 - job_id: j1p3k48z5 + job_id: jglvmx1l5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:25:07Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:22:49Z' - torchscript_onnx_tflite: - inference_time: 7887.0 - throughput: 126.79092177000126 + inference_time: 7934.0 + throughput: 126.03982858583312 estimated_peak_memory_range: - min: 274432 - max: 2653064 + min: 245760 + max: 3055480 primary_compute_unit: NPU precision: fp16 layer_info: @@ -173,22 +201,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 2 total_layers: 1235 - job_id: jwgoy1md5 + job_id: j5q6qymmp job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:25:08Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:22:48Z' - torchscript_onnx_tflite: - inference_time: 7901.0 - throughput: 126.56625743576762 + inference_time: 8522.0 + throughput: 117.34334663224595 estimated_peak_memory_range: - min: 262144 - max: 2491936 + min: 245760 + max: 88355744 primary_compute_unit: NPU precision: fp16 layer_info: @@ -196,13 +224,74 @@ models: layers_on_gpu: 0 layers_on_cpu: 2 total_layers: 1235 - job_id: j1pv314m5 + job_id: jgkex49og job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:25:08Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:22:47Z' + - torchscript_onnx_tflite: + inference_time: 5295.0 + throughput: 188.85741265344666 + estimated_peak_memory_range: + min: 221184 + max: 71293792 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 1233 + layers_on_gpu: 0 + layers_on_cpu: 2 + total_layers: 1235 + job_id: jgo26r4dp + job_status: Passed + torchscript_onnx: + inference_time: 4830.0 + throughput: 207.0393374741201 + estimated_peak_memory_range: + min: 1024000 + max: 83432272 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 1254 + layers_on_gpu: 0 + layers_on_cpu: 4 + total_layers: 1258 + job_id: j57yr49r5 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:23:04Z' + - torchscript_onnx: + inference_time: 8063.0 + throughput: 124.0233163834801 + estimated_peak_memory_range: + min: 4661248 + max: 4661248 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 1254 + layers_on_gpu: 0 + layers_on_cpu: 4 + total_layers: 1258 + job_id: jp14zjvlp + job_status: Passed + reference_device_info: + name: Snapdragon X Elite CRD + os: '11' + form_factor: Compute + os_name: Windows + manufacturer: Qualcomm + chipset: Snapdragon® X Elite + timestamp: '2024-10-15T00:23:02Z' diff --git a/qai_hub_models/models/llama_v2_7b_chat_quantized/README.md b/qai_hub_models/models/llama_v2_7b_chat_quantized/README.md index 52871142..0463700d 100644 --- a/qai_hub_models/models/llama_v2_7b_chat_quantized/README.md +++ b/qai_hub_models/models/llama_v2_7b_chat_quantized/README.md @@ -6,7 +6,7 @@ Llama 2 is a family of LLMs. The "Chat" at the end indicates that the model is optimized for chatbot-like dialogue. The model is quantized to w4a16(4-bit weights and 16-bit activations) and part of the model is quantized to w8a16(8-bit weights and 16-bit activations) making it suitable for on-device deployment. For Prompt and output length specified below, the time to first token is Llama-PromptProcessor-Quantized's latency and average time per addition token is Llama-TokenGenerator-KVCache-Quantized's latency. This is based on the implementation of Llama-v2-7B-Chat found -[here](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/llama_v2_7b_chat_quantized). @@ -14,26 +14,7 @@ accross various devices, can be found [here](https://aihub.qualcomm.com/models/l ## Deploying Llama 2 on-device -Large Language Model (LLM) such as [Llama 2](https://llama.meta.com/llama2/) has the following complexities to deploy on-device: -1. Model size is too large to fit in device memory for inference -2. Multi-Head Attention (MHA) has large activations leading to fallback from accelerators -3. High model load and inference time - -We can tackle the above constraints with the following steps: -1. Quantize weights to reduce on-disk model size, e.g., int8 or int4 weights -2. Quantize activations to reduce inference time memory pressure -3. Graph transformations to reduce inference time memory pressure, e.g., Multi-Head to Split-Head Attention (MHA -> SHA) -4. Graph transformations to convert or decompose operations into more accelerator friendly operations e.g. Linear to Conv -5. For LLM with 7B or more parameters, above steps are still not good enough on mobile, - hence we go one step further and split model into sub-parts. - -Here, we divide the model into 4 parts in order to -1. Make model exportable with low memory usage -2. Avoid inference time out-of-memory errors - -In order to export Llama 2, please ensure -1. Host machine has >40GB memory (RAM+swap-space) -2. If you don't have enough memory, export.py will dump instructions to increase swap space accordingly. +Please follow the [LLM on-device deployment](https://github.com/quic/ai-hub-apps/tree/main/tutorials/llm_on_genie) tutorial. ## Sample output prompts generated on-device 1. --prompt "what is gravity?" --max-output-tokens 30 @@ -69,44 +50,20 @@ print(fibonacci(5)) -## Example & Usage - -Install the package via pip: -```bash -pip install "qai_hub_models[llama_v2_7b_chat_quantized]" -``` - -Once installed, run the following simple CLI demo: - -```bash -python -m qai_hub_models.models.llama_v2_7b_chat_quantized.demo -``` -More details on the CLI tool can be found with the `--help` option. See -[demo.py](demo.py) for sample usage of the model including pre/post processing -scripts. Please refer to our [general instructions on using -models](../../../#getting-started) for more usage instructions. - -## Export for on-device deployment - -This repository contains export scripts that produce a model optimized for -on-device deployment. This can be run as follows: - -```bash -python -m qai_hub_models.models.llama_v2_7b_chat_quantized.export -``` -Additional options are documented with the `--help` option. Note that the above -script requires access to Deployment instructions for Qualcomm® AI Hub. ## License -- The license for the original implementation of Llama-v2-7B-Chat can be found +* The license for the original implementation of Llama-v2-7B-Chat can be found [here](https://github.com/facebookresearch/llama/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://github.com/facebookresearch/llama/blob/main/LICENSE) +* The license for the compiled assets for on-device deployment can be found [here](https://github.com/facebookresearch/llama/blob/main/LICENSE) + ## References * [LLaMA: Open and Efficient Foundation Language Models](https://arxiv.org/abs/2302.13971) * [Source Model Implementation](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/llama_v2_7b_chat_quantized/export.py b/qai_hub_models/models/llama_v2_7b_chat_quantized/export.py index 061ab003..1fadd4cf 100644 --- a/qai_hub_models/models/llama_v2_7b_chat_quantized/export.py +++ b/qai_hub_models/models/llama_v2_7b_chat_quantized/export.py @@ -31,26 +31,34 @@ ) ALL_COMPONENTS = [ - "PromptProcessor_1_Quantized", - "PromptProcessor_2_Quantized", - "PromptProcessor_3_Quantized", - "PromptProcessor_4_Quantized", - "TokenGenerator_1_Quantized", - "TokenGenerator_2_Quantized", - "TokenGenerator_3_Quantized", - "TokenGenerator_4_Quantized", -] -DEFAULT_COMPONENTS = [ - "PromptProcessor_1_Quantized", - "PromptProcessor_2_Quantized", - "PromptProcessor_3_Quantized", - "PromptProcessor_4_Quantized", - "TokenGenerator_1_Quantized", - "TokenGenerator_2_Quantized", - "TokenGenerator_3_Quantized", - "TokenGenerator_4_Quantized", + "Llama2_Part1_Quantized", + "Llama2_Part2_Quantized", + "Llama2_Part3_Quantized", + "Llama2_Part4_Quantized", ] +DEFAULT_COMPONENTS = ALL_COMPONENTS + +# Each components is two sub-components linked together with shared weights +ALL_SUB_COMPONENTS = { + "Llama2_Part1_Quantized": [ + "PromptProcessor_1_Quantized", + "TokenGenerator_1_Quantized", + ], + "Llama2_Part2_Quantized": [ + "PromptProcessor_2_Quantized", + "TokenGenerator_2_Quantized", + ], + "Llama2_Part3_Quantized": [ + "PromptProcessor_3_Quantized", + "TokenGenerator_3_Quantized", + ], + "Llama2_Part4_Quantized": [ + "PromptProcessor_4_Quantized", + "TokenGenerator_4_Quantized", + ], +} + DEFAULT_EXPORT_DEVICE = "Samsung Galaxy S24 (Family)" @@ -133,142 +141,168 @@ def export_model( # 1. Initialize PyTorch model model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) - compile_jobs: Dict[str, hub.client.CompileJob] = {} - profile_options_per_component: Dict[str, str] = {} + hub_device = hub.Device(device) + compile_jobs: Dict[str, List[hub.client.CompileJob]] = {} + profile_options_per_sub_component: Dict[str, str] = {} + link_jobs: Dict[str, hub.client.LinkJob] = {} hub_device = hub.Device(device) for component_name in components: - # Load model part + compile_jobs[component_name] = [] + for sub_component_name in ALL_SUB_COMPONENTS[component_name]: - component = model.load_model_part(component_name) + # Load model part + component = model.load_model_part(sub_component_name) - input_spec = component.get_input_spec( - **get_input_spec_kwargs(component, additional_model_kwargs) - ) - - source_model = component.convert_to_hub_source_model( - target_runtime, - output_path, - input_spec, - external_onnx_weights=True, - output_names=component.get_output_names(), - ) + input_spec = component.get_input_spec( + **get_input_spec_kwargs(component, additional_model_kwargs) + ) - if target_runtime == TargetRuntime.TFLITE: - quant_calibration_data = None - else: - quant_calibration_data = component.get_calibration_data( - target_runtime, input_spec=input_spec + source_model = component.convert_to_hub_source_model( + target_runtime, + output_path, + input_spec, + external_onnx_weights=True, + output_names=component.get_output_names(), ) - # 2. Compile the models to an on-device asset - model_compile_options = component.get_hub_compile_options( - target_runtime, compile_options - ) - print(f"Optimizing model {component_name} to run on-device") - submitted_compile_job = hub.submit_compile_job( - model=source_model, - input_specs=input_spec, - device=hub_device, - name=f"{model_name}_{component_name}", - calibration_data=quant_calibration_data, - options=model_compile_options, - ) + if target_runtime == TargetRuntime.TFLITE: + quant_calibration_data = None + else: + quant_calibration_data = component.get_calibration_data( + target_runtime, input_spec=input_spec + ) - compile_jobs[component_name] = cast( - hub.client.CompileJob, submitted_compile_job - ) - profile_options_per_component[ - component_name - ] = component.get_hub_profile_options(target_runtime, profile_options) + # 2. Compile the models to an on-device asset + model_compile_options = component.get_hub_compile_options( + target_runtime, compile_options + ) + print(f"Optimizing model {sub_component_name} to run on-device") + submitted_compile_job = hub.submit_compile_job( + model=source_model, + input_specs=input_spec, + device=hub_device, + name=f"{model_name}_{sub_component_name}", + calibration_data=quant_calibration_data, + options=model_compile_options, + ) - # Free model part to reduce memory-pressure - del component + profile_options_per_sub_component[ + sub_component_name + ] = component.get_hub_profile_options(target_runtime, profile_options) + + compile_jobs[component_name].append(submitted_compile_job) + # Free model part to reduce memory-pressure + del component + + for component_name, compile_jobs_list in compile_jobs.items(): + models = [] + for compile_job in compile_jobs_list: + if compile_job.get_status().code == "FAILED": + raise RuntimeError( + f"Compile job failed for {component_name}. Please re-run export script for failed component." + ) + models.append(compile_job.get_target_model()) + + # Link Prompt processor and Token generator + link_jobs[component_name] = hub.submit_link_job( + models, name=f"{model_name}_{component_name}" + ) - # 3. Profile the model assets on real devices + # 4. Profile the model assets on real devices profile_jobs: Dict[str, hub.client.ProfileJob] = {} if not skip_profiling: for component_name in components: - profile_options_all = profile_options_per_component[component_name] - print(f"Profiling model {component_name} on a hosted device.") - submitted_profile_job = hub.submit_profile_job( - model=compile_jobs[component_name].get_target_model(), - device=hub_device, - name=f"{model_name}_{component_name}", - options=profile_options_all, - ) - profile_jobs[component_name] = cast( - hub.client.ProfileJob, submitted_profile_job - ) - - # 4. Run inference on-device with sample inputs + hub_model = link_jobs[component_name].get_target_model() + for sub_component_name in ALL_SUB_COMPONENTS[component_name]: + profile_options_all = profile_options_per_sub_component[ + sub_component_name + ] + print(f"Profiling model {component_name} on a hosted device.") + submitted_profile_job = hub.submit_profile_job( + model=hub_model, + device=hub_device, + name=f"{model_name}_{sub_component_name}", + options=profile_options_all, + ) + profile_jobs[sub_component_name] = cast( + hub.client.ProfileJob, submitted_profile_job + ) + + # 5. Run inference on-device with sample inputs inference_jobs: Dict[str, hub.client.InferenceJob] = {} - if not skip_inferencing: for component_name in components: - print( - f"Running inference for {component_name} on a hosted device with example inputs." - ) - # Load model with no-AIMET mode - component = model.load_model_part(component_name) - profile_options_all = profile_options_per_component[component_name] - # Load individual model part - sample_inputs = component.sample_inputs() - submitted_inference_job = hub.submit_inference_job( - model=compile_jobs[component_name].get_target_model(), - inputs=sample_inputs, - device=hub_device, - name=f"{model_name}_{component_name}", - options=profile_options_all, - ) - inference_jobs[component_name] = cast( - hub.client.InferenceJob, submitted_inference_job - ) - - # 5. Download the model assets to a local file + for sub_component_name in ALL_SUB_COMPONENTS[component_name]: + print( + f"Running inference for {sub_component_name} on a hosted device with example inputs." + ) + # Load model with no-AIMET mode + component = model.load_model_part(sub_component_name) + profile_options_all = profile_options_per_sub_component[ + sub_component_name + ] + # Load individual model part + sample_inputs = component.sample_inputs() + submitted_inference_job = hub.submit_inference_job( + model=link_jobs[component_name].get_target_model(), + inputs=sample_inputs, + device=hub_device, + name=f"{model_name}_{sub_component_name}", + options=profile_options_all, + ) + inference_jobs[sub_component_name] = cast( + hub.client.InferenceJob, submitted_inference_job + ) + + # 6. Download the model assets to a local file if not skip_downloading: os.makedirs(output_path, exist_ok=True) - for component_name, compile_job in compile_jobs.items(): + for component_name, compile_job in link_jobs.items(): target_model: hub.Model = compile_job.get_target_model() # type: ignore - target_model.download( - str(output_path / f"{model_name}_{component_name}.bin") - ) + target_model.download(str(output_path / f"{component_name}.bin")) - # 6. Summarize the results from profiling and inference + # 7. Summarize the results from profiling and inference if not skip_summary and not skip_profiling: for component_name in components: - profile_job = profile_jobs[component_name] - assert profile_job is not None and profile_job.wait().success - profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore - print_profile_metrics_from_job(profile_job, profile_data) + for sub_component_name in ALL_SUB_COMPONENTS[component_name]: + profile_job = profile_jobs[sub_component_name] + assert profile_job is not None and profile_job.wait().success + profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore + print_profile_metrics_from_job(profile_job, profile_data) if not skip_summary and not skip_inferencing: for component_name in components: - inference_job = inference_jobs[component_name] - # Load individual model part - component = model.load_model_part(component_name) - # Get ordered model output names - output_names = component.get_output_names() - sample_inputs = component.sample_inputs() - torch_out = torch_inference(component, sample_inputs) - assert inference_job is not None and inference_job.wait().success - inference_result: hub.client.DatasetEntries = inference_job.download_output_data() # type: ignore - print_inference_metrics( - inference_job, inference_result, torch_out, output_names=output_names - ) + for sub_component_name in ALL_SUB_COMPONENTS[component_name]: + inference_job = inference_jobs[sub_component_name] + # Load individual model part + component = model.load_model_part(sub_component_name) + # Get ordered model output names + output_names = component.get_output_names() + sample_inputs = component.sample_inputs() + torch_out = torch_inference(component, sample_inputs) + assert inference_job is not None and inference_job.wait().success + inference_result: hub.client.DatasetEntries = inference_job.download_output_data() # type: ignore + print_inference_metrics( + inference_job, + inference_result, + torch_out, + output_names=output_names, + ) if not skip_summary: print_on_target_demo_cmd( - compile_jobs.values(), Path(__file__).parent.resolve(), hub_device + link_jobs.values(), Path(__file__).parent.resolve(), hub_device ) return { component_name: ( - compile_jobs[component_name], - profile_jobs.get(component_name, None), - inference_jobs.get(component_name, None), + link_jobs[component_name], + profile_jobs.get(sub_component_name, None), + inference_jobs.get(sub_component_name, None), ) for component_name in components + for sub_component_name in ALL_SUB_COMPONENTS[component_name] } diff --git a/qai_hub_models/models/llama_v2_7b_chat_quantized/info.yaml b/qai_hub_models/models/llama_v2_7b_chat_quantized/info.yaml index 767912c3..0e6d4d30 100644 --- a/qai_hub_models/models/llama_v2_7b_chat_quantized/info.yaml +++ b/qai_hub_models/models/llama_v2_7b_chat_quantized/info.yaml @@ -21,10 +21,11 @@ research_paper_title: "LLaMA: Open and Efficient Foundation Language Models" license: https://github.com/facebookresearch/llama/blob/main/LICENSE source_repo: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf technical_details: + Input sequence length for Prompt Processor: 1024 + Context length: 1024 Number of parameters: 7B Precision: w4a16 + w8a16 (few layers) Model-1 (Prompt Processor): Llama-PromptProcessor-Quantized - Max context length: 1024 Prompt processor model size: 3.6 GB Prompt processor input: 1024 tokens Prompt processor output: 1024 output tokens + KVCache for token generator @@ -32,8 +33,11 @@ technical_details: Token generator model size: 3.6 GB Token generator input: 1 input token + past KVCache Token generator output: 1 output token + KVCache for next iteration - Decoding length: 1024 (1 output token + 1023 from KVCache) Use: Initiate conversation with prompt-processor and then token generator for subsequent iterations. + Minimum QNN SDK version required: 2.27.0 + Supported languages: English. + TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. For Llama-v2-7B-Chat, both values in the range are the same since prompt length is the full context length (1024 tokens). + Response Rate: Rate of response generation after the first response token. applicable_scenarios: - Dialogue - Content Generation @@ -49,3 +53,7 @@ deploy_license: https://github.com/facebookresearch/llama/blob/main/LICENSE deploy_license_type: llama2 dataset: [] restrict_model_sharing: true +model_type_llm: true +llm_details: + call_to_action: 'view_readme' + genie_compatible: true diff --git a/qai_hub_models/models/llama_v2_7b_chat_quantized/perf.yaml b/qai_hub_models/models/llama_v2_7b_chat_quantized/perf.yaml index 9bb996f7..7627a2f4 100644 --- a/qai_hub_models/models/llama_v2_7b_chat_quantized/perf.yaml +++ b/qai_hub_models/models/llama_v2_7b_chat_quantized/perf.yaml @@ -1,173 +1,72 @@ +aggregated: + supported_devices: + - QCS8550 (Proxy) + - Samsung Galaxy S24 + - Snapdragon X Elite CRD + - Snapdragon 8 Elite QRD + supported_oses: + - Android + supported_chipsets: + - Snapdragon® 8 Gen 3 + - Snapdragon® X Elite + - QCS8550 Proxy + - Snapdragon® 8 Elite models: -- name: Llama2-TokenGenerator-KVCache-Quantized + name: Llama-v2-7B-Chat performance_metrics: - - reference_device_info: - name: QCS8550 (Proxy) - os: '12' - form_factor: Iot - os_name: Android - manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-10-04T23:59:26.162600Z' - torchscript_onnx_qnn: - inference_time: 97732 - throughput: 10.23 - estimated_peak_memory_range: - min: 74272768 - max: 75651480 - layer_info: - layers_on_npu: 35926 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 35926 - precision: uint16 - primary_compute_unit: NPU - job_id: "null" - job_status: Passed - - reference_device_info: + - torchscript_onnx_qnn: + llm_metrics: + time_to_first_token_range: + min: 1495830 + max: 1495830 + tokens_per_second: 12.85 + reference_device_info: name: Samsung Galaxy S24 os: '14' form_factor: Phone os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-07-01T19:11:33.087816Z' - torchscript_onnx_qnn: - inference_time: 88438 - throughput: 11.307 - estimated_peak_memory_range: - min: 95744000 - max: 4468197056 - layer_info: - layers_on_npu: 33818 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 33818 - precision: uint16 - primary_compute_unit: NPU - job_id: "null" - job_status: Passed - - reference_device_info: + timestamp: '2024-10-16T00:32:42.210701Z' + - torchscript_onnx_qnn: + llm_metrics: + time_to_first_token_range: + min: 1919000 + max: 1919000 + tokens_per_second: 11.20 + reference_device_info: name: Snapdragon X Elite CRD os: '11' form_factor: Compute os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-07-01T19:09:26.083951Z' - torchscript_onnx_qnn: - inference_time: 95960 - throughput: 10.421 - estimated_peak_memory_range: - min: 68235264 - max: 68235264 - layer_info: - layers_on_npu: 33818 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 33818 - precision: uint16 - primary_compute_unit: NPU - job_id: "null" - job_status: Passed -- name: Llama2-PromptProcessor-Quantized - performance_metrics: - - reference_device_info: + timestamp: '2024-10-16T00:32:42.210701Z' + - torchscript_onnx_qnn: + llm_metrics: + time_to_first_token_range: + min: 1919000 + max: 1919000 + tokens_per_second: 11.20 + reference_device_info: name: QCS8550 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-10-04T23:59:26.162600Z' - torchscript_onnx_qnn: - inference_time: 2020745 - throughput: 506.93 - estimated_peak_memory_range: - min: 11554816 - max: 13002000 - layer_info: - layers_on_npu: 31830 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 31830 - precision: uint16 - primary_compute_unit: NPU - job_id: "null" - job_status: Passed - - reference_device_info: - name: Samsung Galaxy S24 - os: '14' + chipset: QCS8550 Proxy + timestamp: '2024-10-16T00:32:42.210701Z' + - torchscript_onnx_qnn: + llm_metrics: + time_to_first_token_range: + min: 1440000 + max: 1440000 + tokens_per_second: 17.94 + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' form_factor: Phone os_name: Android - manufacturer: Samsung - chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-07-01T20:53:21.204302Z' - torchscript_onnx_qnn: - inference_time: 1484949 - throughput: 689.5859 - estimated_peak_memory_range: - min: 8421376 - max: 1809446256 - layer_info: - layers_on_npu: 31766 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 31766 - precision: uint16 - primary_compute_unit: NPU - job_id: "null" - job_status: Passed - - reference_device_info: - name: Snapdragon X Elite CRD - os: '11' - form_factor: Compute - os_name: Windows manufacturer: Qualcomm - chipset: Snapdragon® X Elite - timestamp: '2024-07-02T00:17:42.777637Z' - torchscript_onnx_qnn: - inference_time: 1889092 - throughput: 542.059 - estimated_peak_memory_range: - min: 10784768 - max: 10784768 - layer_info: - layers_on_npu: 31766 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 31766 - precision: uint16 - primary_compute_unit: NPU - job_id: "null" - job_status: Passed -aggregated: - supported_devices: - - Samsung Galaxy S23 Ultra - - Samsung Galaxy S24 - - Snapdragon X Elite CRD - supported_oses: - - Android - supported_chipsets: - - Snapdragon® 8 Gen 2 - - Snapdragon® 8 Gen 3 - - Snapdragon® X Elite - performance_metrics: - - reference_device_info: - name: Samsung Galaxy S23 Ultra - os: '13' - form_factor: Phone - os_name: Android - manufacturer: Samsung - chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-01-26T00:34:02.549319Z' - torchscript_onnx_qnn: - inference_time: 117423.0 - throughput: 8.5 - estimated_peak_memory_range: - min: 68579328 - max: 73044264 - precision: uint16 - primary_compute_unit: NPU - job_id: "" - job_status: Passed + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-16T00:32:42.210701Z' diff --git a/qai_hub_models/models/llama_v3_1_8b_chat_quantized/README.md b/qai_hub_models/models/llama_v3_1_8b_chat_quantized/README.md new file mode 100644 index 00000000..b49510f6 --- /dev/null +++ b/qai_hub_models/models/llama_v3_1_8b_chat_quantized/README.md @@ -0,0 +1,61 @@ +[![Qualcomm® AI Hub Models](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/quic-logo.jpg)](../../README.md) + + +# [Llama-v3.1-8B-Chat: State-of-the-art large language model useful on a variety of language understanding and generation tasks](https://aihub.qualcomm.com/models/llama_v3_1_8b_chat_quantized) + +Llama 3 is a family of LLMs. The "Chat" at the end indicates that the model is optimized for chatbot-like dialogue. The model is quantized to w4a16 (4-bit weights and 16-bit activations) and part of the model is quantized to w8a16 (8-bit weights and 16-bit activations) making it suitable for on-device deployment. For Prompt and output length specified below, the time to first token is Llama-PromptProcessor-Quantized's latency and average time per addition token is Llama-TokenGenerator-Quantized's latency. + +This is based on the implementation of Llama-v3.1-8B-Chat found +[here]({source_repo}). This repository contains scripts for optimized on-device +export suitable to run on Qualcomm® devices. More details on model performance +accross various devices, can be found [here](https://aihub.qualcomm.com/models/llama_v3_1_8b_chat_quantized). + +[Sign up](https://myaccount.qualcomm.com/signup) to start using Qualcomm AI Hub and run these models on a hosted Qualcomm® device. + +## Deploying Llama 3.1 on-device + +Please follow the [LLM on-device deployment](https://github.com/quic/ai-hub-apps/tree/main/tutorials/llm_on_genie) tutorial. + + + + + +## License +* The license for the original implementation of Llama-v3.1-8B-Chat can be found + [here](https://github.com/facebookresearch/llama/blob/main/LICENSE). +* The license for the compiled assets for on-device deployment can be found [here](https://github.com/facebookresearch/llama/blob/main/LICENSE) + + +## References +* [LLaMA: Open and Efficient Foundation Language Models](https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_1/) +* [Source Model Implementation](https://github.com/meta-llama/llama3/tree/main) + + + +## Community +* Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. +* For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). + + +## Usage and Limitations + +This model may not be used for or in connection with any of the following applications: + +- Accessing essential private and public services and benefits; +- Administration of justice and democratic processes; +- Assessing or recognizing the emotional state of a person; +- Biometric and biometrics-based systems, including categorization of persons based on sensitive characteristics; +- Education and vocational training; +- Employment and workers management; +- Exploitation of the vulnerabilities of persons resulting in harmful behavior; +- General purpose social scoring; +- Law enforcement; +- Management and operation of critical infrastructure; +- Migration, asylum and border control management; +- Predictive policing; +- Real-time remote biometric identification in public spaces; +- Recommender systems of social media platforms; +- Scraping of facial images (from the internet or otherwise); and/or +- Subliminal manipulation + + diff --git a/qai_hub_models/models/llama_v3_1_8b_chat_quantized/__init__.py b/qai_hub_models/models/llama_v3_1_8b_chat_quantized/__init__.py new file mode 100644 index 00000000..522353c1 --- /dev/null +++ b/qai_hub_models/models/llama_v3_1_8b_chat_quantized/__init__.py @@ -0,0 +1,8 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +from qai_hub_models.models._shared.llama.app import ChatApp as App # noqa: F401 + +from .model import MODEL_ID # noqa: F401 +from .model import Llama3_1_Quantized as Model # noqa: F401 diff --git a/qai_hub_models/models/llama_v3_1_8b_chat_quantized/demo.py b/qai_hub_models/models/llama_v3_1_8b_chat_quantized/demo.py new file mode 100644 index 00000000..09e48f63 --- /dev/null +++ b/qai_hub_models/models/llama_v3_1_8b_chat_quantized/demo.py @@ -0,0 +1,52 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +from __future__ import annotations + +from typing import List, Type + +from qai_hub_models.models._shared.llama3.demo import llama_chat_demo +from qai_hub_models.models._shared.llama3.model import ( + DEFAULT_USER_PROMPT, + END_TOKENS, + get_input_prompt_with_tags, + get_tokenizer, + prepare_combined_attention_mask, +) +from qai_hub_models.models.llama_v3_1_8b_chat_quantized import MODEL_ID, Model +from qai_hub_models.models.llama_v3_1_8b_chat_quantized.model import ( + HF_REPO_NAME, + HF_REPO_URL, +) +from qai_hub_models.utils.base_model import BaseModel, TargetRuntime + + +def llama_3_1_chat_demo( + model_cls: Type[BaseModel] = Model, + model_id: str = MODEL_ID, + end_tokens: set = END_TOKENS, + hf_repo_name: str = HF_REPO_NAME, + hf_repo_url: str = HF_REPO_URL, + default_prompt: str = DEFAULT_USER_PROMPT, + is_test: bool = False, + available_target_runtimes: List[TargetRuntime] = [TargetRuntime.QNN], +): + llama_chat_demo( + model_cls=model_cls, + model_id=model_id, + get_input_prompt_with_tags=get_input_prompt_with_tags, + prepare_combined_attention_mask=prepare_combined_attention_mask, + tokenizer=get_tokenizer(hf_repo_name), + end_tokens=end_tokens, + hf_repo_name=hf_repo_name, + hf_repo_url=hf_repo_url, + default_prompt=default_prompt, + is_test=is_test, + available_target_runtimes=available_target_runtimes, + bundled_kvcache=False, + ) + + +if __name__ == "__main__": + llama_3_1_chat_demo(model_cls=Model) diff --git a/qai_hub_models/models/llama_v3_1_8b_chat_quantized/export.py b/qai_hub_models/models/llama_v3_1_8b_chat_quantized/export.py new file mode 100644 index 00000000..9b27f1b7 --- /dev/null +++ b/qai_hub_models/models/llama_v3_1_8b_chat_quantized/export.py @@ -0,0 +1,57 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- + +from __future__ import annotations + +import warnings + +from qai_hub_models.models._shared.llama3.export import export_model +from qai_hub_models.models.llama_v3_1_8b_chat_quantized import MODEL_ID, Model +from qai_hub_models.models.llama_v3_1_8b_chat_quantized.model import ( + NUM_LAYERS_PER_SPLIT, + NUM_SPLITS, +) +from qai_hub_models.utils.args import export_parser + +DEFAULT_EXPORT_DEVICE = "Snapdragon 8 Elite QRD" + +ALL_COMPONENTS = [f"part_{i + 1}_of_{NUM_SPLITS}" for i in range(NUM_SPLITS)] + +# Each components is two sub-components linked together with shared weights +ALL_SUB_COMPONENTS = { + f"part_{i + 1}_of_{NUM_SPLITS}": [ + f"prompt_{i + 1}_of_{NUM_SPLITS}", + f"token_{i + 1}_of_{NUM_SPLITS}", + ] + for i in range(NUM_SPLITS) +} + + +def main(): + warnings.filterwarnings("ignore") + parser = export_parser( + model_cls=Model, + supports_tflite=False, + supports_precompiled_qnn_onnx=False, + default_export_device=DEFAULT_EXPORT_DEVICE, + ) + parser.add_argument( + "--synchronous", + action="store_true", + help="Wait for each command to finish before submitting new.", + ) + args = parser.parse_args() + export_model( + model_cls=Model, + model_name=MODEL_ID, + components=ALL_COMPONENTS, + sub_components=ALL_SUB_COMPONENTS, + num_layers_per_split=NUM_LAYERS_PER_SPLIT, + **vars(args), + ) + + +if __name__ == "__main__": + main() diff --git a/qai_hub_models/models/llama_v3_1_8b_chat_quantized/info.yaml b/qai_hub_models/models/llama_v3_1_8b_chat_quantized/info.yaml new file mode 100644 index 00000000..c36e4441 --- /dev/null +++ b/qai_hub_models/models/llama_v3_1_8b_chat_quantized/info.yaml @@ -0,0 +1,61 @@ +name: Llama-v3.1-8B-Chat +id: llama_v3_1_8b_chat_quantized +status: public +headline: State-of-the-art large language model useful on a variety of language + understanding and generation tasks. +domain: Generative AI +description: Llama 3 is a family of LLMs. The "Chat" at the end indicates that + the model is optimized for chatbot-like dialogue. The model is quantized to + w4a16 (4-bit weights and 16-bit activations) and part of the model is quantized to + w8a16 (8-bit weights and 16-bit activations) making it suitable for on-device + deployment. For Prompt and output length specified below, the time to first token is + Llama-PromptProcessor-Quantized's latency and average time per addition token is + Llama-TokenGenerator-Quantized's latency. +use_case: Text Generation +tags: + - llm + - generative-ai + - quantized +research_paper: https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_1/ +research_paper_title: "LLaMA: Open and Efficient Foundation Language Models" +license: https://github.com/facebookresearch/llama/blob/main/LICENSE +source_repo: https://github.com/meta-llama/llama3/tree/main +technical_details: + Input sequence length for Prompt Processor: 128 + Context length: 4096 + Number of parameters: 8B + Model size: 4.8GB + Precision: w4a16 + w8a16 (few layers) + Num of key-value heads: 8 + Model-1 (Prompt Processor): Llama-PromptProcessor-Quantized + Prompt processor input: 128 tokens + position embeddings + attention mask + KV cache inputs + Prompt processor output: 128 output tokens + KV cache outputs + Model-2 (Token Generator): Llama-TokenGenerator-Quantized + Token generator input: 1 input token + position embeddings + attention mask + KV cache inputs + Token generator output: 1 output token + KV cache outputs + Use: Initiate conversation with prompt-processor and then token generator for subsequent iterations. + Minimum QNN SDK version required: 2.27.7 + Language(s) supported: English. + TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens). + Response Rate: Rate of response generation after the first response token. +applicable_scenarios: + - Dialogue + - Content Generation + - Customer Support +related_models: + - llama_v3_8b_chat_quantized + - llama_v3_2_3b_chat_quantized +form_factors: + - Phone + - Tablet +has_static_banner: true +has_animated_banner: true +license_type: llama3 +deploy_license: https://github.com/facebookresearch/llama/blob/main/LICENSE +deploy_license_type: llama3 +dataset: [] +restrict_model_sharing: true +model_type_llm: true +llm_details: + call_to_action: 'view_readme' + genie_compatible: true diff --git a/qai_hub_models/models/llama_v3_1_8b_chat_quantized/model.py b/qai_hub_models/models/llama_v3_1_8b_chat_quantized/model.py new file mode 100644 index 00000000..e30695f1 --- /dev/null +++ b/qai_hub_models/models/llama_v3_1_8b_chat_quantized/model.py @@ -0,0 +1,110 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +from __future__ import annotations + +import os + +from qai_hub_models.models._shared.llama3.model import ( + DEFAULT_CONTEXT_LENGTH, + Llama3Base_Quantized, +) +from qai_hub_models.utils.asset_loaders import CachedWebModelAsset +from qai_hub_models.utils.input_spec import InputSpec + +MODEL_ID = __name__.split(".")[-2] +MODEL_ASSET_VERSION = 2 +DEFAULT_ENCODINGS = "llama31.encodings" +DEFAULT_ENCODINGS_ZIP = DEFAULT_ENCODINGS + ".zip" + +NUM_LAYERS = 32 +NUM_SPLITS = 5 +NUM_LAYERS_PER_SPLIT = 9 + +# Hugging face repo name and url +HF_REPO_NAME = "meta-llama/Meta-Llama-3.1-8B-Instruct" +HF_REPO_URL = f"https://huggingface.co/meta-llama/{HF_REPO_NAME}" + +# Minimum memory (RAM+swap) recommended for export. +# TODO: #10762 should reduce once AIMET export consumes less memory during export. TODO!!! Not quite correct, since we are not using AIMET +MIN_MEMORY_RECOMMENDED = 40 # TODO: Does this work for Llama 3? + + +class Llama3_1_Quantized(Llama3Base_Quantized): + def __init__(self, huggingface_model_name: str = HF_REPO_NAME, *args, **kwargs): + super().__init__( + huggingface_model_name=huggingface_model_name, + min_memory_recommended=MIN_MEMORY_RECOMMENDED, + *args, + **kwargs, + ) + + @classmethod + def from_pretrained( + cls, + sequence_length: int, + context_length: int = DEFAULT_CONTEXT_LENGTH, + aimet_encodings: str | None = "DEFAULT", + huggingface_model_name: str = HF_REPO_NAME, + ) -> "Llama3_1_Quantized": + """ + Load a pre-trained Llama 3.1 (8B) model from Meta via HuggingFace. + + sequence_length: + Instantiate with this token sequence length input. A longer + sequence length means the model is capable of processing more + tokens at once. This can only be set to greater than one to process + prompts, since responses are auto-regressive in nature and require + this to be 1. + context_length: + Total context length of model. Longer context length means the + model is more capable of making longer connections in the input + prompt. However, it also hurts runtime performance (both time-to- + first-token and tokens-per-second), so this is a tradeoff that may + depend on the use case. + aimet_encodings: + Path to AIMET quantization encodings file. + huggingface_model_name: + Name or URL of the HuggingFace model. Change this if you want to + change the weights. + """ + if aimet_encodings: + if aimet_encodings == "DEFAULT": + aimet_encodings = os.path.join( + CachedWebModelAsset.from_asset_store( + MODEL_ID, MODEL_ASSET_VERSION, DEFAULT_ENCODINGS_ZIP + ).fetch(extract=True), + DEFAULT_ENCODINGS, + ) + + return cls( + aimet_encodings=aimet_encodings, + sequence_length=sequence_length, + context_length=context_length, + huggingface_model_name=huggingface_model_name, + ) + + @staticmethod + def get_output_names(num_hidden_layers: int = NUM_LAYERS): + return Llama3Base_Quantized.get_output_names( + num_hidden_layers=num_hidden_layers + ) + + @staticmethod + def get_input_spec( + num_hidden_layers: int = NUM_LAYERS, + input_seq_length: int = 128, + context_length: int = DEFAULT_CONTEXT_LENGTH, + hidden_size: int = 4096, + num_key_value_heads: int = 8, + num_attention_heads: int = 32, + ) -> InputSpec: + return Llama3Base_Quantized.get_input_spec( + num_hidden_layers=NUM_LAYERS, + input_seq_length=input_seq_length, + context_length=context_length, + hidden_size=hidden_size, + num_key_value_heads=num_key_value_heads, + num_attention_heads=num_attention_heads, + ) diff --git a/qai_hub_models/models/llama_v3_1_8b_chat_quantized/perf.yaml b/qai_hub_models/models/llama_v3_1_8b_chat_quantized/perf.yaml new file mode 100644 index 00000000..a4eb058c --- /dev/null +++ b/qai_hub_models/models/llama_v3_1_8b_chat_quantized/perf.yaml @@ -0,0 +1,25 @@ +aggregated: + supported_devices: + - Snapdragon 8 Elite QRD + supported_oses: + - Android + supported_chipsets: + - Snapdragon® 8 Elite +models: + name: Llama-v3.1-8B-Chat + performance_metrics: + - torchscript_onnx_qnn: + llm_metrics: + time_to_first_token_range: + min: 154517 + max: 4944544 + tokens_per_second: 13.0546 + evaluation_metrics: null + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-16T00:32:42.210701Z' diff --git a/qai_hub_models/models/llama_v3_1_8b_chat_quantized/requirements.txt b/qai_hub_models/models/llama_v3_1_8b_chat_quantized/requirements.txt new file mode 100644 index 00000000..c5deadcc --- /dev/null +++ b/qai_hub_models/models/llama_v3_1_8b_chat_quantized/requirements.txt @@ -0,0 +1,5 @@ +onnx==1.16.2 +transformers==4.45.0 +huggingface_hub==0.23.2 +sentencepiece==0.2.0 +psutil diff --git a/qai_hub_models/models/llama_v3_1_8b_chat_quantized/test.py b/qai_hub_models/models/llama_v3_1_8b_chat_quantized/test.py new file mode 100644 index 00000000..e4b557ee --- /dev/null +++ b/qai_hub_models/models/llama_v3_1_8b_chat_quantized/test.py @@ -0,0 +1,14 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +import pytest + +from qai_hub_models.models.llama_v3_1_8b_chat_quantized.demo import llama_3_1_chat_demo + + +@pytest.mark.skip("#105 move slow_cloud and slow tests to nightly.") +@pytest.mark.slow_cloud +def test_demo(): + # Run demo and verify it does not crash + llama_3_1_chat_demo(is_test=True) diff --git a/qai_hub_models/models/llama_v3_2_3b_chat_quantized/README.md b/qai_hub_models/models/llama_v3_2_3b_chat_quantized/README.md new file mode 100644 index 00000000..fcd3721b --- /dev/null +++ b/qai_hub_models/models/llama_v3_2_3b_chat_quantized/README.md @@ -0,0 +1,61 @@ +[![Qualcomm® AI Hub Models](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/quic-logo.jpg)](../../README.md) + + +# [Llama-v3.2-3B-Chat: State-of-the-art large language model useful on a variety of language understanding and generation tasks](https://aihub.qualcomm.com/models/llama_v3_2_3b_chat_quantized) + +Llama 3 is a family of LLMs. The "Chat" at the end indicates that the model is optimized for chatbot-like dialogue. The model is quantized to w4a16 (4-bit weights and 16-bit activations) and part of the model is quantized to w8a16 (8-bit weights and 16-bit activations) making it suitable for on-device deployment. For Prompt and output length specified below, the time to first token is Llama-PromptProcessor-Quantized's latency and average time per addition token is Llama-TokenGenerator-Quantized's latency. + +This is based on the implementation of Llama-v3.2-3B-Chat found +[here]({source_repo}). This repository contains scripts for optimized on-device +export suitable to run on Qualcomm® devices. More details on model performance +accross various devices, can be found [here](https://aihub.qualcomm.com/models/llama_v3_2_3b_chat_quantized). + +[Sign up](https://myaccount.qualcomm.com/signup) to start using Qualcomm AI Hub and run these models on a hosted Qualcomm® device. + +## Deploying Llama 3.2 on-device + +Please follow the [LLM on-device deployment](https://github.com/quic/ai-hub-apps/tree/main/tutorials/llm_on_genie) tutorial. + + + + + +## License +* The license for the original implementation of Llama-v3.2-3B-Chat can be found + [here](https://github.com/facebookresearch/llama/blob/main/LICENSE). +* The license for the compiled assets for on-device deployment can be found [here](https://github.com/facebookresearch/llama/blob/main/LICENSE) + + +## References +* [LLaMA: Open and Efficient Foundation Language Models](https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_2/) +* [Source Model Implementation](https://github.com/meta-llama/llama3/tree/main) + + + +## Community +* Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. +* For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). + + +## Usage and Limitations + +This model may not be used for or in connection with any of the following applications: + +- Accessing essential private and public services and benefits; +- Administration of justice and democratic processes; +- Assessing or recognizing the emotional state of a person; +- Biometric and biometrics-based systems, including categorization of persons based on sensitive characteristics; +- Education and vocational training; +- Employment and workers management; +- Exploitation of the vulnerabilities of persons resulting in harmful behavior; +- General purpose social scoring; +- Law enforcement; +- Management and operation of critical infrastructure; +- Migration, asylum and border control management; +- Predictive policing; +- Real-time remote biometric identification in public spaces; +- Recommender systems of social media platforms; +- Scraping of facial images (from the internet or otherwise); and/or +- Subliminal manipulation + + diff --git a/qai_hub_models/models/llama_v3_2_3b_chat_quantized/__init__.py b/qai_hub_models/models/llama_v3_2_3b_chat_quantized/__init__.py new file mode 100644 index 00000000..142d3feb --- /dev/null +++ b/qai_hub_models/models/llama_v3_2_3b_chat_quantized/__init__.py @@ -0,0 +1,8 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +from qai_hub_models.models._shared.llama.app import ChatApp as App # noqa: F401 + +from .model import MODEL_ID # noqa: F401 +from .model import Llama3_2_Quantized as Model # noqa: F401 diff --git a/qai_hub_models/models/llama_v3_2_3b_chat_quantized/demo.py b/qai_hub_models/models/llama_v3_2_3b_chat_quantized/demo.py new file mode 100644 index 00000000..a76d37f9 --- /dev/null +++ b/qai_hub_models/models/llama_v3_2_3b_chat_quantized/demo.py @@ -0,0 +1,52 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +from __future__ import annotations + +from typing import List, Type + +from qai_hub_models.models._shared.llama3.demo import llama_chat_demo +from qai_hub_models.models._shared.llama3.model import ( + DEFAULT_USER_PROMPT, + END_TOKENS, + get_input_prompt_with_tags, + get_tokenizer, + prepare_combined_attention_mask, +) +from qai_hub_models.models.llama_v3_2_3b_chat_quantized import MODEL_ID, Model +from qai_hub_models.models.llama_v3_2_3b_chat_quantized.model import ( + HF_REPO_NAME, + HF_REPO_URL, +) +from qai_hub_models.utils.base_model import BaseModel, TargetRuntime + + +def llama_3_2_chat_demo( + model_cls: Type[BaseModel] = Model, + model_id: str = MODEL_ID, + end_tokens: set = END_TOKENS, + hf_repo_name: str = HF_REPO_NAME, + hf_repo_url: str = HF_REPO_URL, + default_prompt: str = DEFAULT_USER_PROMPT, + is_test: bool = False, + available_target_runtimes: List[TargetRuntime] = [TargetRuntime.QNN], +): + llama_chat_demo( + model_cls=model_cls, + model_id=model_id, + get_input_prompt_with_tags=get_input_prompt_with_tags, + prepare_combined_attention_mask=prepare_combined_attention_mask, + tokenizer=get_tokenizer(hf_repo_name), + end_tokens=end_tokens, + hf_repo_name=hf_repo_name, + hf_repo_url=hf_repo_url, + default_prompt=default_prompt, + is_test=is_test, + available_target_runtimes=available_target_runtimes, + bundled_kvcache=False, + ) + + +if __name__ == "__main__": + llama_3_2_chat_demo(model_cls=Model) diff --git a/qai_hub_models/models/llama_v3_2_3b_chat_quantized/export.py b/qai_hub_models/models/llama_v3_2_3b_chat_quantized/export.py new file mode 100644 index 00000000..4784b5d9 --- /dev/null +++ b/qai_hub_models/models/llama_v3_2_3b_chat_quantized/export.py @@ -0,0 +1,57 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- + +from __future__ import annotations + +import warnings + +from qai_hub_models.models._shared.llama3.export import export_model +from qai_hub_models.models.llama_v3_2_3b_chat_quantized import MODEL_ID, Model +from qai_hub_models.models.llama_v3_2_3b_chat_quantized.model import ( + NUM_LAYERS_PER_SPLIT, + NUM_SPLITS, +) +from qai_hub_models.utils.args import export_parser + +DEFAULT_EXPORT_DEVICE = "Snapdragon 8 Elite QRD" + +ALL_COMPONENTS = [f"part_{i + 1}_of_{NUM_SPLITS}" for i in range(NUM_SPLITS)] + +# Each components is two sub-components linked together with shared weights +ALL_SUB_COMPONENTS = { + f"part_{i + 1}_of_{NUM_SPLITS}": [ + f"prompt_{i + 1}_of_{NUM_SPLITS}", + f"token_{i + 1}_of_{NUM_SPLITS}", + ] + for i in range(NUM_SPLITS) +} + + +def main(): + warnings.filterwarnings("ignore") + parser = export_parser( + model_cls=Model, + supports_tflite=False, + supports_precompiled_qnn_onnx=False, + default_export_device=DEFAULT_EXPORT_DEVICE, + ) + parser.add_argument( + "--synchronous", + action="store_true", + help="Wait for each command to finish before submitting new.", + ) + args = parser.parse_args() + export_model( + model_cls=Model, + model_name=MODEL_ID, + components=ALL_COMPONENTS, + sub_components=ALL_SUB_COMPONENTS, + num_layers_per_split=NUM_LAYERS_PER_SPLIT, + **vars(args), + ) + + +if __name__ == "__main__": + main() diff --git a/qai_hub_models/models/llama_v3_2_3b_chat_quantized/info.yaml b/qai_hub_models/models/llama_v3_2_3b_chat_quantized/info.yaml new file mode 100644 index 00000000..37416791 --- /dev/null +++ b/qai_hub_models/models/llama_v3_2_3b_chat_quantized/info.yaml @@ -0,0 +1,61 @@ +name: Llama-v3.2-3B-Chat +id: llama_v3_2_3b_chat_quantized +status: public +headline: State-of-the-art large language model useful on a variety of language + understanding and generation tasks. +domain: Generative AI +description: Llama 3 is a family of LLMs. The "Chat" at the end indicates that + the model is optimized for chatbot-like dialogue. The model is quantized to + w4a16 (4-bit weights and 16-bit activations) and part of the model is quantized to + w8a16 (8-bit weights and 16-bit activations) making it suitable for on-device + deployment. For Prompt and output length specified below, the time to first token is + Llama-PromptProcessor-Quantized's latency and average time per addition token is + Llama-TokenGenerator-Quantized's latency. +use_case: Text Generation +tags: + - llm + - generative-ai + - quantized +research_paper: https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_2/ +research_paper_title: "LLaMA: Open and Efficient Foundation Language Models" +license: https://github.com/facebookresearch/llama/blob/main/LICENSE +source_repo: https://github.com/meta-llama/llama3/tree/main +technical_details: + Input sequence length for Prompt Processor: 128 + Context length: 4096 + Number of parameters: 3B + Model size: 2.4G + Precision: w4a16 + w8a16 (few layers) + Num of key-value heads: 8 + Model-1 (Prompt Processor): Llama-PromptProcessor-Quantized + Prompt processor input: 128 tokens + position embeddings + attention mask + KV cache inputs + Prompt processor output: 128 output tokens + KV cache outputs + Model-2 (Token Generator): Llama-TokenGenerator-Quantized + Token generator input: 1 input token + position embeddings + attention mask + KV cache inputs + Token generator output: 1 output token + KV cache outputs + Use: Initiate conversation with prompt-processor and then token generator for subsequent iterations. + Minimum QNN SDK version required: 2.27.7 + Supported languages: English. + TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens). + Response Rate: Rate of response generation after the first response token. +applicable_scenarios: + - Dialogue + - Content Generation + - Customer Support +related_models: + - llama_v3_8b_chat_quantized + - llama_v3_1_8b_chat_quantized +form_factors: + - Phone + - Tablet +has_static_banner: true +has_animated_banner: true +license_type: llama3 +deploy_license: https://github.com/facebookresearch/llama/blob/main/LICENSE +deploy_license_type: llama3 +dataset: [] +restrict_model_sharing: true +model_type_llm: true +llm_details: + call_to_action: 'view_readme' + genie_compatible: true diff --git a/qai_hub_models/models/llama_v3_2_3b_chat_quantized/model.py b/qai_hub_models/models/llama_v3_2_3b_chat_quantized/model.py new file mode 100644 index 00000000..3fd4b15c --- /dev/null +++ b/qai_hub_models/models/llama_v3_2_3b_chat_quantized/model.py @@ -0,0 +1,110 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +from __future__ import annotations + +import os + +from qai_hub_models.models._shared.llama3.model import ( + DEFAULT_CONTEXT_LENGTH, + Llama3Base_Quantized, +) +from qai_hub_models.utils.asset_loaders import CachedWebModelAsset +from qai_hub_models.utils.input_spec import InputSpec + +MODEL_ID = __name__.split(".")[-2] +MODEL_ASSET_VERSION = 1 +DEFAULT_ENCODINGS = "llama32.encodings" +DEFAULT_ENCODINGS_ZIP = DEFAULT_ENCODINGS + ".zip" + +NUM_LAYERS = 28 +NUM_SPLITS = 3 +NUM_LAYERS_PER_SPLIT = 14 + +# Hugging face repo name and url +HF_REPO_NAME = "meta-llama/Llama-3.2-3B-Instruct" +HF_REPO_URL = f"https://huggingface.co/meta-llama/{HF_REPO_NAME}" + +# Minimum memory (RAM+swap) recommended for export. +# TODO: #10762 should reduce once AIMET export consumes less memory during export. TODO!!! Not quite correct, since we are not using AIMET +MIN_MEMORY_RECOMMENDED = 40 # TODO: Does this work for Llama 3? + + +class Llama3_2_Quantized(Llama3Base_Quantized): + def __init__(self, huggingface_model_name: str = HF_REPO_NAME, *args, **kwargs): + super().__init__( + huggingface_model_name=huggingface_model_name, + min_memory_recommended=MIN_MEMORY_RECOMMENDED, + *args, + **kwargs, + ) + + @classmethod + def from_pretrained( + cls, + sequence_length: int, + context_length: int = DEFAULT_CONTEXT_LENGTH, + aimet_encodings: str | None = "DEFAULT", + huggingface_model_name: str = HF_REPO_NAME, + ) -> "Llama3_2_Quantized": + """ + Load a pre-trained Llama 3.2 (3B) model from Meta via HuggingFace. + + sequence_length: + Instantiate with this token sequence length input. A longer + sequence length means the model is capable of processing more + tokens at once. This can only be set to greater than one to process + prompts, since responses are auto-regressive in nature and require + this to be 1. + context_length: + Total context length of model. Longer context length means the + model is more capable of making longer connections in the input + prompt. However, it also hurts runtime performance (both time-to- + first-token and tokens-per-second), so this is a tradeoff that may + depend on the use case. + aimet_encodings: + Path to AIMET quantization encodings file. + huggingface_model_name: + Name or URL of the HuggingFace model. Change this if you want to + change the weights. + """ + if aimet_encodings: + if aimet_encodings == "DEFAULT": + aimet_encodings = os.path.join( + CachedWebModelAsset.from_asset_store( + MODEL_ID, MODEL_ASSET_VERSION, DEFAULT_ENCODINGS_ZIP + ).fetch(extract=True), + DEFAULT_ENCODINGS, + ) + + return cls( + aimet_encodings=aimet_encodings, + sequence_length=sequence_length, + context_length=context_length, + huggingface_model_name=huggingface_model_name, + ) + + @staticmethod + def get_output_names(num_hidden_layers: int = NUM_LAYERS): + return Llama3Base_Quantized.get_output_names( + num_hidden_layers=num_hidden_layers + ) + + @staticmethod + def get_input_spec( + num_hidden_layers: int = NUM_LAYERS, + input_seq_length: int = 128, + context_length: int = DEFAULT_CONTEXT_LENGTH, + hidden_size: int = 3072, + num_key_value_heads: int = 8, + num_attention_heads: int = 24, + ) -> InputSpec: + return Llama3Base_Quantized.get_input_spec( + num_hidden_layers=NUM_LAYERS, + input_seq_length=input_seq_length, + context_length=context_length, + hidden_size=hidden_size, + num_key_value_heads=num_key_value_heads, + num_attention_heads=num_attention_heads, + ) diff --git a/qai_hub_models/models/llama_v3_2_3b_chat_quantized/perf.yaml b/qai_hub_models/models/llama_v3_2_3b_chat_quantized/perf.yaml new file mode 100644 index 00000000..a8e23cb8 --- /dev/null +++ b/qai_hub_models/models/llama_v3_2_3b_chat_quantized/perf.yaml @@ -0,0 +1,27 @@ +aggregated: + supported_devices: + - Snapdragon 8 Elite QRD + - Snapdragon 8 Gen 3 QRD + supported_oses: + - Android + supported_chipsets: + - Snapdragon® 8 Elite + - Snapdragon® 8 Gen 3 +models: + name: Llama-v3.2-3B-Chat + performance_metrics: + - torchscript_onnx_qnn: + llm_metrics: + time_to_first_token_range: + min: 88195 + max: 2822250 + tokens_per_second: 23.4718 + evaluation_metrics: null + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-16T00:32:42.210701Z' diff --git a/qai_hub_models/models/llama_v3_2_3b_chat_quantized/requirements.txt b/qai_hub_models/models/llama_v3_2_3b_chat_quantized/requirements.txt new file mode 100644 index 00000000..c5deadcc --- /dev/null +++ b/qai_hub_models/models/llama_v3_2_3b_chat_quantized/requirements.txt @@ -0,0 +1,5 @@ +onnx==1.16.2 +transformers==4.45.0 +huggingface_hub==0.23.2 +sentencepiece==0.2.0 +psutil diff --git a/qai_hub_models/models/llama_v3_2_3b_chat_quantized/test.py b/qai_hub_models/models/llama_v3_2_3b_chat_quantized/test.py new file mode 100644 index 00000000..44bed787 --- /dev/null +++ b/qai_hub_models/models/llama_v3_2_3b_chat_quantized/test.py @@ -0,0 +1,14 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +import pytest + +from qai_hub_models.models.llama_v3_2_3b_chat_quantized.demo import llama_3_2_chat_demo + + +@pytest.mark.skip("#105 move slow_cloud and slow tests to nightly.") +@pytest.mark.slow_cloud +def test_demo(): + # Run demo and verify it does not crash + llama_3_2_chat_demo(is_test=True) diff --git a/qai_hub_models/models/llama_v3_8b_chat_quantized/README.md b/qai_hub_models/models/llama_v3_8b_chat_quantized/README.md index 27678cb2..c2ca1a4b 100644 --- a/qai_hub_models/models/llama_v3_8b_chat_quantized/README.md +++ b/qai_hub_models/models/llama_v3_8b_chat_quantized/README.md @@ -3,10 +3,10 @@ # [Llama-v3-8B-Chat: State-of-the-art large language model useful on a variety of language understanding and generation tasks](https://aihub.qualcomm.com/models/llama_v3_8b_chat_quantized) -Llama 3 is a family of LLMs. The "Chat" at the end indicates that the model is optimized for chatbot-like dialogue. The model is quantized to w4a16(4-bit weights and 16-bit activations) and part of the model is quantized to w8a16(8-bit weights and 16-bit activations) making it suitable for on-device deployment. For Prompt and output length specified below, the time to first token is Llama-PromptProcessor-Quantized's latency and average time per addition token is Llama-TokenGenerator-KVCache-Quantized's latency. +Llama 3 is a family of LLMs. The "Chat" at the end indicates that the model is optimized for chatbot-like dialogue. The model is quantized to w4a16 (4-bit weights and 16-bit activations) and part of the model is quantized to w8a16 (8-bit weights and 16-bit activations) making it suitable for on-device deployment. For Prompt and output length specified below, the time to first token is Llama-PromptProcessor-Quantized's latency and average time per addition token is Llama-TokenGenerator-Quantized's latency. This is based on the implementation of Llama-v3-8B-Chat found -[here](https://github.com/meta-llama/llama3/tree/main). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/llama_v3_8b_chat_quantized). @@ -14,88 +14,24 @@ accross various devices, can be found [here](https://aihub.qualcomm.com/models/l ## Deploying Llama 3 on-device -Large Language Model (LLM) such as [Llama 2](https://llama.meta.com/llama3/) has the following complexities to deploy on-device: -1. Model size is too large to fit in device memory for inference -2. Multi-Head Attention (MHA) has large activations leading to fallback from accelerators -3. High model load and inference time +Please follow the [LLM on-device deployment](https://github.com/quic/ai-hub-apps/tree/main/tutorials/llm_on_genie) tutorial. -We can tackle the above constraints with the following steps: -1. Quantize weights to reduce on-disk model size, e.g., int8 or int4 weights -2. Quantize activations to reduce inference time memory pressure -3. Graph transformations to reduce inference time memory pressure, e.g., Multi-Head to Split-Head Attention (MHA -> SHA) -4. Graph transformations to convert or decompose operations into more accelerator friendly operations e.g. Linear to Conv -5. For LLM with 7B or more parameters, above steps are still not good enough on mobile, - hence we go one step further and split model into sub-parts. -Here, we divide the model into 4 parts in order to -1. Make model exportable with low memory usage -2. Avoid inference time out-of-memory errors -In order to export Llama 3, please ensure -1. Host machine has >40GB memory (RAM+swap-space) -2. If you don't have enough memory, export.py will dump instructions to increase swap space accordingly -## Sample output prompts generated on-device -1. --prompt "where is California?" -``` -------- Response Summary -------- -Prompt: where is California? -Response: California is a state located on the West Coast of -``` - -2. --prompt "what is 2+3?" --max-output-tokens 30 -``` --------- Response Summary -------- -Prompt: what is 2+3? -Response: 2 + 3 = 5 -``` - -3. --prompt "what is superposition in Quantum Physics?" --max-output-tokens 30 -``` -Prompt: what is superposition in Quantum Physics? -Response: Superposition is a fundamental concept in quantum mechanics, which is a branch of physics that studies the behavior of matter and energy at a very -``` - - - -## Example & Usage - -Install the package via pip: -```bash -pip install "qai_hub_models[llama_v3_8b_chat_quantized]" -``` - - -Once installed, run the following simple CLI demo: - -```bash -python -m qai_hub_models.models.llama_v3_8b_chat_quantized.demo -``` -More details on the CLI tool can be found with the `--help` option. See -[demo.py](demo.py) for sample usage of the model including pre/post processing -scripts. Please refer to our [general instructions on using -models](../../../#getting-started) for more usage instructions. - -## Export for on-device deployment - -This repository contains export scripts that produce a model optimized for -on-device deployment. This can be run as follows: - -```bash -python -m qai_hub_models.models.llama_v3_8b_chat_quantized.export -``` -Additional options are documented with the `--help` option. Note that the above -script requires access to Deployment instructions for Qualcomm® AI Hub. ## License -- The license for the original implementation of Llama-v3-8B-Chat can be found +* The license for the original implementation of Llama-v3-8B-Chat can be found [here](https://github.com/facebookresearch/llama/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://github.com/facebookresearch/llama/blob/main/LICENSE) +* The license for the compiled assets for on-device deployment can be found [here](https://github.com/facebookresearch/llama/blob/main/LICENSE) + ## References * [LLaMA: Open and Efficient Foundation Language Models](https://ai.meta.com/blog/meta-llama-3/) * [Source Model Implementation](https://github.com/meta-llama/llama3/tree/main) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/llama_v3_8b_chat_quantized/demo.py b/qai_hub_models/models/llama_v3_8b_chat_quantized/demo.py index 762a20e0..246c67cd 100644 --- a/qai_hub_models/models/llama_v3_8b_chat_quantized/demo.py +++ b/qai_hub_models/models/llama_v3_8b_chat_quantized/demo.py @@ -6,63 +6,25 @@ from typing import List, Type -from qai_hub_models.models._shared.llama.demo import llama_chat_demo -from qai_hub_models.models.llama_v3_8b_chat_quantized import MODEL_ID, Model -from qai_hub_models.models.llama_v3_8b_chat_quantized.model import ( +from qai_hub_models.models._shared.llama3.demo import llama_chat_demo +from qai_hub_models.models._shared.llama3.model import ( DEFAULT_USER_PROMPT, END_TOKENS, - HF_REPO_NAME, - HF_REPO_URL, - MODEL_SPLIT_MAP, - NUM_KEY_VAL_HEADS, - NUM_SPLITS, - Llama3_PromptProcessor_1_Quantized, - Llama3_PromptProcessor_2_Quantized, - Llama3_PromptProcessor_3_Quantized, - Llama3_PromptProcessor_4_Quantized, - Llama3_PromptProcessor_5_Quantized, - Llama3_TokenGenerator_1_Quantized, - Llama3_TokenGenerator_2_Quantized, - Llama3_TokenGenerator_3_Quantized, - Llama3_TokenGenerator_4_Quantized, - Llama3_TokenGenerator_5_Quantized, get_input_prompt_with_tags, get_tokenizer, prepare_combined_attention_mask, ) +from qai_hub_models.models.llama_v3_8b_chat_quantized import MODEL_ID, Model +from qai_hub_models.models.llama_v3_8b_chat_quantized.model import ( + HF_REPO_NAME, + HF_REPO_URL, +) from qai_hub_models.utils.base_model import BaseModel, TargetRuntime -def _get_model_class(split_part: int, is_token_generator: bool = False): - if split_part < 1 or split_part > 5: - raise RuntimeError( - "Incorrect index provided to request Model split class." - f" Must be within (1-5), provided ({split_part})." - ) - - if is_token_generator: - return [ - Llama3_TokenGenerator_1_Quantized, - Llama3_TokenGenerator_2_Quantized, - Llama3_TokenGenerator_3_Quantized, - Llama3_TokenGenerator_4_Quantized, - Llama3_TokenGenerator_5_Quantized, - ][split_part - 1] - return [ - Llama3_PromptProcessor_1_Quantized, - Llama3_PromptProcessor_2_Quantized, - Llama3_PromptProcessor_3_Quantized, - Llama3_PromptProcessor_4_Quantized, - Llama3_PromptProcessor_5_Quantized, - ][split_part - 1] - - def llama_3_chat_demo( model_cls: Type[BaseModel] = Model, model_id: str = MODEL_ID, - num_splits: int = NUM_SPLITS, - num_key_val_heads: int = NUM_KEY_VAL_HEADS, - model_split_map: dict = MODEL_SPLIT_MAP, end_tokens: set = END_TOKENS, hf_repo_name: str = HF_REPO_NAME, hf_repo_url: str = HF_REPO_URL, @@ -73,13 +35,9 @@ def llama_3_chat_demo( llama_chat_demo( model_cls=model_cls, model_id=model_id, - get_model_class=_get_model_class, get_input_prompt_with_tags=get_input_prompt_with_tags, prepare_combined_attention_mask=prepare_combined_attention_mask, - tokenizer=get_tokenizer(), - num_splits=num_splits, - num_key_val_heads=num_key_val_heads, - model_split_map=model_split_map, + tokenizer=get_tokenizer(hf_repo_name), end_tokens=end_tokens, hf_repo_name=hf_repo_name, hf_repo_url=hf_repo_url, @@ -91,4 +49,4 @@ def llama_3_chat_demo( if __name__ == "__main__": - llama_3_chat_demo() + llama_3_chat_demo(model_cls=Model) diff --git a/qai_hub_models/models/llama_v3_8b_chat_quantized/export.py b/qai_hub_models/models/llama_v3_8b_chat_quantized/export.py index fc3e5c20..3ed9600d 100644 --- a/qai_hub_models/models/llama_v3_8b_chat_quantized/export.py +++ b/qai_hub_models/models/llama_v3_8b_chat_quantized/export.py @@ -5,288 +5,52 @@ from __future__ import annotations -import os import warnings -from pathlib import Path -from typing import Any, Dict, List, Mapping, Optional, Tuple, cast -import qai_hub as hub - -from qai_hub_models.models.llama_v3_8b_chat_quantized import Model -from qai_hub_models.utils.args import ( - export_parser, - get_input_spec_kwargs, - get_model_kwargs, -) -from qai_hub_models.utils.base_model import TargetRuntime -from qai_hub_models.utils.compare import torch_inference -from qai_hub_models.utils.printing import ( - print_inference_metrics, - print_on_target_demo_cmd, - print_profile_metrics_from_job, +from qai_hub_models.models._shared.llama3.export import export_model +from qai_hub_models.models.llama_v3_8b_chat_quantized import MODEL_ID, Model +from qai_hub_models.models.llama_v3_8b_chat_quantized.model import ( + NUM_LAYERS_PER_SPLIT, + NUM_SPLITS, ) -from qai_hub_models.utils.qai_hub_helpers import ( - can_access_qualcomm_ai_hub, - export_without_hub_access, -) - -ALL_COMPONENTS = [ - "PromptProcessor_1_Quantized", - "PromptProcessor_2_Quantized", - "PromptProcessor_3_Quantized", - "PromptProcessor_4_Quantized", - "PromptProcessor_5_Quantized", - "TokenGenerator_1_Quantized", - "TokenGenerator_2_Quantized", - "TokenGenerator_3_Quantized", - "TokenGenerator_4_Quantized", - "TokenGenerator_5_Quantized", -] -DEFAULT_COMPONENTS = [ - "PromptProcessor_1_Quantized", - "PromptProcessor_2_Quantized", - "PromptProcessor_3_Quantized", - "PromptProcessor_4_Quantized", - "PromptProcessor_5_Quantized", - "TokenGenerator_1_Quantized", - "TokenGenerator_2_Quantized", - "TokenGenerator_3_Quantized", - "TokenGenerator_4_Quantized", - "TokenGenerator_5_Quantized", -] - -DEFAULT_EXPORT_DEVICE = "Samsung Galaxy S24 (Family)" - - -def export_model( - device: str = DEFAULT_EXPORT_DEVICE, - components: Optional[List[str]] = None, - skip_profiling: bool = False, - skip_inferencing: bool = False, - skip_downloading: bool = False, - skip_summary: bool = False, - output_dir: Optional[str] = None, - target_runtime: TargetRuntime = TargetRuntime.QNN, - compile_options: str = "", - profile_options: str = "", - **additional_model_kwargs, -) -> Mapping[ - str, Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] -] | List[str]: - """ - This function accomplishes 6 main tasks: - - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. - - Each of the last four steps can be optionally skipped using the input options. - - Parameters: - device: Device for which to export the model. - Full list of available devices can be found by running `hub.get_devices()`. - Defaults to DEFAULT_DEVICE if not specified. - components: List of sub-components of the model that will be exported. - Each component is compiled and profiled separately. - Defaults to ALL_COMPONENTS if not specified. - skip_profiling: If set, skips profiling of compiled model on real devices. - skip_inferencing: If set, skips computing on-device outputs from sample data. - skip_downloading: If set, skips downloading of compiled model. - skip_summary: If set, skips waiting for and summarizing results - from profiling and inference. - output_dir: Directory to store generated assets (e.g. compiled model). - Defaults to `/build/`. - target_runtime: Which on-device runtime to target. Default is TFLite. - compile_options: Additional options to pass when submitting the compile job. - profile_options: Additional options to pass when submitting the profile job. - **additional_model_kwargs: Additional optional kwargs used to customize - `model_cls.from_pretrained` - - Returns: - A Mapping from component_name to a 3-tuple of: - * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). - * An InferenceJob containing metadata about the inference job (None if inferencing skipped). - """ - model_name = "llama_v3_8b_chat_quantized" - output_path = Path(output_dir or Path.cwd() / "build" / model_name) - component_arg = components - components = components or DEFAULT_COMPONENTS - for component_name in components: - if component_name not in ALL_COMPONENTS: - raise ValueError(f"Invalid component {component_name}.") - if not can_access_qualcomm_ai_hub(): - return export_without_hub_access( - "llama_v3_8b_chat_quantized", - "Llama-v3-7B-Chat", - device, - skip_profiling, - skip_inferencing, - skip_downloading, - skip_summary, - output_path, - target_runtime, - compile_options, - profile_options, - component_arg, - ) - - # 1. Initialize PyTorch model - model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) - - compile_jobs: Dict[str, hub.client.CompileJob] = {} - profile_options_per_component: Dict[str, str] = {} - - hub_device = hub.Device(device) - for component_name in components: - # Load model part - component = model.load_model_part(component_name) - - input_spec = component.get_input_spec( - **get_input_spec_kwargs(component, additional_model_kwargs) - ) +from qai_hub_models.utils.args import export_parser - # Trace the model - source_model = component.convert_to_hub_source_model( - target_runtime, - output_path, - input_spec, - external_onnx_weights=True, - output_names=component.get_output_names(), - ) +DEFAULT_EXPORT_DEVICE = "Snapdragon 8 Elite QRD" - if target_runtime == TargetRuntime.TFLITE: - quant_calibration_data = None - else: - quant_calibration_data = component.get_calibration_data( - target_runtime, input_spec=input_spec - ) +ALL_COMPONENTS = [f"part_{i + 1}_of_{NUM_SPLITS}" for i in range(NUM_SPLITS)] - # 2. Compile the models to an on-device asset - model_compile_options = component.get_hub_compile_options( - target_runtime, compile_options - ) - print(f"Optimizing model {component_name} to run on-device") - submitted_compile_job = hub.submit_compile_job( - model=source_model, - input_specs=input_spec, - device=hub_device, - name=f"{model_name}_{component_name}", - calibration_data=quant_calibration_data, - options=model_compile_options, - ) - - compile_jobs[component_name] = cast( - hub.client.CompileJob, submitted_compile_job - ) - profile_options_per_component[ - component_name - ] = component.get_hub_profile_options(target_runtime, profile_options) - - # Free model part to reduce memory-pressure - del component - - # 3. Profile the model assets on real devices - profile_jobs: Dict[str, hub.client.ProfileJob] = {} - if not skip_profiling: - for component_name in components: - profile_options_all = profile_options_per_component[component_name] - print(f"Profiling model {component_name} on a hosted device.") - submitted_profile_job = hub.submit_profile_job( - model=compile_jobs[component_name].get_target_model(), - device=hub_device, - name=f"{model_name}_{component_name}", - options=profile_options_all, - ) - profile_jobs[component_name] = cast( - hub.client.ProfileJob, submitted_profile_job - ) - - # 4. Run inference on-device with sample inputs - inference_jobs: Dict[str, hub.client.InferenceJob] = {} - - if not skip_inferencing: - for component_name in components: - print( - f"Running inference for {component_name} on a hosted device with example inputs." - ) - # Load model with no-AIMET mode - component = model.load_model_part(component_name) - profile_options_all = profile_options_per_component[component_name] - # Load individual model part - sample_inputs = component.sample_inputs() - submitted_inference_job = hub.submit_inference_job( - model=compile_jobs[component_name].get_target_model(), - inputs=sample_inputs, - device=hub_device, - name=f"{model_name}_{component_name}", - options=profile_options_all, - ) - inference_jobs[component_name] = cast( - hub.client.InferenceJob, submitted_inference_job - ) - - # 5. Download the model assets to a local file - if not skip_downloading: - os.makedirs(output_path, exist_ok=True) - for component_name, compile_job in compile_jobs.items(): - target_model: hub.Model = compile_job.get_target_model() # type: ignore - target_model.download( - str(output_path / f"{model_name}_{component_name}.bin") - ) - - # 6. Summarize the results from profiling and inference - if not skip_summary and not skip_profiling: - for component_name in components: - profile_job = profile_jobs[component_name] - assert profile_job is not None and profile_job.wait().success - profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore - print_profile_metrics_from_job(profile_job, profile_data) - - if not skip_summary and not skip_inferencing: - for component_name in components: - inference_job = inference_jobs[component_name] - # Load individual model part - component = model.load_model_part(component_name) - # Get ordered model output names - output_names = component.get_output_names() - sample_inputs = component.sample_inputs() - torch_out = torch_inference(component, sample_inputs) - assert inference_job is not None and inference_job.wait().success - inference_result: hub.client.DatasetEntries = inference_job.download_output_data() # type: ignore - print_inference_metrics( - inference_job, inference_result, torch_out, output_names=output_names - ) - - if not skip_summary: - print_on_target_demo_cmd( - compile_jobs.values(), Path(__file__).parent.resolve(), hub_device - ) - - return { - component_name: ( - compile_jobs[component_name], - profile_jobs.get(component_name, None), - inference_jobs.get(component_name, None), - ) - for component_name in components - } +# Each components is two sub-components linked together with shared weights +ALL_SUB_COMPONENTS = { + f"part_{i + 1}_of_{NUM_SPLITS}": [ + f"prompt_{i + 1}_of_{NUM_SPLITS}", + f"token_{i + 1}_of_{NUM_SPLITS}", + ] + for i in range(NUM_SPLITS) +} def main(): warnings.filterwarnings("ignore") parser = export_parser( model_cls=Model, - components=ALL_COMPONENTS, supports_tflite=False, supports_precompiled_qnn_onnx=False, default_export_device=DEFAULT_EXPORT_DEVICE, ) + parser.add_argument( + "--synchronous", + action="store_true", + help="Wait for each command to finish before submitting new.", + ) args = parser.parse_args() - export_model(**vars(args)) + export_model( + model_cls=Model, + model_name=MODEL_ID, + components=ALL_COMPONENTS, + sub_components=ALL_SUB_COMPONENTS, + num_layers_per_split=NUM_LAYERS_PER_SPLIT, + **vars(args), + ) if __name__ == "__main__": diff --git a/qai_hub_models/models/llama_v3_8b_chat_quantized/info.yaml b/qai_hub_models/models/llama_v3_8b_chat_quantized/info.yaml index 3d38e57b..6ef32977 100644 --- a/qai_hub_models/models/llama_v3_8b_chat_quantized/info.yaml +++ b/qai_hub_models/models/llama_v3_8b_chat_quantized/info.yaml @@ -6,11 +6,11 @@ headline: State-of-the-art large language model useful on a variety of language domain: Generative AI description: Llama 3 is a family of LLMs. The "Chat" at the end indicates that the model is optimized for chatbot-like dialogue. The model is quantized to - w4a16(4-bit weights and 16-bit activations) and part of the model is quantized to - w8a16(8-bit weights and 16-bit activations) making it suitable for on-device + w4a16 (4-bit weights and 16-bit activations) and part of the model is quantized to + w8a16 (8-bit weights and 16-bit activations) making it suitable for on-device deployment. For Prompt and output length specified below, the time to first token is Llama-PromptProcessor-Quantized's latency and average time per addition token is - Llama-TokenGenerator-KVCache-Quantized's latency. + Llama-TokenGenerator-Quantized's latency. use_case: Text Generation tags: - llm @@ -21,25 +21,30 @@ research_paper_title: "LLaMA: Open and Efficient Foundation Language Models" license: https://github.com/facebookresearch/llama/blob/main/LICENSE source_repo: https://github.com/meta-llama/llama3/tree/main technical_details: + Input sequence length for Prompt Processor: 128 + Context length: 4096 Number of parameters: 8B + Model size: 4.8GB Precision: w4a16 + w8a16 (few layers) Num of key-value heads: 8 Model-1 (Prompt Processor): Llama-PromptProcessor-Quantized - Max context length: 1024 - Prompt processor model size: 4.8GB - Prompt processor input: 1024 tokens - Prompt processor output: 1024 output tokens + KVCache for token generator - Model-2 (Token Generator): Llama-TokenGenerator-KVCache-Quantized - Token generator model size: 4.8GB - Token generator input: 1 input token + past KVCache - Token generator output: 1 output token + KVCache for next iteration - Decoding length: 1024 (1 output token + 1023 from KVCache) + Prompt processor input: 128 tokens + position embeddings + attention mask + KV cache inputs + Prompt processor output: 128 output tokens + KV cache outputs + Model-2 (Token Generator): Llama-TokenGenerator-Quantized + Token generator input: 1 input token + position embeddings + attention mask + KV cache inputs + Token generator output: 1 output token + KV cache outputs Use: Initiate conversation with prompt-processor and then token generator for subsequent iterations. + Minimum QNN SDK version required: 2.27.7 + Supported languages: English. + TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens). + Response Rate: Rate of response generation after the first response token. applicable_scenarios: - Dialogue - Content Generation - Customer Support -related_models: [] +related_models: + - llama_v3_1_8b_chat_quantized + - llama_v3_2_3b_chat_quantized form_factors: - Phone - Tablet @@ -50,3 +55,7 @@ deploy_license: https://github.com/facebookresearch/llama/blob/main/LICENSE deploy_license_type: llama3 dataset: [] restrict_model_sharing: true +model_type_llm: true +llm_details: + call_to_action: 'view_readme' + genie_compatible: true diff --git a/qai_hub_models/models/llama_v3_8b_chat_quantized/model.py b/qai_hub_models/models/llama_v3_8b_chat_quantized/model.py index bb025332..ffe75e12 100644 --- a/qai_hub_models/models/llama_v3_8b_chat_quantized/model.py +++ b/qai_hub_models/models/llama_v3_8b_chat_quantized/model.py @@ -5,1726 +5,106 @@ from __future__ import annotations import os -from typing import Optional, Tuple -import torch -from qai_hub.client import DatasetEntries - -from qai_hub_models.models._shared.llama.model import ( - DEFAULT_INPUT_SEQ_LEN, - Llama_QuantizedMixin, - RopeEmbedding, - get_hidden_layer_range_from_split, - get_past_key_names, - get_past_keyval_with_shift, - load_input_cached_data, - make_torch_compatible_past_key_values, - save_input_cached_data, -) -from qai_hub_models.models.llama_v3_8b_chat_quantized.modeling_llama import ( # RopeEmbedding, - LlamaForCausalLM, - LlamaModel, +from qai_hub_models.models._shared.llama3.model import ( + DEFAULT_CONTEXT_LENGTH, + Llama3Base_Quantized, ) from qai_hub_models.utils.asset_loaders import CachedWebModelAsset -from qai_hub_models.utils.base_model import CollectionModel, TargetRuntime -from qai_hub_models.utils.huggingface import ( - ensure_has_required_transformer, - has_model_access, -) from qai_hub_models.utils.input_spec import InputSpec -from qai_hub_models.utils.model_adapters import flatten, suppress_warnings -from qai_hub_models.utils.system_info import has_recommended_memory - -MIN_TRANFORMER_VERSION = "4.40.0" - - -# isort: off - -# TODO: 10761 remove transformer version check once AIMET -# transformer restriction is uplifted. -ensure_has_required_transformer(MIN_TRANFORMER_VERSION) -from transformers import AutoConfig, AutoTokenizer # noqa: E402 - MODEL_ID = __name__.split(".")[-2] -MODEL_ASSET_VERSION = 2 - -# Configs -AIMET_ENCODINGS_PREFIX = "config" -AIMET_CONFIG = "default_config_llama" +MODEL_ASSET_VERSION = 4 +DEFAULT_ENCODINGS = "llama3.encodings" +DEFAULT_ENCODINGS_ZIP = DEFAULT_ENCODINGS + ".zip" -# Model parameters -MAX_HIDDEN_LAYERS = 32 -MAX_POS_EMBEDDINGS = 1024 -ATTENTION_HIDDEN_DIM = 4096 -POS_EMBED_DIM = 64 -DATA_DIR = "data" -USE_CACHED_DATA = True +NUM_LAYERS = 32 NUM_SPLITS = 5 -NUM_KEY_VAL_HEADS = 8 - -# Model split map to track DecodeLayer split for each part -# key (model split number) -> -# value Tuple of (start index of decoder Layer, end index of decode layer) -MODEL_SPLIT_MAP = { - 1: (0, 4), - 2: (4, 12), - 3: (12, 20), - 4: (20, 28), - 5: (28, 32), -} +NUM_LAYERS_PER_SPLIT = 9 # Hugging face repo name and url HF_REPO_NAME = "meta-llama/Meta-Llama-3-8B-Instruct" -HF_REPO_URL = "https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct" +HF_REPO_URL = f"https://huggingface.co/meta-llama/{HF_REPO_NAME}" # Minimum memory (RAM+swap) recommended for export. -# TODO: #10762 should reduce once AIMET export consumes less memory during export. -MIN_MEMORY_RECOMMENDED = 40 - -## Ref: https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/ -BEGIN_TEXT = "<|begin_of_text|>" -END_TEXT = "<|begin_of_text|>" -START_HEADER = "<|start_header_id|>" -END_HEADER = "<|end_header_id|>" -SYSTEM_ID = "system" -ASSISTANT_ID = "assistant" -USER_ID = "user" -EOT_ID = "<|eot_id|>" -END_TOKENS = {"<|eot_id|>", "<|end_of_text|>"} - -DEFAULT_PROMPT_CONTEXT = "You are a helpful AI assistant" -DEFAULT_USER_PROMPT = "Hi! What is 2+3?" - - -def get_input_prompt_with_tags( - previous_history: str = "", - system_context_prompt: str = DEFAULT_PROMPT_CONTEXT, - user_input_prompt: str = DEFAULT_USER_PROMPT, -): - """ - Get prompt to set context and initialize prompt-processor - """ - prompt = previous_history - prompt += "" if len(previous_history) == 0 else "" - - prompt = f"""{BEGIN_TEXT}{START_HEADER}{SYSTEM_ID}{END_HEADER} - -{system_context_prompt} -{START_HEADER}{USER_ID}{END_HEADER} - -{user_input_prompt}{EOT_ID}{START_HEADER}{ASSISTANT_ID}{END_HEADER} - - -""" - return prompt - - -def get_tokenizer(): - """ - Tokenizer to use for Llama3 - """ - tokenizer = AutoTokenizer.from_pretrained(HF_REPO_NAME, is_fast=False) - tokenizer.padding_side = "left" - tokenizer.pad_token = tokenizer.eos_token - tokenizer.pad_token_id = tokenizer.eos_token_id - tokenizer.truncation_side = "left" - return tokenizer - - -def prepare_combined_attention_mask( - attention_mask: torch.Tensor, - input_shape: Optional[Tuple] = None, - past_key_values_length: int = 0, - dtype: torch.dtype = torch.float32, -): - """ - Creates combined attention_mask from given input attention_mask - Input attention_mask: 2d (1, input_seq_len) - Output attention_mask: 4d (1, 1, input_seq_length, input_seq_length) - """ - if input_shape is None: - input_shape = attention_mask.shape - dummy_enbedding = torch.tensor((1.0,)).to(dtype) - new_mask = LlamaModel._prepare_decoder_attention_mask( - attention_mask, input_shape, dummy_enbedding, past_key_values_length - ) - return new_mask - +# TODO: #10762 should reduce once AIMET export consumes less memory during export. TODO!!! Not quite correct, since we are not using AIMET +MIN_MEMORY_RECOMMENDED = 40 # TODO: Does this work for Llama 3? -class Llama3Wrapper(torch.nn.Module): - def __init__( - self, - max_position_embeddings: int = MAX_POS_EMBEDDINGS, - split_part: int = 1, - is_token_generator: bool = False, - ): - super().__init__() - model_type = "TokenGenerator" if is_token_generator else "PromptProcessor" - self.is_token_generator = is_token_generator - print(f"Loading Llama3 {model_type} {split_part}/{NUM_SPLITS}") - - config = AutoConfig.from_pretrained(HF_REPO_NAME, torchscript=True) - hidden_layers = 32 - config.num_hidden_layers = hidden_layers - config.max_position_embeddings = max_position_embeddings - config.num_attention_heads = 32 - config.block_size = 4096 - config.num_key_value_heads = NUM_KEY_VAL_HEADS - config.num_logits_to_return = 1 - config.shift_cache = False - config.transposed_key_cache = True - config.return_new_key_value_only = True - config.return_top_k = 0 - config.logit_temperature = 1.0 - config.use_combined_mask_input = True - config.use_sha = True - config.use_conv = True - config.mask_neg = -100 - config.split_model = split_part - if split_part < 1 or split_part > 5: - raise RuntimeError( - f"Llama3 split_part must be within 1-5 (Provided {split_part})." - ) - - hidden_layers_start, hidden_layers_end = get_hidden_layer_range_from_split( - split_part, MODEL_SPLIT_MAP - ) - config.hidden_layers_start = hidden_layers_start - config.hidden_layers_end = hidden_layers_end - self.total_hidden_layers = hidden_layers_end - hidden_layers_start - - print("Loading model") - self.model = LlamaForCausalLM.from_pretrained(HF_REPO_NAME, config=config) - self.model.eval() - - if ( - hidden_layers_start < 0 - or hidden_layers_start > MAX_HIDDEN_LAYERS - or hidden_layers_end < 0 - or hidden_layers_end > MAX_HIDDEN_LAYERS - or hidden_layers_start >= hidden_layers_end - ): - raise RuntimeError( - f"Incorrect hidden_layers range provided. Must be within 0-32 (provided {hidden_layers_start}-{hidden_layers_end})." - ) - - # Reduce # of hidden layers as per split - self.model.model.layers = self.model.model.layers[ - hidden_layers_start:hidden_layers_end - ] - - # Apply model conversion - # Convert MHA to SHA - use_sha = config.use_sha - use_conv = config.use_conv - # Convert Linear to 1x1 Conv2D - if use_conv: - for _, module in self.model.named_modules(): - if type(module).__name__ in { - "LlamaMLP", - "LlamaForCausalLM", - "LlamaAttention", - }: - module.prepare_conv() - - if use_sha: - for _, module in self.model.named_modules(): - if type(module).__name__ == "LlamaAttention": - module.prepare_sha() - - def forward( - self, - input_ids, - attention_mask, - position_ids_cos, - position_ids_sin, - *past_key_values, - ): - if self.is_token_generator: - out = self.forward_token_generator( - input_ids, - attention_mask, - position_ids_cos, - position_ids_sin, - *past_key_values, - ) - else: - out = self.forward_prompt_processor( - input_ids, attention_mask, position_ids_cos, position_ids_sin - ) - # Flatten past_key_values - return tuple( - out[:1], - ) + tuple(flatten(out[1])) - - def forward_prompt_processor( - self, input_ids, attention_mask, position_ids_cos, position_ids_sin - ): - return self.model( - input_ids, attention_mask, position_ids=(position_ids_cos, position_ids_sin) - ) - - def forward_token_generator( - self, - input_ids, - attention_mask, - position_ids_cos, - position_ids_sin, - *past_key_values, - ): - past_key_values_tuple = make_torch_compatible_past_key_values( - self.total_hidden_layers, 8, False, *past_key_values +class Llama3_Quantized(Llama3Base_Quantized): + def __init__(self, huggingface_model_name: str = HF_REPO_NAME, *args, **kwargs): + super().__init__( + huggingface_model_name=huggingface_model_name, + min_memory_recommended=MIN_MEMORY_RECOMMENDED, + *args, + **kwargs, ) - return self.model( - input_ids, - attention_mask, - position_ids=(position_ids_cos, position_ids_sin), - past_key_values=past_key_values_tuple, - ) - - -def _get_llama_model_with_split( - max_position_embeddings: int = MAX_POS_EMBEDDINGS, - split_part: int = 1, - is_token_generator: bool = False, -) -> Tuple[torch.nn.Module, str]: - - # Ensure User has access to model, - # otherwise point to instructions to get access and error out. - has_model_access(HF_REPO_NAME, HF_REPO_URL) - - # Ensure User has recommended memory, - # otherwise, provide warning to user and recommend to increase swap-space as a work-around. - has_recommended_memory(MIN_MEMORY_RECOMMENDED) - - with suppress_warnings(): - model = Llama3Wrapper( - max_position_embeddings=max_position_embeddings, - split_part=split_part, - is_token_generator=is_token_generator, - ) - model.eval() - - # Download quantization config and pre-computed encodings - model_encoding_tag = "tg" if is_token_generator else "pp" - aimet_encodings = str( - os.path.join( - AIMET_ENCODINGS_PREFIX, - model_encoding_tag, - f"llama3_{model_encoding_tag}_sha_{split_part}.encodings", - ) - ) - aimet_encodings = str( - CachedWebModelAsset.from_asset_store( - MODEL_ID, MODEL_ASSET_VERSION, aimet_encodings - ).fetch() - ) - return model, aimet_encodings - - -class Llama3_Quantized(CollectionModel): - def __init__(self, max_position_embeddings: int) -> None: - super().__init__() - self.max_position_embeddings = max_position_embeddings @classmethod def from_pretrained( - cls, max_position_embeddings: int = MAX_POS_EMBEDDINGS + cls, + sequence_length: int, + context_length: int = DEFAULT_CONTEXT_LENGTH, + aimet_encodings: str | None = "DEFAULT", + huggingface_model_name: str = HF_REPO_NAME, ) -> "Llama3_Quantized": - return Llama3_Quantized(max_position_embeddings=max_position_embeddings) - - def load_model_part(self, split_part): - if split_part == "PromptProcessor_1_Quantized": - return Llama3_PromptProcessor_1_Quantized.from_pretrained( - max_position_embeddings=self.max_position_embeddings - ) - if split_part == "PromptProcessor_2_Quantized": - return Llama3_PromptProcessor_2_Quantized.from_pretrained( - max_position_embeddings=self.max_position_embeddings - ) - if split_part == "PromptProcessor_3_Quantized": - return Llama3_PromptProcessor_3_Quantized.from_pretrained( - max_position_embeddings=self.max_position_embeddings - ) - if split_part == "PromptProcessor_4_Quantized": - return Llama3_PromptProcessor_4_Quantized.from_pretrained( - max_position_embeddings=self.max_position_embeddings - ) - if split_part == "PromptProcessor_5_Quantized": - return Llama3_PromptProcessor_5_Quantized.from_pretrained( - max_position_embeddings=self.max_position_embeddings - ) - if split_part == "TokenGenerator_1_Quantized": - return Llama3_TokenGenerator_1_Quantized.from_pretrained( - max_position_embeddings=self.max_position_embeddings, - ) - if split_part == "TokenGenerator_2_Quantized": - return Llama3_TokenGenerator_2_Quantized.from_pretrained( - max_position_embeddings=self.max_position_embeddings - ) - if split_part == "TokenGenerator_3_Quantized": - return Llama3_TokenGenerator_3_Quantized.from_pretrained( - max_position_embeddings=self.max_position_embeddings - ) - if split_part == "TokenGenerator_4_Quantized": - return Llama3_TokenGenerator_4_Quantized.from_pretrained( - max_position_embeddings=self.max_position_embeddings - ) - if split_part == "TokenGenerator_5_Quantized": - return Llama3_TokenGenerator_5_Quantized.from_pretrained( - max_position_embeddings=self.max_position_embeddings - ) - raise RuntimeError(f"Unsupported split_part {split_part}.") - - -class Llama3_PromptProcessor_1_Quantized(Llama_QuantizedMixin): - def __init__(self, model, encoding_path): - super().__init__(model, encoding_path) - self.model = model - self.split_part = 1 - - def forward( - self, - input_ids: torch.Tensor, - attention_mask: torch.Tensor, - position_ids_cos: torch.Tensor, - position_ids_sin: torch.Tensor, - ): - return self.model(input_ids, attention_mask, position_ids_cos, position_ids_sin) - - @classmethod - def from_pretrained( - cls, max_position_embeddings: int = MAX_POS_EMBEDDINGS - ) -> Llama3_PromptProcessor_1_Quantized: - model, encoding_path = _get_llama_model_with_split( - max_position_embeddings, split_part=1 - ) - return cls(model, encoding_path) - - @staticmethod - def get_input_spec( - input_seq_length: int = DEFAULT_INPUT_SEQ_LEN, - ) -> InputSpec: - # Get the input specification ordered (name -> (shape, type)) pairs for this model. - # - # This can be used with the qai_hub python API to declare - # the model input specification upon submitting a compile job. - return { - "input_ids": ((1, input_seq_length), "int32"), - "attention_mask": ((1, 1, input_seq_length, input_seq_length), "float32"), - "position_ids_cos": ((1, 1, input_seq_length, POS_EMBED_DIM), "float32"), - "position_ids_sin": ((1, 1, input_seq_length, POS_EMBED_DIM), "float32"), - } - - @staticmethod - def get_output_names(): - layers_start, layers_end = get_hidden_layer_range_from_split( - split_part=1, model_split_map=MODEL_SPLIT_MAP - ) - return Llama_QuantizedMixin.get_output_names( - start=layers_start, - end=layers_end, - past_key_val_heads=NUM_KEY_VAL_HEADS, - bundled_kvcache=False, - ) - - @staticmethod - def get_model_data(input_seq_len: int = DEFAULT_INPUT_SEQ_LEN): - data = load_input_cached_data( - split_part=1, - data_dir=DATA_DIR, - model_name="llama_v3", - model_id=MODEL_ID, - model_asset_version=MODEL_ASSET_VERSION, - model_type="pp", - input_seq_len=input_seq_len, - ) - if data is not None: - return data - - _, input_seq_len = Llama3_PromptProcessor_1_Quantized.get_input_spec()[ - "input_ids" - ][0] - - tokenizer = get_tokenizer() - prompt = get_input_prompt_with_tags(DEFAULT_USER_PROMPT) - input_tokens = tokenizer( - prompt, return_tensors="pt", padding="max_length", max_length=input_seq_len - ) - - inputs = {} - inputs["input_ids"] = input_tokens["input_ids"].type(torch.int32) - inputs["attention_mask"] = prepare_combined_attention_mask( - input_tokens["attention_mask"], input_tokens["attention_mask"].shape - ).type(torch.float32) - tokens = torch.sum(input_tokens["attention_mask"]).item() - position_ids = [0] * (input_seq_len - tokens) + list(range(0, tokens)) - position_ids = ( - torch.Tensor(position_ids).type(torch.long).reshape(1, input_seq_len) - ) - position_ids_cos, position_ids_sin = RopeEmbedding( - max_length=input_seq_len - ).get_embedding(position_ids) - inputs["position_ids_cos"] = position_ids_cos - inputs["position_ids_sin"] = position_ids_sin - save_input_cached_data( - inputs, - split_part=1, - data_dir=DATA_DIR, - model_name="llama_v3", - model_id=MODEL_ID, - model_asset_version=MODEL_ASSET_VERSION, - input_seq_len=input_seq_len, - ) - return inputs - - def get_calibration_data( - self, - target_runtime: TargetRuntime | None = None, - input_spec: InputSpec | None = None, - ) -> DatasetEntries | None: - """ - Calibration dataset for this model and input spec. """ - if input_spec is None: - input_spec = Llama3_PromptProcessor_1_Quantized.get_input_spec() - - _, input_seq_len = input_spec["input_ids"][0] - return Llama3_PromptProcessor_1_Quantized.get_model_data( - input_seq_len=input_seq_len - ) - - -class Llama3_PromptProcessor_2_Quantized(Llama_QuantizedMixin): - def __init__(self, model: torch.nn.Module, encoding_path: str): - super().__init__(model, encoding_path) - self.split_part = 2 - - def forward( - self, - input_ids: torch.Tensor, - attention_mask: torch.Tensor, - position_ids_cos: torch.Tensor, - position_ids_sin: torch.Tensor, - ): - return self.model(input_ids, attention_mask, position_ids_cos, position_ids_sin) - - @classmethod - def from_pretrained( - cls, max_position_embeddings: int = MAX_POS_EMBEDDINGS - ) -> Llama3_PromptProcessor_2_Quantized: - model, encoding_path = _get_llama_model_with_split( - max_position_embeddings, split_part=2 - ) - return cls(model, encoding_path) - - @staticmethod - def get_input_spec( - input_seq_length: int = DEFAULT_INPUT_SEQ_LEN, - ) -> InputSpec: - # Get the input specification ordered (name -> (shape, type)) pairs for this model. - # - # This can be used with the qai_hub python API to declare - # the model input specification upon submitting a compile job. - return { - "input_ids": ((1, input_seq_length, ATTENTION_HIDDEN_DIM), "float32"), - "attention_mask": ((1, 1, input_seq_length, input_seq_length), "float32"), - "position_ids_cos": ((1, 1, input_seq_length, POS_EMBED_DIM), "float32"), - "position_ids_sin": ((1, 1, input_seq_length, POS_EMBED_DIM), "float32"), - } - - @staticmethod - def get_output_names(): - layers_start, layers_end = get_hidden_layer_range_from_split( - split_part=2, model_split_map=MODEL_SPLIT_MAP - ) - return Llama_QuantizedMixin.get_output_names( - start=layers_start, - end=layers_end, - past_key_val_heads=NUM_KEY_VAL_HEADS, - bundled_kvcache=False, - ) - - @staticmethod - def get_model_data(input_seq_len: int = DEFAULT_INPUT_SEQ_LEN): - data = load_input_cached_data( - split_part=2, - data_dir=DATA_DIR, - model_name="llama_v3", - model_id=MODEL_ID, - model_asset_version=MODEL_ASSET_VERSION, - model_type="pp", - input_seq_len=input_seq_len, - ) - if data is not None: - return data - - model = Llama3_PromptProcessor_1_Quantized.from_pretrained() - inputs = Llama3_PromptProcessor_1_Quantized.get_model_data( - input_seq_len=input_seq_len - ) - output = model(*inputs.values()) - del model - - new_inputs = {} - new_inputs["input_ids"] = output[0].detach() - new_inputs["attention_mask"] = inputs["attention_mask"] - new_inputs["position_ids_cos"] = inputs["position_ids_cos"] - new_inputs["position_ids_sin"] = inputs["position_ids_sin"] - save_input_cached_data( - new_inputs, - split_part=2, - data_dir=DATA_DIR, - model_name="llama_v3", - model_id=MODEL_ID, - model_asset_version=MODEL_ASSET_VERSION, - input_seq_len=input_seq_len, - ) - return new_inputs - - def get_calibration_data( - self, - target_runtime: TargetRuntime | None = None, - input_spec: InputSpec | None = None, - ) -> DatasetEntries | None: - """ - Calibration dataset for this model. - """ - if input_spec is None: - input_spec = Llama3_PromptProcessor_2_Quantized.get_input_spec() - - input_seq_len = input_spec["attention_mask"][0][-1] - return Llama3_PromptProcessor_2_Quantized.get_model_data( - input_seq_len=input_seq_len - ) - - -class Llama3_PromptProcessor_3_Quantized(Llama_QuantizedMixin): - def __init__(self, model: torch.nn.Module, encoding_path: str): - super().__init__(model, encoding_path) - self.split_part = 3 - - def forward( - self, - input_ids: torch.Tensor, - attention_mask: torch.Tensor, - position_ids_cos: torch.Tensor, - position_ids_sin: torch.Tensor, - ): - return self.model(input_ids, attention_mask, position_ids_cos, position_ids_sin) - - @classmethod - def from_pretrained( - cls, max_position_embeddings: int = MAX_POS_EMBEDDINGS - ) -> Llama3_PromptProcessor_3_Quantized: - model, encoding_path = _get_llama_model_with_split( - max_position_embeddings, split_part=3 - ) - return cls(model, encoding_path) - - @staticmethod - def get_input_spec( - input_seq_length: int = DEFAULT_INPUT_SEQ_LEN, - ) -> InputSpec: - # Get the input specification ordered (name -> (shape, type)) pairs for this model. - # - # This can be used with the qai_hub python API to declare - # the model input specification upon submitting a compile job. - return { - "input_ids": ((1, input_seq_length, ATTENTION_HIDDEN_DIM), "float32"), - "attention_mask": ((1, 1, input_seq_length, input_seq_length), "float32"), - "position_ids_cos": ((1, 1, input_seq_length, POS_EMBED_DIM), "float32"), - "position_ids_sin": ((1, 1, input_seq_length, POS_EMBED_DIM), "float32"), - } - - @staticmethod - def get_output_names(): - layers_start, layers_end = get_hidden_layer_range_from_split( - split_part=3, model_split_map=MODEL_SPLIT_MAP - ) - return Llama_QuantizedMixin.get_output_names( - start=layers_start, - end=layers_end, - past_key_val_heads=NUM_KEY_VAL_HEADS, - bundled_kvcache=False, - ) - - @staticmethod - def get_model_data(input_seq_len: int = DEFAULT_INPUT_SEQ_LEN): - data = load_input_cached_data( - split_part=3, - data_dir=DATA_DIR, - model_name="llama_v3", - model_id=MODEL_ID, - model_asset_version=MODEL_ASSET_VERSION, - model_type="pp", - input_seq_len=input_seq_len, - ) - if data is not None: - return data - - model = Llama3_PromptProcessor_2_Quantized.from_pretrained() - inputs = Llama3_PromptProcessor_2_Quantized.get_model_data( - input_seq_len=input_seq_len - ) - output = model(*inputs.values()) - del model - - new_inputs = {} - new_inputs["input_ids"] = output[0].detach() - new_inputs["attention_mask"] = inputs["attention_mask"] - new_inputs["position_ids_cos"] = inputs["position_ids_cos"] - new_inputs["position_ids_sin"] = inputs["position_ids_sin"] - save_input_cached_data( - new_inputs, - split_part=3, - data_dir=DATA_DIR, - model_name="llama_v3", - model_id=MODEL_ID, - model_asset_version=MODEL_ASSET_VERSION, - input_seq_len=input_seq_len, - ) - return new_inputs - - def get_calibration_data( - self, - target_runtime: TargetRuntime | None = None, - input_spec: InputSpec | None = None, - ) -> DatasetEntries | None: - """ - Calibration dataset for this model. - """ - if input_spec is None: - input_spec = Llama3_PromptProcessor_3_Quantized.get_input_spec() - - input_seq_len = input_spec["attention_mask"][0][-1] - return Llama3_PromptProcessor_3_Quantized.get_model_data( - input_seq_len=input_seq_len - ) - - -class Llama3_PromptProcessor_4_Quantized(Llama_QuantizedMixin): - def __init__(self, model: torch.nn.Module, encoding_path: str): - super().__init__(model, encoding_path) - self.split_part = 4 - - def forward( - self, - input_ids: torch.Tensor, - attention_mask: torch.Tensor, - position_ids_cos: torch.Tensor, - position_ids_sin: torch.Tensor, - ): - return self.model(input_ids, attention_mask, position_ids_cos, position_ids_sin) - - @classmethod - def from_pretrained( - cls, max_position_embeddings: int = MAX_POS_EMBEDDINGS - ) -> Llama3_PromptProcessor_4_Quantized: - model, encoding_path = _get_llama_model_with_split( - max_position_embeddings, split_part=4 - ) - return cls(model, encoding_path) - - @staticmethod - def get_input_spec( - input_seq_length: int = DEFAULT_INPUT_SEQ_LEN, - ) -> InputSpec: - # Get the input specification ordered (name -> (shape, type)) pairs for this model. - # - # This can be used with the qai_hub python API to declare - # the model input specification upon submitting a compile job. - return { - "input_ids": ((1, input_seq_length, ATTENTION_HIDDEN_DIM), "float32"), - "attention_mask": ((1, 1, input_seq_length, input_seq_length), "float32"), - "position_ids_cos": ((1, 1, input_seq_length, POS_EMBED_DIM), "float32"), - "position_ids_sin": ((1, 1, input_seq_length, POS_EMBED_DIM), "float32"), - } - - @staticmethod - def get_output_names(): - layers_start, layers_end = get_hidden_layer_range_from_split( - split_part=4, model_split_map=MODEL_SPLIT_MAP - ) - return Llama_QuantizedMixin.get_output_names( - start=layers_start, - end=layers_end, - past_key_val_heads=NUM_KEY_VAL_HEADS, - bundled_kvcache=False, - ) - - @staticmethod - def get_model_data(input_seq_len: int = DEFAULT_INPUT_SEQ_LEN): - data = load_input_cached_data( - split_part=4, - data_dir=DATA_DIR, - model_name="llama_v3", - model_id=MODEL_ID, - model_asset_version=MODEL_ASSET_VERSION, - model_type="pp", - input_seq_len=input_seq_len, - ) - if data is not None: - return data - - model = Llama3_PromptProcessor_3_Quantized.from_pretrained() - inputs = Llama3_PromptProcessor_3_Quantized.get_model_data( - input_seq_len=input_seq_len - ) - output = model(*inputs.values()) - - new_inputs = {} - new_inputs["input_ids"] = output[0].detach() - new_inputs["attention_mask"] = inputs["attention_mask"] - new_inputs["position_ids_cos"] = inputs["position_ids_cos"] - new_inputs["position_ids_sin"] = inputs["position_ids_sin"] - save_input_cached_data( - new_inputs, - split_part=4, - data_dir=DATA_DIR, - model_name="llama_v3", - model_id=MODEL_ID, - model_asset_version=MODEL_ASSET_VERSION, - input_seq_len=input_seq_len, - ) - return new_inputs - - def get_calibration_data( - self, - target_runtime: TargetRuntime | None = None, - input_spec: InputSpec | None = None, - ) -> DatasetEntries | None: - """ - Calibration dataset for this model. - """ - if input_spec is None: - input_spec = Llama3_PromptProcessor_4_Quantized.get_input_spec() - - input_seq_len = input_spec["attention_mask"][0][-1] - return Llama3_PromptProcessor_4_Quantized.get_model_data( - input_seq_len=input_seq_len - ) - - -class Llama3_PromptProcessor_5_Quantized(Llama_QuantizedMixin): - def __init__(self, model: torch.nn.Module, encoding_path: str): - super().__init__(model, encoding_path) - self.split_part = 5 - - def forward( - self, - input_ids: torch.Tensor, - attention_mask: torch.Tensor, - position_ids_cos: torch.Tensor, - position_ids_sin: torch.Tensor, - ): - return self.model(input_ids, attention_mask, position_ids_cos, position_ids_sin) - - @classmethod - def from_pretrained( - cls, max_position_embeddings: int = MAX_POS_EMBEDDINGS - ) -> Llama3_PromptProcessor_5_Quantized: - model, encoding_path = _get_llama_model_with_split( - max_position_embeddings, split_part=5 - ) - return cls(model, encoding_path) - - @staticmethod - def get_input_spec( - input_seq_length: int = DEFAULT_INPUT_SEQ_LEN, - ) -> InputSpec: - # Get the input specification ordered (name -> (shape, type)) pairs for this model. - # - # This can be used with the qai_hub python API to declare - # the model input specification upon submitting a compile job. - return { - "input_ids": ((1, input_seq_length, ATTENTION_HIDDEN_DIM), "float32"), - "attention_mask": ((1, 1, input_seq_length, input_seq_length), "float32"), - "position_ids_cos": ((1, 1, input_seq_length, POS_EMBED_DIM), "float32"), - "position_ids_sin": ((1, 1, input_seq_length, POS_EMBED_DIM), "float32"), - } - - @staticmethod - def get_output_names(): - layers_start, layers_end = get_hidden_layer_range_from_split( - split_part=5, model_split_map=MODEL_SPLIT_MAP - ) - return Llama_QuantizedMixin.get_output_names( - start=layers_start, - end=layers_end, - past_key_val_heads=NUM_KEY_VAL_HEADS, - bundled_kvcache=False, - output_name="logits", - ) - - @staticmethod - def get_model_data(input_seq_len: int = DEFAULT_INPUT_SEQ_LEN): - data = load_input_cached_data( - split_part=5, - data_dir=DATA_DIR, - model_name="llama_v3", - model_id=MODEL_ID, - model_asset_version=MODEL_ASSET_VERSION, - model_type="pp", - input_seq_len=input_seq_len, - ) - if data is not None: - return data - - model = Llama3_PromptProcessor_4_Quantized.from_pretrained() - inputs = Llama3_PromptProcessor_4_Quantized.get_model_data( - input_seq_len=input_seq_len - ) - output = model(*inputs.values()) - - new_inputs = {} - new_inputs["input_ids"] = output[0].detach() - new_inputs["attention_mask"] = inputs["attention_mask"] - new_inputs["position_ids_cos"] = inputs["position_ids_cos"] - new_inputs["position_ids_sin"] = inputs["position_ids_sin"] - save_input_cached_data( - new_inputs, - split_part=4, - data_dir=DATA_DIR, - model_name="llama_v3", - model_id=MODEL_ID, - model_asset_version=MODEL_ASSET_VERSION, - input_seq_len=input_seq_len, - ) - return new_inputs - - def get_calibration_data( - self, - target_runtime: TargetRuntime | None = None, - input_spec: InputSpec | None = None, - ) -> DatasetEntries | None: - """ - Calibration dataset for this model. - """ - if input_spec is None: - input_spec = Llama3_PromptProcessor_5_Quantized.get_input_spec() - - input_seq_len = input_spec["attention_mask"][0][-1] - return Llama3_PromptProcessor_5_Quantized.get_model_data( - input_seq_len=input_seq_len - ) - - -# -# Token Generators -# - - -class Llama3_TokenGenerator_1_Quantized(Llama_QuantizedMixin): - def __init__(self, model: torch.nn.Module, encoding_path: str): - super().__init__(model, encoding_path, is_token_generator=True) - self.split_part = 1 - - def forward( - self, - input_ids: torch.Tensor, - attention_mask: torch.Tensor, - position_ids_cos: torch.Tensor, - position_ids_sin: torch.Tensor, - *past_key_values, - ): - return self.model( - input_ids, - attention_mask, - position_ids_cos, - position_ids_sin, - *past_key_values, - ) - - @classmethod - def from_pretrained( - cls, max_position_embeddings: int = MAX_POS_EMBEDDINGS - ) -> Llama3_TokenGenerator_1_Quantized: - model, encoding_path = _get_llama_model_with_split( - max_position_embeddings, - split_part=1, - is_token_generator=True, - ) - return cls(model, encoding_path) - - @staticmethod - def get_input_spec( - input_seq_length: int = DEFAULT_INPUT_SEQ_LEN, - ) -> InputSpec: - # Get the input specification ordered (name -> (shape, type)) pairs for this model. - # - # This can be used with the qai_hub python API to declare - # the model input specification upon submitting a compile job. - - input_spec = { - "input_ids": ((1, 1), "int32"), - "attention_mask": ((1, 1, 1, input_seq_length), "float32"), - "position_ids_cos": ((1, 1, 1, POS_EMBED_DIM), "float32"), - "position_ids_sin": ((1, 1, 1, POS_EMBED_DIM), "float32"), - } - - # Collect past_key_values and drop output names - layers_start, layers_end = get_hidden_layer_range_from_split( - split_part=1, model_split_map=MODEL_SPLIT_MAP - ) - past_key_val_names = get_past_key_names( - start=layers_start, - end=layers_end, - num_of_past_key_heads=NUM_KEY_VAL_HEADS, - bundled_kvcache=False, - ) - for past_key_val in past_key_val_names: - if "key" in past_key_val: - input_spec[past_key_val] = ( - (1, 1, 128, input_seq_length - 1), - "float32", - ) - else: - input_spec[past_key_val] = ( - (1, 1, input_seq_length - 1, 128), - "float32", - ) - return input_spec - - @staticmethod - def get_output_names(): - layers_start, layers_end = get_hidden_layer_range_from_split( - split_part=1, model_split_map=MODEL_SPLIT_MAP - ) - return Llama_QuantizedMixin.get_output_names( - start=layers_start, - end=layers_end, - past_key_val_heads=NUM_KEY_VAL_HEADS, - bundled_kvcache=False, - ) - - @staticmethod - def get_model_data(input_seq_len: int = DEFAULT_INPUT_SEQ_LEN): - data = load_input_cached_data( - split_part=1, - data_dir=DATA_DIR, - model_name="llama_v3", - model_id=MODEL_ID, - model_asset_version=MODEL_ASSET_VERSION, - model_type="tg", - input_seq_len=input_seq_len, - ) - if data is not None: - return data - - inputs = Llama3_PromptProcessor_1_Quantized.get_model_data( - input_seq_len=input_seq_len - ) - model = Llama3_PromptProcessor_1_Quantized.from_pretrained() - output = model(*inputs.values()) - del model - - tokenizer = get_tokenizer() - prompt = get_input_prompt_with_tags(DEFAULT_USER_PROMPT) - input_tokens = tokenizer( - prompt, return_tensors="pt", padding="max_length", max_length=input_seq_len - ) - num_tokens = torch.sum(input_tokens["attention_mask"]).item() - - # Get last input id - input_ids = inputs["input_ids"][:, -1].reshape(-1, 1).type(torch.int32) - # Create attention mask with - # [B, 1, Target Seq Len, Source Seq Len] - # where Target Seq Len = 1 - padding_size = input_seq_len - num_tokens - - attention_mask = ( - torch.Tensor([0] * padding_size + [1] * (input_seq_len - padding_size)) - .reshape(1, -1) - .type(torch.float32) - ) - - # Get last input id - input_ids = inputs["input_ids"][:, -1].reshape(-1, 1).type(torch.int32) - - # Create attention mask with - # [B, 1, Target Seq Len, Source Seq Len] - # where Target Seq Len = 1 - cm_attention_mask = prepare_combined_attention_mask( - attention_mask=attention_mask, - input_shape=input_ids.shape, - past_key_values_length=input_seq_len - 1, - ) - position_ids = torch.Tensor([padding_size + 1]).reshape(1, -1).type(torch.long) - position_ids_cos, position_ids_sin = RopeEmbedding( - max_length=input_seq_len - ).get_embedding(position_ids) - inputs["position_ids_cos"] = position_ids_cos - inputs["position_ids_sin"] = position_ids_sin - - data = { - "input_ids": input_ids, - "attention_mask": cm_attention_mask, - "position_ids_cos": position_ids_cos, - "position_ids_sin": position_ids_sin, - } - - layers_start, _ = get_hidden_layer_range_from_split( - split_part=1, model_split_map=MODEL_SPLIT_MAP - ) - key_val = get_past_keyval_with_shift( - output[1:], layers_start, NUM_KEY_VAL_HEADS, bundled_kvcache=False - ) - for key, val in key_val.items(): - data[key] = val - - save_input_cached_data( - data, - split_part=1, - data_dir=DATA_DIR, - model_name="llama_v3", - model_id=MODEL_ID, - model_asset_version=MODEL_ASSET_VERSION, - model_type="tg", - input_seq_len=input_seq_len, - ) - return data - - def get_calibration_data( - self, - target_runtime: TargetRuntime | None = None, - input_spec: InputSpec | None = None, - ) -> DatasetEntries | None: - """ - Calibration dataset for this model. - """ - if input_spec is None: - input_spec = Llama3_TokenGenerator_1_Quantized.get_input_spec() - - # Attention mask is of shape [B, 1, TargetSeqLen, SourceSeqLen] - input_seq_len = input_spec["attention_mask"][0][-1] - return Llama3_TokenGenerator_1_Quantized.get_model_data( - input_seq_len=input_seq_len, - ) - - -class Llama3_TokenGenerator_2_Quantized(Llama_QuantizedMixin): - def __init__(self, model: torch.nn.Module, encoding_path: str): - super().__init__(model, encoding_path, is_token_generator=True) - self.split_part = 2 - - def forward( - self, - input_ids: torch.Tensor, - attention_mask: torch.Tensor, - position_ids_cos: torch.Tensor, - position_ids_sin: torch.Tensor, - *past_key_values, - ): - return self.model( - input_ids, - attention_mask, - position_ids_cos, - position_ids_sin, - *past_key_values, - ) - - @classmethod - def from_pretrained( - cls, max_position_embeddings: int = MAX_POS_EMBEDDINGS - ) -> Llama3_TokenGenerator_2_Quantized: - model, encoding_path = _get_llama_model_with_split( - max_position_embeddings, - split_part=2, - is_token_generator=True, - ) - return cls(model, encoding_path) - - @staticmethod - def get_input_spec( - input_seq_length: int = DEFAULT_INPUT_SEQ_LEN, - ) -> InputSpec: - # Get the input specification ordered (name -> (shape, type)) pairs for this model. - # - # This can be used with the qai_hub python API to declare - # the model input specification upon submitting a compile job. - - input_spec = { - "input_ids": ((1, 1, ATTENTION_HIDDEN_DIM), "float32"), - "attention_mask": ((1, 1, 1, input_seq_length), "float32"), - "position_ids_cos": ((1, 1, 1, POS_EMBED_DIM), "float32"), - "position_ids_sin": ((1, 1, 1, POS_EMBED_DIM), "float32"), - } - - # Collect past_key_values and drop output names - layers_start, layers_end = get_hidden_layer_range_from_split( - split_part=2, model_split_map=MODEL_SPLIT_MAP - ) - past_key_val_names = get_past_key_names( - start=layers_start, - end=layers_end, - num_of_past_key_heads=8, - bundled_kvcache=False, - ) - for past_key_val in past_key_val_names: - if "key" in past_key_val: - input_spec[past_key_val] = ( - (1, 1, 128, input_seq_length - 1), - "float32", - ) - else: - input_spec[past_key_val] = ( - (1, 1, input_seq_length - 1, 128), - "float32", - ) - return input_spec - - @staticmethod - def get_output_names(): - layers_start, layers_end = get_hidden_layer_range_from_split( - split_part=2, model_split_map=MODEL_SPLIT_MAP - ) - return Llama_QuantizedMixin.get_output_names( - start=layers_start, - end=layers_end, - past_key_val_heads=NUM_KEY_VAL_HEADS, - bundled_kvcache=False, - ) - - @staticmethod - def get_model_data(input_seq_len: int = DEFAULT_INPUT_SEQ_LEN): - data = load_input_cached_data( - split_part=2, - data_dir=DATA_DIR, - model_name="llama_v3", - model_id=MODEL_ID, - model_asset_version=MODEL_ASSET_VERSION, - model_type="tg", - input_seq_len=input_seq_len, - ) - if data is not None: - return data - - inputs = Llama3_PromptProcessor_2_Quantized.get_model_data( - input_seq_len=input_seq_len - ) - model = Llama3_PromptProcessor_2_Quantized.from_pretrained() - output = model(*inputs.values()) - del model - - inputs = Llama3_TokenGenerator_1_Quantized.get_model_data( - input_seq_len=input_seq_len - ) - model = Llama3_TokenGenerator_1_Quantized.from_pretrained() - output_tg = model(*inputs.values()) - del model - - data = { - "input_ids": output_tg[0].detach(), - "attention_mask": inputs["attention_mask"], - "position_ids_cos": inputs["position_ids_cos"], - "position_ids_sin": inputs["position_ids_sin"], - } - - layers_start, _ = get_hidden_layer_range_from_split( - split_part=2, model_split_map=MODEL_SPLIT_MAP - ) - key_val = get_past_keyval_with_shift( - output[1:], layers_start, NUM_KEY_VAL_HEADS, bundled_kvcache=False - ) - for key, val in key_val.items(): - data[key] = val - - save_input_cached_data( - data, - split_part=2, - data_dir=DATA_DIR, - model_name="llama_v3", - model_id=MODEL_ID, - model_asset_version=MODEL_ASSET_VERSION, - model_type="tg", - input_seq_len=input_seq_len, - ) - return data - - def get_calibration_data( - self, - target_runtime: TargetRuntime | None = None, - input_spec: InputSpec | None = None, - ) -> DatasetEntries | None: - """ - Calibration dataset for this model. - """ - if input_spec is None: - input_spec = Llama3_TokenGenerator_2_Quantized.get_input_spec() - - input_seq_len = input_spec["attention_mask"][0][-1] - return Llama3_TokenGenerator_2_Quantized.get_model_data( - input_seq_len=input_seq_len, - ) - - -class Llama3_TokenGenerator_3_Quantized(Llama_QuantizedMixin): - def __init__(self, model: torch.nn.Module, encoding_path: str): - super().__init__(model, encoding_path, is_token_generator=True) - self.split_part = 3 - - def forward( - self, - input_ids: torch.Tensor, - attention_mask: torch.Tensor, - position_ids_cos: torch.Tensor, - position_ids_sin: torch.Tensor, - *past_key_values, - ): - return self.model( - input_ids, - attention_mask, - position_ids_cos, - position_ids_sin, - *past_key_values, - ) - - @classmethod - def from_pretrained( - cls, max_position_embeddings: int = MAX_POS_EMBEDDINGS - ) -> Llama3_TokenGenerator_3_Quantized: - model, encoding_path = _get_llama_model_with_split( - max_position_embeddings, - split_part=3, - is_token_generator=True, - ) - return cls(model, encoding_path) - - @staticmethod - def get_input_spec( - input_seq_length: int = DEFAULT_INPUT_SEQ_LEN, - ) -> InputSpec: - # Get the input specification ordered (name -> (shape, type)) pairs for this model. - # - # This can be used with the qai_hub python API to declare - # the model input specification upon submitting a compile job. - - input_spec = { - "input_ids": ((1, 1, ATTENTION_HIDDEN_DIM), "float32"), - "attention_mask": ((1, 1, 1, input_seq_length), "float32"), - "position_ids_cos": ((1, 1, 1, POS_EMBED_DIM), "float32"), - "position_ids_sin": ((1, 1, 1, POS_EMBED_DIM), "float32"), - } - - # Collect past_key_values and drop output names - layers_start, layers_end = get_hidden_layer_range_from_split( - split_part=3, model_split_map=MODEL_SPLIT_MAP - ) - past_key_val_names = get_past_key_names( - start=layers_start, - end=layers_end, - num_of_past_key_heads=8, - bundled_kvcache=False, - ) - for past_key_val in past_key_val_names: - if "key" in past_key_val: - input_spec[past_key_val] = ( - (1, 1, 128, input_seq_length - 1), - "float32", - ) - else: - input_spec[past_key_val] = ( - (1, 1, input_seq_length - 1, 128), - "float32", + Load a pre-trained Llama 3 (8B) model from Meta via HuggingFace. + + sequence_length: + Instantiate with this token sequence length input. A longer + sequence length means the model is capable of processing more + tokens at once. This can only be set to greater than one to process + prompts, since responses are auto-regressive in nature and require + this to be 1. + context_length: + Total context length of model. Longer context length means the + model is more capable of making longer connections in the input + prompt. However, it also hurts runtime performance (both time-to- + first-token and tokens-per-second), so this is a tradeoff that may + depend on the use case. + aimet_encodings: + Path to AIMET quantization encodings file. + huggingface_model_name: + Name or URL of the HuggingFace model. Change this if you want to + change the weights. + """ + if aimet_encodings: + if aimet_encodings == "DEFAULT": + aimet_encodings = os.path.join( + CachedWebModelAsset.from_asset_store( + MODEL_ID, MODEL_ASSET_VERSION, DEFAULT_ENCODINGS_ZIP + ).fetch(extract=True), + DEFAULT_ENCODINGS, ) - return input_spec - @staticmethod - def get_output_names(): - layers_start, layers_end = get_hidden_layer_range_from_split( - split_part=3, model_split_map=MODEL_SPLIT_MAP - ) - return Llama_QuantizedMixin.get_output_names( - start=layers_start, - end=layers_end, - past_key_val_heads=NUM_KEY_VAL_HEADS, - bundled_kvcache=False, + return cls( + aimet_encodings=aimet_encodings, + sequence_length=sequence_length, + context_length=context_length, + huggingface_model_name=huggingface_model_name, ) @staticmethod - def get_model_data(input_seq_len: int = DEFAULT_INPUT_SEQ_LEN): - data = load_input_cached_data( - split_part=3, - data_dir=DATA_DIR, - model_name="llama_v3", - model_id=MODEL_ID, - model_asset_version=MODEL_ASSET_VERSION, - model_type="tg", - input_seq_len=input_seq_len, - ) - if data is not None: - return data - - inputs = Llama3_PromptProcessor_3_Quantized.get_model_data( - input_seq_len=input_seq_len - ) - model = Llama3_PromptProcessor_3_Quantized.from_pretrained() - output = model(*inputs.values()) - del model - - inputs = Llama3_TokenGenerator_2_Quantized.get_model_data( - input_seq_len=input_seq_len - ) - model = Llama3_TokenGenerator_2_Quantized.from_pretrained() - output_tg = model(*inputs.values()) - del model - - data = { - "input_ids": output_tg[0].detach(), - "attention_mask": inputs["attention_mask"], - "position_ids_cos": inputs["position_ids_cos"], - "position_ids_sin": inputs["position_ids_sin"], - } - - layers_start, _ = get_hidden_layer_range_from_split( - split_part=3, model_split_map=MODEL_SPLIT_MAP - ) - key_val = get_past_keyval_with_shift( - output[1:], layers_start, NUM_KEY_VAL_HEADS, bundled_kvcache=False - ) - for key, val in key_val.items(): - data[key] = val - - save_input_cached_data( - data, - split_part=3, - data_dir=DATA_DIR, - model_name="llama_v3", - model_id=MODEL_ID, - model_asset_version=MODEL_ASSET_VERSION, - model_type="tg", - input_seq_len=input_seq_len, - ) - return data - - def get_calibration_data( - self, - target_runtime: TargetRuntime | None = None, - input_spec: InputSpec | None = None, - ) -> DatasetEntries | None: - """ - Calibration dataset for this model. - """ - if input_spec is None: - input_spec = Llama3_TokenGenerator_3_Quantized.get_input_spec() - - input_seq_len = input_spec["attention_mask"][0][-1] - return Llama3_TokenGenerator_3_Quantized.get_model_data( - input_seq_len=input_seq_len, - ) - - -class Llama3_TokenGenerator_4_Quantized(Llama_QuantizedMixin): - def __init__(self, model: torch.nn.Module, encoding_path: str): - super().__init__(model, encoding_path, is_token_generator=True) - self.split_part = 4 - - def forward( - self, - input_ids: torch.Tensor, - attention_mask: torch.Tensor, - position_ids_cos: torch.Tensor, - position_ids_sin: torch.Tensor, - *past_key_values, - ): - return self.model( - input_ids, - attention_mask, - position_ids_cos, - position_ids_sin, - *past_key_values, + def get_output_names(num_hidden_layers: int = NUM_LAYERS): + return Llama3Base_Quantized.get_output_names( + num_hidden_layers=num_hidden_layers ) - @classmethod - def from_pretrained( - cls, max_position_embeddings: int = MAX_POS_EMBEDDINGS - ) -> Llama3_TokenGenerator_4_Quantized: - model, encoding_path = _get_llama_model_with_split( - max_position_embeddings, - split_part=4, - is_token_generator=True, - ) - return cls(model, encoding_path) - @staticmethod def get_input_spec( - input_seq_length: int = DEFAULT_INPUT_SEQ_LEN, + num_hidden_layers: int = NUM_LAYERS, + input_seq_length: int = 128, + context_length: int = DEFAULT_CONTEXT_LENGTH, + hidden_size: int = 4096, + num_key_value_heads: int = 8, + num_attention_heads: int = 32, ) -> InputSpec: - # Get the input specification ordered (name -> (shape, type)) pairs for this model. - # - # This can be used with the qai_hub python API to declare - # the model input specification upon submitting a compile job. - - input_spec = { - "input_ids": ((1, 1, ATTENTION_HIDDEN_DIM), "float32"), - "attention_mask": ((1, 1, 1, input_seq_length), "float32"), - "position_ids_cos": ((1, 1, 1, POS_EMBED_DIM), "float32"), - "position_ids_sin": ((1, 1, 1, POS_EMBED_DIM), "float32"), - } - - # Collect past_key_values and drop output names - layers_start, layers_end = get_hidden_layer_range_from_split( - split_part=4, model_split_map=MODEL_SPLIT_MAP - ) - past_key_val_names = get_past_key_names( - start=layers_start, - end=layers_end, - num_of_past_key_heads=8, - bundled_kvcache=False, - ) - for past_key_val in past_key_val_names: - if "key" in past_key_val: - input_spec[past_key_val] = ( - (1, 1, 128, input_seq_length - 1), - "float32", - ) - else: - input_spec[past_key_val] = ( - (1, 1, input_seq_length - 1, 128), - "float32", - ) - return input_spec - - @staticmethod - def get_output_names(): - layers_start, layers_end = get_hidden_layer_range_from_split( - split_part=4, model_split_map=MODEL_SPLIT_MAP - ) - return Llama_QuantizedMixin.get_output_names( - start=layers_start, - end=layers_end, - past_key_val_heads=NUM_KEY_VAL_HEADS, - bundled_kvcache=False, - ) - - @staticmethod - def get_model_data(input_seq_len: int = DEFAULT_INPUT_SEQ_LEN): - data = load_input_cached_data( - split_part=4, - data_dir=DATA_DIR, - model_name="llama_v3", - model_id=MODEL_ID, - model_asset_version=MODEL_ASSET_VERSION, - model_type="tg", - input_seq_len=input_seq_len, - ) - if data is not None: - return data - - inputs = Llama3_PromptProcessor_4_Quantized.get_model_data( - input_seq_len=input_seq_len - ) - model = Llama3_PromptProcessor_4_Quantized.from_pretrained() - output = model(*inputs.values()) - del model - - inputs = Llama3_TokenGenerator_3_Quantized.get_model_data( - input_seq_len=input_seq_len - ) - model = Llama3_TokenGenerator_3_Quantized.from_pretrained() - output_tg = model(*inputs.values()) - del model - - data = { - "input_ids": output_tg[0].detach(), - "attention_mask": inputs["attention_mask"], - "position_ids_cos": inputs["position_ids_cos"], - "position_ids_sin": inputs["position_ids_sin"], - } - - layers_start, _ = get_hidden_layer_range_from_split( - split_part=4, model_split_map=MODEL_SPLIT_MAP - ) - key_val = get_past_keyval_with_shift( - output[1:], layers_start, NUM_KEY_VAL_HEADS, bundled_kvcache=False - ) - for key, val in key_val.items(): - data[key] = val - - save_input_cached_data( - data, - split_part=4, - data_dir=DATA_DIR, - model_name="llama_v3", - model_id=MODEL_ID, - model_asset_version=MODEL_ASSET_VERSION, - model_type="tg", - input_seq_len=input_seq_len, - ) - return data - - def get_calibration_data( - self, - target_runtime: TargetRuntime | None = None, - input_spec: InputSpec | None = None, - ) -> DatasetEntries | None: - """ - Calibration dataset for this model. - """ - if input_spec is None: - input_spec = Llama3_TokenGenerator_4_Quantized.get_input_spec() - - input_seq_len = input_spec["attention_mask"][0][-1] - return Llama3_TokenGenerator_4_Quantized.get_model_data( - input_seq_len=input_seq_len - ) - - -class Llama3_TokenGenerator_5_Quantized(Llama_QuantizedMixin): - def __init__(self, model: torch.nn.Module, encoding_path: str): - super().__init__(model, encoding_path, is_token_generator=True) - self.split_part = 5 - - def forward( - self, - input_ids: torch.Tensor, - attention_mask: torch.Tensor, - position_ids_cos: torch.Tensor, - position_ids_sin: torch.Tensor, - *past_key_values, - ): - return self.model( - input_ids, - attention_mask, - position_ids_cos, - position_ids_sin, - *past_key_values, - ) - - @classmethod - def from_pretrained( - cls, max_position_embeddings: int = MAX_POS_EMBEDDINGS - ) -> Llama3_TokenGenerator_5_Quantized: - model, encoding_path = _get_llama_model_with_split( - max_position_embeddings, - split_part=5, - is_token_generator=True, - ) - return cls(model, encoding_path) - - @staticmethod - def get_input_spec( - input_seq_length: int = DEFAULT_INPUT_SEQ_LEN, - ) -> InputSpec: - # Get the input specification ordered (name -> (shape, type)) pairs for this model. - # - # This can be used with the qai_hub python API to declare - # the model input specification upon submitting a compile job. - - input_spec = { - "input_ids": ((1, 1, ATTENTION_HIDDEN_DIM), "float32"), - "attention_mask": ((1, 1, 1, input_seq_length), "float32"), - "position_ids_cos": ((1, 1, 1, POS_EMBED_DIM), "float32"), - "position_ids_sin": ((1, 1, 1, POS_EMBED_DIM), "float32"), - } - - # Collect past_key_values and drop output names - layers_start, layers_end = get_hidden_layer_range_from_split( - split_part=5, model_split_map=MODEL_SPLIT_MAP - ) - past_key_val_names = get_past_key_names( - start=layers_start, - end=layers_end, - num_of_past_key_heads=8, - bundled_kvcache=False, - ) - for past_key_val in past_key_val_names: - if "key" in past_key_val: - input_spec[past_key_val] = ( - (1, 1, 128, input_seq_length - 1), - "float32", - ) - else: - input_spec[past_key_val] = ( - (1, 1, input_seq_length - 1, 128), - "float32", - ) - return input_spec - - @staticmethod - def get_output_names(): - layers_start, layers_end = get_hidden_layer_range_from_split( - split_part=5, model_split_map=MODEL_SPLIT_MAP - ) - return Llama_QuantizedMixin.get_output_names( - start=layers_start, - end=layers_end, - past_key_val_heads=NUM_KEY_VAL_HEADS, - bundled_kvcache=False, - output_name="logits", - ) - - @staticmethod - def get_model_data(input_seq_len: int = DEFAULT_INPUT_SEQ_LEN): - data = load_input_cached_data( - split_part=5, - data_dir=DATA_DIR, - model_name="llama_v3", - model_id=MODEL_ID, - model_asset_version=MODEL_ASSET_VERSION, - model_type="tg", - input_seq_len=input_seq_len, - ) - if data is not None: - return data - - inputs = Llama3_PromptProcessor_5_Quantized.get_model_data( - input_seq_len=input_seq_len - ) - model = Llama3_PromptProcessor_5_Quantized.from_pretrained() - output = model(*inputs.values()) - del model - - inputs = Llama3_TokenGenerator_4_Quantized.get_model_data( - input_seq_len=input_seq_len - ) - model = Llama3_TokenGenerator_4_Quantized.from_pretrained() - output_tg = model(*inputs.values()) - del model - - data = { - "input_ids": output_tg[0].detach(), - "attention_mask": inputs["attention_mask"], - "position_ids_cos": inputs["position_ids_cos"], - "position_ids_sin": inputs["position_ids_sin"], - } - - layers_start, _ = get_hidden_layer_range_from_split( - split_part=5, model_split_map=MODEL_SPLIT_MAP - ) - key_val = get_past_keyval_with_shift( - output[1:], layers_start, NUM_KEY_VAL_HEADS, bundled_kvcache=False - ) - for key, val in key_val.items(): - data[key] = val - - save_input_cached_data( - data, - split_part=5, - data_dir=DATA_DIR, - model_name="llama_v3", - model_id=MODEL_ID, - model_asset_version=MODEL_ASSET_VERSION, - model_type="tg", - input_seq_len=input_seq_len, - ) - return data - - def get_calibration_data( - self, - target_runtime: TargetRuntime | None = None, - input_spec: InputSpec | None = None, - ) -> DatasetEntries | None: - """ - Calibration dataset for this model. - """ - if input_spec is None: - input_spec = Llama3_TokenGenerator_5_Quantized.get_input_spec() - - input_seq_len = input_spec["attention_mask"][0][-1] - return Llama3_TokenGenerator_5_Quantized.get_model_data( - input_seq_len=input_seq_len + return Llama3Base_Quantized.get_input_spec( + num_hidden_layers=NUM_LAYERS, + input_seq_length=input_seq_length, + context_length=context_length, + hidden_size=hidden_size, + num_key_value_heads=num_key_value_heads, + num_attention_heads=num_attention_heads, ) diff --git a/qai_hub_models/models/llama_v3_8b_chat_quantized/modeling_llama.py b/qai_hub_models/models/llama_v3_8b_chat_quantized/modeling_llama.py deleted file mode 100644 index 676a232e..00000000 --- a/qai_hub_models/models/llama_v3_8b_chat_quantized/modeling_llama.py +++ /dev/null @@ -1,1436 +0,0 @@ -# --------------------------------------------------------------------- -# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. -# SPDX-License-Identifier: BSD-3-Clause -# --------------------------------------------------------------------- -# coding=utf-8 -# Copyright 2022 EleutherAI and the HuggingFace Inc. team. All rights reserved. -# -# This code is based on EleutherAI's GPT-NeoX library and the GPT-NeoX -# and OPT implementations in this library. It has been modified from its -# original forms to accommodate minor architectural differences compared -# to GPT-NeoX and OPT used by the Meta AI team that trained the model. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -""" PyTorch LLaMA model.""" -from __future__ import annotations - -import math -from typing import List, Optional, Tuple, Union - -import torch -import torch.utils.checkpoint -from torch import nn -from torch.nn import CrossEntropyLoss -from transformers.activations import ACT2FN -from transformers.modeling_outputs import ( - BaseModelOutputWithPast, - CausalLMOutputWithPast, -) -from transformers.modeling_utils import PreTrainedModel -from transformers.models.llama.configuration_llama import LlamaConfig -from transformers.utils import ( - add_start_docstrings, - add_start_docstrings_to_model_forward, - logging, - replace_return_docstrings, -) - -logger = logging.get_logger(__name__) - -_CONFIG_FOR_DOC = "LlamaConfig" - - -# Copied from transformers.models.bart.modeling_bart._make_causal_mask -def _make_causal_mask( - input_ids_shape: torch.Size, - dtype: torch.dtype, - device: torch.device, - past_key_values_length: int = 0, - mask_neg: float = -100.0, -): - """ - Make causal mask used for bi-directional self-attention. - """ - bsz, tgt_len = input_ids_shape - # mask = torch.full((tgt_len, tgt_len), torch.tensor(torch.finfo(dtype).min, device=device), device=device) - mask = torch.full( - (tgt_len, tgt_len), torch.tensor(mask_neg, device=device), device=device - ) - mask_cond = torch.arange(mask.size(-1), device=device) - mask.masked_fill_(mask_cond < (mask_cond + 1).view(mask.size(-1), 1), 0) - mask = mask.to(dtype) - - if past_key_values_length > 0: - mask = torch.cat( - [ - torch.zeros( - tgt_len, past_key_values_length, dtype=dtype, device=device - ), - mask, - ], - dim=-1, - ) - return mask[None, None, :, :].expand( - bsz, 1, tgt_len, tgt_len + past_key_values_length - ) - - -# Copied from transformers.models.bart.modeling_bart._expand_mask -def _expand_mask( - mask: torch.Tensor, - dtype: torch.dtype, - mask_neg: float = -100.0, - tgt_len: Optional[int] = None, -): - """ - Expands attention_mask from `[bsz, seq_len]` to `[bsz, 1, tgt_seq_len, src_seq_len]`. - """ - bsz, src_len = mask.size() - tgt_len = tgt_len if tgt_len is not None else src_len - - expanded_mask = mask[:, None, None, :].expand(bsz, 1, tgt_len, src_len).to(dtype) - - inverted_mask = 1.0 - expanded_mask - - # return inverted_mask.masked_fill(inverted_mask.to(torch.bool), torch.finfo(dtype).min) - return inverted_mask.masked_fill(inverted_mask.to(torch.bool), mask_neg) - - -class LlamaRMSNorm(nn.Module): - def __init__(self, hidden_size, eps=1e-6): - """ - LlamaRMSNorm is equivalent to T5LayerNorm - """ - super().__init__() - self.weight = nn.Parameter(torch.ones(hidden_size)) - self.variance_epsilon = eps - - def forward(self, hidden_states): - input_dtype = hidden_states.dtype - variance = hidden_states.to(torch.float32).pow(2).mean(-1, keepdim=True) - hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon) - - return (self.weight * hidden_states).to(input_dtype) - - -class LlamaRotaryEmbedding(torch.nn.Module): - def __init__(self, dim, max_position_embeddings=2048, base=10000, device=None): - super().__init__() - inv_freq = 1.0 / (base ** (torch.arange(0, dim, 2).float().to(device) / dim)) - self.register_buffer("inv_freq", inv_freq) - - # Build here to make `torch.jit.trace` work. - self.max_seq_len_cached = max_position_embeddings - t = torch.arange( - self.max_seq_len_cached, - device=self.inv_freq.device, - dtype=self.inv_freq.dtype, - ) - freqs = torch.einsum("i,j->ij", t, self.inv_freq) - # Different from paper, but it uses a different permutation in order to obtain the same calculation - emb = torch.cat((freqs, freqs), dim=-1) - self.register_buffer( - "cos_cached", emb.cos()[None, None, :, :], persistent=False - ) - self.register_buffer( - "sin_cached", emb.sin()[None, None, :, :], persistent=False - ) - - def forward(self, x, seq_len=None): - # x: [bs, num_attention_heads, seq_len, head_size] - # This `if` block is unlikely to be run after we build sin/cos in `__init__`. Keep the logic here just in case. - if seq_len > self.max_seq_len_cached: - self.max_seq_len_cached = seq_len - t = torch.arange( - self.max_seq_len_cached, device=x.device, dtype=self.inv_freq.dtype - ) - freqs = torch.einsum("i,j->ij", t, self.inv_freq) - # Different from paper, but it uses a different permutation in order to obtain the same calculation - emb = torch.cat((freqs, freqs), dim=-1).to(x.device) - self.register_buffer( - "cos_cached", emb.cos()[None, None, :, :], persistent=False - ) - self.register_buffer( - "sin_cached", emb.sin()[None, None, :, :], persistent=False - ) - return ( - self.cos_cached[:, :, :seq_len, ...].to(dtype=x.dtype), - self.sin_cached[:, :, :seq_len, ...].to(dtype=x.dtype), - ) - - -def rotate_half(x): - """Rotates half the hidden dims of the input.""" - x1 = x[..., : x.shape[-1] // 2] - x2 = x[..., x.shape[-1] // 2 :] - return torch.cat((-x2, x1), dim=-1) - - -def apply_rotary_pos_emb(q, k, cos, sin, position_ids): - # The first two dimensions of cos and sin are always 1, so we can `squeeze` them. - cos = cos[0, 0, :, :] # [seq_len, dim] - sin = sin[0, 0, :, :] # [seq_len, dim] - cos = cos[position_ids].unsqueeze(1) # [bs, 1, seq_len, dim] - sin = sin[position_ids].unsqueeze(1) # [bs, 1, seq_len, dim] - q_embed = (q * cos) + (rotate_half(q) * sin) - k_embed = (k * cos) + (rotate_half(k) * sin) - return q_embed, k_embed - - -def apply_rotary_pos_emb_single(x, cos, sin, position_ids): - # The first two dimensions of cos and sin are always 1, so we can `squeeze` them. - cos = cos[0, 0, :, :] # [seq_len, dim] - sin = sin[0, 0, :, :] # [seq_len, dim] - cos = cos[position_ids].unsqueeze(1) # [bs, 1, seq_len, dim] - sin = sin[position_ids].unsqueeze(1) # [bs, 1, seq_len, dim] - x_embed = (x * cos) + (rotate_half(x) * sin) - return x_embed - - -def apply_rope_single(x, rope_vals: Tuple[torch.Tensor, torch.Tensor]): - """ - Based on FacebookResearch's llama, provided by Carl - """ - rope_real = rope_vals[0] # shape should be 1, 1, seqlen, head_dim/2 - rope_im = rope_vals[1] # shape should be 1, 1, seqlen, head_dim/2 - - # TODO: Why HF uses different coordinates from the paper - x_real = x[:, :, :, : x.shape[-1] // 2] # extract first half elements - x_im = x[:, :, :, x.shape[-1] // 2 :] # extract second half elements - - x_prod_real = x_real * rope_real - x_im * rope_im - x_prod_im = x_real * rope_im + x_im * rope_real - - # TODO: HF need to uses different interleaving - x = torch.cat((x_prod_real, x_prod_im), dim=3).view(*x.shape) - return x - - -class LlamaMLP(nn.Module): - def __init__( - self, - hidden_size: int, - intermediate_size: int, - hidden_act: str, - ): - super().__init__() - self.gate_proj = nn.Linear(hidden_size, intermediate_size, bias=False) - self.down_proj = nn.Linear(intermediate_size, hidden_size, bias=False) - self.up_proj = nn.Linear(hidden_size, intermediate_size, bias=False) - self.act_fn = ACT2FN[hidden_act] - self.hidden_size = hidden_size - self.intermediate_size = intermediate_size - - def prepare_conv(self): - if not hasattr(self, "forward_linear"): - self.gate_proj_conv = nn.Conv2d( - self.hidden_size, self.intermediate_size, 1, bias=False - ) - self.down_proj_conv = nn.Conv2d( - self.intermediate_size, self.hidden_size, 1, bias=False - ) - self.up_proj_conv = nn.Conv2d( - self.hidden_size, self.intermediate_size, 1, bias=False - ) - self.forward_linear = self.forward - self.forward = self.forward_conv - - self.gate_proj_conv.weight.data.copy_( - self.gate_proj.weight[:, :, None, None] - ) - self.down_proj_conv.weight.data.copy_( - self.down_proj.weight[:, :, None, None] - ) - self.up_proj_conv.weight.data.copy_(self.up_proj.weight[:, :, None, None]) - - del self.gate_proj - del self.down_proj - del self.up_proj - - def forward_conv(self, x): - bsz, _, _ = x.size() - - x = torch.reshape(x, (bsz, -1, 1, self.hidden_size)) - x = x.transpose(1, 3) # Transpose right before and after Conv - x = self.down_proj_conv( - self.act_fn(self.gate_proj_conv(x)) * self.up_proj_conv(x) - ) - x = x.transpose(1, 3) - x = torch.reshape(x, (bsz, -1, self.hidden_size)) - - return x - - def forward(self, x): - return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x)) - - -# Copied from transformers.models.llama.modeling_llama.repeat_kv -def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor: - """ - This is the equivalent of torch.repeat_interleave(x, dim=1, repeats=n_rep). The hidden states go from (batch, - num_key_value_heads, seqlen, head_dim) to (batch, num_attention_heads, seqlen, head_dim) - """ - if isinstance(hidden_states, list): - return [head for head in hidden_states for _ in range(n_rep)] - - batch, num_key_value_heads, slen, head_dim = hidden_states.shape - if n_rep == 1: - return hidden_states - hidden_states = hidden_states[:, :, None, :, :].expand( - batch, num_key_value_heads, n_rep, slen, head_dim - ) - return hidden_states.reshape(batch, num_key_value_heads * n_rep, slen, head_dim) - - -class LlamaAttention(nn.Module): - """Multi-headed attention from 'Attention Is All You Need' paper""" - - def __init__(self, config: LlamaConfig): - super().__init__() - self.config = config - self.hidden_size = config.hidden_size - self.num_heads = config.num_attention_heads - self.num_key_value_heads = ( - config.num_key_value_heads - if hasattr(config, "num_key_value_heads") - else self.num_heads - ) - self.num_key_value_groups = self.num_heads // self.num_key_value_heads - self.head_dim = self.hidden_size // self.num_heads - self.max_position_embeddings = config.max_position_embeddings - - if (self.head_dim * self.num_heads) != self.hidden_size: - raise ValueError( - f"hidden_size must be divisible by num_heads (got `hidden_size`: {self.hidden_size}" - f" and `num_heads`: {self.num_heads})." - ) - self.q_proj = nn.Linear( - self.hidden_size, self.num_heads * self.head_dim, bias=False - ) - self.k_proj = nn.Linear( - self.hidden_size, self.num_key_value_heads * self.head_dim, bias=False - ) - self.v_proj = nn.Linear( - self.hidden_size, self.num_key_value_heads * self.head_dim, bias=False - ) - self.o_proj = nn.Linear( - self.num_heads * self.head_dim, self.hidden_size, bias=False - ) - self.rotary_emb = LlamaRotaryEmbedding( - self.head_dim, - max_position_embeddings=self.max_position_embeddings, - base=getattr(config, "rope_theta", 10000.0), - ) - self.mask_neg = config.mask_neg - self.return_new_key_value_only = ( - config.return_new_key_value_only - if hasattr(config, "return_new_key_value_only") - else False - ) - - def _shape(self, tensor: torch.Tensor, seq_len: int, bsz: int): - return ( - tensor.view(bsz, seq_len, self.num_heads, self.head_dim) - .transpose(1, 2) - .contiguous() - ) - - def prepare_conv(self): - if not hasattr(self, "forward_no_conv"): - self.q_proj_conv = nn.Conv2d( - self.hidden_size, self.num_heads * self.head_dim, 1, bias=False - ) - self.k_proj_conv = nn.Conv2d( - self.hidden_size, - self.num_key_value_heads * self.head_dim, - 1, - bias=False, - ) - self.v_proj_conv = nn.Conv2d( - self.hidden_size, - self.num_key_value_heads * self.head_dim, - 1, - bias=False, - ) - self.o_proj_conv = nn.Conv2d( - self.num_heads * self.head_dim, self.hidden_size, 1, bias=False - ) - - self.forward_no_conv = self.forward - self.forward = self.forward_conv - - self.q_proj_conv.weight.data.copy_(self.q_proj.weight[:, :, None, None]) - self.k_proj_conv.weight.data.copy_(self.k_proj.weight[:, :, None, None]) - self.v_proj_conv.weight.data.copy_(self.v_proj.weight[:, :, None, None]) - self.o_proj_conv.weight.data.copy_(self.o_proj.weight[:, :, None, None]) - - del self.q_proj - del self.k_proj - del self.v_proj - del self.o_proj - - def prepare_sha(self): - if not hasattr(self, "forward_mha"): - self.q_proj_sha = nn.ModuleList( - [ - nn.Conv2d(self.hidden_size, self.head_dim, 1, bias=False) - for _ in range(self.num_heads) - ] - ) - self.k_proj_sha = nn.ModuleList( - [ - nn.Conv2d(self.hidden_size, self.head_dim, 1, bias=False) - for _ in range(self.num_key_value_heads) - ] - ) - self.v_proj_sha = nn.ModuleList( - [ - nn.Conv2d(self.hidden_size, self.head_dim, 1, bias=False) - for _ in range(self.num_key_value_heads) - ] - ) - if not hasattr(self, "o_proj_conv"): - self.o_proj_conv = nn.Conv2d( - self.num_heads * self.head_dim, self.hidden_size, 1, bias=False - ) - self.o_proj_conv.weight.data.copy_(self.o_proj.weight[:, :, None, None]) - del self.o_proj - - self.forward_mha = self.forward - self.forward = self.forward_sha - - for i in range(self.num_heads): - self.q_proj_sha[i].weight.data.copy_( - self.q_proj_conv.weight[i * self.head_dim : (i + 1) * self.head_dim, :] - ) - - for i in range(self.num_key_value_heads): - self.k_proj_sha[i].weight.data.copy_( - self.k_proj_conv.weight[i * self.head_dim : (i + 1) * self.head_dim, :] - ) - self.v_proj_sha[i].weight.data.copy_( - self.v_proj_conv.weight[i * self.head_dim : (i + 1) * self.head_dim, :] - ) - - del self.q_proj_conv - del self.k_proj_conv - del self.v_proj_conv - - def forward_sha( - self, - hidden_states: torch.Tensor, - attention_mask: Optional[torch.Tensor] = None, - position_ids: Optional[torch.LongTensor] = None, - past_key_value: Optional[Tuple[torch.Tensor]] = None, - output_attentions: bool = False, - use_cache: bool = False, - ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]: - - bsz, q_len, _ = hidden_states.size() - - hidden_states = torch.reshape(hidden_states, (bsz, -1, 1, self.hidden_size)) - hidden_states = hidden_states.transpose(1, 3) - - query_states = [ - q_proj(hidden_states).permute(0, 2, 3, 1) for q_proj in self.q_proj_sha - ] - key_states = [ - k_proj(hidden_states).permute(0, 2, 3, 1) for k_proj in self.k_proj_sha - ] - value_states = [ - v_proj(hidden_states).permute(0, 2, 3, 1) for v_proj in self.v_proj_sha - ] - - kv_seq_len = value_states[0].shape[-2] - if past_key_value is not None: - kv_seq_len += past_key_value[1][0].shape[-2] - - if isinstance(position_ids, (tuple, list)): - rope_embedding = position_ids - query_states = [apply_rope_single(q, rope_embedding) for q in query_states] - key_states = [apply_rope_single(k, rope_embedding) for k in key_states] - else: - cos, sin = self.rotary_emb(value_states[0], kv_seq_len) - - query_states = [ - apply_rotary_pos_emb_single(q, cos, sin, position_ids) - for q in query_states - ] - key_states = [ - apply_rotary_pos_emb_single(k, cos, sin, position_ids) - for k in key_states - ] - - key_states = [k.transpose(2, 3) for k in key_states] - if self.return_new_key_value_only: - present_key_value = ( - (tuple(key_states), tuple(value_states)) if use_cache else None - ) - - if past_key_value is not None: - # reuse k, v, self_attention - past_key, past_value = past_key_value - key_states = [ - torch.cat([pk, k], dim=3) for pk, k in zip(past_key, key_states) - ] - value_states = [ - torch.cat([pv, v], dim=2) for pv, v in zip(past_value, value_states) - ] - - if not self.return_new_key_value_only: - present_key_value = ( - (tuple(key_states), tuple(value_states)) if use_cache else None - ) - - key_states = repeat_kv(key_states, self.num_key_value_groups) - value_states = repeat_kv(value_states, self.num_key_value_groups) - - attn_weights = [ - torch.matmul(q, k) / math.sqrt(self.head_dim) - for q, k in zip(query_states, key_states) - ] - if attn_weights[0].size() != (bsz, 1, q_len, kv_seq_len): - raise ValueError( - f"Attention weights should be of size {(bsz, 1, q_len, kv_seq_len)}, but is" - f" {attn_weights[0].size()}" - ) - - if attention_mask is not None: - if attention_mask.size() != (bsz, 1, q_len, kv_seq_len): - raise ValueError( - f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.size()}" - ) - attn_weights = [aw + attention_mask for aw in attn_weights] - - # upcast attention to fp32 - attn_weights = [ - nn.functional.softmax(aw, dim=-1, dtype=torch.float32).to( - query_states[0].dtype - ) - for aw in attn_weights - ] - attn_output = [torch.matmul(aw, v) for aw, v in zip(attn_weights, value_states)] - - if attn_output[0].size() != (bsz, 1, q_len, self.head_dim): - raise ValueError( - f"`attn_output` should be of size {(bsz, 1, q_len, self.head_dim)}, but is" - f" {attn_output[0].size()}" - ) - - attn_output = torch.cat(attn_output, dim=3) - attn_output = attn_output.permute(0, 3, 1, 2) - attn_output = self.o_proj_conv(attn_output) - attn_output = attn_output.transpose(1, 3) - attn_output = attn_output.reshape(bsz, q_len, self.hidden_size) - - if not output_attentions: - attn_weights = None - - return attn_output, attn_weights, present_key_value - - def forward_conv( - self, - hidden_states: torch.Tensor, - attention_mask: Optional[torch.Tensor] = None, - position_ids: Optional[torch.LongTensor] = None, - past_key_value: Optional[Tuple[torch.Tensor]] = None, - output_attentions: bool = False, - use_cache: bool = False, - ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]: - bsz, q_len, _ = hidden_states.size() - - hidden_states = torch.reshape( - hidden_states, (bsz, q_len, 1, self.hidden_size) - ).transpose(1, 3) - - query_states = self.q_proj_conv(hidden_states) - key_states = self.k_proj_conv(hidden_states) - value_states = self.v_proj_conv(hidden_states) - - query_states = query_states.reshape( - bsz, self.num_heads, self.head_dim, q_len - ).transpose(2, 3) - key_states = key_states.reshape( - bsz, self.num_key_value_heads, self.head_dim, q_len - ).transpose(2, 3) - value_states = value_states.reshape( - bsz, self.num_key_value_heads, self.head_dim, q_len - ).transpose(2, 3) - - kv_seq_len = key_states.shape[-2] - if past_key_value is not None: - dim = 3 if self.config.transposed_key_cache else 2 - kv_seq_len += past_key_value[0].shape[dim] - - if isinstance(position_ids, (tuple, list)): - rope_embedding = position_ids - query_states = apply_rope_single(query_states, rope_embedding) - key_states = apply_rope_single(key_states, rope_embedding) - else: - cos, sin = self.rotary_emb(value_states, kv_seq_len) - query_states, key_states = apply_rotary_pos_emb( - query_states, key_states, cos, sin, position_ids - ) - # [bsz, nh, t, hd] - - if self.config.transposed_key_cache: - key_states = key_states.transpose(2, 3) - - if self.return_new_key_value_only: - present_key_value = (key_states, value_states) if use_cache else None - - if past_key_value is not None: - # reuse k, v, self_attention - dim = 3 if self.config.transposed_key_cache else 2 - key_states = torch.cat([past_key_value[0], key_states], dim=dim) - value_states = torch.cat([past_key_value[1], value_states], dim=2) - - if not self.return_new_key_value_only: - present_key_value = (key_states, value_states) if use_cache else None - - key_states = repeat_kv(key_states, self.num_key_value_groups) - value_states = repeat_kv(value_states, self.num_key_value_groups) - - if self.config.transposed_key_cache: - attn_weights = torch.matmul(query_states, key_states) / math.sqrt( - self.head_dim - ) - else: - attn_weights = torch.matmul( - query_states, key_states.transpose(2, 3) - ) / math.sqrt(self.head_dim) - - if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len): - raise ValueError( - f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is" - f" {attn_weights.size()}" - ) - - if attention_mask is not None: - if attention_mask.size() != (bsz, 1, q_len, kv_seq_len): - raise ValueError( - f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.size()}" - ) - attn_weights = attn_weights + attention_mask - - # upcast attention to fp32 - attn_weights = nn.functional.softmax( - attn_weights, dim=-1, dtype=torch.float32 - ).to(query_states.dtype) - attn_output = torch.matmul(attn_weights, value_states) - - if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim): - raise ValueError( - f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is" - f" {attn_output.size()}" - ) - - attn_output = attn_output.transpose(1, 2) - attn_output = attn_output.reshape(bsz, q_len, 1, self.hidden_size) - attn_output = attn_output.transpose(1, 3) - attn_output = self.o_proj_conv(attn_output) - attn_output = attn_output.transpose(1, 3) - attn_output = attn_output.reshape(bsz, q_len, self.hidden_size) - - if not output_attentions: - attn_weights = None - - return attn_output, attn_weights, present_key_value - - def forward( - self, - hidden_states: torch.Tensor, - attention_mask: Optional[torch.Tensor] = None, - position_ids: Optional[torch.LongTensor] = None, - past_key_value: Optional[Tuple[torch.Tensor]] = None, - output_attentions: bool = False, - use_cache: bool = False, - ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]: - bsz, q_len, _ = hidden_states.size() - - query_states = self.q_proj(hidden_states) - key_states = self.k_proj(hidden_states) - value_states = self.v_proj(hidden_states) - - query_states = query_states.view( - bsz, q_len, self.num_heads, self.head_dim - ).transpose(1, 2) - key_states = key_states.view( - bsz, q_len, self.num_key_value_heads, self.head_dim - ).transpose(1, 2) - value_states = value_states.view( - bsz, q_len, self.num_key_value_heads, self.head_dim - ).transpose(1, 2) - - kv_seq_len = key_states.shape[-2] - if past_key_value is not None: - kv_seq_len += past_key_value[1].shape[-2] - - if isinstance(position_ids, (tuple, list)): - rope_embedding = position_ids - query_states = apply_rope_single(query_states, rope_embedding) - key_states = apply_rope_single(key_states, rope_embedding) - else: - cos, sin = self.rotary_emb(value_states, kv_seq_len) - query_states, key_states = apply_rotary_pos_emb( - query_states, key_states, cos, sin, position_ids - ) - # [bsz, nh, t, hd] - - if self.config.transposed_key_cache: - key_states = key_states.transpose(2, 3) - - if self.return_new_key_value_only: - present_key_value = (key_states, value_states) if use_cache else None - - if past_key_value is not None: - # reuse k, v, self_attention - dim = 3 if self.config.transposed_key_cache else 2 - key_states = torch.cat([past_key_value[0], key_states], dim=dim) - value_states = torch.cat([past_key_value[1], value_states], dim=2) - - if not self.return_new_key_value_only: - present_key_value = (key_states, value_states) if use_cache else None - - key_states = repeat_kv(key_states, self.num_key_value_groups) - value_states = repeat_kv(value_states, self.num_key_value_groups) - - if self.config.transposed_key_cache: - attn_weights = torch.matmul(query_states, key_states) / math.sqrt( - self.head_dim - ) - else: - attn_weights = torch.matmul( - query_states, key_states.transpose(2, 3) - ) / math.sqrt(self.head_dim) - - if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len): - raise ValueError( - f"Attention weights should be of size {(bsz, self.num_heads, q_len, kv_seq_len)}, but is" - f" {attn_weights.size()}" - ) - - if attention_mask is not None: - if attention_mask.size() != (bsz, 1, q_len, kv_seq_len): - raise ValueError( - f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.size()}" - ) - attn_weights = attn_weights + attention_mask - - # upcast attention to fp32 - attn_weights = nn.functional.softmax( - attn_weights, dim=-1, dtype=torch.float32 - ).to(query_states.dtype) - attn_output = torch.matmul(attn_weights, value_states) - - if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim): - raise ValueError( - f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is" - f" {attn_output.size()}" - ) - - attn_output = attn_output.transpose(1, 2) - - attn_output = attn_output.reshape(bsz, q_len, self.hidden_size) - attn_output = self.o_proj(attn_output) - - if not output_attentions: - attn_weights = None - - return attn_output, attn_weights, present_key_value - - -class LlamaDecoderLayer(nn.Module): - def __init__(self, config: LlamaConfig): - super().__init__() - self.hidden_size = config.hidden_size - self.self_attn = LlamaAttention(config=config) - self.mlp = LlamaMLP( - hidden_size=self.hidden_size, - intermediate_size=config.intermediate_size, - hidden_act=config.hidden_act, - ) - self.input_layernorm = LlamaRMSNorm(config.hidden_size, eps=config.rms_norm_eps) - self.post_attention_layernorm = LlamaRMSNorm( - config.hidden_size, eps=config.rms_norm_eps - ) - - def forward( - self, - hidden_states: torch.Tensor, - attention_mask: Optional[torch.Tensor] = None, - position_ids: Optional[torch.LongTensor] = None, - past_key_value: Optional[Tuple[torch.Tensor]] = None, - output_attentions: Optional[bool] = False, - use_cache: Optional[bool] = False, - ) -> Tuple[ - torch.FloatTensor, Optional[Tuple[torch.FloatTensor, torch.FloatTensor]] - ]: - """ - Args: - hidden_states (`torch.FloatTensor`): input to the layer of shape `(batch, seq_len, embed_dim)` - attention_mask (`torch.FloatTensor`, *optional*): attention mask of size - `(batch, 1, tgt_len, src_len)` where padding elements are indicated by very large negative values. - output_attentions (`bool`, *optional*): - Whether or not to return the attentions tensors of all attention layers. See `attentions` under - returned tensors for more detail. - use_cache (`bool`, *optional*): - If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding - (see `past_key_values`). - past_key_value (`Tuple(torch.FloatTensor)`, *optional*): cached past key and value projection states - """ - - residual = hidden_states - - hidden_states = self.input_layernorm(hidden_states) - - # Self Attention - hidden_states, self_attn_weights, present_key_value = self.self_attn( - hidden_states=hidden_states, - attention_mask=attention_mask, - position_ids=position_ids, - past_key_value=past_key_value, - output_attentions=output_attentions, - use_cache=use_cache, - ) - hidden_states = residual + hidden_states - - # Fully Connected - residual = hidden_states - hidden_states = self.post_attention_layernorm(hidden_states) - hidden_states = self.mlp(hidden_states) - hidden_states = residual + hidden_states - - outputs = (hidden_states,) - - if output_attentions: - outputs += (self_attn_weights,) - - if use_cache: - outputs += (present_key_value,) - - return outputs - - -LLAMA_START_DOCSTRING = r""" - This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the - library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads - etc.) - This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. - Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage - and behavior. - Parameters: - config ([`LlamaConfig`]): - Model configuration class with all the parameters of the model. Initializing with a config file does not - load the weights associated with the model, only the configuration. Check out the - [`~PreTrainedModel.from_pretrained`] method to load the model weights. -""" - - -@add_start_docstrings( - "The bare LLaMA Model outputting raw hidden-states without any specific head on top.", - LLAMA_START_DOCSTRING, -) -class LlamaPreTrainedModel(PreTrainedModel): - config_class = LlamaConfig - base_model_prefix = "model" - supports_gradient_checkpointing = True - _no_split_modules = ["LlamaDecoderLayer"] - _skip_keys_device_placement = "past_key_values" - _keys_to_ignore_on_load_unexpected = [r"decoder\.version"] - - def _init_weights(self, module): - std = self.config.initializer_range - if isinstance(module, nn.Linear): - module.weight.data.normal_(mean=0.0, std=std) - if module.bias is not None: - module.bias.data.zero_() - elif isinstance(module, nn.Embedding): - module.weight.data.normal_(mean=0.0, std=std) - if module.padding_idx is not None: - module.weight.data[module.padding_idx].zero_() - - def _set_gradient_checkpointing(self, module, value=False): - if isinstance(module, LlamaModel): - module.gradient_checkpointing = value - - -LLAMA_INPUTS_DOCSTRING = r""" - Args: - input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`): - Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide - it. - Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and - [`PreTrainedTokenizer.__call__`] for details. - [What are input IDs?](../glossary#input-ids) - attention_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*): - Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: - - 1 for tokens that are **not masked**, - - 0 for tokens that are **masked**. - [What are attention masks?](../glossary#attention-mask) - Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and - [`PreTrainedTokenizer.__call__`] for details. - If `past_key_values` is used, optionally only the last `decoder_input_ids` have to be input (see - `past_key_values`). - If you want to change padding behavior, you should read [`modeling_opt._prepare_decoder_attention_mask`] - and modify to your needs. See diagram 1 in [the paper](https://arxiv.org/abs/1910.13461) for more - information on the default strategy. - - 1 indicates the head is **not masked**, - - 0 indicates the head is **masked**. - position_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*): - Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, - config.n_positions - 1]`. - [What are position IDs?](../glossary#position-ids) - past_key_values (`tuple(tuple(torch.FloatTensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`): - Tuple of `tuple(torch.FloatTensor)` of length `config.n_layers`, with each tuple having 2 tensors of shape - `(batch_size, num_heads, sequence_length, embed_size_per_head)`) and 2 additional tensors of shape - `(batch_size, num_heads, encoder_sequence_length, embed_size_per_head)`. - Contains pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention - blocks) that can be used (see `past_key_values` input) to speed up sequential decoding. - If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that - don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all - `decoder_input_ids` of shape `(batch_size, sequence_length)`. - inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*): - Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This - is useful if you want more control over how to convert `input_ids` indices into associated vectors than the - model's internal embedding lookup matrix. - use_cache (`bool`, *optional*): - If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see - `past_key_values`). - output_attentions (`bool`, *optional*): - Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned - tensors for more detail. - output_hidden_states (`bool`, *optional*): - Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for - more detail. - return_dict (`bool`, *optional*): - Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple. -""" - - -@add_start_docstrings( - "The bare LLaMA Model outputting raw hidden-states without any specific head on top.", - LLAMA_START_DOCSTRING, -) -class LlamaModel(LlamaPreTrainedModel): - """ - Transformer decoder consisting of *config.num_hidden_layers* layers. Each layer is a [`LlamaDecoderLayer`] - Args: - config: LlamaConfig - """ - - def __init__(self, config: LlamaConfig): - super().__init__(config) - self.padding_idx = config.pad_token_id - self.vocab_size = config.vocab_size - - self.embed_tokens = nn.Embedding( - config.vocab_size, config.hidden_size, self.padding_idx - ) - # self.layers = nn.ModuleList([LlamaDecoderLayer(config) for _ in range(config.num_hidden_layers)]) - ### ------- QCOM EDITS STARTS ------- ### - self.layers = nn.ModuleList( - [ - LlamaDecoderLayer(config) - if config.hidden_layers_start <= i < config.hidden_layers_end - else nn.Identity() - for i in range(config.num_hidden_layers) - ] - ) - ### ------- QCOM EDITS ENDS ------- ### - self.norm = LlamaRMSNorm(config.hidden_size, eps=config.rms_norm_eps) - - self.gradient_checkpointing = False - # Initialize weights and apply final processing - self.post_init() - self.mask_neg = config.mask_neg - - def get_input_embeddings(self): - return self.embed_tokens - - def set_input_embeddings(self, value): - self.embed_tokens = value - - # Copied from transformers.models.bart.modeling_bart.BartDecoder._prepare_decoder_attention_mask - @staticmethod - def _prepare_decoder_attention_mask( - attention_mask, - input_shape, - inputs_embeds, - past_key_values_length, - mask_neg=-100.0, - ): - # create causal mask - # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len] - combined_attention_mask = None - if input_shape[-1] > 1: - combined_attention_mask = _make_causal_mask( - input_shape, - inputs_embeds.dtype, - device=inputs_embeds.device, - past_key_values_length=past_key_values_length, - mask_neg=mask_neg, - ) - - if attention_mask is not None: - # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len] - expanded_attn_mask = _expand_mask( - attention_mask, - inputs_embeds.dtype, - tgt_len=input_shape[-1], - mask_neg=mask_neg, - ).to(inputs_embeds.device) - combined_attention_mask = ( - expanded_attn_mask - if combined_attention_mask is None - else expanded_attn_mask + combined_attention_mask - ) - - return combined_attention_mask - - @add_start_docstrings_to_model_forward(LLAMA_INPUTS_DOCSTRING) - def forward( - self, - input_ids: torch.LongTensor = None, - attention_mask: Optional[torch.Tensor] = None, - position_ids: Optional[torch.LongTensor] = None, - past_key_values: Optional[List[torch.FloatTensor]] = None, - inputs_embeds: Optional[torch.FloatTensor] = None, - use_cache: Optional[bool] = None, - output_attentions: Optional[bool] = None, - output_hidden_states: Optional[bool] = None, - return_dict: Optional[bool] = None, - ) -> Union[Tuple, BaseModelOutputWithPast]: - output_attentions = ( - output_attentions - if output_attentions is not None - else self.config.output_attentions - ) - output_hidden_states = ( - output_hidden_states - if output_hidden_states is not None - else self.config.output_hidden_states - ) - use_cache = use_cache if use_cache is not None else self.config.use_cache - use_combined_mask_input = self.config.use_combined_mask_input - - return_dict = ( - return_dict if return_dict is not None else self.config.use_return_dict - ) - - # retrieve input_ids and inputs_embeds - # if input_ids is not None and inputs_embeds is not None: - # raise ValueError("You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time") - # elif input_ids is not None: - # batch_size, seq_length = input_ids.shape - # elif inputs_embeds is not None: - # batch_size, seq_length, _ = inputs_embeds.shape - # else: - # raise ValueError("You have to specify either decoder_input_ids or decoder_inputs_embeds") - # retrieve input_ids and inputs_embeds - if input_ids is not None and inputs_embeds is not None: - raise ValueError( - "You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time" - ) - - ### ------- QCOM EDITS STARTS ------- ### - # Combined attention mask expand attention mask to rank-4 - # [ bsz, 1, tgt_seq_len, src_seq_len ] - # check attention mask shape and fetch sequence length correctly. - elif attention_mask is not None: - attention_shape = attention_mask.shape - batch_size = attention_shape[0] - seq_length = ( - attention_shape[-2] - if len(attention_shape) == 4 - else attention_shape[-1] - ) - - ### ------- QCOM EDITS ENDS ------- ### - elif inputs_embeds is not None: - batch_size, seq_length, _ = inputs_embeds.shape - else: - raise ValueError( - "You have to specify either decoder_input_ids or decoder_inputs_embeds" - ) - - seq_length_with_past = seq_length - past_key_values_length = 0 - - if past_key_values is not None: - past_key_values_length = past_key_values[0][1][0].shape[-2] - seq_length_with_past = seq_length_with_past + past_key_values_length - - if position_ids is None: - device = input_ids.device if input_ids is not None else inputs_embeds.device - position_ids = torch.arange( - past_key_values_length, - seq_length + past_key_values_length, - dtype=torch.long, - device=device, - ) - position_ids = position_ids.unsqueeze(0).view(-1, seq_length) - elif isinstance(position_ids, (tuple, list)): - # don't make position_ids - pass - else: - position_ids = position_ids.view(-1, seq_length).long() - - ### ------- QCOM EDITS STARTS ------- ### - if self.config.split_model is None or self.config.split_model == 1: - if inputs_embeds is None: - inputs_embeds = self.embed_tokens(input_ids) - # embed positions - ### ------- QCOM EDITS ENDS ------- ### - - # if use_combined_mask_input, then attention mask is prepared outside the model - if not use_combined_mask_input: - if attention_mask is None: - attention_mask = torch.ones( - (batch_size, seq_length_with_past), - dtype=torch.bool, - device=inputs_embeds.device, - ) - attention_mask = self._prepare_decoder_attention_mask( - attention_mask, - (batch_size, seq_length), - inputs_embeds, - past_key_values_length, - self.mask_neg, - ) - - ### ------- QCOM EDITS STARTS ------- ### - if self.config.split_model is None or self.config.split_model == 1: - hidden_states = inputs_embeds - else: - hidden_states = input_ids - ### ------- QCOM EDITS ENDS ------- ### - - if self.gradient_checkpointing and self.training: - if use_cache: - logger.warning_once( - "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..." - ) - use_cache = False - - # decoder layers - all_hidden_states = () if output_hidden_states else None - all_self_attns = () if output_attentions else None - next_decoder_cache = () if use_cache else None - - for idx, decoder_layer in enumerate(self.layers): - if output_hidden_states: - all_hidden_states += (hidden_states,) - - past_key_value = ( - past_key_values[idx] if past_key_values is not None else None - ) - - if self.gradient_checkpointing and self.training: - - def create_custom_forward(module): - def custom_forward(*inputs): - # None for past_key_value - return module(*inputs, output_attentions, None) - - return custom_forward - - layer_outputs = torch.utils.checkpoint.checkpoint( - create_custom_forward(decoder_layer), - hidden_states, - attention_mask, - position_ids, - None, - ) - else: - layer_outputs = decoder_layer( - hidden_states, - attention_mask=attention_mask, - position_ids=position_ids, - past_key_value=past_key_value, - output_attentions=output_attentions, - use_cache=use_cache, - ) - - hidden_states = layer_outputs[0] - - if use_cache: - next_decoder_cache += (layer_outputs[2 if output_attentions else 1],) - - if output_attentions: - all_self_attns += (layer_outputs[1],) - - ### ------- QCOM EDITS STARTS ------- ### - if self.config.split_model is None or self.config.split_model == 5: - hidden_states = self.norm(hidden_states) - ### ------- QCOM EDITS ENDS ------- ### - - # add hidden states from the last decoder layer - if output_hidden_states: - all_hidden_states += (hidden_states,) - - next_cache = next_decoder_cache if use_cache else None - if not return_dict: - return tuple( - v - for v in [hidden_states, next_cache, all_hidden_states, all_self_attns] - if v is not None - ) - return BaseModelOutputWithPast( - last_hidden_state=hidden_states, - past_key_values=next_cache, - hidden_states=all_hidden_states, - attentions=all_self_attns, - ) - - -class CustomLogitWarper(nn.Module): - """ - Customized transformers.TopKLogitsWarper: Temperature + Topk + Softmax - """ - - def __init__(self, top_k, temperature, filter_value=-float("inf")): - super().__init__() - self.top_k = top_k - self.temperature = temperature - self.filter_value = filter_value - - def forward(self, logits): - top_logits, indices = torch.topk(logits, self.top_k) - indices_to_remove = logits < top_logits[..., -1, None] - logits = logits / self.temperature - logits = logits.masked_fill(indices_to_remove, self.filter_value) - probs = nn.functional.softmax(logits, dim=-1) - return probs, indices - - -class LlamaForCausalLM(LlamaPreTrainedModel): - def __init__(self, config): - super().__init__(config) - self.model = LlamaModel(config) - - self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False) - - # Initialize weights and apply final processing - self.post_init() - - self.num_logits_to_return = config.num_logits_to_return - self.return_top_k = config.return_top_k - if self.return_top_k > 0: - self.logit_warper = CustomLogitWarper( - self.return_top_k, - config.logit_temperature, - filter_value=config.mask_neg, - ) - - def prepare_conv(self): - if not hasattr(self, "lm_head_conv"): - self.lm_head_conv = nn.Conv2d( - self.config.hidden_size, self.config.vocab_size, 1, bias=False - ) - self.lm_head_conv.weight.data.copy_(self.lm_head.weight[:, :, None, None]) - - del self.lm_head - - def lm_head_conv_forward(self, x): - bsz, _, _ = x.size() - x = torch.reshape(x, (bsz, -1, 1, self.config.hidden_size)) - x = x.transpose(1, 3) # Transpose right before and after Conv - x = self.lm_head_conv(x) - x = x.transpose(1, 3) - x = torch.reshape(x, (bsz, -1, self.config.vocab_size)) - return x - - def get_input_embeddings(self): - return self.model.embed_tokens - - def set_input_embeddings(self, value): - self.model.embed_tokens = value - - def get_output_embeddings(self): - return self.lm_head - - def set_output_embeddings(self, new_embeddings): - self.lm_head = new_embeddings - - def set_decoder(self, decoder): - self.model = decoder - - def get_decoder(self): - return self.model - - @add_start_docstrings_to_model_forward(LLAMA_INPUTS_DOCSTRING) - @replace_return_docstrings( - output_type=CausalLMOutputWithPast, config_class=_CONFIG_FOR_DOC - ) - def forward( - self, - input_ids: torch.LongTensor = None, - attention_mask: Optional[torch.Tensor] = None, - position_ids: Optional[torch.LongTensor] = None, - past_key_values: Optional[List[torch.FloatTensor]] = None, - inputs_embeds: Optional[torch.FloatTensor] = None, - labels: Optional[torch.LongTensor] = None, - use_cache: Optional[bool] = None, - output_attentions: Optional[bool] = None, - output_hidden_states: Optional[bool] = None, - return_dict: Optional[bool] = None, - ) -> Union[Tuple, CausalLMOutputWithPast]: - r""" - Args: - labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*): - Labels for computing the masked language modeling loss. Indices should either be in `[0, ..., - config.vocab_size]` or -100 (see `input_ids` docstring). Tokens with indices set to `-100` are ignored - (masked), the loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`. - Returns: - Example: - ```python - >>> from transformers import AutoTokenizer, LlamaForCausalLM - >>> model = LlamaForCausalLM.from_pretrained(PATH_TO_CONVERTED_WEIGHTS) - >>> tokenizer = AutoTokenizer.from_pretrained(PATH_TO_CONVERTED_TOKENIZER) - >>> prompt = "Hey, are you consciours? Can you talk to me?" - >>> inputs = tokenizer(prompt, return_tensors="pt") - >>> # Generate - >>> generate_ids = model.generate(inputs.input_ids, max_length=30) - >>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0] - "Hey, are you consciours? Can you talk to me?\nI'm not consciours, but I can talk to you." - ```""" - - output_attentions = ( - output_attentions - if output_attentions is not None - else self.config.output_attentions - ) - output_hidden_states = ( - output_hidden_states - if output_hidden_states is not None - else self.config.output_hidden_states - ) - return_dict = ( - return_dict if return_dict is not None else self.config.use_return_dict - ) - - # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn) - outputs = self.model( - input_ids=input_ids, - attention_mask=attention_mask, - position_ids=position_ids, - past_key_values=past_key_values, - inputs_embeds=inputs_embeds, - use_cache=use_cache, - output_attentions=output_attentions, - output_hidden_states=output_hidden_states, - return_dict=return_dict, - ) - - hidden_states = outputs[0] - - ### ------- QCOM EDITS STARTS ------- ### - loss = None - if self.config.split_model is None or self.config.split_model == 5: - if self.num_logits_to_return == 0: - # return all logits by default - logits = ( - self.lm_head_conv_forward(hidden_states) - if self.config.use_conv - else self.lm_head(hidden_states) - ) - else: - # only return num_logits_to_return logits for memory efficiency - last_hidden_states = hidden_states[ - :, -self.num_logits_to_return :, : - ].contiguous() - logits = ( - self.lm_head_conv_forward(last_hidden_states) - if self.config.use_conv - else self.lm_head(last_hidden_states) - ) - - if labels is not None: - # Shift so that tokens < n predict n - all_logits = self.lm_head(hidden_states) - shift_logits = all_logits[..., :-1, :].contiguous() - shift_labels = labels[..., 1:].contiguous() - # Flatten the tokens - loss_fct = CrossEntropyLoss() - shift_logits = shift_logits.view(-1, self.config.vocab_size) - shift_labels = shift_labels.view(-1) - # Enable model parallelism - shift_labels = shift_labels.to(shift_logits.device) - loss = loss_fct(shift_logits, shift_labels) - - if self.return_top_k > 0: - probs, indices = self.logit_warper(logits) - output = (probs, indices) + outputs[1:] - return ((loss,) + output) if loss is not None else output - else: - logits = hidden_states - ### ------- QCOM EDITS ENDS ------- ### - - if not return_dict: - output = (logits,) + outputs[1:] - return (loss,) + output if loss is not None else output - - return CausalLMOutputWithPast( - loss=loss, - logits=logits, - past_key_values=outputs.past_key_values, - hidden_states=outputs.hidden_states, - attentions=outputs.attentions, - ) - - def prepare_inputs_for_generation( - self, - input_ids, - past_key_values=None, - attention_mask=None, - inputs_embeds=None, - **kwargs, - ): - if past_key_values: - input_ids = input_ids[:, -1:] - - position_ids = kwargs.get("position_ids", None) - if attention_mask is not None and position_ids is None: - # create position_ids on the fly for batch generation - position_ids = attention_mask.long().cumsum(-1) - 1 - position_ids.masked_fill_(attention_mask == 0, 1) - if past_key_values: - position_ids = position_ids[:, -1].unsqueeze(-1) - - # if `inputs_embeds` are passed, we only want to use them in the 1st generation step - if inputs_embeds is not None and past_key_values is None: - model_inputs = {"inputs_embeds": inputs_embeds} - else: - model_inputs = {"input_ids": input_ids} - - model_inputs.update( - { - "position_ids": position_ids, - "past_key_values": past_key_values, - "use_cache": kwargs.get("use_cache"), - "attention_mask": attention_mask, - } - ) - return model_inputs - - @staticmethod - def _reorder_cache(past_key_values, beam_idx): - reordered_past = () - for layer_past in past_key_values: - reordered_past += ( - tuple( - past_state.index_select(0, beam_idx) for past_state in layer_past - ), - ) - return reordered_past diff --git a/qai_hub_models/models/llama_v3_8b_chat_quantized/perf.yaml b/qai_hub_models/models/llama_v3_8b_chat_quantized/perf.yaml index 24c2c6f5..6a13d6da 100644 --- a/qai_hub_models/models/llama_v3_8b_chat_quantized/perf.yaml +++ b/qai_hub_models/models/llama_v3_8b_chat_quantized/perf.yaml @@ -1,173 +1,42 @@ +aggregated: + supported_devices: + - Snapdragon 8 Elite QRD + - Snapdragon X Elite CRD + supported_oses: + - Android + supported_chipsets: + - Snapdragon® 8 Elite + - Snapdragon® X Elite models: -- name: Llama3-TokenGenerator-KVCache-Quantized + name: Llama-v3-8B-Chat performance_metrics: - - reference_device_info: - name: QCS8550 (Proxy) - os: '12' - form_factor: Iot - os_name: Android - manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-10-05T00:20:11.634769Z' - torchscript_onnx_qnn: - inference_time: 99315 - throughput: 10.07 - estimated_peak_memory_range: - min: 34553856 - max: 36402280 - layer_info: - layers_on_npu: 21272 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 21272 - precision: uint16 - primary_compute_unit: NPU - job_id: 'null' - job_status: Passed - - reference_device_info: - name: Samsung Galaxy S24 - os: '14' + - torchscript_onnx_qnn: + llm_metrics: + time_to_first_token_range: + min: 159383 + max: 5100256 + tokens_per_second: 12.9262 + evaluation_metrics: null + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' form_factor: Phone os_name: Android - manufacturer: Samsung - chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-06-11T00:34:02.549319Z' - torchscript_onnx_qnn: - inference_time: 72856 - throughput: 13.72 - estimated_peak_memory_range: - min: 950272 - max: 1322707920 - layer_info: - layers_on_npu: 20765 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 20765 - precision: uint16 - primary_compute_unit: NPU - job_id: "null" - job_status: Passed - - reference_device_info: - name: Snapdragon X Elite CRD - os: '11' - form_factor: Compute - os_name: Windows manufacturer: Qualcomm - chipset: Snapdragon® X Elite - timestamp: '2024-06-12T00:34:02.549319Z' - torchscript_onnx_qnn: - inference_time: 79170 - throughput: 12.63 - estimated_peak_memory_range: - min: 17051648 - max: 17051648 - layer_info: - layers_on_npu: 20765 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 20765 - precision: uint16 - primary_compute_unit: NPU - job_id: "null" - job_status: Passed -- name: Llama3-PromptProcessor-Quantized - performance_metrics: - - reference_device_info: - name: QCS8550 (Proxy) - os: '12' - form_factor: Iot - os_name: Android - manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-10-05T00:17:58.707236Z' - torchscript_onnx_qnn: - inference_time: 1807176 - throughput: 566.63 - estimated_peak_memory_range: - min: 11788288 - max: 13357640 - layer_info: - layers_on_npu: 20248 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 20248 - precision: uint16 - primary_compute_unit: NPU - job_id: 'null' - job_status: Passed - - reference_device_info: - name: Samsung Galaxy S24 - os: '14' - form_factor: Phone - os_name: Android - manufacturer: Samsung - chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-06-11T00:34:02.549319Z' - torchscript_onnx_qnn: - inference_time: 1316502 - throughput: 781.67 - estimated_peak_memory_range: - min: 12288 - max: 1026895408 - layer_info: - layers_on_npu: 20248 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 20248 - precision: uint16 - primary_compute_unit: NPU - job_id: "null" - job_status: Passed - - reference_device_info: + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-16T00:32:42.210701Z' + - torchscript_onnx_qnn: + llm_metrics: + time_to_first_token_range: + min: 211644 + max: 6772608 + tokens_per_second: 10.0367 + evaluation_metrics: null + reference_device_info: name: Snapdragon X Elite CRD os: '11' form_factor: Compute os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-06-12T00:34:02.549319Z' - torchscript_onnx_qnn: - inference_time: 1668294 - throughput: 613.83 - estimated_peak_memory_range: - min: 10801152 - max: 10801152 - layer_info: - layers_on_npu: 20248 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 20248 - precision: uint16 - primary_compute_unit: NPU - job_id: "null" - job_status: Passed -aggregated: - supported_devices: - - Samsung Galaxy S23 Ultra - - Samsung Galaxy S24 - - Snapdragon X Elite CRD - supported_oses: - - Android - supported_chipsets: - - Snapdragon® 8 Gen 2 - - Snapdragon® 8 Gen 3 - - Snapdragon® X Elite - performance_metrics: - - reference_device_info: - name: Samsung Galaxy S23 Ultra - os: '13' - form_factor: Phone - os_name: Android - manufacturer: Samsung - chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-01-26T00:34:02.549319Z' - torchscript_onnx_qnn: - inference_time: 117423.0 - throughput: 8.5 - estimated_peak_memory_range: - min: 68579328 - max: 73044264 - precision: uint16 - primary_compute_unit: NPU - job_id: "" - job_status: Passed + timestamp: '2024-10-16T00:32:42.210701Z' diff --git a/qai_hub_models/models/llama_v3_8b_chat_quantized/requirements.txt b/qai_hub_models/models/llama_v3_8b_chat_quantized/requirements.txt index 10e857b7..c5deadcc 100644 --- a/qai_hub_models/models/llama_v3_8b_chat_quantized/requirements.txt +++ b/qai_hub_models/models/llama_v3_8b_chat_quantized/requirements.txt @@ -1,3 +1,5 @@ -transformers==4.40.0 +onnx==1.16.2 +transformers==4.45.0 +huggingface_hub==0.23.2 sentencepiece==0.2.0 psutil diff --git a/qai_hub_models/models/mediapipe_face/README.md b/qai_hub_models/models/mediapipe_face/README.md index 1aeb2a40..9d82ada2 100644 --- a/qai_hub_models/models/mediapipe_face/README.md +++ b/qai_hub_models/models/mediapipe_face/README.md @@ -6,7 +6,7 @@ Designed for sub-millisecond processing, this model predicts bounding boxes and pose skeletons (left eye, right eye, nose tip, mouth, left eye tragion, and right eye tragion) of faces in an image. This is based on the implementation of MediaPipe-Face-Detection found -[here](https://github.com/zmurez/MediaPipePyTorch/). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/mediapipe_face). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.mediapipe_face.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of MediaPipe-Face-Detection can be found +* The license for the original implementation of MediaPipe-Face-Detection can be found [here](https://github.com/zmurez/MediaPipePyTorch/blob/master/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs](https://arxiv.org/abs/1907.05047) * [Source Model Implementation](https://github.com/zmurez/MediaPipePyTorch/) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/mediapipe_face/export.py b/qai_hub_models/models/mediapipe_face/export.py index 3ed353ab..e9be9d54 100644 --- a/qai_hub_models/models/mediapipe_face/export.py +++ b/qai_hub_models/models/mediapipe_face/export.py @@ -10,14 +10,15 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Mapping, Optional, Tuple, cast +from typing import Any, Dict, List, Mapping, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.mediapipe_face import Model from qai_hub_models.utils.args import export_parser, get_model_kwargs -from qai_hub_models.utils.base_model import BaseModel, TargetRuntime +from qai_hub_models.utils.base_model import BaseModel from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -45,20 +46,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Mapping[ - str, Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] -] | List[str]: +) -> Mapping[str, ExportResult] | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -83,10 +82,10 @@ def export_model( `model_cls.from_pretrained` Returns: - A Mapping from component_name to a 3-tuple of: + A Mapping from component_name to a struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "mediapipe_face" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -118,7 +117,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) components_dict: Dict[str, BaseModel] = {} if "MediaPipeFaceDetector" in components: @@ -135,7 +134,7 @@ def export_model( component.to("cpu"), make_torch_inputs(input_spec) ) - # 2. Compile the models to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = component.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -151,7 +150,7 @@ def export_model( hub.client.CompileJob, submitted_compile_job ) - # 3. Profile the model assets on real devices + # 3. Profiles the model performance on a real device profile_jobs: Dict[str, hub.client.ProfileJob] = {} if not skip_profiling: for component_name in components: @@ -169,7 +168,7 @@ def export_model( hub.client.ProfileJob, submitted_profile_job ) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_jobs: Dict[str, hub.client.InferenceJob] = {} if not skip_inferencing: for component_name in components: @@ -193,14 +192,14 @@ def export_model( hub.client.InferenceJob, submitted_inference_job ) - # 5. Download the model assets to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) for component_name, compile_job in compile_jobs.items(): target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / component_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: for component_name in components: profile_job = profile_jobs[component_name] @@ -225,10 +224,10 @@ def export_model( ) return { - component_name: ( - compile_jobs[component_name], - profile_jobs.get(component_name, None), - inference_jobs.get(component_name, None), + component_name: ExportResult( + compile_job=compile_jobs[component_name], + inference_job=inference_jobs.get(component_name, None), + profile_job=profile_jobs.get(component_name, None), ) for component_name in components } diff --git a/qai_hub_models/models/mediapipe_face/perf.yaml b/qai_hub_models/models/mediapipe_face/perf.yaml index ec65f38f..626156cb 100644 --- a/qai_hub_models/models/mediapipe_face/perf.yaml +++ b/qai_hub_models/models/mediapipe_face/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: MediaPipeFaceDetector performance_metrics: - torchscript_onnx_tflite: - inference_time: 552.0 - throughput: 1811.5942028985507 + inference_time: 549.0 + throughput: 1821.4936247723133 estimated_peak_memory_range: - min: 12288 - max: 1453472 + min: 24576 + max: 1451224 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 111 - job_id: jogkzllog + job_id: jg9lnm9qg job_status: Passed torchscript_onnx_qnn: - inference_time: 622.0 - throughput: 1607.717041800643 + inference_time: 626.0 + throughput: 1597.444089456869 estimated_peak_memory_range: - min: 28672 - max: 5194512 + min: 806912 + max: 5865936 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 146 - job_id: jz57zjvrp + job_id: jgo26rm4p job_status: Passed torchscript_onnx: - inference_time: 1042.0 - throughput: 959.6928982725528 + inference_time: 1003.0 + throughput: 997.0089730807578 estimated_peak_memory_range: - min: 393216 - max: 3493720 + min: 12288 + max: 77735904 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: j1pv31zm5 + job_id: jp0z0jde5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:19:38Z' + timestamp: '2024-10-15T00:17:42Z' - torchscript_onnx_tflite: - inference_time: 440.0 - throughput: 2272.7272727272725 + inference_time: 450.0 + throughput: 2222.222222222222 estimated_peak_memory_range: min: 12288 - max: 34769872 + max: 35502640 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 111 - job_id: j1gln00lp + job_id: jgdx137kp job_status: Passed torchscript_onnx_qnn: - inference_time: 549.0 - throughput: 1821.4936247723133 + inference_time: 502.0 + throughput: 1992.03187250996 estimated_peak_memory_range: min: 802816 - max: 17389312 + max: 15520064 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 146 - job_id: j0pxv7e9g + job_id: jgjvn717g job_status: Passed torchscript_onnx: - inference_time: 836.0 - throughput: 1196.1722488038276 + inference_time: 810.0 + throughput: 1234.567901234568 estimated_peak_memory_range: min: 0 - max: 39429984 + max: 40477728 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: jlpe9r40g + job_id: jgkex4oog job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:19:39Z' + timestamp: '2024-10-15T00:17:44Z' - torchscript_onnx_tflite: - inference_time: 542.0 - throughput: 1845.018450184502 + inference_time: 546.0 + throughput: 1831.5018315018315 estimated_peak_memory_range: min: 12288 - max: 1257848 + max: 75495088 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 111 - job_id: j1p3k44z5 + job_id: jp4lr1jq5 job_status: Passed torchscript_onnx_qnn: - inference_time: 593.0 - throughput: 1686.3406408094436 + inference_time: 599.0 + throughput: 1669.449081803005 estimated_peak_memory_range: - min: 811008 - max: 2063984 + min: 819200 + max: 2153864 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 146 - job_id: jep2873mp + job_id: jg9lnm8qg job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:19:29Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:17:29Z' - torchscript_onnx_tflite: - inference_time: 749.0 - throughput: 1335.1134846461948 + inference_time: 549.0 + throughput: 1821.4936247723133 estimated_peak_memory_range: min: 12288 - max: 31840544 + max: 75878936 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 111 - job_id: j1pv311m5 + job_id: jp8qyx8zp job_status: Passed torchscript_onnx_qnn: - inference_time: 825.0 - throughput: 1212.121212121212 + inference_time: 602.0 + throughput: 1661.1295681063123 estimated_peak_memory_range: - min: 802816 - max: 17311184 + min: 827392 + max: 2353288 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 146 - job_id: j1p3k4qz5 + job_id: jgdx130lp job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:19:36Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:17:34Z' - torchscript_onnx_tflite: - inference_time: 550.0 - throughput: 1818.1818181818182 + inference_time: 549.0 + throughput: 1821.4936247723133 estimated_peak_memory_range: - min: 24576 - max: 1340576 + min: 20480 + max: 1445560 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 111 - job_id: jlpe9rr0g + job_id: jpy13xnrp job_status: Passed torchscript_onnx_qnn: - inference_time: 602.0 - throughput: 1661.1295681063123 + inference_time: 612.0 + throughput: 1633.986928104575 estimated_peak_memory_range: - min: 831488 - max: 2071336 + min: 823296 + max: 2118152 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 146 - job_id: j2p0y1eeg + job_id: jg9lnm8vg job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:19:30Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:17:33Z' - torchscript_onnx_tflite: inference_time: 547.0 throughput: 1828.1535648994516 estimated_peak_memory_range: - min: 12288 - max: 4834208 + min: 65536 + max: 1466768 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 111 - job_id: jz5woddjp + job_id: jprv309vg job_status: Passed torchscript_onnx_qnn: - inference_time: 604.0 - throughput: 1655.6291390728477 + inference_time: 613.0 + throughput: 1631.3213703099511 estimated_peak_memory_range: min: 819200 - max: 2065576 + max: 2122200 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 146 - job_id: jogkzlrog + job_id: jgdx130kp job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:19:32Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:17:31Z' - torchscript_onnx_tflite: - inference_time: 552.0 - throughput: 1811.5942028985507 + inference_time: 763.0 + throughput: 1310.615989515072 estimated_peak_memory_range: - min: 20480 - max: 1332120 + min: 77824 + max: 32675344 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 111 - job_id: jnp10ddl5 + job_id: j5mnxmvyp job_status: Passed torchscript_onnx_qnn: - inference_time: 608.0 - throughput: 1644.7368421052631 + inference_time: 830.0 + throughput: 1204.8192771084337 estimated_peak_memory_range: - min: 827392 - max: 2271272 + min: 802816 + max: 17538896 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 146 - job_id: j1gln0elp + job_id: jgn6vnom5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:17:38Z' + - torchscript_onnx_tflite: + inference_time: 410.0 + throughput: 2439.0243902439024 + estimated_peak_memory_range: + min: 8192 + max: 25089424 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 111 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 111 + job_id: j56y47vvp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 380.0 + throughput: 2631.5789473684213 + estimated_peak_memory_range: + min: 798720 + max: 13748800 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 146 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 146 + job_id: jp2kyw4mp + job_status: Passed + torchscript_onnx: + inference_time: 752.0 + throughput: 1329.787234042553 + estimated_peak_memory_range: + min: 0 + max: 29157136 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 147 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 147 + job_id: jpv6kdem5 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:19:34Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:17:49Z' - torchscript_onnx_qnn: - inference_time: 766.0 - throughput: 1305.4830287206266 + inference_time: 804.0 + throughput: 1243.7810945273632 estimated_peak_memory_range: min: 786432 max: 786432 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 146 - job_id: jegn29rmg + job_id: jgz3dmwz5 job_status: Passed torchscript_onnx: - inference_time: 1076.0 - throughput: 929.368029739777 + inference_time: 1040.0 + throughput: 961.5384615384615 estimated_peak_memory_range: - min: 1908736 - max: 1908736 + min: 2031616 + max: 2031616 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: jmg9v39v5 + job_id: jglvmxol5 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,15 +429,15 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:19:41Z' + timestamp: '2024-10-15T00:17:46Z' - name: MediaPipeFaceLandmarkDetector performance_metrics: - torchscript_onnx_tflite: - inference_time: 193.0 - throughput: 5181.347150259067 + inference_time: 190.0 + throughput: 5263.1578947368425 estimated_peak_memory_range: - min: 36864 - max: 4354488 + min: 20480 + max: 1544456 primary_compute_unit: NPU precision: fp16 layer_info: @@ -394,14 +445,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 100 - job_id: jn5q877m5 + job_id: jp14zjqkp job_status: Passed torchscript_onnx_qnn: - inference_time: 277.0 - throughput: 3610.1083032490974 + inference_time: 279.0 + throughput: 3584.2293906810037 estimated_peak_memory_range: - min: 475136 - max: 8373224 + min: 458752 + max: 7823360 primary_compute_unit: NPU precision: fp16 layer_info: @@ -409,14 +460,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 105 - job_id: jqp4qxjlg + job_id: jpv6kd475 job_status: Passed torchscript_onnx: - inference_time: 503.0 - throughput: 1988.0715705765408 + inference_time: 506.0 + throughput: 1976.2845849802372 estimated_peak_memory_range: - min: 24576 - max: 1592672 + min: 12288 + max: 1609632 primary_compute_unit: NPU precision: fp16 layer_info: @@ -424,7 +475,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 106 - job_id: j7gjx0k8p + job_id: jp8qyx68p job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -433,13 +484,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:19:38Z' + timestamp: '2024-10-15T00:17:42Z' - torchscript_onnx_tflite: - inference_time: 153.0 - throughput: 6535.9477124183 + inference_time: 144.0 + throughput: 6944.444444444444 estimated_peak_memory_range: - min: 12288 - max: 30067152 + min: 16384 + max: 30559200 primary_compute_unit: NPU precision: fp16 layer_info: @@ -447,14 +498,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 100 - job_id: jw5663375 + job_id: j57yr4vq5 job_status: Passed torchscript_onnx_qnn: - inference_time: 216.0 - throughput: 4629.62962962963 + inference_time: 213.0 + throughput: 4694.835680751174 estimated_peak_memory_range: - min: 0 - max: 10297152 + min: 458752 + max: 12063568 primary_compute_unit: NPU precision: fp16 layer_info: @@ -462,14 +513,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 105 - job_id: jo5mrwvqg + job_id: jpedmz275 job_status: Passed torchscript_onnx: - inference_time: 401.0 - throughput: 2493.7655860349128 + inference_time: 402.0 + throughput: 2487.5621890547263 estimated_peak_memory_range: min: 0 - max: 33171120 + max: 33128640 primary_compute_unit: NPU precision: fp16 layer_info: @@ -477,7 +528,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 106 - job_id: jygzexv6g + job_id: j5q6qyzmp job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -486,13 +537,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:19:40Z' + timestamp: '2024-10-15T00:17:44Z' - torchscript_onnx_tflite: - inference_time: 192.0 - throughput: 5208.333333333333 + inference_time: 190.0 + throughput: 5263.1578947368425 estimated_peak_memory_range: - min: 32768 - max: 8564216 + min: 12288 + max: 1368704 primary_compute_unit: NPU precision: fp16 layer_info: @@ -500,14 +551,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 100 - job_id: jwgoy11d5 + job_id: jpxko4ej5 job_status: Passed torchscript_onnx_qnn: - inference_time: 275.0 - throughput: 3636.3636363636365 + inference_time: 274.0 + throughput: 3649.6350364963505 estimated_peak_memory_range: - min: 516096 - max: 1751168 + min: 0 + max: 1780184 primary_compute_unit: NPU precision: fp16 layer_info: @@ -515,7 +566,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 105 - job_id: jqpye4v4g + job_id: jp14zj3kp job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -523,14 +574,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:19:29Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:17:29Z' - torchscript_onnx_tflite: - inference_time: 279.0 - throughput: 3584.2293906810037 + inference_time: 193.0 + throughput: 5181.347150259067 estimated_peak_memory_range: min: 20480 - max: 30064256 + max: 17894512 primary_compute_unit: NPU precision: fp16 layer_info: @@ -538,14 +589,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 100 - job_id: j7gjx008p + job_id: jgkex4dyg job_status: Passed torchscript_onnx_qnn: - inference_time: 377.0 - throughput: 2652.5198938992044 + inference_time: 275.0 + throughput: 3636.3636363636365 estimated_peak_memory_range: - min: 458752 - max: 14833504 + min: 475136 + max: 1729048 primary_compute_unit: NPU precision: fp16 layer_info: @@ -553,22 +604,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 105 - job_id: jwgoy1ed5 + job_id: j57yr4kr5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:19:36Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:17:35Z' - torchscript_onnx_tflite: - inference_time: 189.0 - throughput: 5291.005291005291 + inference_time: 194.0 + throughput: 5154.639175257732 estimated_peak_memory_range: - min: 32768 - max: 1780696 + min: 16384 + max: 1366016 primary_compute_unit: NPU precision: fp16 layer_info: @@ -576,14 +627,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 100 - job_id: jygzexx6g + job_id: jp0z0jk25 job_status: Passed torchscript_onnx_qnn: - inference_time: 276.0 - throughput: 3623.1884057971015 + inference_time: 277.0 + throughput: 3610.1083032490974 estimated_peak_memory_range: - min: 471040 - max: 1620408 + min: 466944 + max: 1599728 primary_compute_unit: NPU precision: fp16 layer_info: @@ -591,22 +642,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 105 - job_id: j1p8o3w8g + job_id: jp14zj3lp job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:19:31Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:17:33Z' - torchscript_onnx_tflite: - inference_time: 192.0 - throughput: 5208.333333333333 + inference_time: 194.0 + throughput: 5154.639175257732 estimated_peak_memory_range: - min: 32768 - max: 12650104 + min: 28672 + max: 77986736 primary_compute_unit: NPU precision: fp16 layer_info: @@ -614,14 +665,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 100 - job_id: jmg9v33v5 + job_id: jp2kywjxp job_status: Passed torchscript_onnx_qnn: - inference_time: 276.0 - throughput: 3623.1884057971015 + inference_time: 273.0 + throughput: 3663.003663003663 estimated_peak_memory_range: - min: 483328 - max: 2200040 + min: 466944 + max: 1825032 primary_compute_unit: NPU precision: fp16 layer_info: @@ -629,22 +680,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 105 - job_id: jn5q879m5 + job_id: j5we67xj5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:19:33Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:17:31Z' - torchscript_onnx_tflite: - inference_time: 191.0 - throughput: 5235.602094240838 + inference_time: 283.0 + throughput: 3533.5689045936397 estimated_peak_memory_range: - min: 28672 - max: 9401336 + min: 20480 + max: 30559248 primary_compute_unit: NPU precision: fp16 layer_info: @@ -652,14 +703,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 100 - job_id: jvgdwrrl5 + job_id: jgn6vnxv5 job_status: Passed torchscript_onnx_qnn: - inference_time: 281.0 - throughput: 3558.7188612099644 + inference_time: 382.0 + throughput: 2617.801047120419 estimated_peak_memory_range: - min: 471040 - max: 2095608 + min: 0 + max: 14615040 primary_compute_unit: NPU precision: fp16 layer_info: @@ -667,19 +718,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 105 - job_id: jw5663q75 + job_id: jprv30oeg job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:17:39Z' + - torchscript_onnx_tflite: + inference_time: 122.0 + throughput: 8196.72131147541 + estimated_peak_memory_range: + min: 0 + max: 19055568 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 100 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 100 + job_id: jp3j098xg + job_status: Passed + torchscript_onnx_qnn: + inference_time: 207.0 + throughput: 4830.917874396136 + estimated_peak_memory_range: + min: 458752 + max: 10560912 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 105 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 105 + job_id: jpy13xq4p + job_status: Passed + torchscript_onnx: + inference_time: 409.0 + throughput: 2444.987775061125 + estimated_peak_memory_range: + min: 0 + max: 19005648 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 106 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 106 + job_id: jgjvn7o8g + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:19:35Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:17:49Z' - torchscript_onnx_qnn: - inference_time: 376.0 - throughput: 2659.574468085106 + inference_time: 383.0 + throughput: 2610.9660574412533 estimated_peak_memory_range: min: 442368 max: 442368 @@ -690,14 +794,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 105 - job_id: joprk41e5 + job_id: j5we67xz5 job_status: Passed torchscript_onnx: - inference_time: 509.0 - throughput: 1964.6365422396857 + inference_time: 512.0 + throughput: 1953.125 estimated_peak_memory_range: - min: 1847296 - max: 1847296 + min: 1884160 + max: 1884160 primary_compute_unit: NPU precision: fp16 layer_info: @@ -705,7 +809,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 106 - job_id: jnp10dql5 + job_id: j56y47r7p job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -714,4 +818,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:19:41Z' + timestamp: '2024-10-15T00:17:46Z' diff --git a/qai_hub_models/models/mediapipe_face_quantized/README.md b/qai_hub_models/models/mediapipe_face_quantized/README.md index 72472470..46972b73 100644 --- a/qai_hub_models/models/mediapipe_face_quantized/README.md +++ b/qai_hub_models/models/mediapipe_face_quantized/README.md @@ -6,7 +6,7 @@ Designed for sub-millisecond processing, this model predicts bounding boxes and pose skeletons (left eye, right eye, nose tip, mouth, left eye tragion, and right eye tragion) of faces in an image. This is based on the implementation of MediaPipe-Face-Detection-Quantized found -[here](https://github.com/zmurez/MediaPipePyTorch/). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/mediapipe_face_quantized). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.mediapipe_face_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of MediaPipe-Face-Detection-Quantized can be found +* The license for the original implementation of MediaPipe-Face-Detection-Quantized can be found [here](https://github.com/zmurez/MediaPipePyTorch/blob/master/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs](https://arxiv.org/abs/1907.05047) * [Source Model Implementation](https://github.com/zmurez/MediaPipePyTorch/) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/mediapipe_face_quantized/export.py b/qai_hub_models/models/mediapipe_face_quantized/export.py index 58c80292..a60c31fc 100644 --- a/qai_hub_models/models/mediapipe_face_quantized/export.py +++ b/qai_hub_models/models/mediapipe_face_quantized/export.py @@ -10,13 +10,14 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Mapping, Optional, Tuple, cast +from typing import Any, Dict, List, Mapping, Optional, cast import qai_hub as hub +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.mediapipe_face_quantized import Model from qai_hub_models.utils.args import export_parser, get_model_kwargs -from qai_hub_models.utils.base_model import BaseModel, TargetRuntime +from qai_hub_models.utils.base_model import BaseModel from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.printing import ( print_inference_metrics, @@ -43,20 +44,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Mapping[ - str, Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] -] | List[str]: +) -> Mapping[str, ExportResult] | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -81,10 +80,10 @@ def export_model( `model_cls.from_pretrained` Returns: - A Mapping from component_name to a 3-tuple of: + A Mapping from component_name to a struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "mediapipe_face_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -116,7 +115,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) components_dict: Dict[str, BaseModel] = {} if "MediaPipeFaceDetector" in components: @@ -133,7 +132,7 @@ def export_model( target_runtime, output_path, input_spec ) - # 2. Compile the models to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = component.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -150,7 +149,7 @@ def export_model( hub.client.CompileJob, submitted_compile_job ) - # 3. Profile the model assets on real devices + # 3. Profiles the model performance on a real device profile_jobs: Dict[str, hub.client.ProfileJob] = {} if not skip_profiling: for component_name in components: @@ -168,7 +167,7 @@ def export_model( hub.client.ProfileJob, submitted_profile_job ) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_jobs: Dict[str, hub.client.InferenceJob] = {} if not skip_inferencing: for component_name in components: @@ -192,14 +191,14 @@ def export_model( hub.client.InferenceJob, submitted_inference_job ) - # 5. Download the model assets to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) for component_name, compile_job in compile_jobs.items(): target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / component_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: for component_name in components: profile_job = profile_jobs[component_name] @@ -224,10 +223,10 @@ def export_model( ) return { - component_name: ( - compile_jobs[component_name], - profile_jobs.get(component_name, None), - inference_jobs.get(component_name, None), + component_name: ExportResult( + compile_job=compile_jobs[component_name], + inference_job=inference_jobs.get(component_name, None), + profile_job=profile_jobs.get(component_name, None), ) for component_name in components } diff --git a/qai_hub_models/models/mediapipe_face_quantized/perf.yaml b/qai_hub_models/models/mediapipe_face_quantized/perf.yaml index 7f813609..236a7a48 100644 --- a/qai_hub_models/models/mediapipe_face_quantized/perf.yaml +++ b/qai_hub_models/models/mediapipe_face_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,44 +20,41 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS6490 (Proxy) - RB3 Gen 2 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Sa8775p Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Qcs6490 Proxy + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: MediaPipeFaceDetector performance_metrics: - torchscript_onnx_tflite: - inference_time: 274.0 - throughput: 3649.6350364963505 + inference_time: 275.0 + throughput: 3636.3636363636365 estimated_peak_memory_range: - min: 12288 - max: 1420480 + min: 36864 + max: 1335848 primary_compute_unit: NPU precision: fp16 layer_info: @@ -64,22 +62,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 121 - job_id: j1p3k4ex5 + job_id: j5mnxm3yp job_status: Passed torchscript_onnx_qnn: - inference_time: 300.0 - throughput: 3333.3333333333335 + inference_time: 297.0 + throughput: 3367.003367003367 estimated_peak_memory_range: - min: 28672 - max: 5606824 + min: 16384 + max: 76721656 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 118 + layers_on_npu: 151 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 118 - job_id: j2p0y1r2g + total_layers: 151 + job_id: j5mnxmzyp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -88,13 +86,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:18:26Z' + timestamp: '2024-10-15T00:16:10Z' - torchscript_onnx_tflite: - inference_time: 196.0 - throughput: 5102.040816326531 + inference_time: 182.0 + throughput: 5494.505494505494 estimated_peak_memory_range: min: 12288 - max: 33583568 + max: 33570912 primary_compute_unit: NPU precision: fp16 layer_info: @@ -102,22 +100,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 121 - job_id: j1pv31v75 + job_id: jprv30yvg job_status: Passed torchscript_onnx_qnn: - inference_time: 211.0 - throughput: 4739.336492890995 + inference_time: 236.0 + throughput: 4237.28813559322 estimated_peak_memory_range: - min: 204800 - max: 18012192 + min: 208896 + max: 20242528 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 118 + layers_on_npu: 151 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 118 - job_id: jogkzlyyg + total_layers: 151 + job_id: jp2kyw7xp job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -126,13 +124,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:18:28Z' + timestamp: '2024-10-15T00:16:12Z' - torchscript_onnx_tflite: - inference_time: 273.0 - throughput: 3663.003663003663 + inference_time: 681.0 + throughput: 1468.4287812041116 estimated_peak_memory_range: min: 12288 - max: 1265400 + max: 26145872 primary_compute_unit: NPU precision: fp16 layer_info: @@ -140,37 +138,60 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 121 - job_id: jlpe9rk7g + job_id: jg9lnm4qg job_status: Passed torchscript_onnx_qnn: - inference_time: 303.0 - throughput: 3300.3300330033003 + inference_time: 762.0 + throughput: 1312.3359580052493 estimated_peak_memory_range: - min: 278528 - max: 1444976 + min: 12288 + max: 8048944 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 118 + layers_on_npu: 151 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 118 - job_id: j1p3k4mx5 + total_layers: 151 + job_id: jp14zjdkp job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:18:32Z' + chipset: QCS6490 Proxy + timestamp: '2024-10-15T00:16:28Z' - torchscript_onnx_tflite: - inference_time: 322.0 - throughput: 3105.590062111801 + inference_time: 5031.0 + throughput: 198.76764062810574 + estimated_peak_memory_range: + min: 28672 + max: 5636744 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 121 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 121 + job_id: jgdx13vkp + job_status: Passed + reference_device_info: + name: RB5 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8250 Proxy + timestamp: '2024-10-15T00:16:05Z' + - torchscript_onnx_tflite: + inference_time: 273.0 + throughput: 3663.003663003663 estimated_peak_memory_range: min: 12288 - max: 33420464 + max: 1499376 primary_compute_unit: NPU precision: fp16 layer_info: @@ -178,37 +199,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 121 - job_id: jz5wodqzp + job_id: jp0z0jr25 job_status: Passed torchscript_onnx_qnn: - inference_time: 352.0 - throughput: 2840.909090909091 + inference_time: 301.0 + throughput: 3322.2591362126245 estimated_peak_memory_range: - min: 208896 - max: 20119952 + min: 229376 + max: 1570912 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 118 + layers_on_npu: 151 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 118 - job_id: jnp10d8k5 + total_layers: 151 + job_id: jgkex4lyg job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:18:39Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:16:17Z' - torchscript_onnx_tflite: - inference_time: 276.0 - throughput: 3623.1884057971015 + inference_time: 279.0 + throughput: 3584.2293906810037 estimated_peak_memory_range: min: 12288 - max: 1522664 + max: 73232952 primary_compute_unit: NPU precision: fp16 layer_info: @@ -216,37 +237,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 121 - job_id: jnp10dek5 + job_id: jpv6kdw75 job_status: Passed torchscript_onnx_qnn: - inference_time: 305.0 - throughput: 3278.688524590164 + inference_time: 303.0 + throughput: 3300.3300330033003 estimated_peak_memory_range: - min: 221184 - max: 1508064 + min: 237568 + max: 1460392 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 118 + layers_on_npu: 151 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 118 - job_id: j1pv31w75 + total_layers: 151 + job_id: jpv6kd175 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:18:34Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:16:23Z' - torchscript_onnx_tflite: - inference_time: 273.0 - throughput: 3663.003663003663 + inference_time: 274.0 + throughput: 3649.6350364963505 estimated_peak_memory_range: min: 12288 - max: 1533688 + max: 1308920 primary_compute_unit: NPU precision: fp16 layer_info: @@ -254,22 +275,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 121 - job_id: jz57zjxqp + job_id: jp3j09mxg job_status: Passed torchscript_onnx_qnn: - inference_time: 302.0 - throughput: 3311.2582781456954 + inference_time: 303.0 + throughput: 3300.3300330033003 estimated_peak_memory_range: - min: 225280 - max: 1933896 + min: 229376 + max: 1455680 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 118 + layers_on_npu: 151 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 118 - job_id: jlpe9rv7g + total_layers: 151 + job_id: jp3j094xg job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -277,14 +298,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:18:36Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:16:21Z' - torchscript_onnx_tflite: - inference_time: 278.0 - throughput: 3597.122302158273 + inference_time: 272.0 + throughput: 3676.470588235294 estimated_peak_memory_range: min: 12288 - max: 1478072 + max: 2358944 primary_compute_unit: NPU precision: fp16 layer_info: @@ -292,37 +313,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 121 - job_id: j0pxv7yjg + job_id: jglvmxke5 job_status: Passed torchscript_onnx_qnn: - inference_time: 303.0 - throughput: 3300.3300330033003 + inference_time: 301.0 + throughput: 3322.2591362126245 estimated_peak_memory_range: - min: 225280 - max: 1466272 + min: 229376 + max: 1461496 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 118 + layers_on_npu: 151 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 118 - job_id: jz5wod9zp + total_layers: 151 + job_id: jglvmx0e5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:18:37Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:16:19Z' - torchscript_onnx_tflite: - inference_time: 789.0 - throughput: 1267.427122940431 + inference_time: 324.0 + throughput: 3086.41975308642 estimated_peak_memory_range: min: 24576 - max: 26053904 + max: 34618272 primary_compute_unit: NPU precision: fp16 layer_info: @@ -330,37 +351,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 121 - job_id: jegn29evg + job_id: jgkex4yyg job_status: Passed torchscript_onnx_qnn: - inference_time: 747.0 - throughput: 1338.6880856760374 + inference_time: 350.0 + throughput: 2857.1428571428573 estimated_peak_memory_range: - min: 12288 - max: 8071440 + min: 208896 + max: 19975968 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 118 + layers_on_npu: 151 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 118 - job_id: jz5wod9jp + total_layers: 151 + job_id: j5we67dz5 job_status: Passed reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T12:18:41Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:16:26Z' - torchscript_onnx_tflite: - inference_time: 4996.0 - throughput: 200.160128102482 + inference_time: 202.0 + throughput: 4950.495049504951 estimated_peak_memory_range: - min: 40960 - max: 6796896 + min: 8192 + max: 24408816 primary_compute_unit: NPU precision: fp16 layer_info: @@ -368,30 +389,45 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 121 - job_id: jep287mxp + job_id: jp4lr1wq5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 207.0 + throughput: 4830.917874396136 + estimated_peak_memory_range: + min: 208896 + max: 15580480 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 151 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 151 + job_id: j57yr4jq5 job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T12:18:24Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:16:30Z' - torchscript_onnx_qnn: - inference_time: 430.0 - throughput: 2325.5813953488373 + inference_time: 427.0 + throughput: 2341.92037470726 estimated_peak_memory_range: - min: 475136 - max: 475136 + min: 552960 + max: 552960 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 118 + layers_on_npu: 151 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 118 - job_id: j1gln0kep + total_layers: 151 + job_id: jp0z0j125 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -400,15 +436,15 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:18:30Z' + timestamp: '2024-10-15T00:16:15Z' - name: MediaPipeFaceLandmarkDetector performance_metrics: - torchscript_onnx_tflite: - inference_time: 184.0 - throughput: 5434.782608695652 + inference_time: 180.0 + throughput: 5555.555555555556 estimated_peak_memory_range: - min: 12288 - max: 16939576 + min: 20480 + max: 71074632 primary_compute_unit: NPU precision: fp16 layer_info: @@ -416,14 +452,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 117 - job_id: jwgoy1345 + job_id: jgn6vnev5 job_status: Passed torchscript_onnx_qnn: - inference_time: 219.0 - throughput: 4566.2100456621 + inference_time: 226.0 + throughput: 4424.778761061947 estimated_peak_memory_range: - min: 139264 - max: 3531368 + min: 24576 + max: 3226816 primary_compute_unit: NPU precision: fp16 layer_info: @@ -431,7 +467,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 112 - job_id: j1p8o37zg + job_id: jgn6vn9v5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -440,13 +476,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:18:26Z' + timestamp: '2024-10-15T00:16:10Z' - torchscript_onnx_tflite: - inference_time: 127.0 - throughput: 7874.0157480314965 + inference_time: 142.0 + throughput: 7042.2535211267605 estimated_peak_memory_range: min: 12288 - max: 28082720 + max: 27584640 primary_compute_unit: NPU precision: fp16 layer_info: @@ -454,14 +490,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 117 - job_id: j7gjx0e7p + job_id: jp2kywmxp job_status: Passed torchscript_onnx_qnn: - inference_time: 163.0 - throughput: 6134.9693251533745 + inference_time: 166.0 + throughput: 6024.096385542169 estimated_peak_memory_range: - min: 0 - max: 14810512 + min: 126976 + max: 12796320 primary_compute_unit: NPU precision: fp16 layer_info: @@ -469,7 +505,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 112 - job_id: jn5q87275 + job_id: jpy13x4rp job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -478,13 +514,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:18:28Z' + timestamp: '2024-10-15T00:16:13Z' - torchscript_onnx_tflite: - inference_time: 185.0 - throughput: 5405.405405405405 + inference_time: 395.0 + throughput: 2531.6455696202534 estimated_peak_memory_range: - min: 24576 - max: 1349144 + min: 12288 + max: 19759872 primary_compute_unit: NPU precision: fp16 layer_info: @@ -492,14 +528,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 117 - job_id: jygzexrzg + job_id: jp14zj8kp job_status: Passed torchscript_onnx_qnn: - inference_time: 215.0 - throughput: 4651.162790697675 + inference_time: 490.0 + throughput: 2040.8163265306123 estimated_peak_memory_range: - min: 143360 - max: 1461008 + min: 16384 + max: 8039360 primary_compute_unit: NPU precision: fp16 layer_info: @@ -507,22 +543,45 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 112 - job_id: jwgoy1v45 + job_id: jgdx13rkp job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:18:32Z' + chipset: QCS6490 Proxy + timestamp: '2024-10-15T00:16:29Z' - torchscript_onnx_tflite: - inference_time: 226.0 - throughput: 4424.778761061947 + inference_time: 2921.0 + throughput: 342.3485107839781 + estimated_peak_memory_range: + min: 12288 + max: 6971816 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 117 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 117 + job_id: j57yr4dq5 + job_status: Passed + reference_device_info: + name: RB5 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8250 Proxy + timestamp: '2024-10-15T00:16:06Z' + - torchscript_onnx_tflite: + inference_time: 182.0 + throughput: 5494.505494505494 estimated_peak_memory_range: min: 12288 - max: 28702592 + max: 3054504 primary_compute_unit: NPU precision: fp16 layer_info: @@ -530,14 +589,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 117 - job_id: jmg9v3wq5 + job_id: jp8qyx7zp job_status: Passed torchscript_onnx_qnn: - inference_time: 260.0 - throughput: 3846.153846153846 + inference_time: 212.0 + throughput: 4716.981132075472 estimated_peak_memory_range: - min: 126976 - max: 14818800 + min: 0 + max: 1741232 primary_compute_unit: NPU precision: fp16 layer_info: @@ -545,22 +604,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 112 - job_id: jvgdwrvk5 + job_id: j5q6qy77p job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:18:40Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:16:17Z' - torchscript_onnx_tflite: - inference_time: 190.0 - throughput: 5263.1578947368425 + inference_time: 185.0 + throughput: 5405.405405405405 estimated_peak_memory_range: - min: 45056 - max: 58132840 + min: 28672 + max: 1437576 primary_compute_unit: NPU precision: fp16 layer_info: @@ -568,14 +627,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 117 - job_id: jvgdwrok5 + job_id: jgjvn7l7g job_status: Passed torchscript_onnx_qnn: - inference_time: 221.0 - throughput: 4524.886877828054 + inference_time: 216.0 + throughput: 4629.62962962963 estimated_peak_memory_range: - min: 135168 - max: 1401824 + min: 188416 + max: 1447600 primary_compute_unit: NPU precision: fp16 layer_info: @@ -583,22 +642,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 112 - job_id: j7gjx0l7p + job_id: jgjvn707g job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:18:34Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:16:23Z' - torchscript_onnx_tflite: - inference_time: 181.0 - throughput: 5524.861878453039 + inference_time: 185.0 + throughput: 5405.405405405405 estimated_peak_memory_range: - min: 24576 - max: 1432552 + min: 12288 + max: 3185936 primary_compute_unit: NPU precision: fp16 layer_info: @@ -606,14 +665,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 117 - job_id: jqp4qxvqg + job_id: jgo26rv4p job_status: Passed torchscript_onnx_qnn: - inference_time: 218.0 - throughput: 4587.155963302752 + inference_time: 214.0 + throughput: 4672.897196261682 estimated_peak_memory_range: - min: 16384 - max: 1567608 + min: 143360 + max: 1484912 primary_compute_unit: NPU precision: fp16 layer_info: @@ -621,7 +680,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 112 - job_id: jygzex7zg + job_id: jgo26r14p job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -629,14 +688,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:18:36Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:16:21Z' - torchscript_onnx_tflite: - inference_time: 183.0 - throughput: 5464.48087431694 + inference_time: 182.0 + throughput: 5494.505494505494 estimated_peak_memory_range: - min: 49152 - max: 1431880 + min: 16384 + max: 1846152 primary_compute_unit: NPU precision: fp16 layer_info: @@ -644,14 +703,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 117 - job_id: jo5mrw3yg + job_id: j56y471vp job_status: Passed torchscript_onnx_qnn: - inference_time: 221.0 - throughput: 4524.886877828054 + inference_time: 219.0 + throughput: 4566.2100456621 estimated_peak_memory_range: - min: 135168 - max: 1379384 + min: 0 + max: 1474192 primary_compute_unit: NPU precision: fp16 layer_info: @@ -659,22 +718,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 112 - job_id: jmg9v34q5 + job_id: j56y473vp job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:18:38Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:16:19Z' - torchscript_onnx_tflite: - inference_time: 404.0 - throughput: 2475.2475247524753 + inference_time: 214.0 + throughput: 4672.897196261682 estimated_peak_memory_range: min: 12288 - max: 19660752 + max: 29404864 primary_compute_unit: NPU precision: fp16 layer_info: @@ -682,14 +741,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 117 - job_id: joprk4yv5 + job_id: j5q6qy27p job_status: Passed torchscript_onnx_qnn: - inference_time: 490.0 - throughput: 2040.8163265306123 + inference_time: 269.0 + throughput: 3717.472118959108 estimated_peak_memory_range: - min: 131072 - max: 7896480 + min: 126976 + max: 15028640 primary_compute_unit: NPU precision: fp16 layer_info: @@ -697,22 +756,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 112 - job_id: jmg9v34v5 + job_id: jg9lnm3qg job_status: Passed reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T12:18:41Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:16:27Z' - torchscript_onnx_tflite: - inference_time: 2886.0 - throughput: 346.5003465003465 + inference_time: 120.0 + throughput: 8333.333333333334 estimated_peak_memory_range: - min: 16384 - max: 3187016 + min: 8192 + max: 18867616 primary_compute_unit: NPU precision: fp16 layer_info: @@ -720,22 +779,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 117 - job_id: jqpye4drg + job_id: jpxko41j5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 177.0 + throughput: 5649.717514124294 + estimated_peak_memory_range: + min: 0 + max: 10204336 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 112 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 112 + job_id: jp4lr1xq5 job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T12:18:24Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:16:30Z' - torchscript_onnx_qnn: - inference_time: 333.0 - throughput: 3003.003003003003 + inference_time: 343.0 + throughput: 2915.451895043732 estimated_peak_memory_range: - min: 585728 - max: 585728 + min: 667648 + max: 667648 primary_compute_unit: NPU precision: fp16 layer_info: @@ -743,7 +817,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 112 - job_id: jw56631v5 + job_id: jp8qyx3zp job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -752,4 +826,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:18:30Z' + timestamp: '2024-10-15T00:16:15Z' diff --git a/qai_hub_models/models/mediapipe_hand/README.md b/qai_hub_models/models/mediapipe_hand/README.md index 7c170f6a..b5e9c832 100644 --- a/qai_hub_models/models/mediapipe_hand/README.md +++ b/qai_hub_models/models/mediapipe_hand/README.md @@ -6,7 +6,7 @@ The MediaPipe Hand Landmark Detector is a machine learning pipeline that predicts bounding boxes and pose skeletons of hands in an image. This is based on the implementation of MediaPipe-Hand-Detection found -[here](https://github.com/zmurez/MediaPipePyTorch/). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/mediapipe_hand). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.mediapipe_hand.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of MediaPipe-Hand-Detection can be found +* The license for the original implementation of MediaPipe-Hand-Detection can be found [here](https://github.com/zmurez/MediaPipePyTorch/blob/master/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [MediaPipe Hands: On-device Real-time Hand Tracking](https://arxiv.org/abs/2006.10214) * [Source Model Implementation](https://github.com/zmurez/MediaPipePyTorch/) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/mediapipe_hand/export.py b/qai_hub_models/models/mediapipe_hand/export.py index 111f81bd..3ab65b6f 100644 --- a/qai_hub_models/models/mediapipe_hand/export.py +++ b/qai_hub_models/models/mediapipe_hand/export.py @@ -10,14 +10,15 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Mapping, Optional, Tuple, cast +from typing import Any, Dict, List, Mapping, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.mediapipe_hand import Model from qai_hub_models.utils.args import export_parser, get_model_kwargs -from qai_hub_models.utils.base_model import BaseModel, TargetRuntime +from qai_hub_models.utils.base_model import BaseModel from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -45,20 +46,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Mapping[ - str, Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] -] | List[str]: +) -> Mapping[str, ExportResult] | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -83,10 +82,10 @@ def export_model( `model_cls.from_pretrained` Returns: - A Mapping from component_name to a 3-tuple of: + A Mapping from component_name to a struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "mediapipe_hand" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -118,7 +117,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) components_dict: Dict[str, BaseModel] = {} if "MediaPipeHandDetector" in components: @@ -135,7 +134,7 @@ def export_model( component.to("cpu"), make_torch_inputs(input_spec) ) - # 2. Compile the models to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = component.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -151,7 +150,7 @@ def export_model( hub.client.CompileJob, submitted_compile_job ) - # 3. Profile the model assets on real devices + # 3. Profiles the model performance on a real device profile_jobs: Dict[str, hub.client.ProfileJob] = {} if not skip_profiling: for component_name in components: @@ -169,7 +168,7 @@ def export_model( hub.client.ProfileJob, submitted_profile_job ) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_jobs: Dict[str, hub.client.InferenceJob] = {} if not skip_inferencing: for component_name in components: @@ -193,14 +192,14 @@ def export_model( hub.client.InferenceJob, submitted_inference_job ) - # 5. Download the model assets to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) for component_name, compile_job in compile_jobs.items(): target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / component_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: for component_name in components: profile_job = profile_jobs[component_name] @@ -225,10 +224,10 @@ def export_model( ) return { - component_name: ( - compile_jobs[component_name], - profile_jobs.get(component_name, None), - inference_jobs.get(component_name, None), + component_name: ExportResult( + compile_job=compile_jobs[component_name], + inference_job=inference_jobs.get(component_name, None), + profile_job=profile_jobs.get(component_name, None), ) for component_name in components } diff --git a/qai_hub_models/models/mediapipe_hand/perf.yaml b/qai_hub_models/models/mediapipe_hand/perf.yaml index 10d59228..2cc0d1fb 100644 --- a/qai_hub_models/models/mediapipe_hand/perf.yaml +++ b/qai_hub_models/models/mediapipe_hand/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: MediaPipeHandDetector performance_metrics: - torchscript_onnx_tflite: - inference_time: 714.0 - throughput: 1400.5602240896358 + inference_time: 704.0 + throughput: 1420.4545454545455 estimated_peak_memory_range: - min: 12288 - max: 5003216 + min: 20480 + max: 3734688 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,29 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 149 - job_id: jlpe9re7g - job_status: Passed - torchscript_onnx_qnn: - inference_time: 791.0 - throughput: 1264.2225031605562 - estimated_peak_memory_range: - min: 716800 - max: 20735568 - primary_compute_unit: NPU - precision: fp16 - layer_info: - layers_on_npu: 195 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 195 - job_id: j2p0y122g + job_id: jp14zjynp job_status: Passed torchscript_onnx: - inference_time: 1150.0 - throughput: 869.5652173913044 + inference_time: 1160.0 + throughput: 862.0689655172414 estimated_peak_memory_range: - min: 32768 - max: 6079328 + min: 20480 + max: 18222304 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +71,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 196 - job_id: jz57zjlqp + job_id: jglvmx325 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +80,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:17:29Z' + timestamp: '2024-10-15T00:15:06Z' - torchscript_onnx_tflite: - inference_time: 565.0 - throughput: 1769.9115044247787 + inference_time: 612.0 + throughput: 1633.986928104575 estimated_peak_memory_range: min: 12288 - max: 59678496 + max: 61765328 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,29 +94,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 149 - job_id: jz5wod2zp - job_status: Passed - torchscript_onnx_qnn: - inference_time: 622.0 - throughput: 1607.717041800643 - estimated_peak_memory_range: - min: 806912 - max: 18903824 - primary_compute_unit: NPU - precision: fp16 - layer_info: - layers_on_npu: 195 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 195 - job_id: jogkzlqyg + job_id: j57yr40n5 job_status: Passed torchscript_onnx: - inference_time: 949.0 - throughput: 1053.740779768177 + inference_time: 903.0 + throughput: 1107.4197120708748 estimated_peak_memory_range: - min: 307200 - max: 68754544 + min: 0 + max: 70548400 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +109,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 196 - job_id: j0pxv76jg + job_id: jp3j09emg job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +118,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:17:31Z' + timestamp: '2024-10-15T00:15:08Z' - torchscript_onnx_tflite: - inference_time: 720.0 - throughput: 1388.888888888889 + inference_time: 706.0 + throughput: 1416.4305949008499 estimated_peak_memory_range: - min: 28672 - max: 4251000 + min: 12288 + max: 118955440 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,22 +132,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 149 - job_id: jnp10dyk5 - job_status: Passed - torchscript_onnx_qnn: - inference_time: 769.0 - throughput: 1300.3901170351105 - estimated_peak_memory_range: - min: 0 - max: 1765648 - primary_compute_unit: NPU - precision: fp16 - layer_info: - layers_on_npu: 195 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 195 - job_id: j1p3k41x5 + job_id: jpxko4n85 job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +140,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:17:20Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:14:35Z' - torchscript_onnx_tflite: - inference_time: 1290.0 - throughput: 775.1937984496124 + inference_time: 711.0 + throughput: 1406.4697609001407 estimated_peak_memory_range: - min: 16384 - max: 54753632 + min: 28672 + max: 63810304 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,37 +155,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 149 - job_id: jz57zj0qp - job_status: Passed - torchscript_onnx_qnn: - inference_time: 1401.0 - throughput: 713.7758743754462 - estimated_peak_memory_range: - min: 802816 - max: 17189520 - primary_compute_unit: NPU - precision: fp16 - layer_info: - layers_on_npu: 195 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 195 - job_id: jnp10dwk5 + job_id: jglvmx225 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:17:27Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:14:43Z' - torchscript_onnx_tflite: - inference_time: 710.0 - throughput: 1408.4507042253522 + inference_time: 706.0 + throughput: 1416.4305949008499 estimated_peak_memory_range: - min: 28672 - max: 5703480 + min: 12288 + max: 3533936 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,22 +178,30 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 149 - job_id: j0pxv7njg + job_id: jp0z0j205 job_status: Passed - torchscript_onnx_qnn: - inference_time: 789.0 - throughput: 1267.427122940431 + reference_device_info: + name: SA8775 (Proxy) + os: '13' + form_factor: Auto + os_name: Android + manufacturer: Qualcomm + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:14:41Z' + - torchscript_onnx_tflite: + inference_time: 708.0 + throughput: 1412.4293785310736 estimated_peak_memory_range: - min: 864256 - max: 2525072 + min: 24576 + max: 3593208 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 195 + layers_on_npu: 149 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 195 - job_id: j1pv31r75 + total_layers: 149 + job_id: jp2kyw06p job_status: Passed reference_device_info: name: SA8650 (Proxy) @@ -263,14 +209,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:17:22Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:14:39Z' - torchscript_onnx_tflite: - inference_time: 713.0 - throughput: 1402.5245441795232 + inference_time: 1321.0 + throughput: 757.002271006813 estimated_peak_memory_range: - min: 28672 - max: 4408696 + min: 12288 + max: 55013504 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,37 +224,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 149 - job_id: jegn29mvg - job_status: Passed - torchscript_onnx_qnn: - inference_time: 790.0 - throughput: 1265.8227848101267 - estimated_peak_memory_range: - min: 819200 - max: 2107504 - primary_compute_unit: NPU - precision: fp16 - layer_info: - layers_on_npu: 195 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 195 - job_id: jlpe9rw7g + job_id: jgn6vnlj5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:17:24Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:14:37Z' - torchscript_onnx_tflite: - inference_time: 710.0 - throughput: 1408.4507042253522 + inference_time: 529.0 + throughput: 1890.359168241966 estimated_peak_memory_range: - min: 28672 - max: 3690112 + min: 8192 + max: 29453520 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,52 +247,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 149 - job_id: jep2879xp + job_id: jpv6kdrz5 job_status: Passed - torchscript_onnx_qnn: - inference_time: 798.0 - throughput: 1253.1328320802006 + torchscript_onnx: + inference_time: 878.0 + throughput: 1138.9521640091116 estimated_peak_memory_range: - min: 823296 - max: 2057064 + min: 0 + max: 34077424 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 195 + layers_on_npu: 196 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 195 - job_id: jz5wod3zp + total_layers: 196 + job_id: j5we67q45 job_status: Passed reference_device_info: - name: SA8255 (Proxy) - os: '13' - form_factor: Auto + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:17:25Z' - - torchscript_onnx_qnn: - inference_time: 925.0 - throughput: 1081.081081081081 - estimated_peak_memory_range: - min: 786432 - max: 786432 - primary_compute_unit: NPU - precision: fp16 - layer_info: - layers_on_npu: 195 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 195 - job_id: j1gln02ep - job_status: Passed - torchscript_onnx: - inference_time: 1178.0 - throughput: 848.8964346349745 + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:15:13Z' + - torchscript_onnx: + inference_time: 1204.0 + throughput: 830.5647840531561 estimated_peak_memory_range: - min: 4263936 - max: 4263936 + min: 5832704 + max: 5832704 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +285,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 196 - job_id: jegn293vg + job_id: jpv6kdvz5 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,15 +294,15 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:17:32Z' + timestamp: '2024-10-15T00:15:10Z' - name: MediaPipeHandLandmarkDetector performance_metrics: - torchscript_onnx_tflite: - inference_time: 1048.0 - throughput: 954.1984732824427 + inference_time: 1030.0 + throughput: 970.8737864077669 estimated_peak_memory_range: min: 12288 - max: 57929352 + max: 1495504 primary_compute_unit: NPU precision: fp16 layer_info: @@ -394,29 +310,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 158 - job_id: jygzexozg - job_status: Passed - torchscript_onnx_qnn: - inference_time: 1109.0 - throughput: 901.7132551848512 - estimated_peak_memory_range: - min: 1626112 - max: 41405288 - primary_compute_unit: NPU - precision: fp16 - layer_info: - layers_on_npu: 208 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 208 - job_id: j1p8o3mzg + job_id: jgdx13e6p job_status: Passed torchscript_onnx: - inference_time: 1575.0 - throughput: 634.9206349206349 + inference_time: 1552.0 + throughput: 644.3298969072165 estimated_peak_memory_range: min: 12288 - max: 7777872 + max: 8154736 primary_compute_unit: NPU precision: fp16 layer_info: @@ -424,7 +325,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 209 - job_id: jqp4qxdqg + job_id: j56y47nnp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -433,13 +334,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:17:29Z' + timestamp: '2024-10-15T00:15:06Z' - torchscript_onnx_tflite: - inference_time: 907.0 - throughput: 1102.5358324145534 + inference_time: 848.0 + throughput: 1179.245283018868 estimated_peak_memory_range: min: 12288 - max: 64177952 + max: 64923696 primary_compute_unit: NPU precision: fp16 layer_info: @@ -447,29 +348,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 158 - job_id: jmg9v3jq5 - job_status: Passed - torchscript_onnx_qnn: - inference_time: 943.0 - throughput: 1060.4453870625662 - estimated_peak_memory_range: - min: 802816 - max: 20160416 - primary_compute_unit: NPU - precision: fp16 - layer_info: - layers_on_npu: 208 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 208 - job_id: jn5q87r75 + job_id: jp4lr1k25 job_status: Passed torchscript_onnx: - inference_time: 1255.0 - throughput: 796.8127490039841 + inference_time: 1213.0 + throughput: 824.4023083264633 estimated_peak_memory_range: - min: 0 - max: 66155712 + min: 327680 + max: 68004160 primary_compute_unit: NPU precision: fp16 layer_info: @@ -477,7 +363,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 209 - job_id: jo5mrw6yg + job_id: jgo26r31p job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -486,13 +372,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:17:31Z' + timestamp: '2024-10-15T00:15:08Z' - torchscript_onnx_tflite: - inference_time: 1005.0 - throughput: 995.0248756218906 + inference_time: 1003.0 + throughput: 997.0089730807578 estimated_peak_memory_range: - min: 32768 - max: 14970560 + min: 53248 + max: 179286624 primary_compute_unit: NPU precision: fp16 layer_info: @@ -500,22 +386,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 158 - job_id: jvgdwrek5 - job_status: Passed - torchscript_onnx_qnn: - inference_time: 1090.0 - throughput: 917.4311926605504 - estimated_peak_memory_range: - min: 819200 - max: 1921360 - primary_compute_unit: NPU - precision: fp16 - layer_info: - layers_on_npu: 208 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 208 - job_id: jwgoy1n45 + job_id: j5mnxmq7p job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -523,14 +394,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:17:21Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:14:35Z' - torchscript_onnx_tflite: - inference_time: 2570.0 - throughput: 389.10505836575874 + inference_time: 1008.0 + throughput: 992.063492063492 estimated_peak_memory_range: - min: 12288 - max: 57590704 + min: 49152 + max: 1565352 primary_compute_unit: NPU precision: fp16 layer_info: @@ -538,22 +409,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 158 - job_id: jqp4qxkqg + job_id: j56y47znp job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:17:09Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:14:43Z' - torchscript_onnx_tflite: - inference_time: 1015.0 - throughput: 985.2216748768473 + inference_time: 1004.0 + throughput: 996.01593625498 estimated_peak_memory_range: - min: 24576 - max: 1455056 + min: 12288 + max: 1431104 primary_compute_unit: NPU precision: fp16 layer_info: @@ -561,22 +432,30 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 158 - job_id: jo5mrwqyg + job_id: jp8qyxmqp job_status: Passed - torchscript_onnx_qnn: - inference_time: 1091.0 - throughput: 916.5902841429881 + reference_device_info: + name: SA8775 (Proxy) + os: '13' + form_factor: Auto + os_name: Android + manufacturer: Qualcomm + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:14:41Z' + - torchscript_onnx_tflite: + inference_time: 1035.0 + throughput: 966.1835748792271 estimated_peak_memory_range: - min: 868352 - max: 2128472 + min: 20480 + max: 1344472 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 208 + layers_on_npu: 158 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 208 - job_id: j7gjx027p + total_layers: 158 + job_id: jpy13xr0p job_status: Passed reference_device_info: name: SA8650 (Proxy) @@ -584,14 +463,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:17:22Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:14:39Z' - torchscript_onnx_tflite: - inference_time: 999.0 - throughput: 1001.001001001001 + inference_time: 2590.0 + throughput: 386.1003861003861 estimated_peak_memory_range: - min: 28672 - max: 1513784 + min: 12288 + max: 58059440 primary_compute_unit: NPU precision: fp16 layer_info: @@ -599,37 +478,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 158 - job_id: joprk42v5 - job_status: Passed - torchscript_onnx_qnn: - inference_time: 1148.0 - throughput: 871.0801393728223 - estimated_peak_memory_range: - min: 823296 - max: 2426232 - primary_compute_unit: NPU - precision: fp16 - layer_info: - layers_on_npu: 208 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 208 - job_id: jygzexjzg + job_id: jprv308kg job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:17:24Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:14:38Z' - torchscript_onnx_tflite: - inference_time: 1053.0 - throughput: 949.667616334283 + inference_time: 585.0 + throughput: 1709.4017094017095 estimated_peak_memory_range: - min: 20480 - max: 1470912 + min: 8192 + max: 33374032 primary_compute_unit: NPU precision: fp16 layer_info: @@ -637,52 +501,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 158 - job_id: jqpye4jrg + job_id: jgjvn721g job_status: Passed - torchscript_onnx_qnn: - inference_time: 1110.0 - throughput: 900.9009009009009 + torchscript_onnx: + inference_time: 1068.0 + throughput: 936.3295880149813 estimated_peak_memory_range: - min: 831488 - max: 2170544 + min: 0 + max: 39068848 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 208 + layers_on_npu: 209 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 208 - job_id: jmg9v3yq5 + total_layers: 209 + job_id: jg9lnmwmg job_status: Passed reference_device_info: - name: SA8255 (Proxy) - os: '13' - form_factor: Auto + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:17:26Z' - - torchscript_onnx_qnn: - inference_time: 1339.0 - throughput: 746.8259895444362 - estimated_peak_memory_range: - min: 786432 - max: 786432 - primary_compute_unit: NPU - precision: fp16 - layer_info: - layers_on_npu: 208 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 208 - job_id: jw5663zv5 - job_status: Passed - torchscript_onnx: - inference_time: 1619.0 - throughput: 617.6652254478073 + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:15:14Z' + - torchscript_onnx: + inference_time: 1641.0 + throughput: 609.3845216331505 estimated_peak_memory_range: - min: 6717440 - max: 6717440 + min: 8015872 + max: 8015872 primary_compute_unit: NPU precision: fp16 layer_info: @@ -690,7 +539,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 209 - job_id: joprk4ev5 + job_id: jgjvn7e1g job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -699,4 +548,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:17:33Z' + timestamp: '2024-10-15T00:15:10Z' diff --git a/qai_hub_models/models/mediapipe_pose/README.md b/qai_hub_models/models/mediapipe_pose/README.md index 4df97c19..02ceb20e 100644 --- a/qai_hub_models/models/mediapipe_pose/README.md +++ b/qai_hub_models/models/mediapipe_pose/README.md @@ -6,7 +6,7 @@ The MediaPipe Pose Landmark Detector is a machine learning pipeline that predicts bounding boxes and pose skeletons of poses in an image. This is based on the implementation of MediaPipe-Pose-Estimation found -[here](https://github.com/zmurez/MediaPipePyTorch/). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/mediapipe_pose). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.mediapipe_pose.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of MediaPipe-Pose-Estimation can be found +* The license for the original implementation of MediaPipe-Pose-Estimation can be found [here](https://github.com/zmurez/MediaPipePyTorch/blob/master/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [BlazePose: On-device Real-time Body Pose tracking](https://arxiv.org/abs/2006.10204) * [Source Model Implementation](https://github.com/zmurez/MediaPipePyTorch/) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/mediapipe_pose/export.py b/qai_hub_models/models/mediapipe_pose/export.py index dd844ad8..fae2a609 100644 --- a/qai_hub_models/models/mediapipe_pose/export.py +++ b/qai_hub_models/models/mediapipe_pose/export.py @@ -10,14 +10,15 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Mapping, Optional, Tuple, cast +from typing import Any, Dict, List, Mapping, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.mediapipe_pose import Model from qai_hub_models.utils.args import export_parser, get_model_kwargs -from qai_hub_models.utils.base_model import BaseModel, TargetRuntime +from qai_hub_models.utils.base_model import BaseModel from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -45,20 +46,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Mapping[ - str, Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] -] | List[str]: +) -> Mapping[str, ExportResult] | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -83,10 +82,10 @@ def export_model( `model_cls.from_pretrained` Returns: - A Mapping from component_name to a 3-tuple of: + A Mapping from component_name to a struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "mediapipe_pose" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -118,7 +117,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) components_dict: Dict[str, BaseModel] = {} if "MediaPipePoseDetector" in components: @@ -135,7 +134,7 @@ def export_model( component.to("cpu"), make_torch_inputs(input_spec) ) - # 2. Compile the models to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = component.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -151,7 +150,7 @@ def export_model( hub.client.CompileJob, submitted_compile_job ) - # 3. Profile the model assets on real devices + # 3. Profiles the model performance on a real device profile_jobs: Dict[str, hub.client.ProfileJob] = {} if not skip_profiling: for component_name in components: @@ -169,7 +168,7 @@ def export_model( hub.client.ProfileJob, submitted_profile_job ) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_jobs: Dict[str, hub.client.InferenceJob] = {} if not skip_inferencing: for component_name in components: @@ -193,14 +192,14 @@ def export_model( hub.client.InferenceJob, submitted_inference_job ) - # 5. Download the model assets to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) for component_name, compile_job in compile_jobs.items(): target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / component_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: for component_name in components: profile_job = profile_jobs[component_name] @@ -225,10 +224,10 @@ def export_model( ) return { - component_name: ( - compile_jobs[component_name], - profile_jobs.get(component_name, None), - inference_jobs.get(component_name, None), + component_name: ExportResult( + compile_job=compile_jobs[component_name], + inference_job=inference_jobs.get(component_name, None), + profile_job=profile_jobs.get(component_name, None), ) for component_name in components } diff --git a/qai_hub_models/models/mediapipe_pose/perf.yaml b/qai_hub_models/models/mediapipe_pose/perf.yaml index a7530740..1099c765 100644 --- a/qai_hub_models/models/mediapipe_pose/perf.yaml +++ b/qai_hub_models/models/mediapipe_pose/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,29 +20,26 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: MediaPipePoseDetector performance_metrics: @@ -49,8 +47,8 @@ models: inference_time: 774.0 throughput: 1291.9896640826873 estimated_peak_memory_range: - min: 69632 - max: 1600768 + min: 28672 + max: 5196752 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,29 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 106 - job_id: j0pxv7k8g - job_status: Passed - torchscript_onnx_qnn: - inference_time: 838.0 - throughput: 1193.3174224343675 - estimated_peak_memory_range: - min: 12288 - max: 6020592 - primary_compute_unit: NPU - precision: fp16 - layer_info: - layers_on_npu: 138 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 138 - job_id: j1pv31qz5 + job_id: jg9lnml8g job_status: Passed torchscript_onnx: - inference_time: 1013.0 - throughput: 987.1668311944719 + inference_time: 1009.0 + throughput: 991.0802775024777 estimated_peak_memory_range: - min: 221184 - max: 1711312 + min: 16384 + max: 4350520 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +71,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 139 - job_id: jegn29lvg + job_id: jp2kywx6p job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +80,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:16:30Z' + timestamp: '2024-10-15T00:13:57Z' - torchscript_onnx_tflite: inference_time: 669.0 throughput: 1494.7683109118086 estimated_peak_memory_range: - min: 61440 - max: 47617888 + min: 16384 + max: 49169056 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,29 +94,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 106 - job_id: jegn296jg - job_status: Passed - torchscript_onnx_qnn: - inference_time: 720.0 - throughput: 1388.888888888889 - estimated_peak_memory_range: - min: 208896 - max: 16873328 - primary_compute_unit: NPU - precision: fp16 - layer_info: - layers_on_npu: 138 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 138 - job_id: jlpe9ro8g + job_id: jgdx13xzp job_status: Passed torchscript_onnx: - inference_time: 869.0 - throughput: 1150.7479861910242 + inference_time: 898.0 + throughput: 1113.5857461024498 estimated_peak_memory_range: min: 0 - max: 50983296 + max: 52554240 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +109,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 139 - job_id: jep2870xp + job_id: jp0z0j305 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +118,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:16:31Z' + timestamp: '2024-10-15T00:13:58Z' - torchscript_onnx_tflite: - inference_time: 770.0 - throughput: 1298.7012987012988 + inference_time: 774.0 + throughput: 1291.9896640826873 estimated_peak_memory_range: - min: 28672 - max: 15094632 + min: 53248 + max: 1412544 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,22 +132,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 106 - job_id: jep287k6p - job_status: Passed - torchscript_onnx_qnn: - inference_time: 816.0 - throughput: 1225.4901960784314 - estimated_peak_memory_range: - min: 237568 - max: 1474664 - primary_compute_unit: NPU - precision: fp16 - layer_info: - layers_on_npu: 138 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 138 - job_id: jnp10d2n5 + job_id: jg9lnmlmg job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +140,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:16:21Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:13:27Z' - torchscript_onnx_tflite: - inference_time: 1898.0 - throughput: 526.8703898840885 + inference_time: 777.0 + throughput: 1287.001287001287 estimated_peak_memory_range: - min: 61440 - max: 42858912 + min: 86016 + max: 1434184 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,37 +155,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 106 - job_id: j2p0y140g - job_status: Passed - torchscript_onnx_qnn: - inference_time: 1990.0 - throughput: 502.51256281407035 - estimated_peak_memory_range: - min: 208896 - max: 14890608 - primary_compute_unit: NPU - precision: fp16 - layer_info: - layers_on_npu: 138 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 138 - job_id: j0pxv79jg + job_id: jp2kywk6p job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:16:28Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:13:34Z' - torchscript_onnx_tflite: - inference_time: 772.0 - throughput: 1295.3367875647668 + inference_time: 779.0 + throughput: 1283.6970474967907 estimated_peak_memory_range: min: 28672 - max: 1446720 + max: 1387472 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,22 +178,30 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 106 - job_id: jogkzlvvg + job_id: jgn6vn6j5 job_status: Passed - torchscript_onnx_qnn: - inference_time: 819.0 - throughput: 1221.001221001221 + reference_device_info: + name: SA8775 (Proxy) + os: '13' + form_factor: Auto + os_name: Android + manufacturer: Qualcomm + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:13:32Z' + - torchscript_onnx_tflite: + inference_time: 774.0 + throughput: 1291.9896640826873 estimated_peak_memory_range: - min: 233472 - max: 1953384 + min: 65536 + max: 1705552 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 138 + layers_on_npu: 106 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 138 - job_id: jz5wodwzp + total_layers: 106 + job_id: jp4lr1l25 job_status: Passed reference_device_info: name: SA8650 (Proxy) @@ -263,14 +209,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:16:22Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:13:30Z' - torchscript_onnx_tflite: - inference_time: 775.0 - throughput: 1290.3225806451612 + inference_time: 1892.0 + throughput: 528.5412262156448 estimated_peak_memory_range: - min: 16384 - max: 1545736 + min: 12288 + max: 43618656 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,37 +224,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 106 - job_id: j1gln042p - job_status: Passed - torchscript_onnx_qnn: - inference_time: 818.0 - throughput: 1222.4938875305625 - estimated_peak_memory_range: - min: 258048 - max: 1546952 - primary_compute_unit: NPU - precision: fp16 - layer_info: - layers_on_npu: 138 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 138 - job_id: jnp10d2k5 + job_id: jgdx13x6p job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:16:24Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:13:29Z' - torchscript_onnx_tflite: - inference_time: 778.0 - throughput: 1285.3470437017995 + inference_time: 457.0 + throughput: 2188.183807439825 estimated_peak_memory_range: - min: 28672 - max: 1582800 + min: 12288 + max: 24848112 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,52 +247,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 106 - job_id: j1p3k4nm5 + job_id: jgkex4vvg job_status: Passed - torchscript_onnx_qnn: - inference_time: 820.0 - throughput: 1219.5121951219512 + torchscript_onnx: + inference_time: 755.0 + throughput: 1324.5033112582782 estimated_peak_memory_range: - min: 212992 - max: 1943736 + min: 0 + max: 27046240 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 138 + layers_on_npu: 139 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 138 - job_id: jz57zj2qp + total_layers: 139 + job_id: jp3j09vmg job_status: Passed reference_device_info: - name: SA8255 (Proxy) - os: '13' - form_factor: Auto + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:16:26Z' - - torchscript_onnx_qnn: - inference_time: 977.0 - throughput: 1023.5414534288639 - estimated_peak_memory_range: - min: 466944 - max: 466944 - primary_compute_unit: NPU - precision: fp16 - layer_info: - layers_on_npu: 138 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 138 - job_id: jz5wodw4p - job_status: Passed - torchscript_onnx: - inference_time: 1053.0 - throughput: 949.667616334283 + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:14:04Z' + - torchscript_onnx: + inference_time: 1057.0 + throughput: 946.073793755913 estimated_peak_memory_range: - min: 2973696 - max: 2973696 + min: 3051520 + max: 3051520 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +285,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 139 - job_id: j2p0y132g + job_id: jgkex47vg job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,15 +294,15 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:16:33Z' + timestamp: '2024-10-15T00:14:00Z' - name: MediaPipePoseLandmarkDetector performance_metrics: - torchscript_onnx_tflite: - inference_time: 832.0 - throughput: 1201.923076923077 + inference_time: 831.0 + throughput: 1203.3694344163657 estimated_peak_memory_range: - min: 36864 - max: 2373544 + min: 12288 + max: 6408528 primary_compute_unit: NPU precision: fp16 layer_info: @@ -394,29 +310,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 219 - job_id: jo5mrwn7g - job_status: Passed - torchscript_onnx_qnn: - inference_time: 915.0 - throughput: 1092.896174863388 - estimated_peak_memory_range: - min: 12288 - max: 39718592 - primary_compute_unit: NPU - precision: fp16 - layer_info: - layers_on_npu: 290 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 290 - job_id: j7gjx0d1p + job_id: jp14zj47p job_status: Passed torchscript_onnx: - inference_time: 1333.0 - throughput: 750.1875468867216 + inference_time: 1315.0 + throughput: 760.4562737642585 estimated_peak_memory_range: - min: 12288 - max: 9216760 + min: 28672 + max: 9434736 primary_compute_unit: NPU precision: fp16 layer_info: @@ -424,7 +325,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 291 - job_id: joprk48v5 + job_id: jpy13xz0p job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -433,13 +334,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:16:30Z' + timestamp: '2024-10-15T00:13:57Z' - torchscript_onnx_tflite: - inference_time: 665.0 - throughput: 1503.7593984962407 + inference_time: 705.0 + throughput: 1418.4397163120568 estimated_peak_memory_range: - min: 16384 - max: 93101840 + min: 12288 + max: 94875760 primary_compute_unit: NPU precision: fp16 layer_info: @@ -447,29 +348,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 219 - job_id: joprk4vk5 - job_status: Passed - torchscript_onnx_qnn: - inference_time: 724.0 - throughput: 1381.2154696132598 - estimated_peak_memory_range: - min: 0 - max: 20107360 - primary_compute_unit: NPU - precision: fp16 - layer_info: - layers_on_npu: 290 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 290 - job_id: jygzex24g + job_id: j5we67e45 job_status: Passed torchscript_onnx: - inference_time: 1052.0 - throughput: 950.5703422053232 + inference_time: 1012.0 + throughput: 988.1422924901186 estimated_peak_memory_range: min: 0 - max: 96993728 + max: 100694720 primary_compute_unit: NPU precision: fp16 layer_info: @@ -477,7 +363,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 291 - job_id: jqpye4rrg + job_id: jp8qyx0qp job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -486,13 +372,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:16:32Z' + timestamp: '2024-10-15T00:13:59Z' - torchscript_onnx_tflite: - inference_time: 819.0 - throughput: 1221.001221001221 + inference_time: 818.0 + throughput: 1222.4938875305625 estimated_peak_memory_range: min: 12288 - max: 1485816 + max: 1410992 primary_compute_unit: NPU precision: fp16 layer_info: @@ -500,22 +386,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 219 - job_id: jqpye410g - job_status: Passed - torchscript_onnx_qnn: - inference_time: 919.0 - throughput: 1088.139281828074 - estimated_peak_memory_range: - min: 811008 - max: 2001472 - primary_compute_unit: NPU - precision: fp16 - layer_info: - layers_on_npu: 290 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 290 - job_id: jvgdwrn65 + job_id: jp14zj4np job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -523,14 +394,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:16:21Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:13:27Z' - torchscript_onnx_tflite: - inference_time: 1820.0 - throughput: 549.4505494505495 + inference_time: 841.0 + throughput: 1189.0606420927468 estimated_peak_memory_range: - min: 12288 - max: 82623056 + min: 24576 + max: 8275072 primary_compute_unit: NPU precision: fp16 layer_info: @@ -538,22 +409,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 219 - job_id: j1p8o32qg + job_id: jpy13x10p job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:16:09Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:13:34Z' - torchscript_onnx_tflite: - inference_time: 831.0 - throughput: 1203.3694344163657 + inference_time: 826.0 + throughput: 1210.6537530266344 estimated_peak_memory_range: - min: 12288 - max: 1682648 + min: 16384 + max: 2589320 primary_compute_unit: NPU precision: fp16 layer_info: @@ -561,22 +432,30 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 219 - job_id: jn5q870e5 + job_id: jprv30vkg job_status: Passed - torchscript_onnx_qnn: - inference_time: 904.0 - throughput: 1106.1946902654868 + reference_device_info: + name: SA8775 (Proxy) + os: '13' + form_factor: Auto + os_name: Android + manufacturer: Qualcomm + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:13:32Z' + - torchscript_onnx_tflite: + inference_time: 846.0 + throughput: 1182.033096926714 estimated_peak_memory_range: - min: 827392 - max: 2088096 + min: 20480 + max: 5743568 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 290 + layers_on_npu: 219 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 290 - job_id: jmg9v30q5 + total_layers: 219 + job_id: jpxko4k85 job_status: Passed reference_device_info: name: SA8650 (Proxy) @@ -584,14 +463,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:16:23Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:13:31Z' - torchscript_onnx_tflite: - inference_time: 821.0 - throughput: 1218.026796589525 + inference_time: 1814.0 + throughput: 551.2679162072767 estimated_peak_memory_range: min: 12288 - max: 43989096 + max: 82768992 primary_compute_unit: NPU precision: fp16 layer_info: @@ -599,37 +478,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 219 - job_id: jw56632n5 - job_status: Passed - torchscript_onnx_qnn: - inference_time: 901.0 - throughput: 1109.8779134295228 - estimated_peak_memory_range: - min: 811008 - max: 2141312 - primary_compute_unit: NPU - precision: fp16 - layer_info: - layers_on_npu: 290 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 290 - job_id: jvgdwrnk5 + job_id: j57yr4yn5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:16:25Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:13:29Z' - torchscript_onnx_tflite: - inference_time: 826.0 - throughput: 1210.6537530266344 + inference_time: 549.0 + throughput: 1821.4936247723133 estimated_peak_memory_range: - min: 12288 - max: 1473008 + min: 8192 + max: 36712976 primary_compute_unit: NPU precision: fp16 layer_info: @@ -637,52 +501,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 219 - job_id: jwgoy1z15 + job_id: j5q6qy0ep job_status: Passed - torchscript_onnx_qnn: - inference_time: 893.0 - throughput: 1119.8208286674133 + torchscript_onnx: + inference_time: 916.0 + throughput: 1091.703056768559 estimated_peak_memory_range: - min: 819200 - max: 2073072 + min: 245760 + max: 44747168 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 290 + layers_on_npu: 291 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 290 - job_id: jqp4qxnqg + total_layers: 291 + job_id: jgo26rk1p job_status: Passed reference_device_info: - name: SA8255 (Proxy) - os: '13' - form_factor: Auto + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:16:27Z' - - torchscript_onnx_qnn: - inference_time: 1121.0 - throughput: 892.0606601248885 - estimated_peak_memory_range: - min: 786432 - max: 786432 - primary_compute_unit: NPU - precision: fp16 - layer_info: - layers_on_npu: 290 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 290 - job_id: jmg9v30m5 - job_status: Passed - torchscript_onnx: - inference_time: 1404.0 - throughput: 712.2507122507122 + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:14:04Z' + - torchscript_onnx: + inference_time: 1382.0 + throughput: 723.589001447178 estimated_peak_memory_range: - min: 8105984 - max: 8105984 + min: 8028160 + max: 8028160 primary_compute_unit: NPU precision: fp16 layer_info: @@ -690,7 +539,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 291 - job_id: j1p8o30zg + job_id: j5q6qyeep job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -699,4 +548,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:16:34Z' + timestamp: '2024-10-15T00:14:01Z' diff --git a/qai_hub_models/models/mediapipe_selfie/README.md b/qai_hub_models/models/mediapipe_selfie/README.md index 49115f4e..e24f4220 100644 --- a/qai_hub_models/models/mediapipe_selfie/README.md +++ b/qai_hub_models/models/mediapipe_selfie/README.md @@ -6,7 +6,7 @@ Light-weight model that segments a person from the background in square or landscape selfie and video conference imagery. This is based on the implementation of MediaPipe-Selfie-Segmentation found -[here](https://github.com/google/mediapipe/tree/master/mediapipe/modules/selfie_segmentation). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/mediapipe_selfie). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.mediapipe_selfie.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of MediaPipe-Selfie-Segmentation can be found +* The license for the original implementation of MediaPipe-Selfie-Segmentation can be found [here](https://github.com/google/mediapipe/blob/master/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Image segmentation guide](https://developers.google.com/mediapipe/solutions/vision/image_segmenter/) * [Source Model Implementation](https://github.com/google/mediapipe/tree/master/mediapipe/modules/selfie_segmentation) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/mediapipe_selfie/export.py b/qai_hub_models/models/mediapipe_selfie/export.py index fb867c80..21e3ae07 100644 --- a/qai_hub_models/models/mediapipe_selfie/export.py +++ b/qai_hub_models/models/mediapipe_selfie/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.mediapipe_selfie import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "mediapipe_selfie" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/mediapipe_selfie/perf.yaml b/qai_hub_models/models/mediapipe_selfie/perf.yaml index 6e1606bd..0ffd5a08 100644 --- a/qai_hub_models/models/mediapipe_selfie/perf.yaml +++ b/qai_hub_models/models/mediapipe_selfie/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: MediaPipe-Selfie-Segmentation performance_metrics: - torchscript_onnx_tflite: - inference_time: 699.0 - throughput: 1430.615164520744 + inference_time: 698.0 + throughput: 1432.6647564469913 estimated_peak_memory_range: - min: 12288 - max: 1574920 + min: 290816 + max: 2000184 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 118 - job_id: jegn29vjg + job_id: jgjvn7neg job_status: Passed torchscript_onnx_qnn: - inference_time: 775.0 - throughput: 1290.3225806451612 + inference_time: 774.0 + throughput: 1291.9896640826873 estimated_peak_memory_range: - min: 811008 - max: 4052192 + min: 806912 + max: 26386216 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 138 - job_id: jn5q876e5 + job_id: jpxko4ol5 job_status: Passed torchscript_onnx: - inference_time: 1336.0 - throughput: 748.502994011976 + inference_time: 1320.0 + throughput: 757.5757575757576 estimated_peak_memory_range: - min: 589824 - max: 15605624 + min: 32768 + max: 3657536 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 140 - job_id: jygzex34g + job_id: jglvmxvm5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:15:36Z' + timestamp: '2024-10-15T00:12:53Z' - torchscript_onnx_tflite: - inference_time: 471.0 - throughput: 2123.1422505307855 + inference_time: 472.0 + throughput: 2118.64406779661 estimated_peak_memory_range: min: 12288 - max: 29493584 + max: 30256864 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 118 - job_id: joprk43k5 + job_id: jpedmzmv5 job_status: Passed torchscript_onnx_qnn: - inference_time: 527.0 - throughput: 1897.5332068311195 + inference_time: 525.0 + throughput: 1904.7619047619048 estimated_peak_memory_range: - min: 806912 - max: 15650160 + min: 0 + max: 13296544 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 138 - job_id: j1gln0v2p + job_id: j5mnxmx9p job_status: Passed torchscript_onnx: - inference_time: 905.0 - throughput: 1104.9723756906078 + inference_time: 899.0 + throughput: 1112.3470522803113 estimated_peak_memory_range: min: 0 - max: 32653568 + max: 33673888 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 140 - job_id: jz5wode4p + job_id: j56y47yyp job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:15:37Z' + timestamp: '2024-10-15T00:12:55Z' - torchscript_onnx_tflite: - inference_time: 702.0 - throughput: 1424.5014245014245 + inference_time: 696.0 + throughput: 1436.7816091954023 estimated_peak_memory_range: - min: 16384 - max: 4487320 + min: 12288 + max: 1477832 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 118 - job_id: jep287y6p + job_id: jgz3dmdx5 job_status: Passed torchscript_onnx_qnn: - inference_time: 761.0 - throughput: 1314.060446780552 + inference_time: 756.0 + throughput: 1322.7513227513227 estimated_peak_memory_range: - min: 819200 - max: 2749112 + min: 815104 + max: 2186448 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 138 - job_id: j1p3k4jm5 + job_id: jprv3037g job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:15:31Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:12:46Z' - torchscript_onnx_tflite: - inference_time: 931.0 - throughput: 1074.1138560687432 + inference_time: 698.0 + throughput: 1432.6647564469913 estimated_peak_memory_range: - min: 12288 - max: 28976208 + min: 28672 + max: 1636016 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 118 - job_id: jqpye430g + job_id: jgdx131zp job_status: Passed torchscript_onnx_qnn: - inference_time: 997.0 - throughput: 1003.0090270812437 + inference_time: 755.0 + throughput: 1324.5033112582782 estimated_peak_memory_range: - min: 802816 - max: 16742848 + min: 811008 + max: 2166472 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 138 - job_id: jlpe9rd8g + job_id: jp0z0j0n5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:15:35Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:12:49Z' - torchscript_onnx_tflite: - inference_time: 702.0 - throughput: 1424.5014245014245 + inference_time: 697.0 + throughput: 1434.7202295552368 estimated_peak_memory_range: min: 12288 - max: 1762976 + max: 71148648 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 118 - job_id: j2p0y1z0g + job_id: jp14zjz7p job_status: Passed torchscript_onnx_qnn: - inference_time: 766.0 - throughput: 1305.4830287206266 + inference_time: 760.0 + throughput: 1315.7894736842106 estimated_peak_memory_range: - min: 827392 - max: 2140080 + min: 823296 + max: 2206160 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 138 - job_id: jwgoy1215 + job_id: jpy13x3lp job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:15:32Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:12:48Z' - torchscript_onnx_tflite: - inference_time: 702.0 - throughput: 1424.5014245014245 + inference_time: 704.0 + throughput: 1420.4545454545455 estimated_peak_memory_range: - min: 20480 - max: 1562888 + min: 28672 + max: 1979288 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 118 - job_id: j1p8o3qqg + job_id: jg9lnmn8g job_status: Passed torchscript_onnx_qnn: - inference_time: 761.0 - throughput: 1314.060446780552 + inference_time: 763.0 + throughput: 1310.615989515072 estimated_peak_memory_range: min: 819200 - max: 2150504 + max: 2181288 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 138 - job_id: j1pv316z5 + job_id: jp2kywyqp job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:15:33Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:12:47Z' - torchscript_onnx_tflite: - inference_time: 708.0 - throughput: 1412.4293785310736 + inference_time: 934.0 + throughput: 1070.6638115631692 estimated_peak_memory_range: - min: 24576 - max: 2987208 + min: 16384 + max: 29442032 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 118 - job_id: jogkzlevg + job_id: j5we676m5 job_status: Passed torchscript_onnx_qnn: - inference_time: 775.0 - throughput: 1290.3225806451612 + inference_time: 995.0 + throughput: 1005.0251256281407 estimated_peak_memory_range: - min: 819200 - max: 2135888 + min: 802816 + max: 16557232 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 138 - job_id: j7gjx0v1p + job_id: jgkex4xng job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:12:51Z' + - torchscript_onnx_tflite: + inference_time: 367.0 + throughput: 2724.7956403269754 + estimated_peak_memory_range: + min: 8192 + max: 19071632 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 118 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 118 + job_id: jp4lr1r15 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 518.0 + throughput: 1930.5019305019305 + estimated_peak_memory_range: + min: 0 + max: 10835744 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 138 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 138 + job_id: j5q6qyqop + job_status: Passed + torchscript_onnx: + inference_time: 871.0 + throughput: 1148.105625717566 + estimated_peak_memory_range: + min: 0 + max: 23945696 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 140 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 140 + job_id: jpv6kd6r5 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:15:34Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:12:57Z' - torchscript_onnx_qnn: - inference_time: 915.0 - throughput: 1092.896174863388 + inference_time: 908.0 + throughput: 1101.3215859030836 estimated_peak_memory_range: min: 786432 max: 786432 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 138 - job_id: jw5663yn5 + job_id: jgn6vnvq5 job_status: Passed torchscript_onnx: - inference_time: 1374.0 - throughput: 727.802037845706 + inference_time: 1367.0 + throughput: 731.528895391368 estimated_peak_memory_range: - min: 1953792 - max: 1953792 + min: 1912832 + max: 1912832 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 140 - job_id: jmg9v3lm5 + job_id: jp3j09jng job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:15:38Z' + timestamp: '2024-10-15T00:12:56Z' diff --git a/qai_hub_models/models/midas/README.md b/qai_hub_models/models/midas/README.md index d8e6479e..2295b940 100644 --- a/qai_hub_models/models/midas/README.md +++ b/qai_hub_models/models/midas/README.md @@ -6,7 +6,7 @@ Midas is designed for estimating depth at each point in an image. This is based on the implementation of Midas-V2 found -[here](https://github.com/isl-org/MiDaS). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/midas). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.midas.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Midas-V2 can be found +* The license for the original implementation of Midas-V2 can be found [here](https://github.com/isl-org/MiDaS/blob/master/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer](https://arxiv.org/abs/1907.01341v3) * [Source Model Implementation](https://github.com/isl-org/MiDaS) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/midas/export.py b/qai_hub_models/models/midas/export.py index b5091219..fe5f4cbf 100644 --- a/qai_hub_models/models/midas/export.py +++ b/qai_hub_models/models/midas/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.midas import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "midas" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -122,7 +120,7 @@ def export_model( model.to("cpu"), make_torch_inputs(input_spec), check_trace=False ) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -136,7 +134,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -151,7 +149,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -172,13 +170,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -199,7 +197,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/midas/perf.yaml b/qai_hub_models/models/midas/perf.yaml index d9dd811e..dee34bc5 100644 --- a/qai_hub_models/models/midas/perf.yaml +++ b/qai_hub_models/models/midas/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: Midas-V2 performance_metrics: - torchscript_onnx_tflite: - inference_time: 3254.0 - throughput: 307.3140749846343 + inference_time: 3240.0 + throughput: 308.641975308642 estimated_peak_memory_range: - min: 16384 - max: 2301536 + min: 24576 + max: 1960272 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 138 - job_id: jegn2e8jg + job_id: jglvmlrm5 job_status: Passed torchscript_onnx_qnn: - inference_time: 3305.0 - throughput: 302.571860816944 + inference_time: 3278.0 + throughput: 305.0640634533252 estimated_peak_memory_range: - min: 245760 - max: 109842840 + min: 286720 + max: 105600504 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 197 - job_id: jn5q87qe5 + job_id: jg9lnde8g job_status: Passed torchscript_onnx: - inference_time: 3394.0 - throughput: 294.6375957572186 + inference_time: 3303.0 + throughput: 302.7550711474417 estimated_peak_memory_range: - min: 806912 - max: 2627096 + min: 16384 + max: 43420992 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 199 - job_id: jygzexd4g + job_id: jp0z067n5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:14:58Z' + timestamp: '2024-10-15T00:12:08Z' - torchscript_onnx_tflite: - inference_time: 2419.0 - throughput: 413.39396444811905 + inference_time: 2841.0 + throughput: 351.98873636043646 estimated_peak_memory_range: min: 12288 - max: 89353488 + max: 91484384 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 138 - job_id: joprkyjk5 + job_id: j56y4wlyp job_status: Passed torchscript_onnx_qnn: - inference_time: 2865.0 - throughput: 349.04013961605585 + inference_time: 2462.0 + throughput: 406.17384240454913 estimated_peak_memory_range: - min: 0 - max: 23754928 + min: 802816 + max: 28420720 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 197 - job_id: j1gln0m2p + job_id: jp14z6x7p job_status: Passed torchscript_onnx: - inference_time: 2949.0 - throughput: 339.097999321804 + inference_time: 2550.0 + throughput: 392.15686274509807 estimated_peak_memory_range: min: 0 - max: 90842160 + max: 94982112 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 199 - job_id: jz5wod64p + job_id: jp8qy1vop job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:14:59Z' + timestamp: '2024-10-15T00:12:09Z' - torchscript_onnx_tflite: - inference_time: 3196.0 - throughput: 312.89111389236547 + inference_time: 3213.0 + throughput: 311.2356053532524 estimated_peak_memory_range: - min: 24576 - max: 44829872 + min: 12288 + max: 4925056 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 138 - job_id: jep28mn6p + job_id: jp3j062ng job_status: Passed torchscript_onnx_qnn: inference_time: 3087.0 throughput: 323.9390994493035 estimated_peak_memory_range: - min: 823296 - max: 2617712 + min: 819200 + max: 2066160 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 197 - job_id: j1p3k40m5 + job_id: jp4lr3015 job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:14:53Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:12:00Z' - torchscript_onnx_tflite: - inference_time: 4755.0 - throughput: 210.3049421661409 + inference_time: 3222.0 + throughput: 310.36623215394167 estimated_peak_memory_range: - min: 278528 - max: 94961424 + min: 24576 + max: 2070128 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 138 - job_id: jqpyed00g + job_id: jpedmy3v5 job_status: Passed torchscript_onnx_qnn: - inference_time: 4923.0 - throughput: 203.12817387771685 + inference_time: 3045.0 + throughput: 328.4072249589491 estimated_peak_memory_range: - min: 802816 - max: 26186432 + min: 815104 + max: 2149376 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 197 - job_id: jlpe9rm8g + job_id: jgn6vk8q5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:14:57Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:12:04Z' - torchscript_onnx_tflite: - inference_time: 3234.0 - throughput: 309.2145949288806 + inference_time: 3228.0 + throughput: 309.7893432465923 estimated_peak_memory_range: - min: 94208 - max: 2470544 + min: 16384 + max: 2079536 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 138 - job_id: j2p0y100g + job_id: jgjvnq4eg job_status: Passed torchscript_onnx_qnn: - inference_time: 3101.0 - throughput: 322.4766204450177 + inference_time: 3049.0 + throughput: 327.97638570022957 estimated_peak_memory_range: min: 819200 - max: 2107176 + max: 2135880 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 197 - job_id: jwgoy1615 + job_id: j5mnx8y9p job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:14:54Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:12:03Z' - torchscript_onnx_tflite: - inference_time: 3268.0 - throughput: 305.99755201958385 + inference_time: 3228.0 + throughput: 309.7893432465923 estimated_peak_memory_range: - min: 20480 - max: 2166808 + min: 28672 + max: 2099536 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 138 - job_id: j1p8o3yqg + job_id: jpv6k7xr5 job_status: Passed torchscript_onnx_qnn: - inference_time: 3103.0 - throughput: 322.26877215597807 + inference_time: 3049.0 + throughput: 327.97638570022957 estimated_peak_memory_range: - min: 843776 - max: 2385512 + min: 827392 + max: 2248040 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 197 - job_id: j1pv31kz5 + job_id: jpxkox2l5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:14:55Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:12:01Z' - torchscript_onnx_tflite: - inference_time: 3234.0 - throughput: 309.2145949288806 + inference_time: 4752.0 + throughput: 210.43771043771045 estimated_peak_memory_range: - min: 20480 - max: 1971784 + min: 16384 + max: 95709024 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 138 - job_id: jogkzlxvg + job_id: jgo268qkp job_status: Passed torchscript_onnx_qnn: - inference_time: 3121.0 - throughput: 320.41012495994875 + inference_time: 4887.0 + throughput: 204.62451401677922 estimated_peak_memory_range: - min: 847872 - max: 2181760 + min: 802816 + max: 28784448 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 197 - job_id: j7gjx0n1p + job_id: jp2kyenqp job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:12:06Z' + - torchscript_onnx_tflite: + inference_time: 2133.0 + throughput: 468.8232536333802 + estimated_peak_memory_range: + min: 20480 + max: 40145520 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 138 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 138 + job_id: j5we64nm5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 2164.0 + throughput: 462.1072088724584 + estimated_peak_memory_range: + min: 0 + max: 23509904 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 197 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 197 + job_id: jpy13m0lp + job_status: Passed + torchscript_onnx: + inference_time: 2218.0 + throughput: 450.8566275924256 + estimated_peak_memory_range: + min: 0 + max: 44280544 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 199 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 199 + job_id: jglvmxmm5 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:14:56Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:12:12Z' - torchscript_onnx_qnn: - inference_time: 3281.0 - throughput: 304.7851264858275 + inference_time: 3256.0 + throughput: 307.12530712530713 estimated_peak_memory_range: min: 786432 max: 786432 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 197 - job_id: jw56634n5 + job_id: j57yr9395 job_status: Passed torchscript_onnx: - inference_time: 3348.0 - throughput: 298.6857825567503 + inference_time: 3378.0 + throughput: 296.0331557134399 estimated_peak_memory_range: - min: 37814272 - max: 37814272 + min: 37810176 + max: 37810176 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 199 - job_id: jnp10dzn5 + job_id: jgkex8mng job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:15:00Z' + timestamp: '2024-10-15T00:12:10Z' diff --git a/qai_hub_models/models/midas_quantized/README.md b/qai_hub_models/models/midas_quantized/README.md index c2a4db5a..c61ad7c9 100644 --- a/qai_hub_models/models/midas_quantized/README.md +++ b/qai_hub_models/models/midas_quantized/README.md @@ -6,7 +6,7 @@ Midas is designed for estimating depth at each point in an image. This is based on the implementation of Midas-V2-Quantized found -[here](https://github.com/isl-org/MiDaS). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/midas_quantized). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.midas_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Midas-V2-Quantized can be found +* The license for the original implementation of Midas-V2-Quantized can be found [here](https://github.com/isl-org/MiDaS/blob/master/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer](https://arxiv.org/abs/1907.01341v3) * [Source Model Implementation](https://github.com/isl-org/MiDaS) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/midas_quantized/export.py b/qai_hub_models/models/midas_quantized/export.py index f80a1efa..09180c7f 100644 --- a/qai_hub_models/models/midas_quantized/export.py +++ b/qai_hub_models/models/midas_quantized/export.py @@ -10,17 +10,17 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.midas_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.printing import ( print_inference_metrics, @@ -45,20 +45,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -80,10 +78,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "midas_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,7 +107,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( target_runtime, output_path, input_spec, check_trace=False ) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -135,7 +133,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +148,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +169,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -198,7 +196,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/midas_quantized/perf.yaml b/qai_hub_models/models/midas_quantized/perf.yaml index af1fa03f..d8ebc38d 100644 --- a/qai_hub_models/models/midas_quantized/perf.yaml +++ b/qai_hub_models/models/midas_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,41 +20,38 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: Midas-V2-Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 1094.0 - throughput: 914.0767824497258 + inference_time: 1112.0 + throughput: 899.2805755395683 estimated_peak_memory_range: min: 12288 - max: 3108640 + max: 8668496 primary_compute_unit: NPU precision: int8 layer_info: @@ -61,22 +59,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 145 - job_id: jegn2eyjg + job_id: jp2kye8qp job_status: Passed torchscript_onnx_qnn: - inference_time: 1435.0 - throughput: 696.8641114982578 + inference_time: 1434.0 + throughput: 697.350069735007 estimated_peak_memory_range: - min: 16384 - max: 315875608 + min: 24576 + max: 54561184 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 203 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jw5661ln5 + total_layers: 203 + job_id: jgjvnqmeg job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -85,13 +83,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:14:11Z' + timestamp: '2024-10-15T00:11:09Z' - torchscript_onnx_tflite: - inference_time: 764.0 - throughput: 1308.9005235602094 + inference_time: 774.0 + throughput: 1291.9896640826873 estimated_peak_memory_range: min: 12288 - max: 91793680 + max: 93655872 primary_compute_unit: NPU precision: int8 layer_info: @@ -99,22 +97,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 145 - job_id: joprkyqk5 + job_id: jpy13melp job_status: Passed torchscript_onnx_qnn: - inference_time: 1018.0 - throughput: 982.3182711198428 + inference_time: 1013.0 + throughput: 987.1668311944719 estimated_peak_memory_range: min: 208896 - max: 25454496 + max: 22425392 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 203 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: j1p3km2m5 + total_layers: 203 + job_id: jpedmy1v5 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -123,13 +121,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:14:12Z' + timestamp: '2024-10-15T00:11:10Z' - torchscript_onnx_tflite: - inference_time: 1080.0 - throughput: 925.925925925926 + inference_time: 3827.0 + throughput: 261.30128037627384 estimated_peak_memory_range: - min: 12288 - max: 1462496 + min: 81920 + max: 52640544 primary_compute_unit: NPU precision: int8 layer_info: @@ -137,37 +135,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 145 - job_id: jep28m66p + job_id: jp3j063ng job_status: Passed torchscript_onnx_qnn: - inference_time: 1310.0 - throughput: 763.3587786259542 + inference_time: 6190.0 + throughput: 161.55088852988692 estimated_peak_memory_range: - min: 229376 - max: 1931200 + min: 241664 + max: 8642432 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 203 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: j1pv3wxz5 + total_layers: 203 + job_id: jpxkoxjl5 job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:14:14Z' + chipset: QCS6490 Proxy + timestamp: '2024-10-15T00:11:19Z' - torchscript_onnx_tflite: - inference_time: 1451.0 - throughput: 689.1798759476223 + inference_time: 15542.0 + throughput: 64.34178355424012 estimated_peak_memory_range: - min: 81920 - max: 90190768 + min: 102400 + max: 6053224 primary_compute_unit: NPU precision: int8 layer_info: @@ -175,37 +173,60 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 145 - job_id: jqpyedw0g + job_id: jgo2680kp + job_status: Passed + reference_device_info: + name: RB5 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8250 Proxy + timestamp: '2024-10-15T00:11:07Z' + - torchscript_onnx_tflite: + inference_time: 1083.0 + throughput: 923.3610341643582 + estimated_peak_memory_range: + min: 16384 + max: 232524664 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 145 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 145 + job_id: jp0z06yn5 job_status: Passed torchscript_onnx_qnn: - inference_time: 1767.0 - throughput: 565.9309564233164 + inference_time: 1306.0 + throughput: 765.6967840735069 estimated_peak_memory_range: - min: 217088 - max: 27273760 + min: 233472 + max: 1537024 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 203 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jz5wo9n4p + total_layers: 203 + job_id: j5we64vm5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:14:18Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:11:12Z' - torchscript_onnx_tflite: - inference_time: 1090.0 - throughput: 917.4311926605504 + inference_time: 1092.0 + throughput: 915.7509157509157 estimated_peak_memory_range: - min: 16384 - max: 140063064 + min: 36864 + max: 5718496 primary_compute_unit: NPU precision: int8 layer_info: @@ -213,37 +234,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 145 - job_id: j2p0yr70g + job_id: jglvmlzm5 job_status: Passed torchscript_onnx_qnn: - inference_time: 1329.0 - throughput: 752.4454477050414 + inference_time: 1320.0 + throughput: 757.5757575757576 estimated_peak_memory_range: - min: 229376 - max: 1546896 + min: 225280 + max: 1551736 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 203 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: j7gjxl41p + total_layers: 203 + job_id: jgdx129zp job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:14:15Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:11:16Z' - torchscript_onnx_tflite: - inference_time: 1074.0 - throughput: 931.0986964618249 + inference_time: 1089.0 + throughput: 918.2736455463728 estimated_peak_memory_range: - min: 16384 - max: 11851488 + min: 61440 + max: 1662920 primary_compute_unit: NPU precision: int8 layer_info: @@ -251,22 +272,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 145 - job_id: j1p8o7vqg + job_id: j5q6qv8op job_status: Passed torchscript_onnx_qnn: - inference_time: 1307.0 - throughput: 765.1109410864575 + inference_time: 1315.0 + throughput: 760.4562737642585 estimated_peak_memory_range: - min: 233472 - max: 1941720 + min: 229376 + max: 1785760 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 203 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jlpe9v38g + total_layers: 203 + job_id: jp14z6l7p job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -274,14 +295,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:14:16Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:11:15Z' - torchscript_onnx_tflite: - inference_time: 1090.0 - throughput: 917.4311926605504 + inference_time: 1092.0 + throughput: 915.7509157509157 estimated_peak_memory_range: - min: 24576 - max: 247403960 + min: 16384 + max: 1594440 primary_compute_unit: NPU precision: int8 layer_info: @@ -289,37 +310,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 145 - job_id: jogkzymvg + job_id: jgkex8zng job_status: Passed torchscript_onnx_qnn: - inference_time: 1344.0 - throughput: 744.047619047619 + inference_time: 1319.0 + throughput: 758.1501137225171 estimated_peak_memory_range: - min: 225280 - max: 1583016 + min: 229376 + max: 1625368 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 203 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jygze7k4g + total_layers: 203 + job_id: jg9lnd18g job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:14:17Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:11:13Z' - torchscript_onnx_tflite: - inference_time: 3787.0 - throughput: 264.06126221283336 + inference_time: 1431.0 + throughput: 698.8120195667366 estimated_peak_memory_range: - min: 40960 - max: 51857952 + min: 81920 + max: 91993600 primary_compute_unit: NPU precision: int8 layer_info: @@ -327,37 +348,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 145 - job_id: jn5q82oe5 + job_id: jp8qy1oop job_status: Passed torchscript_onnx_qnn: - inference_time: 5935.0 - throughput: 168.49199663016006 + inference_time: 1772.0 + throughput: 564.3340857787811 estimated_peak_memory_range: - min: 212992 - max: 7695632 + min: 208896 + max: 25817216 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 203 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jmg9v4em5 + total_layers: 203 + job_id: jp4lr3o15 job_status: Passed reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T12:14:19Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:11:18Z' - torchscript_onnx_tflite: - inference_time: 15696.0 - throughput: 63.710499490316 + inference_time: 731.0 + throughput: 1367.9890560875513 estimated_peak_memory_range: - min: 114688 - max: 2008800 + min: 8192 + max: 49238192 primary_compute_unit: NPU precision: int8 layer_info: @@ -365,30 +386,45 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 145 - job_id: j1glnkr2p + job_id: jpv6k7or5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 1001.0 + throughput: 999.000999000999 + estimated_peak_memory_range: + min: 0 + max: 22022448 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 203 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 203 + job_id: j5mnx829p job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T12:14:10Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:11:20Z' - torchscript_onnx_qnn: - inference_time: 1480.0 - throughput: 675.6756756756756 + inference_time: 1461.0 + throughput: 684.4626967830253 estimated_peak_memory_range: - min: 344064 - max: 344064 + min: 442368 + max: 442368 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 203 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jwgoyvq15 + total_layers: 203 + job_id: jgz3dn9x5 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -397,4 +433,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:14:13Z' + timestamp: '2024-10-15T00:11:11Z' diff --git a/qai_hub_models/models/mistral_3b_quantized/README.md b/qai_hub_models/models/mistral_3b_quantized/README.md new file mode 100644 index 00000000..a8500d9f --- /dev/null +++ b/qai_hub_models/models/mistral_3b_quantized/README.md @@ -0,0 +1,55 @@ +[![Qualcomm® AI Hub Models](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/quic-logo.jpg)](../../README.md) + + +# [Mistral-3B: State-of-the-art large language model useful on a variety of language understanding and generation tasks](https://aihub.qualcomm.com/models/mistral_3b_quantized) + +Mistral 3B model is Mistral AI's first generation edge model, optimized for optimal performance on Snapdragon platforms. + +This is based on the implementation of Mistral-3B found +[here]({source_repo}). This repository contains scripts for optimized on-device +export suitable to run on Qualcomm® devices. More details on model performance +accross various devices, can be found [here](https://aihub.qualcomm.com/models/mistral_3b_quantized). + +[Sign up](https://myaccount.qualcomm.com/signup) to start using Qualcomm AI Hub and run these models on a hosted Qualcomm® device. + +## Deploying Mistral 3B on-device + +Please follow the [LLM on-device deployment](https://github.com/quic/ai-hub-apps/tree/main/tutorials/llm_on_genie) tutorial. + + + + + +## References +* [None](None) +* [Source Model Implementation](https://github.com/mistralai/mistral-inference) + + + +## Community +* Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. +* For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). + + +## Usage and Limitations + +This model may not be used for or in connection with any of the following applications: + +- Accessing essential private and public services and benefits; +- Administration of justice and democratic processes; +- Assessing or recognizing the emotional state of a person; +- Biometric and biometrics-based systems, including categorization of persons based on sensitive characteristics; +- Education and vocational training; +- Employment and workers management; +- Exploitation of the vulnerabilities of persons resulting in harmful behavior; +- General purpose social scoring; +- Law enforcement; +- Management and operation of critical infrastructure; +- Migration, asylum and border control management; +- Predictive policing; +- Real-time remote biometric identification in public spaces; +- Recommender systems of social media platforms; +- Scraping of facial images (from the internet or otherwise); and/or +- Subliminal manipulation + + diff --git a/qai_hub_models/models/mistral_3b_quantized/info.yaml b/qai_hub_models/models/mistral_3b_quantized/info.yaml new file mode 100644 index 00000000..fd7ec018 --- /dev/null +++ b/qai_hub_models/models/mistral_3b_quantized/info.yaml @@ -0,0 +1,41 @@ +name: Mistral-3B +id: mistral_3b_quantized +status: public +headline: State-of-the-art large language model useful on a variety of language + understanding and generation tasks. +domain: Generative AI +description: Mistral 3B model is Mistral AI's first generation edge model, optimized for optimal performance on Snapdragon platforms. +use_case: Text Generation +tags: + - llm + - generative-ai + - quantized +source_repo: https://github.com/mistralai/mistral-inference +model_maker_id: mistral-ai +technical_details: + Input sequence length for Prompt Processor: 128 + Max context length: 4096 + Num of key-value heads: 8 + Number of parameters: 3B + Precision: w4a16 + w8a16 (few layers) + Use: Initiate conversation with prompt-processor and then token generator for subsequent iterations. + Minimum QNN SDK version required: 2.27.7 + Supported languages: English. + TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens). + Response Rate: Rate of response generation after the first response token. +applicable_scenarios: + - Dialogue + - Content Generation + - Customer Support +related_models: [] +form_factors: + - Phone + - Tablet +has_static_banner: true +has_animated_banner: false +license_type: "other" +dataset: [] +model_type_llm: true +restrict_model_sharing: true +llm_details: + call_to_action: 'contact_for_purchase' diff --git a/qai_hub_models/models/mistral_3b_quantized/perf.yaml b/qai_hub_models/models/mistral_3b_quantized/perf.yaml new file mode 100644 index 00000000..2a0c06be --- /dev/null +++ b/qai_hub_models/models/mistral_3b_quantized/perf.yaml @@ -0,0 +1,25 @@ +aggregated: + supported_oses: + - Android + supported_devices: + - Snapdragon 8 Elite QRD + supported_chipsets: + - Snapdragon® 8 Elite +models: + name: 'Mistral-3B' + performance_metrics: + - torchscript_onnx_qnn: + llm_metrics: + time_to_first_token_range: + min: 92289 + max: 2953273.6 + tokens_per_second: 21.05 + evaluation_metrics: null + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-16T00:32:42.210701Z' diff --git a/qai_hub_models/models/mistral_7b_instruct_v0_3_quantized/README.md b/qai_hub_models/models/mistral_7b_instruct_v0_3_quantized/README.md new file mode 100644 index 00000000..c93301ee --- /dev/null +++ b/qai_hub_models/models/mistral_7b_instruct_v0_3_quantized/README.md @@ -0,0 +1,61 @@ +[![Qualcomm® AI Hub Models](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/quic-logo.jpg)](../../README.md) + + +# [Mistral-7B-Instruct-v0.3: State-of-the-art large language model useful on a variety of language understanding and generation tasks](https://aihub.qualcomm.com/models/mistral_7b_instruct_v0_3_quantized) + +Mistral AI's first open source dense model released September 2023. Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine‑tuned version of the Mistral‑7B‑v0.3. It has an extended vocabulary and supports the v3 Tokenizer, enhancing language understanding and generation. Additionally function calling is enabled. + +This is based on the implementation of Mistral-7B-Instruct-v0.3 found +[here]({source_repo}). This repository contains scripts for optimized on-device +export suitable to run on Qualcomm® devices. More details on model performance +accross various devices, can be found [here](https://aihub.qualcomm.com/models/mistral_7b_instruct_v0_3_quantized). + +[Sign up](https://myaccount.qualcomm.com/signup) to start using Qualcomm AI Hub and run these models on a hosted Qualcomm® device. + +## Deploying Mistral 7B Instruct v0.3 on-device + +Please follow the [LLM on-device deployment](https://github.com/quic/ai-hub-apps/tree/main/tutorials/llm_on_genie) tutorial. + + + + + +## License +* The license for the original implementation of Mistral-7B-Instruct-v0.3 can be found + [here](https://github.com/mistralai/mistral-inference/blob/main/LICENSE). +* The license for the compiled assets for on-device deployment can be found [here](https://github.com/mistralai/mistral-inference/blob/main/LICENSE) + + +## References +* [Mistral 7B](https://arxiv.org/abs/2310.06825) +* [Source Model Implementation](https://github.com/mistralai/mistral-inference) + + + +## Community +* Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. +* For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). + + +## Usage and Limitations + +This model may not be used for or in connection with any of the following applications: + +- Accessing essential private and public services and benefits; +- Administration of justice and democratic processes; +- Assessing or recognizing the emotional state of a person; +- Biometric and biometrics-based systems, including categorization of persons based on sensitive characteristics; +- Education and vocational training; +- Employment and workers management; +- Exploitation of the vulnerabilities of persons resulting in harmful behavior; +- General purpose social scoring; +- Law enforcement; +- Management and operation of critical infrastructure; +- Migration, asylum and border control management; +- Predictive policing; +- Real-time remote biometric identification in public spaces; +- Recommender systems of social media platforms; +- Scraping of facial images (from the internet or otherwise); and/or +- Subliminal manipulation + + diff --git a/qai_hub_models/models/mistral_7b_instruct_v0_3_quantized/info.yaml b/qai_hub_models/models/mistral_7b_instruct_v0_3_quantized/info.yaml new file mode 100644 index 00000000..c3f6d769 --- /dev/null +++ b/qai_hub_models/models/mistral_7b_instruct_v0_3_quantized/info.yaml @@ -0,0 +1,56 @@ +name: Mistral-7B-Instruct-v0.3 +id: mistral_7b_instruct_v0_3_quantized +status: public +headline: State-of-the-art large language model useful on a variety of language + understanding and generation tasks. +domain: Generative AI +description: Mistral AI's first open source dense model released September 2023. Mistral-7B-Instruct-v0.3 Large Language Model (LLM) is an instruct fine‑tuned version of the Mistral‑7B‑v0.3. It has an extended vocabulary and supports the v3 Tokenizer, enhancing language understanding and generation. Additionally function calling is enabled. +use_case: Text Generation +tags: + - llm + - generative-ai + - quantized +research_paper: https://arxiv.org/abs/2310.06825 +research_paper_title: "Mistral 7B" +model_maker_id: mistral-ai +license: https://github.com/mistralai/mistral-inference/blob/main/LICENSE +deploy_license: https://github.com/mistralai/mistral-inference/blob/main/LICENSE +source_repo: https://github.com/mistralai/mistral-inference +technical_details: + Input sequence length for Prompt Processor: 128 + Context length: 4096 + Number of parameters: 7.3B + Precision: w4a16 + w8a16 (few layers) + Num of key-value heads: 8 + Information about the model parts: Prompt Processor and Token Generator are split into 4 parts each. Each corresponding Prompt Processor and Token Generator part share weights. + Prompt processor model size: 4.17 GB + Prompt processor input: 128 tokens + KVCache initialized with pad token + Prompt processor output: 128 output tokens + KVCache for token generator + Token generator model size: 4.17 GB + Token generator input: 1 input token + past KVCache + Token generator output: 1 output token + KVCache for next iteration + Use: Initiate conversation with prompt-processor and then token generator for subsequent iterations. + Minimum QNN SDK version required: 2.27.7 + Supported languages: English. + TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens). + Response Rate: Rate of response generation after the first response token. +applicable_scenarios: + - Dialogue + - Content Generation + - Customer Support +related_models: [] +form_factors: + - Phone + - Tablet +has_static_banner: true +has_animated_banner: false +license_type: apache-2.0 +deploy_license_type: apache-2.0 +dataset: [] +model_type_llm: true +llm_details: + call_to_action: 'download' + genie_compatible: true + Snapdragon 8 Elite QRD: + torchscript_onnx_qnn: + model_download_url: v2/snapdragon_8_elite/models.zip diff --git a/qai_hub_models/models/mistral_7b_instruct_v0_3_quantized/perf.yaml b/qai_hub_models/models/mistral_7b_instruct_v0_3_quantized/perf.yaml new file mode 100644 index 00000000..517fabe5 --- /dev/null +++ b/qai_hub_models/models/mistral_7b_instruct_v0_3_quantized/perf.yaml @@ -0,0 +1,25 @@ +aggregated: + supported_oses: + - Android + supported_devices: + - Snapdragon 8 Elite QRD + supported_chipsets: + - Snapdragon® 8 Elite +models: + name: '' + performance_metrics: + - torchscript_onnx_qnn: + llm_metrics: + time_to_first_token_range: + min: 165650 + max: 5300800 + tokens_per_second: 12.56 + evaluation_metrics: null + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-16T00:32:42.210701Z' diff --git a/qai_hub_models/models/mnasnet05/README.md b/qai_hub_models/models/mnasnet05/README.md index 6f322636..adc79e3c 100644 --- a/qai_hub_models/models/mnasnet05/README.md +++ b/qai_hub_models/models/mnasnet05/README.md @@ -6,7 +6,7 @@ MNASNet05 is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of MNASNet05 found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/mnasnet.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/mnasnet05). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.mnasnet05.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of MNASNet05 can be found +* The license for the original implementation of MNASNet05 can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [MnasNet: Platform-Aware Neural Architecture Search for Mobile](https://arxiv.org/abs/1807.11626) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/mnasnet.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/mnasnet05/export.py b/qai_hub_models/models/mnasnet05/export.py index 939e4572..d2a47929 100644 --- a/qai_hub_models/models/mnasnet05/export.py +++ b/qai_hub_models/models/mnasnet05/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.mnasnet05 import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "mnasnet05" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/mnasnet05/perf.yaml b/qai_hub_models/models/mnasnet05/perf.yaml index f15bfade..66f1a7cd 100644 --- a/qai_hub_models/models/mnasnet05/perf.yaml +++ b/qai_hub_models/models/mnasnet05/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: MNASNet05 performance_metrics: - torchscript_onnx_tflite: - inference_time: 755.0 - throughput: 1324.5033112582782 + inference_time: 759.0 + throughput: 1317.5230566534915 estimated_peak_memory_range: - min: 24576 - max: 13756000 + min: 28672 + max: 1984968 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 71 - job_id: j1p8o7oog + job_id: jgn6vkjk5 job_status: Passed torchscript_onnx_qnn: - inference_time: 821.0 - throughput: 1218.026796589525 + inference_time: 825.0 + throughput: 1212.121212121212 estimated_peak_memory_range: - min: 16384 - max: 172103328 + min: 12288 + max: 22109120 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 103 - job_id: j1pv3wor5 + job_id: j56y4wk6p job_status: Passed torchscript_onnx: - inference_time: 780.0 - throughput: 1282.051282051282 + inference_time: 751.0 + throughput: 1331.5579227696405 estimated_peak_memory_range: - min: 49152 - max: 7152088 + min: 245760 + max: 5644488 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 104 - job_id: jz5wo9v4p + job_id: j5we64om5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:13:08Z' + timestamp: '2024-10-15T00:10:00Z' - torchscript_onnx_tflite: - inference_time: 531.0 - throughput: 1883.2391713747645 + inference_time: 533.0 + throughput: 1876.172607879925 estimated_peak_memory_range: min: 16384 - max: 52090288 + max: 52102112 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 71 - job_id: jogkzyzng + job_id: jprv3wz0g job_status: Passed torchscript_onnx_qnn: - inference_time: 583.0 - throughput: 1715.2658662092624 + inference_time: 581.0 + throughput: 1721.170395869191 estimated_peak_memory_range: min: 618496 - max: 14605200 + max: 18174944 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 103 - job_id: j7gjxlmep + job_id: jp3j06y3g job_status: Passed torchscript_onnx: - inference_time: 671.0 - throughput: 1490.312965722802 + inference_time: 572.0 + throughput: 1748.2517482517483 estimated_peak_memory_range: min: 0 - max: 55460784 + max: 55921184 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 104 - job_id: jmg9v41m5 + job_id: jg9lndv8g job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:13:09Z' + timestamp: '2024-10-15T00:10:01Z' - torchscript_onnx_tflite: - inference_time: 758.0 - throughput: 1319.2612137203166 + inference_time: 755.0 + throughput: 1324.5033112582782 estimated_peak_memory_range: - min: 24576 - max: 3049560 + min: 12288 + max: 3595384 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 71 - job_id: jn5q828o5 + job_id: jp2kye2rp job_status: Passed torchscript_onnx_qnn: - inference_time: 762.0 - throughput: 1312.3359580052493 + inference_time: 757.0 + throughput: 1321.003963011889 estimated_peak_memory_range: - min: 651264 - max: 1968008 + min: 667648 + max: 1782736 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 103 - job_id: jygze79xg + job_id: jpv6k73k5 job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,52 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:13:03Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:09:53Z' - torchscript_onnx_tflite: - inference_time: 1027.0 - throughput: 973.7098344693281 + inference_time: 756.0 + throughput: 1322.7513227513227 + estimated_peak_memory_range: + min: 28672 + max: 75716312 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 71 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 71 + job_id: jgkex8jwg + job_status: Passed + torchscript_onnx_qnn: + inference_time: 763.0 + throughput: 1310.615989515072 + estimated_peak_memory_range: + min: 634880 + max: 2103512 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 103 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 103 + job_id: jgz3dneo5 + job_status: Passed + reference_device_info: + name: SA8255 (Proxy) + os: '13' + form_factor: Auto + os_name: Android + manufacturer: Qualcomm + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:09:56Z' + - torchscript_onnx_tflite: + inference_time: 752.0 + throughput: 1329.787234042553 estimated_peak_memory_range: min: 16384 - max: 52887776 + max: 25221072 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 71 - job_id: j1glnkzmp + job_id: jp8qy1lkp job_status: Passed torchscript_onnx_qnn: - inference_time: 1120.0 - throughput: 892.8571428571429 + inference_time: 764.0 + throughput: 1308.9005235602094 estimated_peak_memory_range: - min: 622592 - max: 16481008 + min: 638976 + max: 1832664 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 103 - job_id: jvgdwv9z5 + job_id: jpedmy9o5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8775 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:13:07Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:09:55Z' - torchscript_onnx_tflite: inference_time: 759.0 throughput: 1317.5230566534915 estimated_peak_memory_range: - min: 45056 - max: 143981296 + min: 28672 + max: 1379768 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 71 - job_id: jw5661jy5 + job_id: jp0z06n95 job_status: Passed torchscript_onnx_qnn: - inference_time: 765.0 - throughput: 1307.18954248366 + inference_time: 766.0 + throughput: 1305.4830287206266 estimated_peak_memory_range: - min: 634880 - max: 1953144 + min: 630784 + max: 2174552 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,7 +291,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 103 - job_id: jz5wo9vmp + job_id: jgjvnqxvg job_status: Passed reference_device_info: name: SA8650 (Proxy) @@ -263,14 +299,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:13:04Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:09:54Z' - torchscript_onnx_tflite: - inference_time: 753.0 - throughput: 1328.0212483399735 + inference_time: 1026.0 + throughput: 974.6588693957115 estimated_peak_memory_range: min: 16384 - max: 29055920 + max: 54140992 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 71 - job_id: j1p3km3n5 + job_id: jpy13m98p job_status: Passed torchscript_onnx_qnn: - inference_time: 776.0 - throughput: 1288.659793814433 + inference_time: 1120.0 + throughput: 892.8571428571429 estimated_peak_memory_range: - min: 679936 - max: 2252104 + min: 0 + max: 16380944 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +329,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 103 - job_id: jmg9v4185 + job_id: jp14z608p job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:13:05Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:09:58Z' - torchscript_onnx_tflite: - inference_time: 758.0 - throughput: 1319.2612137203166 + inference_time: 507.0 + throughput: 1972.3865877712033 estimated_peak_memory_range: - min: 16384 - max: 10073128 + min: 12288 + max: 22764432 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +352,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 71 - job_id: jwgoyv0k5 + job_id: jglvmljj5 job_status: Passed torchscript_onnx_qnn: - inference_time: 766.0 - throughput: 1305.4830287206266 + inference_time: 560.0 + throughput: 1785.7142857142858 estimated_peak_memory_range: - min: 634880 - max: 2038648 + min: 614400 + max: 12920832 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +367,34 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 103 - job_id: jnp108l75 + job_id: jgdx12wrp + job_status: Passed + torchscript_onnx: + inference_time: 573.0 + throughput: 1745.2006980802792 + estimated_peak_memory_range: + min: 0 + max: 24190256 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 104 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 104 + job_id: j57yr9z95 job_status: Passed reference_device_info: - name: SA8255 (Proxy) - os: '13' - form_factor: Auto + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:13:06Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:10:04Z' - torchscript_onnx_qnn: - inference_time: 906.0 - throughput: 1103.7527593818984 + inference_time: 932.0 + throughput: 1072.961373390558 estimated_peak_memory_range: min: 602112 max: 602112 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 103 - job_id: jlpe9v1vg + job_id: jgo268jqp job_status: Passed torchscript_onnx: - inference_time: 799.0 - throughput: 1251.5644555694619 + inference_time: 824.0 + throughput: 1213.5922330097087 estimated_peak_memory_range: - min: 5668864 - max: 5668864 + min: 7053312 + max: 7053312 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 104 - job_id: jnp108ln5 + job_id: jp14z607p job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:13:10Z' + timestamp: '2024-10-15T00:10:02Z' diff --git a/qai_hub_models/models/mobilenet_v2/README.md b/qai_hub_models/models/mobilenet_v2/README.md index bf6d9dca..62926fb4 100644 --- a/qai_hub_models/models/mobilenet_v2/README.md +++ b/qai_hub_models/models/mobilenet_v2/README.md @@ -6,7 +6,7 @@ MobileNetV2 is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of MobileNet-v2 found -[here](https://github.com/tonylins/pytorch-mobilenet-v2/tree/master). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/mobilenet_v2). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.mobilenet_v2.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of MobileNet-v2 can be found +* The license for the original implementation of MobileNet-v2 can be found [here](https://github.com/tonylins/pytorch-mobilenet-v2/blob/master/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381) * [Source Model Implementation](https://github.com/tonylins/pytorch-mobilenet-v2/tree/master) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/mobilenet_v2/export.py b/qai_hub_models/models/mobilenet_v2/export.py index 06e19094..dc4abe1a 100644 --- a/qai_hub_models/models/mobilenet_v2/export.py +++ b/qai_hub_models/models/mobilenet_v2/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.mobilenet_v2 import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "mobilenet_v2" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/mobilenet_v2/perf.yaml b/qai_hub_models/models/mobilenet_v2/perf.yaml index e7bb6554..326017fd 100644 --- a/qai_hub_models/models/mobilenet_v2/perf.yaml +++ b/qai_hub_models/models/mobilenet_v2/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: MobileNet-v2 performance_metrics: - torchscript_onnx_tflite: - inference_time: 906.0 - throughput: 1103.7527593818984 + inference_time: 905.0 + throughput: 1104.9723756906078 estimated_peak_memory_range: - min: 12288 - max: 185155568 + min: 20480 + max: 184020992 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 72 - job_id: jn5q82jo5 + job_id: j57yr9mv5 job_status: Passed torchscript_onnx_qnn: inference_time: 1253.0 throughput: 798.0845969672786 estimated_peak_memory_range: - min: 266240 - max: 51335624 + min: 28672 + max: 39276912 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 105 - job_id: jlpe9v9vg + job_id: jp8qy1nkp job_status: Passed torchscript_onnx: - inference_time: 954.0 - throughput: 1048.2180293501049 + inference_time: 919.0 + throughput: 1088.139281828074 estimated_peak_memory_range: min: 12288 - max: 14648184 + max: 1628616 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 105 - job_id: j0pxv1vlg + job_id: jgz3dn1o5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:12:30Z' + timestamp: '2024-10-15T00:09:15Z' - torchscript_onnx_tflite: inference_time: 623.0 throughput: 1605.1364365971108 estimated_peak_memory_range: min: 16384 - max: 63367312 + max: 64676784 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 72 - job_id: j1glnknmp + job_id: jp4lr3785 job_status: Passed torchscript_onnx_qnn: - inference_time: 860.0 - throughput: 1162.7906976744187 + inference_time: 862.0 + throughput: 1160.092807424594 estimated_peak_memory_range: min: 618496 - max: 14968944 + max: 16158048 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 105 - job_id: jygze7exg + job_id: jgkex81wg job_status: Passed torchscript_onnx: - inference_time: 703.0 - throughput: 1422.475106685633 + inference_time: 678.0 + throughput: 1474.9262536873157 estimated_peak_memory_range: - min: 0 - max: 65651824 + min: 512000 + max: 68785456 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 105 - job_id: jo5mrzr9g + job_id: j5we64j35 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:12:31Z' + timestamp: '2024-10-15T00:09:16Z' - torchscript_onnx_tflite: - inference_time: 905.0 - throughput: 1104.9723756906078 + inference_time: 902.0 + throughput: 1108.6474501108648 estimated_peak_memory_range: - min: 28672 - max: 1431080 + min: 12288 + max: 1340448 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 72 - job_id: jw56616y5 + job_id: jpxkoxq35 job_status: Passed torchscript_onnx_qnn: inference_time: 1188.0 throughput: 841.7508417508418 estimated_peak_memory_range: - min: 634880 - max: 2261792 + min: 626688 + max: 1791448 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 105 - job_id: jmg9v4v85 + job_id: jglvmldj5 job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:12:25Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:09:07Z' - torchscript_onnx_tflite: - inference_time: 1093.0 - throughput: 914.9130832570905 + inference_time: 906.0 + throughput: 1103.7527593818984 estimated_peak_memory_range: - min: 16384 - max: 65363808 + min: 28672 + max: 8275216 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 72 - job_id: j1p3kmkn5 + job_id: jp2kye1rp job_status: Passed torchscript_onnx_qnn: - inference_time: 1439.0 - throughput: 694.9270326615705 + inference_time: 1188.0 + throughput: 841.7508417508418 estimated_peak_memory_range: - min: 618496 - max: 17563664 + min: 638976 + max: 1769016 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 105 - job_id: jqp4qwq1g + job_id: jgo268xqp job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:12:29Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:09:10Z' - torchscript_onnx_tflite: - inference_time: 907.0 - throughput: 1102.5358324145534 + inference_time: 901.0 + throughput: 1109.8779134295228 estimated_peak_memory_range: - min: 16384 - max: 1391776 + min: 24576 + max: 1600056 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 72 - job_id: jwgoyvyk5 + job_id: jprv3wr0g job_status: Passed torchscript_onnx_qnn: - inference_time: 1183.0 - throughput: 845.30853761623 + inference_time: 1189.0 + throughput: 841.0428931875525 estimated_peak_memory_range: - min: 634880 - max: 1990832 + min: 630784 + max: 1956072 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 105 - job_id: jnp108075 + job_id: jp3j06d3g job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:12:26Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:09:09Z' - torchscript_onnx_tflite: - inference_time: 907.0 - throughput: 1102.5358324145534 + inference_time: 902.0 + throughput: 1108.6474501108648 estimated_peak_memory_range: - min: 20480 - max: 1935496 + min: 12288 + max: 24504288 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 72 - job_id: j1pv3w3r5 + job_id: jgn6vk4k5 job_status: Passed torchscript_onnx_qnn: inference_time: 1191.0 throughput: 839.6305625524769 estimated_peak_memory_range: - min: 638976 - max: 2290136 + min: 630784 + max: 1901720 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 105 - job_id: jvgdwvwz5 + job_id: j56y4wx6p job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:12:27Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:09:08Z' - torchscript_onnx_tflite: - inference_time: 906.0 - throughput: 1103.7527593818984 + inference_time: 1083.0 + throughput: 923.3610341643582 estimated_peak_memory_range: - min: 12288 - max: 6087368 + min: 16384 + max: 66463520 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 72 - job_id: j7gjxlxep + job_id: j5mnx87dp job_status: Passed torchscript_onnx_qnn: - inference_time: 1201.0 - throughput: 832.6394671107411 + inference_time: 1430.0 + throughput: 699.3006993006993 estimated_peak_memory_range: - min: 638976 - max: 1895296 + min: 618496 + max: 19699392 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 105 - job_id: jz57zdz9p + job_id: jgjvnqjvg job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:09:13Z' + - torchscript_onnx_tflite: + inference_time: 503.0 + throughput: 1988.0715705765408 + estimated_peak_memory_range: + min: 8192 + max: 25112736 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 72 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 72 + job_id: jp0z06w95 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 848.0 + throughput: 1179.245283018868 + estimated_peak_memory_range: + min: 0 + max: 14676272 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 105 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 105 + job_id: jpedmyjo5 + job_status: Passed + torchscript_onnx: + inference_time: 681.0 + throughput: 1468.4287812041116 + estimated_peak_memory_range: + min: 0 + max: 25528400 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 105 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 105 + job_id: jgdx12jrp + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:12:28Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:09:19Z' - torchscript_onnx_qnn: - inference_time: 1375.0 - throughput: 727.2727272727273 + inference_time: 1348.0 + throughput: 741.839762611276 estimated_peak_memory_range: min: 602112 max: 602112 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 105 - job_id: jz5wo9omp + job_id: j5q6qvnnp job_status: Passed torchscript_onnx: - inference_time: 961.0 - throughput: 1040.5827263267429 + inference_time: 971.0 + throughput: 1029.8661174047375 estimated_peak_memory_range: - min: 8085504 - max: 8085504 + min: 9220096 + max: 9220096 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 105 - job_id: jegn2e2qg + job_id: jg9lnd6wg job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:12:32Z' + timestamp: '2024-10-15T00:09:17Z' diff --git a/qai_hub_models/models/mobilenet_v2_quantized/README.md b/qai_hub_models/models/mobilenet_v2_quantized/README.md index 9a8a7c06..378f950e 100644 --- a/qai_hub_models/models/mobilenet_v2_quantized/README.md +++ b/qai_hub_models/models/mobilenet_v2_quantized/README.md @@ -6,7 +6,7 @@ MobileNetV2 is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of MobileNet-v2-Quantized found -[here](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/mobilenetv2). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/mobilenet_v2_quantized). @@ -17,11 +17,6 @@ accross various devices, can be found [here](https://aihub.qualcomm.com/models/m ## Example & Usage -Install the package via pip: -```bash -pip install "qai_hub_models[mobilenet_v2_quantized]" -``` - Once installed, run the following simple CLI demo: @@ -44,15 +39,19 @@ python -m qai_hub_models.models.mobilenet_v2_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of MobileNet-v2-Quantized can be found +* The license for the original implementation of MobileNet-v2-Quantized can be found [here](https://github.com/quic/aimet-model-zoo/blob/develop/LICENSE.pdf). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381) * [Source Model Implementation](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/mobilenetv2) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/mobilenet_v2_quantized/conftest.py b/qai_hub_models/models/mobilenet_v2_quantized/conftest.py index 56084dea..220651da 100644 --- a/qai_hub_models/models/mobilenet_v2_quantized/conftest.py +++ b/qai_hub_models/models/mobilenet_v2_quantized/conftest.py @@ -9,7 +9,6 @@ import pytest from qai_hub_models.models.mobilenet_v2_quantized import Model -from qai_hub_models.utils.testing import skip_clone_repo_check # Instantiate the model only once for all tests. @@ -22,7 +21,6 @@ def cached_from_pretrained(): from_pretrained = Model.from_pretrained sig = inspect.signature(from_pretrained) - @skip_clone_repo_check def _cached_from_pretrained(*args, **kwargs): cache_key = str(args) + str(kwargs) model = pretrained_cache.get(cache_key, None) diff --git a/qai_hub_models/models/mobilenet_v2_quantized/evaluate.py b/qai_hub_models/models/mobilenet_v2_quantized/evaluate.py index 76dd0581..55e4c66d 100644 --- a/qai_hub_models/models/mobilenet_v2_quantized/evaluate.py +++ b/qai_hub_models/models/mobilenet_v2_quantized/evaluate.py @@ -13,10 +13,8 @@ from qai_hub_models.models.mobilenet_v2_quantized import MODEL_ID, Model from qai_hub_models.utils.args import evaluate_parser, get_hub_device, get_model_kwargs -from qai_hub_models.utils.base_model import BaseModel from qai_hub_models.utils.evaluate import evaluate_on_dataset from qai_hub_models.utils.inference import compile_model_from_args -from qai_hub_models.utils.quantization_aimet import AIMETQuantizableMixin SUPPORTED_DATASETS = ["imagenette", "imagenet"] @@ -27,6 +25,7 @@ def main(): model_cls=Model, default_split_size=2500, supported_datasets=SUPPORTED_DATASETS, + is_hub_quantized=True, ) args = parser.parse_args() args.device = None @@ -38,13 +37,7 @@ def main(): MODEL_ID, args, get_model_kwargs(Model, vars(args)) ) hub_device = get_hub_device(None, args.chipset) - - # Use Fp16 model for torch inference - for cls in Model.__mro__: - if issubclass(cls, BaseModel) and not issubclass(cls, AIMETQuantizableMixin): - torch_cls = cls - break - torch_model = torch_cls.from_pretrained(**get_model_kwargs(torch_cls, vars(args))) + torch_model = Model.from_pretrained(**get_model_kwargs(Model, vars(args))) evaluate_on_dataset( hub_model, torch_model, diff --git a/qai_hub_models/models/mobilenet_v2_quantized/export.py b/qai_hub_models/models/mobilenet_v2_quantized/export.py index 47149272..f5b541e9 100644 --- a/qai_hub_models/models/mobilenet_v2_quantized/export.py +++ b/qai_hub_models/models/mobilenet_v2_quantized/export.py @@ -10,18 +10,20 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.mobilenet_v2_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference +from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( print_inference_metrics, print_on_target_demo_cmd, @@ -31,11 +33,14 @@ can_access_qualcomm_ai_hub, export_without_hub_access, ) +from qai_hub_models.utils.quantization import get_calibration_data def export_model( device: str = "Samsung Galaxy S23 (Family)", chipset: Optional[str] = None, + num_calibration_samples: int = 100, + skip_compiling: bool = False, skip_profiling: bool = False, skip_inferencing: bool = False, skip_downloading: bool = False, @@ -45,20 +50,19 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + 3. Compiles the model to an asset that can be run on device + 4. Profiles the model performance on a real device + 5. Inferences the model on sample inputs + 6. Downloads the model asset to the local directory + 7. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 5 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -66,6 +70,9 @@ def export_model( Defaults to DEFAULT_DEVICE if not specified. chipset: If set, will choose a random device with this chipset. Overrides the `device` argument. + num_calibration_samples: The number of calibration data samples + to use for quantization. + skip_compiling: If set, skips compiling model to format that can run on device. skip_profiling: If set, skips profiling of compiled model on real devices. skip_inferencing: If set, skips computing on-device outputs from sample data. skip_downloading: If set, skips downloading of compiled model. @@ -80,10 +87,11 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: - * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). + A struct of: + * A CompileJob object containing metadata about the compile job submitted to hub (None if compiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). + * A QuantizeJob object containing metadata about the quantize job submitted to hub """ model_name = "mobilenet_v2_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,33 +117,52 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) ) # Trace the model - source_model = model.convert_to_hub_source_model( - target_runtime, output_path, input_spec + source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) + + print(f"Quantizing model {model_name} with {num_calibration_samples} samples.") + # 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + onnx_compile_job = hub.submit_compile_job( + model=source_model, + input_specs=input_spec, + device=hub_device, + name=model_name, + options="--target_runtime onnx", + ) + quantize_job = hub.submit_quantize_job( + model=onnx_compile_job.get_target_model(), + calibration_data=get_calibration_data( + input_spec, "imagenette", num_calibration_samples + ), + weights_dtype=model.get_weights_dtype(), + activations_dtype=model.get_activations_dtype(), + name=model_name, + options=model.get_quantize_options(), ) + if skip_compiling: + return ExportResult(quantize_job=quantize_job) - # 2. Compile the model to an on-device asset + # 3. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) print(f"Optimizing model {model_name} to run on-device") submitted_compile_job = hub.submit_compile_job( - model=source_model, + model=quantize_job.get_target_model(), input_specs=input_spec, device=hub_device, name=model_name, - calibration_data=model.get_calibration_data(target_runtime), options=model_compile_options, ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 4. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +177,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 5. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +198,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 6. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 7. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -202,12 +229,17 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + quantize_job=quantize_job, + ) def main(): warnings.filterwarnings("ignore") - parser = export_parser(model_cls=Model) + parser = export_parser(model_cls=Model, is_hub_quantized=True) args = parser.parse_args() export_model(**vars(args)) diff --git a/qai_hub_models/models/mobilenet_v2_quantized/model.py b/qai_hub_models/models/mobilenet_v2_quantized/model.py index d884a6c7..70e84a3c 100644 --- a/qai_hub_models/models/mobilenet_v2_quantized/model.py +++ b/qai_hub_models/models/mobilenet_v2_quantized/model.py @@ -4,86 +4,11 @@ # --------------------------------------------------------------------- from __future__ import annotations -# isort: off -# This verifies aimet is installed, and this must be included first. -from qai_hub_models.utils.quantization_aimet import ( - AIMETQuantizableMixin, -) - -# isort: on - -import torch -from aimet_torch.cross_layer_equalization import equalize_model -from aimet_torch.model_preparer import prepare_model -from aimet_torch.quantsim import QuantizationSimModel, load_encodings_to_sim - from qai_hub_models.models.mobilenet_v2.model import MobileNetV2 -from qai_hub_models.utils.aimet.config_loader import get_default_aimet_config -from qai_hub_models.utils.asset_loaders import CachedWebModelAsset -from qai_hub_models.utils.quantization_aimet import ( - constrain_quantized_inputs_to_image_range, - convert_all_depthwise_to_per_tensor, -) +from qai_hub_models.utils.quantization import HubQuantizableMixin MODEL_ID = __name__.split(".")[-2] -MODEL_ASSET_VERSION = 5 - -# Weights downloaded from https://github.com/quic/aimet-model-zoo/releases/download/phase_2_january_artifacts/torch_mobilenetv2_w8a8_state_dict.pth -QUANTIZED_WEIGHTS = "torch_mobilenetv2_w8a8_state_dict.pth" -DEFAULT_ENCODINGS = "mobilenet_v2_quantized_encodings.json" - - -class MobileNetV2Quantizable(AIMETQuantizableMixin, MobileNetV2): - """MobileNetV2 with post train quantization support.""" - - def __init__( - self, - quant_sim_model: QuantizationSimModel, - ) -> None: - # Input is already normalized by sim_model. Disable it in the wrapper model. - MobileNetV2.__init__(self, quant_sim_model.model, normalize_input=False) - AIMETQuantizableMixin.__init__( - self, - quant_sim_model, - ) - - @classmethod - def from_pretrained( - cls, - aimet_encodings: str | None = "DEFAULT", - ) -> "MobileNetV2Quantizable": - """ - Parameters: - aimet_encodings: - if "DEFAULT": Loads the model with aimet encodings calibrated on imagenette. - elif None: Doesn't load any encodings. Used when computing encodings. - else: Interprets as a filepath and loads the encodings stored there. - """ - # Load Model - model = MobileNetV2.from_pretrained() - input_shape = cls.get_input_spec()["image_tensor"][0] - # Following - # https://github.com/quic/aimet-model-zoo/blob/develop/aimet_zoo_torch/mobilenetv2/model/model_definition.py#L64 - model = prepare_model(model) - equalize_model(model, input_shape) - - aimet_config = get_default_aimet_config() - sim = QuantizationSimModel( - model, - quant_scheme="tf_enhanced", - default_param_bw=8, - default_output_bw=8, - config_file=aimet_config, - dummy_input=torch.rand(input_shape), - ) - convert_all_depthwise_to_per_tensor(sim.model) - constrain_quantized_inputs_to_image_range(sim) - if aimet_encodings: - if aimet_encodings == "DEFAULT": - aimet_encodings = CachedWebModelAsset.from_asset_store( - MODEL_ID, MODEL_ASSET_VERSION, DEFAULT_ENCODINGS - ).fetch() - load_encodings_to_sim(sim, aimet_encodings) - return cls(sim) +class MobileNetV2Quantizable(HubQuantizableMixin, MobileNetV2): + pass diff --git a/qai_hub_models/models/mobilenet_v2_quantized/perf.yaml b/qai_hub_models/models/mobilenet_v2_quantized/perf.yaml index f76dbb7f..c1386822 100644 --- a/qai_hub_models/models/mobilenet_v2_quantized/perf.yaml +++ b/qai_hub_models/models/mobilenet_v2_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,82 +20,62 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS6490 (Proxy) - RB3 Gen 2 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Sa8775p Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Qcs6490 Proxy + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8775P Proxy models: - name: MobileNet-v2-Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 279.0 - throughput: 3584.2293906810037 + inference_time: 434.0 + throughput: 2304.147465437788 estimated_peak_memory_range: min: 12288 - max: 1617296 + max: 9924072 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 74 + layers_on_npu: 109 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 74 - job_id: jogkzy1ng + total_layers: 109 + job_id: jgz32xdz5 job_status: Passed torchscript_onnx_qnn: - inference_time: 657.0 - throughput: 1522.0700152207 + inference_time: 665.0 + throughput: 1503.7593984962407 estimated_peak_memory_range: min: 16384 - max: 9679232 - primary_compute_unit: NPU - precision: int8 - layer_info: - layers_on_npu: 71 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 71 - job_id: jygze71xg - job_status: Passed - torchscript_onnx: - inference_time: 567.0 - throughput: 1763.668430335097 - estimated_peak_memory_range: - min: 12288 - max: 6765824 + max: 10005680 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 74 + layers_on_npu: 106 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 74 - job_id: jegn2ejqg + total_layers: 106 + job_id: jp2kx7kxp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -103,51 +84,36 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:11:52Z' + timestamp: '2024-10-17T17:28:07Z' - torchscript_onnx_tflite: - inference_time: 209.0 - throughput: 4784.688995215311 + inference_time: 306.0 + throughput: 3267.97385620915 estimated_peak_memory_range: min: 12288 - max: 41895024 + max: 45111632 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 74 + layers_on_npu: 109 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 74 - job_id: jn5q82no5 + total_layers: 109 + job_id: j5wewd6z5 job_status: Passed torchscript_onnx_qnn: - inference_time: 482.0 - throughput: 2074.688796680498 + inference_time: 487.0 + throughput: 2053.388090349076 estimated_peak_memory_range: min: 159744 - max: 17433984 - primary_compute_unit: NPU - precision: int8 - layer_info: - layers_on_npu: 71 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 71 - job_id: jz5wo9jmp - job_status: Passed - torchscript_onnx: - inference_time: 417.0 - throughput: 2398.0815347721823 - estimated_peak_memory_range: - min: 0 - max: 62311824 + max: 16437872 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 74 + layers_on_npu: 106 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 74 - job_id: joprkyz75 + total_layers: 106 + job_id: jpy1z41rp job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -156,287 +122,272 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:11:53Z' + timestamp: '2024-10-17T17:28:09Z' - torchscript_onnx_tflite: - inference_time: 286.0 - throughput: 3496.5034965034965 + inference_time: 1067.0 + throughput: 937.207122774133 estimated_peak_memory_range: - min: 12288 - max: 2184456 + min: 16384 + max: 28807200 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 74 + layers_on_npu: 109 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 74 - job_id: j1glnkjmp + total_layers: 109 + job_id: jg9l03nqg job_status: Passed torchscript_onnx_qnn: - inference_time: 602.0 - throughput: 1661.1295681063123 + inference_time: 1490.0 + throughput: 671.1409395973154 estimated_peak_memory_range: - min: 184320 - max: 1872464 + min: 12288 + max: 7521664 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 71 + layers_on_npu: 106 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 71 - job_id: jnp108r75 + total_layers: 106 + job_id: jp0z41z25 job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:11:46Z' + chipset: QCS6490 Proxy + timestamp: '2024-10-17T17:27:50Z' - torchscript_onnx_tflite: - inference_time: 329.0 - throughput: 3039.51367781155 - estimated_peak_memory_range: - min: 12288 - max: 42901040 - primary_compute_unit: NPU - precision: int8 - layer_info: - layers_on_npu: 74 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 74 - job_id: jw5661ky5 - job_status: Passed - torchscript_onnx_qnn: - inference_time: 724.0 - throughput: 1381.2154696132598 + inference_time: 12534.0 + throughput: 79.78299026647518 estimated_peak_memory_range: - min: 0 - max: 17851680 + min: 28672 + max: 6489896 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 71 - layers_on_gpu: 0 + layers_on_npu: 107 + layers_on_gpu: 2 layers_on_cpu: 0 - total_layers: 71 - job_id: j0pxv1wlg + total_layers: 109 + job_id: jp142dzkp job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: RB5 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:11:50Z' + chipset: QCS8250 Proxy + timestamp: '2024-10-17T17:27:32Z' - torchscript_onnx_tflite: - inference_time: 284.0 - throughput: 3521.1267605633802 + inference_time: 433.0 + throughput: 2309.4688221709007 estimated_peak_memory_range: min: 12288 - max: 1401424 + max: 4784296 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 74 + layers_on_npu: 109 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 74 - job_id: j1p3kmyn5 + total_layers: 109 + job_id: jgdxnr1kp job_status: Passed torchscript_onnx_qnn: - inference_time: 610.0 - throughput: 1639.344262295082 + inference_time: 609.0 + throughput: 1642.0361247947455 estimated_peak_memory_range: min: 184320 - max: 1483200 + max: 1352704 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 71 + layers_on_npu: 106 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 71 - job_id: jvgdwvjz5 + total_layers: 106 + job_id: jp8q23qzp job_status: Passed reference_device_info: - name: SA8650 (Proxy) - os: '13' - form_factor: Auto + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:11:47Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-17T17:27:52Z' - torchscript_onnx_tflite: - inference_time: 287.0 - throughput: 3484.320557491289 + inference_time: 437.0 + throughput: 2288.329519450801 estimated_peak_memory_range: - min: 32768 - max: 4324136 + min: 12288 + max: 1342408 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 74 + layers_on_npu: 109 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 74 - job_id: jwgoyvjk5 + total_layers: 109 + job_id: j57y2jrq5 job_status: Passed torchscript_onnx_qnn: - inference_time: 626.0 - throughput: 1597.444089456869 + inference_time: 617.0 + throughput: 1620.7455429497568 estimated_peak_memory_range: - min: 221184 - max: 1462848 + min: 184320 + max: 1510768 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 71 + layers_on_npu: 106 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 71 - job_id: jz57zdq9p + total_layers: 106 + job_id: j5q60767p job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:11:48Z' + chipset: SA8255P Proxy + timestamp: '2024-10-17T17:27:56Z' - torchscript_onnx_tflite: - inference_time: 283.0 - throughput: 3533.5689045936397 + inference_time: 433.0 + throughput: 2309.4688221709007 estimated_peak_memory_range: min: 12288 - max: 1426808 + max: 1327560 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 74 + layers_on_npu: 109 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 74 - job_id: j1pv3wjr5 + total_layers: 109 + job_id: jp4lnxrq5 job_status: Passed torchscript_onnx_qnn: - inference_time: 602.0 - throughput: 1661.1295681063123 + inference_time: 625.0 + throughput: 1600.0 estimated_peak_memory_range: - min: 180224 - max: 1801976 + min: 184320 + max: 1393752 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 71 + layers_on_npu: 106 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 71 - job_id: jqp4qwz1g + total_layers: 106 + job_id: jglv40ve5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:11:49Z' + chipset: SA8775P Proxy + timestamp: '2024-10-17T17:27:58Z' - torchscript_onnx_tflite: - inference_time: 830.0 - throughput: 1204.8192771084337 + inference_time: 486.0 + throughput: 2057.61316872428 estimated_peak_memory_range: min: 12288 - max: 27623808 + max: 44831232 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 74 + layers_on_npu: 109 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 74 - job_id: j7gjxljep + total_layers: 109 + job_id: jpxk97oj5 job_status: Passed torchscript_onnx_qnn: - inference_time: 1429.0 - throughput: 699.7900629811056 + inference_time: 725.0 + throughput: 1379.3103448275863 estimated_peak_memory_range: - min: 12288 - max: 8095952 + min: 159744 + max: 20340720 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 71 + layers_on_npu: 106 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 71 - job_id: jo5mrzj9g + total_layers: 106 + job_id: j56y23yvp job_status: Passed reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T12:11:51Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-17T17:28:00Z' - torchscript_onnx_tflite: - inference_time: 7669.0 - throughput: 130.39509714434737 + inference_time: 297.0 + throughput: 3367.003367003367 estimated_peak_memory_range: - min: 12288 - max: 10982704 + min: 8192 + max: 28543344 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 72 - layers_on_gpu: 2 + layers_on_npu: 109 + layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 74 - job_id: jlpe9vjvg + total_layers: 109 + job_id: j5mnewxyp job_status: Passed - reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot - os_name: Android - manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T12:11:42Z' - - torchscript_onnx_qnn: - inference_time: 724.0 - throughput: 1381.2154696132598 + torchscript_onnx_qnn: + inference_time: 406.0 + throughput: 2463.054187192118 estimated_peak_memory_range: - min: 577536 - max: 577536 + min: 8192 + max: 14825008 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 71 + layers_on_npu: 106 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 71 - job_id: jmg9v4685 + total_layers: 106 + job_id: jp3jn4jxg job_status: Passed - torchscript_onnx: - inference_time: 577.0 - throughput: 1733.102253032929 + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-17T17:28:12Z' + - torchscript_onnx_qnn: + inference_time: 757.0 + throughput: 1321.003963011889 estimated_peak_memory_range: - min: 5672960 - max: 5672960 + min: 630784 + max: 630784 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 74 + layers_on_npu: 106 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 74 - job_id: jep28m2qp + total_layers: 106 + job_id: jgkevleyg job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -445,4 +396,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:11:54Z' + timestamp: '2024-10-17T17:28:11Z' diff --git a/qai_hub_models/models/mobilenet_v2_quantized/requirements.txt b/qai_hub_models/models/mobilenet_v2_quantized/requirements.txt deleted file mode 100644 index de5b80e8..00000000 --- a/qai_hub_models/models/mobilenet_v2_quantized/requirements.txt +++ /dev/null @@ -1 +0,0 @@ -aimet-torch==1.32.1.post1; sys_platform == "linux" diff --git a/qai_hub_models/models/mobilenet_v2_quantized/test.py b/qai_hub_models/models/mobilenet_v2_quantized/test.py deleted file mode 100644 index 9837761a..00000000 --- a/qai_hub_models/models/mobilenet_v2_quantized/test.py +++ /dev/null @@ -1,31 +0,0 @@ -# --------------------------------------------------------------------- -# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. -# SPDX-License-Identifier: BSD-3-Clause -# --------------------------------------------------------------------- -from qai_hub_models.models._shared.imagenet_classifier.test_utils import ( - run_imagenet_classifier_test, -) -from qai_hub_models.models.mobilenet_v2_quantized.demo import main as demo_main -from qai_hub_models.models.mobilenet_v2_quantized.model import ( - MODEL_ASSET_VERSION, - MODEL_ID, - MobileNetV2Quantizable, -) -from qai_hub_models.utils.testing import skip_clone_repo_check - - -@skip_clone_repo_check -def test_task(): - run_imagenet_classifier_test( - MobileNetV2Quantizable.from_pretrained(), - MODEL_ID, - asset_version=MODEL_ASSET_VERSION, - probability_threshold=0.56, - diff_tol=0.06, - ) - - -@skip_clone_repo_check -def test_demo(): - # Verify demo does not crash - demo_main(is_test=True) diff --git a/qai_hub_models/models/mobilenet_v3_large/README.md b/qai_hub_models/models/mobilenet_v3_large/README.md index cbc69327..f2e5c92e 100644 --- a/qai_hub_models/models/mobilenet_v3_large/README.md +++ b/qai_hub_models/models/mobilenet_v3_large/README.md @@ -6,7 +6,7 @@ MobileNet-v3-Large is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of MobileNet-v3-Large found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/mobilenetv3.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/mobilenet_v3_large). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.mobilenet_v3_large.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of MobileNet-v3-Large can be found +* The license for the original implementation of MobileNet-v3-Large can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/mobilenetv3.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/mobilenet_v3_large/export.py b/qai_hub_models/models/mobilenet_v3_large/export.py index 5ff2c2ad..0ab299ba 100644 --- a/qai_hub_models/models/mobilenet_v3_large/export.py +++ b/qai_hub_models/models/mobilenet_v3_large/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.mobilenet_v3_large import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "mobilenet_v3_large" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/mobilenet_v3_large/perf.yaml b/qai_hub_models/models/mobilenet_v3_large/perf.yaml index 62227cc1..ae27a257 100644 --- a/qai_hub_models/models/mobilenet_v3_large/perf.yaml +++ b/qai_hub_models/models/mobilenet_v3_large/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: MobileNet-v3-Large performance_metrics: - torchscript_onnx_tflite: - inference_time: 996.0 - throughput: 1004.0160642570281 + inference_time: 987.0 + throughput: 1013.1712259371834 estimated_peak_memory_range: min: 12288 - max: 2218408 + max: 38384416 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 128 - job_id: j1glnkdmp + job_id: jgjvnqyxg job_status: Passed torchscript_onnx_qnn: - inference_time: 1044.0 - throughput: 957.8544061302682 + inference_time: 1051.0 + throughput: 951.4747859181732 estimated_peak_memory_range: - min: 626688 - max: 286895088 + min: 622592 + max: 5759816 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 144 - job_id: jygze76xg + job_id: jp14z6m8p job_status: Passed torchscript_onnx: - inference_time: 1036.0 - throughput: 965.2509652509652 + inference_time: 993.0 + throughput: 1007.0493454179255 estimated_peak_memory_range: - min: 0 - max: 276525096 + min: 618496 + max: 2189336 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 146 - job_id: jo5mrz79g + job_id: jp0z06x95 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:11:02Z' + timestamp: '2024-10-15T00:07:28Z' - torchscript_onnx_tflite: - inference_time: 692.0 - throughput: 1445.086705202312 + inference_time: 698.0 + throughput: 1432.6647564469913 estimated_peak_memory_range: min: 16384 - max: 67093360 + max: 68672320 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 128 - job_id: jw5661xy5 + job_id: jpedmyx15 job_status: Passed torchscript_onnx_qnn: - inference_time: 731.0 - throughput: 1367.9890560875513 + inference_time: 732.0 + throughput: 1366.120218579235 estimated_peak_memory_range: min: 618496 - max: 18611664 + max: 19697296 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 144 - job_id: jz5wo9kmp + job_id: jgdx12mrp job_status: Passed torchscript_onnx: - inference_time: 756.0 - throughput: 1322.7513227513227 + inference_time: 722.0 + throughput: 1385.0415512465374 estimated_peak_memory_range: min: 0 - max: 67472384 + max: 68762576 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 146 - job_id: jegn2e4qg + job_id: jp8qy1kkp job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:11:03Z' + timestamp: '2024-10-15T00:07:29Z' - torchscript_onnx_tflite: inference_time: 987.0 throughput: 1013.1712259371834 estimated_peak_memory_range: - min: 24576 - max: 1867088 + min: 20480 + max: 1464160 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 128 - job_id: j1p3kmdn5 + job_id: jgz3dnyk5 job_status: Passed torchscript_onnx_qnn: - inference_time: 996.0 - throughput: 1004.0160642570281 + inference_time: 1001.0 + throughput: 999.000999000999 estimated_peak_memory_range: - min: 630784 - max: 1775008 + min: 638976 + max: 1908496 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 144 - job_id: jnp108975 + job_id: jp4lr3285 job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:10:58Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:07:20Z' - torchscript_onnx_tflite: - inference_time: 1392.0 - throughput: 718.3908045977012 + inference_time: 986.0 + throughput: 1014.1987829614604 estimated_peak_memory_range: - min: 12288 - max: 68139712 + min: 24576 + max: 253750768 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,37 +200,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 128 - job_id: jwgoyvxk5 + job_id: jgdx12mep job_status: Passed torchscript_onnx_qnn: - inference_time: 1470.0 - throughput: 680.2721088435375 + inference_time: 1003.0 + throughput: 997.0089730807578 estimated_peak_memory_range: - min: 622592 - max: 21538816 + min: 634880 + max: 1943496 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 146 + layers_on_npu: 144 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: j0pxv1qlg + total_layers: 144 + job_id: jgn6vkwk5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:11:02Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:07:24Z' - torchscript_onnx_tflite: - inference_time: 997.0 - throughput: 1003.0090270812437 + inference_time: 991.0 + throughput: 1009.0817356205853 estimated_peak_memory_range: - min: 57344 - max: 1584984 + min: 20480 + max: 1590376 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 128 - job_id: j1pv3w8r5 + job_id: jp14z6m2p job_status: Passed torchscript_onnx_qnn: - inference_time: 998.0 - throughput: 1002.0040080160321 + inference_time: 994.0 + throughput: 1006.0362173038229 estimated_peak_memory_range: - min: 634880 - max: 1977464 + min: 638976 + max: 1794568 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 144 - job_id: jvgdwvkz5 + job_id: j5mnx8ldp job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:10:59Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:07:23Z' - torchscript_onnx_tflite: - inference_time: 997.0 - throughput: 1003.0090270812437 + inference_time: 989.0 + throughput: 1011.1223458038422 estimated_peak_memory_range: - min: 45056 - max: 2527128 + min: 28672 + max: 257923856 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 128 - job_id: j7gjxl9ep + job_id: jg9lndqlg job_status: Passed torchscript_onnx_qnn: - inference_time: 999.0 - throughput: 1001.001001001001 + inference_time: 1000.0 + throughput: 1000.0 estimated_peak_memory_range: - min: 663552 - max: 2271392 + min: 622592 + max: 1873896 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,60 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 144 - job_id: jz57zdm9p + job_id: jpxkoxz35 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:11:00Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:07:22Z' - torchscript_onnx_tflite: - inference_time: 991.0 - throughput: 1009.0817356205853 + inference_time: 1385.0 + throughput: 722.0216606498195 estimated_peak_memory_range: - min: 24576 - max: 1530320 + min: 20480 + max: 69762448 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 128 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 128 + job_id: j5we64r65 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 1473.0 + throughput: 678.8866259334691 + estimated_peak_memory_range: + min: 618496 + max: 23621792 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 146 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 146 + job_id: jp2kyezrp + job_status: Passed + reference_device_info: + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:07:26Z' + - torchscript_onnx_tflite: + inference_time: 679.0 + throughput: 1472.7540500736377 + estimated_peak_memory_range: + min: 12288 + max: 26004720 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +352,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 128 - job_id: jlpe9vqvg + job_id: jg9lndqwg job_status: Passed torchscript_onnx_qnn: - inference_time: 1002.0 - throughput: 998.003992015968 + inference_time: 710.0 + throughput: 1408.4507042253522 estimated_peak_memory_range: - min: 630784 - max: 1848168 + min: 614400 + max: 15213248 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +367,34 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 144 - job_id: jqp4qw71g + job_id: jpy13my8p + job_status: Passed + torchscript_onnx: + inference_time: 734.0 + throughput: 1362.3978201634877 + estimated_peak_memory_range: + min: 0 + max: 28075408 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 146 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 146 + job_id: jglvmlqj5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) - os: '13' - form_factor: Auto + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:11:01Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:07:32Z' - torchscript_onnx_qnn: - inference_time: 1173.0 - throughput: 852.5149190110827 + inference_time: 1166.0 + throughput: 857.6329331046312 estimated_peak_memory_range: min: 602112 max: 602112 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 144 - job_id: jmg9v4r85 + job_id: j57yr98v5 job_status: Passed torchscript_onnx: - inference_time: 1040.0 - throughput: 961.5384615384615 + inference_time: 1072.0 + throughput: 932.8358208955224 estimated_peak_memory_range: - min: 14659584 - max: 14659584 + min: 13778944 + max: 13778944 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 146 - job_id: joprkyr75 + job_id: jgkex8kwg job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:11:04Z' + timestamp: '2024-10-15T00:07:30Z' diff --git a/qai_hub_models/models/mobilenet_v3_large_quantized/README.md b/qai_hub_models/models/mobilenet_v3_large_quantized/README.md index f1d80ca8..e799b0e9 100644 --- a/qai_hub_models/models/mobilenet_v3_large_quantized/README.md +++ b/qai_hub_models/models/mobilenet_v3_large_quantized/README.md @@ -6,7 +6,7 @@ MobileNet-v3-Large is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of MobileNet-v3-Large-Quantized found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/mobilenetv3.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/mobilenet_v3_large_quantized). @@ -17,11 +17,6 @@ accross various devices, can be found [here](https://aihub.qualcomm.com/models/m ## Example & Usage -Install the package via pip: -```bash -pip install "qai_hub_models[mobilenet_v3_large_quantized]" -``` - Once installed, run the following simple CLI demo: @@ -44,15 +39,19 @@ python -m qai_hub_models.models.mobilenet_v3_large_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of MobileNet-v3-Large-Quantized can be found +* The license for the original implementation of MobileNet-v3-Large-Quantized can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/mobilenetv3.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/mobilenet_v3_large_quantized/evaluate.py b/qai_hub_models/models/mobilenet_v3_large_quantized/evaluate.py index 39314070..6716f23c 100644 --- a/qai_hub_models/models/mobilenet_v3_large_quantized/evaluate.py +++ b/qai_hub_models/models/mobilenet_v3_large_quantized/evaluate.py @@ -13,10 +13,8 @@ from qai_hub_models.models.mobilenet_v3_large_quantized import MODEL_ID, Model from qai_hub_models.utils.args import evaluate_parser, get_hub_device, get_model_kwargs -from qai_hub_models.utils.base_model import BaseModel from qai_hub_models.utils.evaluate import evaluate_on_dataset from qai_hub_models.utils.inference import compile_model_from_args -from qai_hub_models.utils.quantization_aimet import AIMETQuantizableMixin SUPPORTED_DATASETS = ["imagenette", "imagenet"] @@ -27,6 +25,7 @@ def main(): model_cls=Model, default_split_size=2500, supported_datasets=SUPPORTED_DATASETS, + is_hub_quantized=True, ) args = parser.parse_args() args.device = None @@ -38,13 +37,7 @@ def main(): MODEL_ID, args, get_model_kwargs(Model, vars(args)) ) hub_device = get_hub_device(None, args.chipset) - - # Use Fp16 model for torch inference - for cls in Model.__mro__: - if issubclass(cls, BaseModel) and not issubclass(cls, AIMETQuantizableMixin): - torch_cls = cls - break - torch_model = torch_cls.from_pretrained(**get_model_kwargs(torch_cls, vars(args))) + torch_model = Model.from_pretrained(**get_model_kwargs(Model, vars(args))) evaluate_on_dataset( hub_model, torch_model, diff --git a/qai_hub_models/models/mobilenet_v3_large_quantized/export.py b/qai_hub_models/models/mobilenet_v3_large_quantized/export.py index 0b04571e..81d02d7a 100644 --- a/qai_hub_models/models/mobilenet_v3_large_quantized/export.py +++ b/qai_hub_models/models/mobilenet_v3_large_quantized/export.py @@ -10,18 +10,20 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.mobilenet_v3_large_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference +from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( print_inference_metrics, print_on_target_demo_cmd, @@ -31,11 +33,14 @@ can_access_qualcomm_ai_hub, export_without_hub_access, ) +from qai_hub_models.utils.quantization import get_calibration_data def export_model( device: str = "Samsung Galaxy S23 (Family)", chipset: Optional[str] = None, + num_calibration_samples: int = 100, + skip_compiling: bool = False, skip_profiling: bool = False, skip_inferencing: bool = False, skip_downloading: bool = False, @@ -45,20 +50,19 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + 3. Compiles the model to an asset that can be run on device + 4. Profiles the model performance on a real device + 5. Inferences the model on sample inputs + 6. Downloads the model asset to the local directory + 7. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 5 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -66,6 +70,9 @@ def export_model( Defaults to DEFAULT_DEVICE if not specified. chipset: If set, will choose a random device with this chipset. Overrides the `device` argument. + num_calibration_samples: The number of calibration data samples + to use for quantization. + skip_compiling: If set, skips compiling model to format that can run on device. skip_profiling: If set, skips profiling of compiled model on real devices. skip_inferencing: If set, skips computing on-device outputs from sample data. skip_downloading: If set, skips downloading of compiled model. @@ -80,10 +87,11 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: - * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). + A struct of: + * A CompileJob object containing metadata about the compile job submitted to hub (None if compiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). + * A QuantizeJob object containing metadata about the quantize job submitted to hub """ model_name = "mobilenet_v3_large_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,33 +117,52 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) ) # Trace the model - source_model = model.convert_to_hub_source_model( - target_runtime, output_path, input_spec + source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) + + print(f"Quantizing model {model_name} with {num_calibration_samples} samples.") + # 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + onnx_compile_job = hub.submit_compile_job( + model=source_model, + input_specs=input_spec, + device=hub_device, + name=model_name, + options="--target_runtime onnx", + ) + quantize_job = hub.submit_quantize_job( + model=onnx_compile_job.get_target_model(), + calibration_data=get_calibration_data( + input_spec, "imagenette", num_calibration_samples + ), + weights_dtype=model.get_weights_dtype(), + activations_dtype=model.get_activations_dtype(), + name=model_name, + options=model.get_quantize_options(), ) + if skip_compiling: + return ExportResult(quantize_job=quantize_job) - # 2. Compile the model to an on-device asset + # 3. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) print(f"Optimizing model {model_name} to run on-device") submitted_compile_job = hub.submit_compile_job( - model=source_model, + model=quantize_job.get_target_model(), input_specs=input_spec, device=hub_device, name=model_name, - calibration_data=model.get_calibration_data(target_runtime), options=model_compile_options, ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 4. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +177,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 5. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +198,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 6. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 7. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -198,12 +225,17 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + quantize_job=quantize_job, + ) def main(): warnings.filterwarnings("ignore") - parser = export_parser(model_cls=Model) + parser = export_parser(model_cls=Model, is_hub_quantized=True) args = parser.parse_args() export_model(**vars(args)) diff --git a/qai_hub_models/models/mobilenet_v3_large_quantized/model.py b/qai_hub_models/models/mobilenet_v3_large_quantized/model.py index b13a9d4c..ee3c1b02 100644 --- a/qai_hub_models/models/mobilenet_v3_large_quantized/model.py +++ b/qai_hub_models/models/mobilenet_v3_large_quantized/model.py @@ -4,78 +4,11 @@ # --------------------------------------------------------------------- from __future__ import annotations -# isort: off -# This verifies aimet is installed, and this must be included first. -from qai_hub_models.utils.quantization_aimet import ( - AIMETQuantizableMixin, - constrain_quantized_inputs_to_image_range, -) - -# isort: on - -import torch -from aimet_torch.cross_layer_equalization import equalize_model -from aimet_torch.model_preparer import prepare_model -from aimet_torch.quantsim import QuantizationSimModel, load_encodings_to_sim - from qai_hub_models.models.mobilenet_v3_large.model import MobileNetV3Large -from qai_hub_models.utils.aimet.config_loader import get_default_aimet_config -from qai_hub_models.utils.asset_loaders import CachedWebModelAsset +from qai_hub_models.utils.quantization import HubQuantizableMixin MODEL_ID = __name__.split(".")[-2] -MODEL_ASSET_VERSION = 2 -DEFAULT_ENCODINGS = "mobilenet_v3_large_quantized_encodings.json" - - -class MobileNetV3LargeQuantizable(AIMETQuantizableMixin, MobileNetV3Large): - """MobileNetV3Large with post train quantization support. - - Supports only 8 bit weights and activations, and only loads pre-quantized checkpoints. - Support for quantizing using your own weights & data will come at a later date.""" - - def __init__( - self, - sim_model: QuantizationSimModel, - ) -> None: - # Input is already normalized by sim_model. Disable it in the wrapper model. - MobileNetV3Large.__init__(self, sim_model.model, normalize_input=False) - AIMETQuantizableMixin.__init__( - self, - sim_model, - ) - - @classmethod - def from_pretrained( - cls, - aimet_encodings: str | None = "DEFAULT", - ) -> "MobileNetV3LargeQuantizable": - """ - Parameters: - aimet_encodings: - if "DEFAULT": Loads the model with aimet encodings calibrated on imagenette. - elif None: Doesn't load any encodings. Used when computing encodings. - else: Interprets as a filepath and loads the encodings stored there. - """ - model = MobileNetV3Large.from_pretrained() - input_shape = cls.get_input_spec()["image_tensor"][0] - - model = prepare_model(model) - equalize_model(model, input_shape) - sim = QuantizationSimModel( - model, - quant_scheme="tf_enhanced", - default_param_bw=8, - default_output_bw=8, - config_file=get_default_aimet_config(), - dummy_input=torch.rand(input_shape), - ) - constrain_quantized_inputs_to_image_range(sim) - if aimet_encodings: - if aimet_encodings == "DEFAULT": - aimet_encodings = CachedWebModelAsset.from_asset_store( - MODEL_ID, MODEL_ASSET_VERSION, DEFAULT_ENCODINGS - ).fetch() - load_encodings_to_sim(sim, aimet_encodings) - return cls(sim) +class MobileNetV3LargeQuantizable(HubQuantizableMixin, MobileNetV3Large): + pass diff --git a/qai_hub_models/models/mobilenet_v3_large_quantized/perf.yaml b/qai_hub_models/models/mobilenet_v3_large_quantized/perf.yaml index 0a7c2e6c..659488a9 100644 --- a/qai_hub_models/models/mobilenet_v3_large_quantized/perf.yaml +++ b/qai_hub_models/models/mobilenet_v3_large_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,44 +20,39 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS6490 (Proxy) - RB3 Gen 2 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Sa8775p Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Qcs6490 Proxy + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8775P Proxy models: - name: MobileNet-v3-Large-Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 336.0 - throughput: 2976.190476190476 + inference_time: 346.0 + throughput: 2890.173410404624 estimated_peak_memory_range: - min: 24576 - max: 1495384 + min: 12288 + max: 10174368 primary_compute_unit: NPU precision: int8 layer_info: @@ -64,29 +60,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 137 - job_id: j1p3kmr35 + job_id: jp2kxmnxp job_status: Passed torchscript_onnx_qnn: - inference_time: 627.0 - throughput: 1594.896331738437 + inference_time: 630.0 + throughput: 1587.3015873015872 estimated_peak_memory_range: - min: 12288 - max: 25425304 + min: 28672 + max: 14777032 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 126 + layers_on_npu: 145 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 126 - job_id: jvgdwvyr5 + total_layers: 145 + job_id: jgjvdl47g job_status: Passed torchscript_onnx: - inference_time: 681.0 - throughput: 1468.4287812041116 + inference_time: 656.0 + throughput: 1524.3902439024391 estimated_peak_memory_range: min: 12288 - max: 17557696 + max: 12047376 primary_compute_unit: NPU precision: int8 layer_info: @@ -94,7 +90,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 134 - job_id: joprkym75 + job_id: jgn609vv5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -103,13 +99,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:10:25Z' + timestamp: '2024-10-17T17:26:46Z' - torchscript_onnx_tflite: - inference_time: 318.0 - throughput: 3144.6540880503144 + inference_time: 239.0 + throughput: 4184.100418410042 estimated_peak_memory_range: min: 12288 - max: 55456912 + max: 54438352 primary_compute_unit: NPU precision: int8 layer_info: @@ -117,29 +113,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 137 - job_id: jwgoyv9q5 + job_id: jpy1zd0rp job_status: Passed torchscript_onnx_qnn: - inference_time: 457.0 - throughput: 2188.183807439825 + inference_time: 602.0 + throughput: 1661.1295681063123 estimated_peak_memory_range: min: 163840 - max: 18072000 + max: 19043536 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 126 + layers_on_npu: 145 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 126 - job_id: jz5wo90mp + total_layers: 145 + job_id: jpedov375 job_status: Passed torchscript_onnx: - inference_time: 521.0 - throughput: 1919.3857965451057 + inference_time: 495.0 + throughput: 2020.20202020202 estimated_peak_memory_range: - min: 0 - max: 84828160 + min: 12288 + max: 85886736 primary_compute_unit: NPU precision: int8 layer_info: @@ -147,7 +143,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 134 - job_id: jep28mqqp + job_id: jprv643vg job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -156,13 +152,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:10:25Z' + timestamp: '2024-10-17T17:26:48Z' - torchscript_onnx_tflite: - inference_time: 341.0 - throughput: 2932.551319648094 + inference_time: 1133.0 + throughput: 882.61253309797 estimated_peak_memory_range: min: 12288 - max: 1313456 + max: 33516144 primary_compute_unit: NPU precision: int8 layer_info: @@ -170,37 +166,60 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 137 - job_id: j1pv3wyk5 + job_id: jp0z4r725 job_status: Passed torchscript_onnx_qnn: - inference_time: 579.0 - throughput: 1727.1157167530225 + inference_time: 1699.0 + throughput: 588.5815185403178 estimated_peak_memory_range: - min: 184320 - max: 1431192 + min: 12288 + max: 7871536 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 126 + layers_on_npu: 145 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 126 - job_id: jnp108k75 + total_layers: 145 + job_id: jgz327kz5 job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS6490 Proxy + timestamp: '2024-10-17T17:26:32Z' + - torchscript_onnx_tflite: + inference_time: 6815.0 + throughput: 146.7351430667645 + estimated_peak_memory_range: + min: 40960 + max: 2494136 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 137 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 137 + job_id: jp8q27vzp + job_status: Passed + reference_device_info: + name: RB5 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:10:19Z' + chipset: QCS8250 Proxy + timestamp: '2024-10-17T17:26:17Z' - torchscript_onnx_tflite: - inference_time: 440.0 - throughput: 2272.7272727272725 + inference_time: 349.0 + throughput: 2865.3295128939826 estimated_peak_memory_range: - min: 20480 - max: 54918832 + min: 28672 + max: 1402784 primary_compute_unit: NPU precision: int8 layer_info: @@ -208,37 +227,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 137 - job_id: j7gjxl6vp + job_id: jgkevymyg job_status: Passed torchscript_onnx_qnn: - inference_time: 756.0 - throughput: 1322.7513227513227 + inference_time: 574.0 + throughput: 1742.1602787456445 estimated_peak_memory_range: - min: 163840 - max: 20758448 + min: 184320 + max: 1715536 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 126 + layers_on_npu: 145 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 126 - job_id: jo5mrz19g + total_layers: 145 + job_id: j5wew9nz5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:10:23Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-17T17:26:34Z' - torchscript_onnx_tflite: - inference_time: 344.0 - throughput: 2906.9767441860463 + inference_time: 336.0 + throughput: 2976.190476190476 estimated_peak_memory_range: min: 12288 - max: 2200328 + max: 1349808 primary_compute_unit: NPU precision: int8 layer_info: @@ -246,37 +265,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 137 - job_id: jlpe9v0og + job_id: j5q602o7p job_status: Passed torchscript_onnx_qnn: - inference_time: 579.0 - throughput: 1727.1157167530225 + inference_time: 574.0 + throughput: 1742.1602787456445 estimated_peak_memory_range: - min: 184320 - max: 1425392 + min: 188416 + max: 1410288 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 126 + layers_on_npu: 145 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 126 - job_id: jvgdwvyz5 + total_layers: 145 + job_id: jp1428xkp job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:10:20Z' + chipset: SA8255P Proxy + timestamp: '2024-10-17T17:26:37Z' - torchscript_onnx_tflite: - inference_time: 339.0 - throughput: 2949.8525073746314 + inference_time: 348.0 + throughput: 2873.5632183908046 estimated_peak_memory_range: - min: 24576 - max: 129688360 + min: 16384 + max: 1496872 primary_compute_unit: NPU precision: int8 layer_info: @@ -284,22 +303,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 137 - job_id: jygze7qog + job_id: jglv4kre5 job_status: Passed torchscript_onnx_qnn: - inference_time: 575.0 - throughput: 1739.1304347826087 + inference_time: 577.0 + throughput: 1733.102253032929 estimated_peak_memory_range: min: 176128 - max: 1611400 + max: 1424912 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 126 + layers_on_npu: 145 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 126 - job_id: jz57zd19p + total_layers: 145 + job_id: jgdxnvlkp job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -307,14 +326,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:10:21Z' + chipset: SA8775P Proxy + timestamp: '2024-10-17T17:26:39Z' - torchscript_onnx_tflite: - inference_time: 348.0 - throughput: 2873.5632183908046 + inference_time: 439.0 + throughput: 2277.904328018223 estimated_peak_memory_range: min: 16384 - max: 3181552 + max: 55946528 primary_compute_unit: NPU precision: int8 layer_info: @@ -322,37 +341,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 137 - job_id: jz5wo903p + job_id: j56y21lvp job_status: Passed torchscript_onnx_qnn: - inference_time: 576.0 - throughput: 1736.111111111111 + inference_time: 767.0 + throughput: 1303.7809647979138 estimated_peak_memory_range: - min: 184320 - max: 1569304 + min: 163840 + max: 21186096 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 126 + layers_on_npu: 145 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 126 - job_id: j0pxv18lg + total_layers: 145 + job_id: j57y2d3q5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:10:22Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-17T17:26:40Z' - torchscript_onnx_tflite: - inference_time: 1151.0 - throughput: 868.8097306689835 + inference_time: 249.0 + throughput: 4016.0642570281125 estimated_peak_memory_range: - min: 12288 - max: 33354320 + min: 8192 + max: 31590176 primary_compute_unit: NPU precision: int8 layer_info: @@ -360,75 +379,67 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 137 - job_id: jmg9v47w5 + job_id: jp3jnm2xg job_status: Passed torchscript_onnx_qnn: - inference_time: 1734.0 - throughput: 576.7012687427913 + inference_time: 473.0 + throughput: 2114.164904862579 estimated_peak_memory_range: - min: 12288 - max: 7523600 + min: 159744 + max: 14395792 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 126 + layers_on_npu: 145 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 126 - job_id: jegn2edqg + total_layers: 145 + job_id: jp4lnw0q5 job_status: Passed - reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot - os_name: Android - manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T12:10:24Z' - - torchscript_onnx_tflite: - inference_time: 6871.0 - throughput: 145.53922282055015 + torchscript_onnx: + inference_time: 528.0 + throughput: 1893.939393939394 estimated_peak_memory_range: min: 12288 - max: 2058328 + max: 38851536 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 137 + layers_on_npu: 134 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 137 - job_id: jnp108k85 + total_layers: 134 + job_id: jpy1z43rp job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T12:10:15Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-17T17:26:51Z' - torchscript_onnx_qnn: - inference_time: 699.0 - throughput: 1430.615164520744 + inference_time: 716.0 + throughput: 1396.6480446927374 estimated_peak_memory_range: - min: 540672 - max: 540672 + min: 544768 + max: 544768 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 126 + layers_on_npu: 145 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 126 - job_id: jmg9v4785 + total_layers: 145 + job_id: jg9l04eqg job_status: Passed torchscript_onnx: - inference_time: 703.0 - throughput: 1422.475106685633 + inference_time: 720.0 + throughput: 1388.888888888889 estimated_peak_memory_range: - min: 11837440 - max: 11837440 + min: 10633216 + max: 10633216 primary_compute_unit: NPU precision: int8 layer_info: @@ -436,7 +447,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 134 - job_id: jqpyedklg + job_id: jp2kx7yxp job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -445,4 +456,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:10:26Z' + timestamp: '2024-10-17T17:26:49Z' diff --git a/qai_hub_models/models/mobilenet_v3_large_quantized/requirements.txt b/qai_hub_models/models/mobilenet_v3_large_quantized/requirements.txt deleted file mode 100644 index de5b80e8..00000000 --- a/qai_hub_models/models/mobilenet_v3_large_quantized/requirements.txt +++ /dev/null @@ -1 +0,0 @@ -aimet-torch==1.32.1.post1; sys_platform == "linux" diff --git a/qai_hub_models/models/mobilenet_v3_large_quantized/test.py b/qai_hub_models/models/mobilenet_v3_large_quantized/test.py deleted file mode 100644 index 6767deef..00000000 --- a/qai_hub_models/models/mobilenet_v3_large_quantized/test.py +++ /dev/null @@ -1,29 +0,0 @@ -# --------------------------------------------------------------------- -# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. -# SPDX-License-Identifier: BSD-3-Clause -# --------------------------------------------------------------------- -from qai_hub_models.models._shared.imagenet_classifier.test_utils import ( - run_imagenet_classifier_test, -) -from qai_hub_models.models.mobilenet_v3_large_quantized.demo import main as demo_main -from qai_hub_models.models.mobilenet_v3_large_quantized.model import ( - MODEL_ASSET_VERSION, - MODEL_ID, - MobileNetV3LargeQuantizable, -) - - -def test_task(): - run_imagenet_classifier_test( - MobileNetV3LargeQuantizable.from_pretrained(), - MODEL_ID, - asset_version=MODEL_ASSET_VERSION, - diff_tol=0.005, - rtol=0.02, - atol=0.2, - ) - - -def test_demo(): - # Verify demo does not crash - demo_main(is_test=True) diff --git a/qai_hub_models/models/mobilenet_v3_small/README.md b/qai_hub_models/models/mobilenet_v3_small/README.md index acbc1178..fc4a6a5c 100644 --- a/qai_hub_models/models/mobilenet_v3_small/README.md +++ b/qai_hub_models/models/mobilenet_v3_small/README.md @@ -6,7 +6,7 @@ MobileNetV3Small is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of MobileNet-v3-Small found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/mobilenetv3.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/mobilenet_v3_small). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.mobilenet_v3_small.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of MobileNet-v3-Small can be found +* The license for the original implementation of MobileNet-v3-Small can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/mobilenetv3.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/mobilenet_v3_small/export.py b/qai_hub_models/models/mobilenet_v3_small/export.py index 1eac925c..28542f6f 100644 --- a/qai_hub_models/models/mobilenet_v3_small/export.py +++ b/qai_hub_models/models/mobilenet_v3_small/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.mobilenet_v3_small import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "mobilenet_v3_small" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/mobilenet_v3_small/perf.yaml b/qai_hub_models/models/mobilenet_v3_small/perf.yaml index c8114238..6daac35e 100644 --- a/qai_hub_models/models/mobilenet_v3_small/perf.yaml +++ b/qai_hub_models/models/mobilenet_v3_small/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: MobileNet-v3-Small performance_metrics: - torchscript_onnx_tflite: - inference_time: 817.0 - throughput: 1223.9902080783354 + inference_time: 812.0 + throughput: 1231.527093596059 estimated_peak_memory_range: - min: 28672 - max: 147618024 + min: 12288 + max: 1330904 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 115 - job_id: joprk9oe5 + job_id: j5mnx80wp job_status: Passed torchscript_onnx_qnn: - inference_time: 869.0 - throughput: 1150.7479861910242 + inference_time: 864.0 + throughput: 1157.4074074074074 estimated_peak_memory_range: min: 16384 - max: 34205016 + max: 145318176 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: jvgdwvmr5 + job_id: j56y4w80p job_status: Passed torchscript_onnx: - inference_time: 819.0 - throughput: 1221.001221001221 + inference_time: 813.0 + throughput: 1230.0123001230013 estimated_peak_memory_range: - min: 12288 - max: 7244784 + min: 282624 + max: 13119392 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 128 - job_id: jqpyedy8g + job_id: j57yr9ol5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T23:08:41Z' + timestamp: '2024-10-15T00:05:40Z' - torchscript_onnx_tflite: - inference_time: 550.0 - throughput: 1818.1818181818182 + inference_time: 549.0 + throughput: 1821.4936247723133 estimated_peak_memory_range: - min: 12288 - max: 45436416 + min: 16384 + max: 46787648 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 115 - job_id: jep28j4mp + job_id: jprv3wl9g job_status: Passed torchscript_onnx_qnn: - inference_time: 714.0 - throughput: 1400.5602240896358 + inference_time: 590.0 + throughput: 1694.915254237288 estimated_peak_memory_range: min: 618496 - max: 16021200 + max: 16897440 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: jz57zd8vp + job_id: jp3j06zlg job_status: Passed torchscript_onnx: - inference_time: 594.0 - throughput: 1683.5016835016836 + inference_time: 586.0 + throughput: 1706.4846416382252 estimated_peak_memory_range: min: 0 - max: 48345728 + max: 49373776 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 128 - job_id: j2p0yrx9g + job_id: jp4lr3ev5 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T23:08:42Z' + timestamp: '2024-10-15T00:05:41Z' - torchscript_onnx_tflite: - inference_time: 817.0 - throughput: 1223.9902080783354 + inference_time: 814.0 + throughput: 1228.5012285012285 estimated_peak_memory_range: min: 12288 - max: 1831712 + max: 17196376 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 115 - job_id: jqpyenq4g + job_id: jp2kyer4p job_status: Passed torchscript_onnx_qnn: - inference_time: 840.0 - throughput: 1190.4761904761904 + inference_time: 830.0 + throughput: 1204.8192771084337 estimated_peak_memory_range: min: 638976 - max: 2285872 + max: 1870192 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: j0pxv1z3g + job_id: jpv6k7lj5 job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T23:08:43Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:05:32Z' - torchscript_onnx_tflite: - inference_time: 1094.0 - throughput: 914.0767824497258 + inference_time: 814.0 + throughput: 1228.5012285012285 estimated_peak_memory_range: - min: 16384 - max: 47850208 + min: 24576 + max: 18042136 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,37 +200,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 115 - job_id: j2p0ykdeg + job_id: jgkex822g job_status: Passed torchscript_onnx_qnn: - inference_time: 1160.0 - throughput: 862.0689655172414 + inference_time: 838.0 + throughput: 1193.3174224343675 estimated_peak_memory_range: - min: 618496 - max: 17611888 + min: 626688 + max: 1985320 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 128 + layers_on_npu: 126 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 128 - job_id: jep28mzrp + total_layers: 126 + job_id: jgz3dnlk5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T23:08:43Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:05:36Z' - torchscript_onnx_tflite: - inference_time: 819.0 - throughput: 1221.001221001221 + inference_time: 824.0 + throughput: 1213.5922330097087 estimated_peak_memory_range: - min: 12288 - max: 9970960 + min: 20480 + max: 1643256 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 115 - job_id: j1p8o868g + job_id: jp8qy1exp job_status: Passed torchscript_onnx_qnn: - inference_time: 842.0 - throughput: 1187.648456057007 + inference_time: 839.0 + throughput: 1191.8951132300358 estimated_peak_memory_range: - min: 638976 - max: 1910272 + min: 651264 + max: 1766504 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: jo5mrzldg + job_id: jpedmy715 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T23:08:44Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:05:34Z' - torchscript_onnx_tflite: - inference_time: 815.0 - throughput: 1226.993865030675 + inference_time: 811.0 + throughput: 1233.0456226880394 estimated_peak_memory_range: - min: 12288 - max: 1585672 + min: 32768 + max: 1493512 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 115 - job_id: jogkzdoog + job_id: jp0z06m65 job_status: Passed torchscript_onnx_qnn: - inference_time: 847.0 - throughput: 1180.637544273908 + inference_time: 832.0 + throughput: 1201.923076923077 estimated_peak_memory_range: - min: 634880 - max: 1973856 + min: 667648 + max: 1965000 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: jegn2ewkg + job_id: jgjvnqrxg job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T23:08:45Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:05:33Z' - torchscript_onnx_tflite: - inference_time: 819.0 - throughput: 1221.001221001221 + inference_time: 1098.0 + throughput: 910.7468123861566 estimated_peak_memory_range: min: 12288 - max: 1528968 + max: 48386080 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,52 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 115 - job_id: jn5q8wzm5 + job_id: jpy13mo7p job_status: Passed torchscript_onnx_qnn: - inference_time: 838.0 - throughput: 1193.3174224343675 + inference_time: 1157.0 + throughput: 864.304235090752 estimated_peak_memory_range: - min: 634880 - max: 1874272 + min: 618496 + max: 18132000 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 128 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 128 + job_id: jp14z6o2p + job_status: Passed + reference_device_info: + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:05:38Z' + - torchscript_onnx_tflite: + inference_time: 457.0 + throughput: 2188.183807439825 + estimated_peak_memory_range: + min: 12288 + max: 22173664 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 115 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 115 + job_id: jglvmly85 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 476.0 + throughput: 2100.840336134454 + estimated_peak_memory_range: + min: 0 + max: 11806272 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +367,34 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: joprky705 + job_id: jgdx126ep + job_status: Passed + torchscript_onnx: + inference_time: 596.0 + throughput: 1677.8523489932886 + estimated_peak_memory_range: + min: 0 + max: 23416352 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 128 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 128 + job_id: jgn6vk1r5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) - os: '13' - form_factor: Auto + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T23:08:46Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:05:44Z' - torchscript_onnx_qnn: - inference_time: 968.0 - throughput: 1033.0578512396694 + inference_time: 1007.0 + throughput: 993.0486593843099 estimated_peak_memory_range: min: 602112 max: 602112 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: jqp4qw28g + job_id: jgo268lxp job_status: Passed torchscript_onnx: - inference_time: 841.0 - throughput: 1189.0606420927468 + inference_time: 970.0 + throughput: 1030.9278350515465 estimated_peak_memory_range: - min: 6303744 - max: 6303744 + min: 6238208 + max: 6238208 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 128 - job_id: j1p8o7kkg + job_id: jpxkox015 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:09:34Z' + timestamp: '2024-10-15T00:05:42Z' diff --git a/qai_hub_models/models/openai_clip/README.md b/qai_hub_models/models/openai_clip/README.md index 6bab600a..07421d68 100644 --- a/qai_hub_models/models/openai_clip/README.md +++ b/qai_hub_models/models/openai_clip/README.md @@ -6,7 +6,7 @@ Contrastive Language-Image Pre-Training (CLIP) uses a ViT like transformer to get visual features and a causal language model to get the text features. Both the text and visual features can then be used for a variety of zero-shot learning tasks. This is based on the implementation of OpenAI-Clip found -[here](https://github.com/openai/CLIP/). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/openai_clip). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.openai_clip.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of OpenAI-Clip can be found +* The license for the original implementation of OpenAI-Clip can be found [here](https://github.com/openai/CLIP/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Learning Transferable Visual Models From Natural Language Supervision](https://arxiv.org/abs/2103.00020) * [Source Model Implementation](https://github.com/openai/CLIP/) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/openai_clip/export.py b/qai_hub_models/models/openai_clip/export.py index 27dc1f2f..8f52b43e 100644 --- a/qai_hub_models/models/openai_clip/export.py +++ b/qai_hub_models/models/openai_clip/export.py @@ -10,14 +10,15 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Mapping, Optional, Tuple, cast +from typing import Any, Dict, List, Mapping, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.openai_clip import Model from qai_hub_models.utils.args import export_parser, get_model_kwargs -from qai_hub_models.utils.base_model import BaseModel, TargetRuntime +from qai_hub_models.utils.base_model import BaseModel from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -45,20 +46,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Mapping[ - str, Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] -] | List[str]: +) -> Mapping[str, ExportResult] | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -83,10 +82,10 @@ def export_model( `model_cls.from_pretrained` Returns: - A Mapping from component_name to a 3-tuple of: + A Mapping from component_name to a struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "openai_clip" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -118,7 +117,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) components_dict: Dict[str, BaseModel] = {} if "CLIPTextEncoder" in components: @@ -135,7 +134,7 @@ def export_model( component.to("cpu"), make_torch_inputs(input_spec) ) - # 2. Compile the models to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = component.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -151,7 +150,7 @@ def export_model( hub.client.CompileJob, submitted_compile_job ) - # 3. Profile the model assets on real devices + # 3. Profiles the model performance on a real device profile_jobs: Dict[str, hub.client.ProfileJob] = {} if not skip_profiling: for component_name in components: @@ -169,7 +168,7 @@ def export_model( hub.client.ProfileJob, submitted_profile_job ) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_jobs: Dict[str, hub.client.InferenceJob] = {} if not skip_inferencing: for component_name in components: @@ -193,14 +192,14 @@ def export_model( hub.client.InferenceJob, submitted_inference_job ) - # 5. Download the model assets to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) for component_name, compile_job in compile_jobs.items(): target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / component_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: for component_name in components: profile_job = profile_jobs[component_name] @@ -225,10 +224,10 @@ def export_model( ) return { - component_name: ( - compile_jobs[component_name], - profile_jobs.get(component_name, None), - inference_jobs.get(component_name, None), + component_name: ExportResult( + compile_job=compile_jobs[component_name], + inference_job=inference_jobs.get(component_name, None), + profile_job=profile_jobs.get(component_name, None), ) for component_name in components } diff --git a/qai_hub_models/models/openai_clip/perf.yaml b/qai_hub_models/models/openai_clip/perf.yaml index 6f2fde1f..4d2bdcd6 100644 --- a/qai_hub_models/models/openai_clip/perf.yaml +++ b/qai_hub_models/models/openai_clip/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: CLIPTextEncoder performance_metrics: - torchscript_onnx_tflite: - inference_time: 6808.0 - throughput: 146.88601645123384 + inference_time: 5779.0 + throughput: 173.04031839418585 estimated_peak_memory_range: min: 16384 - max: 2516288 + max: 2798120 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 2 total_layers: 660 - job_id: jz5wo9y3p + job_id: jgjvnmlvg job_status: Passed torchscript_onnx_qnn: - inference_time: 5858.0 - throughput: 170.7067258449983 + inference_time: 4774.0 + throughput: 209.46795140343528 estimated_peak_memory_range: - min: 20480 - max: 20442960 + min: 16384 + max: 16300928 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 445 - job_id: jogkzy6wg + job_id: jpy13wdlp job_status: Passed torchscript_onnx: - inference_time: 38965.0 - throughput: 25.664057487488773 + inference_time: 35403.0 + throughput: 28.24619382538203 estimated_peak_memory_range: - min: 98304 - max: 137364600 + min: 81920 + max: 136793360 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 1 total_layers: 508 - job_id: j0pxv1r3g + job_id: jgn6vy9q5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:08:48Z' + timestamp: '2024-10-15T17:26:05Z' - torchscript_onnx_tflite: - inference_time: 4873.0 - throughput: 205.21239482864766 + inference_time: 4079.0 + throughput: 245.15812699190977 estimated_peak_memory_range: - min: 16384 - max: 186171488 + min: 32768 + max: 202933264 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 2 total_layers: 660 - job_id: jnp108o85 + job_id: jgo2241dp job_status: Passed torchscript_onnx_qnn: - inference_time: 4160.0 - throughput: 240.3846153846154 + inference_time: 3405.0 + throughput: 293.68575624082234 estimated_peak_memory_range: min: 12288 - max: 56794672 + max: 69456128 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 445 - job_id: j1glnkwjp + job_id: jp8qy97op job_status: Passed torchscript_onnx: - inference_time: 29339.0 - throughput: 34.084324619107676 + inference_time: 26223.0 + throughput: 38.13446211341189 estimated_peak_memory_range: min: 61440 - max: 471243408 + max: 560002208 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 1 total_layers: 508 - job_id: jegn2eqkg + job_id: jp4ll9ml5 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:08:49Z' + timestamp: '2024-10-16T08:34:45Z' - torchscript_onnx_tflite: - inference_time: 6725.0 - throughput: 148.6988847583643 + inference_time: 5717.0 + throughput: 174.91691446562882 estimated_peak_memory_range: - min: 24576 - max: 2526952 + min: 16384 + max: 1691600 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 2 total_layers: 660 - job_id: jz57zdovp + job_id: jg9ln14wg job_status: Passed torchscript_onnx_qnn: - inference_time: 5791.0 - throughput: 172.68174753928508 + inference_time: 4856.0 + throughput: 205.9308072487644 estimated_peak_memory_range: - min: 28672 - max: 1623792 + min: 24576 + max: 1295896 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 445 - job_id: j1pv3wmk5 + job_id: j56y4j3yp job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:08:39Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T17:25:53Z' - torchscript_onnx_tflite: - inference_time: 7421.0 - throughput: 134.75272874275703 + inference_time: 5711.0 + throughput: 175.1006828926633 estimated_peak_memory_range: min: 16384 - max: 163313920 + max: 2156408 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 2 total_layers: 660 - job_id: j0pxv103g + job_id: jp4lrow15 job_status: Passed torchscript_onnx_qnn: - inference_time: 6287.0 - throughput: 159.0583744234134 + inference_time: 4794.0 + throughput: 208.59407592824363 estimated_peak_memory_range: - min: 12288 - max: 62928032 + min: 49152 + max: 1298000 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 445 - job_id: jz57zdnvp + job_id: j5we6vdm5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:08:46Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T17:25:58Z' - torchscript_onnx_tflite: - inference_time: 6765.0 - throughput: 147.81966001478196 + inference_time: 5652.0 + throughput: 176.92852087756546 estimated_peak_memory_range: - min: 28672 - max: 2280024 + min: 16384 + max: 2561760 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 2 total_layers: 660 - job_id: jegn2e1kg + job_id: jgdx19vzp job_status: Passed torchscript_onnx_qnn: - inference_time: 5773.0 - throughput: 173.22016282695307 + inference_time: 4897.0 + throughput: 204.20665713702266 estimated_peak_memory_range: - min: 61440 - max: 1238176 + min: 94208 + max: 1440560 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 445 - job_id: jlpe9vxog + job_id: jgjvnm0eg job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:08:41Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T17:25:56Z' - torchscript_onnx_tflite: - inference_time: 6740.0 - throughput: 148.3679525222552 + inference_time: 5683.0 + throughput: 175.96339961288052 estimated_peak_memory_range: - min: 49152 - max: 2948808 + min: 20480 + max: 301324312 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 2 total_layers: 660 - job_id: jep28morp + job_id: jg9ln148g job_status: Passed torchscript_onnx_qnn: - inference_time: 5787.0 - throughput: 172.80110592707794 + inference_time: 4903.0 + throughput: 203.95676116663267 estimated_peak_memory_range: - min: 81920 - max: 1325240 + min: 24576 + max: 1123192 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 445 - job_id: jz5wo9z3p + job_id: jgo2601kp job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:08:43Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T17:25:54Z' - torchscript_onnx_tflite: - inference_time: 6704.0 - throughput: 149.16467780429593 + inference_time: 6593.0 + throughput: 151.67602002123465 estimated_peak_memory_range: - min: 24576 - max: 2242120 + min: 16384 + max: 176541344 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 2 total_layers: 660 - job_id: j2p0yro9g + job_id: jgdx19vrp job_status: Passed torchscript_onnx_qnn: - inference_time: 5821.0 - throughput: 171.79178835251676 + inference_time: 5491.0 + throughput: 182.11619012930248 estimated_peak_memory_range: - min: 73728 - max: 1312600 + min: 12288 + max: 68984736 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,22 +329,75 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 445 - job_id: jnp108185 + job_id: j57yrwj95 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T17:26:02Z' + - torchscript_onnx_tflite: + inference_time: 3963.0 + throughput: 252.33409033560434 + estimated_peak_memory_range: + min: 12288 + max: 114708448 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 658 + layers_on_gpu: 0 + layers_on_cpu: 2 + total_layers: 660 + job_id: jprv3qy7g + job_status: Passed + torchscript_onnx_qnn: + inference_time: 3266.0 + throughput: 306.1849357011635 + estimated_peak_memory_range: + min: 8192 + max: 68595760 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 445 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 445 + job_id: jgo224odp + job_status: Passed + torchscript_onnx: + inference_time: 23780.0 + throughput: 42.05214465937763 + estimated_peak_memory_range: + min: 102400 + max: 335012416 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 507 + layers_on_gpu: 0 + layers_on_cpu: 1 + total_layers: 508 + job_id: jglvmzem5 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:08:45Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-16T08:37:16Z' - torchscript_onnx_qnn: - inference_time: 6204.0 - throughput: 161.18633139909736 + inference_time: 5196.0 + throughput: 192.4557351809084 estimated_peak_memory_range: - min: 126976 - max: 126976 + min: 266240 + max: 266240 primary_compute_unit: NPU precision: fp16 layer_info: @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 445 - job_id: j1p3kmo35 + job_id: j5q6qk2op job_status: Passed torchscript_onnx: - inference_time: 39622.0 - throughput: 25.23850386149109 + inference_time: 38329.0 + throughput: 26.089905815440005 estimated_peak_memory_range: - min: 132591616 - max: 132591616 + min: 132571136 + max: 132571136 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 1 total_layers: 508 - job_id: jep28mdrp + job_id: jp0z0q1n5 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,15 +429,15 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:08:51Z' + timestamp: '2024-10-15T17:26:09Z' - name: CLIPImageEncoder performance_metrics: - torchscript_onnx_tflite: - inference_time: 41610.0 - throughput: 24.03268445085316 + inference_time: 38384.0 + throughput: 26.052521884118384 estimated_peak_memory_range: min: 69632 - max: 4087128 + max: 2451280 primary_compute_unit: NPU precision: fp16 layer_info: @@ -394,14 +445,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 659 - job_id: jmg9v4ow5 + job_id: jpedm1vo5 job_status: Passed torchscript_onnx_qnn: - inference_time: 32966.0 - throughput: 30.334283807559302 + inference_time: 27206.0 + throughput: 36.75659780930677 estimated_peak_memory_range: - min: 65536 - max: 61336256 + min: 61440 + max: 58920288 primary_compute_unit: NPU precision: fp16 layer_info: @@ -409,7 +460,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 438 - job_id: jn5q824n5 + job_id: jp0z0qrn5 + job_status: Passed + torchscript_onnx: + inference_time: 174036.0 + throughput: 5.745937622101175 + estimated_peak_memory_range: + min: 126976 + max: 203668048 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 501 + layers_on_gpu: 0 + layers_on_cpu: 1 + total_layers: 502 + job_id: jprv3q47g job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -418,13 +484,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:08:34Z' + timestamp: '2024-10-15T17:26:06Z' - torchscript_onnx_tflite: - inference_time: 33464.0 - throughput: 29.882859191967487 + inference_time: 33247.0 + throughput: 30.077901765572832 estimated_peak_memory_range: - min: 53248 - max: 601470704 + min: 32768 + max: 698029056 primary_compute_unit: NPU precision: fp16 layer_info: @@ -432,14 +498,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 659 - job_id: jvgdwv6r5 + job_id: jpv6691m5 job_status: Passed torchscript_onnx_qnn: - inference_time: 25442.0 - throughput: 39.30508607813851 + inference_time: 24164.0 + throughput: 41.38387684158252 estimated_peak_memory_range: - min: 638976 - max: 121004016 + min: 634880 + max: 178605712 primary_compute_unit: NPU precision: fp16 layer_info: @@ -447,7 +513,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 438 - job_id: jw5661o65 + job_id: jgkexnyng + job_status: Passed + torchscript_onnx: + inference_time: 118868.0 + throughput: 8.412693071305986 + estimated_peak_memory_range: + min: 843776 + max: 3744565520 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 501 + layers_on_gpu: 0 + layers_on_cpu: 1 + total_layers: 502 + job_id: jpxkkd395 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -456,13 +537,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:08:36Z' + timestamp: '2024-10-16T08:35:39Z' - torchscript_onnx_tflite: - inference_time: 41781.0 - throughput: 23.934324214355808 + inference_time: 37343.0 + throughput: 26.77878049433629 estimated_peak_memory_range: - min: 294912 - max: 2413024 + min: 16384 + max: 2185680 primary_compute_unit: NPU precision: fp16 layer_info: @@ -470,14 +551,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 659 - job_id: jqp4qwe8g + job_id: jp14zl88p job_status: Passed torchscript_onnx_qnn: - inference_time: 29215.0 - throughput: 34.22899195618689 + inference_time: 22015.0 + throughput: 45.423574835339544 estimated_peak_memory_range: - min: 696320 - max: 1809600 + min: 663552 + max: 1787712 primary_compute_unit: NPU precision: fp16 layer_info: @@ -485,7 +566,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 438 - job_id: j7gjxlyvp + job_id: jp3j034ng job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -493,14 +574,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:08:40Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T17:25:53Z' - torchscript_onnx_tflite: - inference_time: 41460.0 - throughput: 24.1196333815726 + inference_time: 37324.0 + throughput: 26.79241238881149 estimated_peak_memory_range: - min: 69632 - max: 506845568 + min: 90112 + max: 2489264 primary_compute_unit: NPU precision: fp16 layer_info: @@ -508,14 +589,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 659 - job_id: jo5mrz9dg + job_id: jpxkoj1l5 job_status: Passed torchscript_onnx_qnn: - inference_time: 35970.0 - throughput: 27.80094523213789 + inference_time: 22477.0 + throughput: 44.489923032433154 estimated_peak_memory_range: - min: 0 - max: 125174752 + min: 704512 + max: 1957024 primary_compute_unit: NPU precision: fp16 layer_info: @@ -523,22 +604,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 438 - job_id: jqp4qw48g + job_id: jg9ln138g job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:08:47Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T17:25:58Z' - torchscript_onnx_tflite: - inference_time: 42134.0 - throughput: 23.73380168035316 + inference_time: 36580.0 + throughput: 27.33734281027884 estimated_peak_memory_range: - min: 86016 - max: 3439456 + min: 90112 + max: 2280272 primary_compute_unit: NPU precision: fp16 layer_info: @@ -546,14 +627,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 659 - job_id: joprkyx05 + job_id: j57yrwd95 job_status: Passed torchscript_onnx_qnn: - inference_time: 29412.0 - throughput: 33.999728002175985 + inference_time: 22644.0 + throughput: 44.16180886769122 estimated_peak_memory_range: - min: 716800 - max: 1974088 + min: 307200 + max: 1567448 primary_compute_unit: NPU precision: fp16 layer_info: @@ -561,22 +642,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 438 - job_id: jygze7yog + job_id: jpedm1rv5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:08:41Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T17:25:57Z' - torchscript_onnx_tflite: - inference_time: 41566.0 - throughput: 24.058124428619546 + inference_time: 36958.0 + throughput: 27.057741219762974 estimated_peak_memory_range: - min: 94208 - max: 3002560 + min: 57344 + max: 3455976 primary_compute_unit: NPU precision: fp16 layer_info: @@ -584,14 +665,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 659 - job_id: jqpyed88g + job_id: jp14zl87p job_status: Passed torchscript_onnx_qnn: - inference_time: 29440.0 - throughput: 33.96739130434783 + inference_time: 22477.0 + throughput: 44.489923032433154 estimated_peak_memory_range: - min: 696320 - max: 2408968 + min: 753664 + max: 2113712 primary_compute_unit: NPU precision: fp16 layer_info: @@ -599,22 +680,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 438 - job_id: jmg9v42w5 + job_id: jpv6ko1r5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:08:43Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T17:25:55Z' - torchscript_onnx_tflite: - inference_time: 41836.0 - throughput: 23.902858781910318 + inference_time: 37123.0 + throughput: 26.937478113299033 estimated_peak_memory_range: - min: 57344 - max: 2921344 + min: 98304 + max: 575585456 primary_compute_unit: NPU precision: fp16 layer_info: @@ -622,14 +703,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 659 - job_id: j1p8o7jkg + job_id: j5we6v9m5 job_status: Passed torchscript_onnx_qnn: - inference_time: 29776.0 - throughput: 33.584094572810315 + inference_time: 30382.0 + throughput: 32.91422552827332 estimated_peak_memory_range: - min: 679936 - max: 1939688 + min: 0 + max: 178714672 primary_compute_unit: NPU precision: fp16 layer_info: @@ -637,19 +718,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 438 - job_id: jvgdwv4r5 + job_id: jp4lrox15 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T17:26:02Z' + - torchscript_onnx_tflite: + inference_time: 25495.0 + throughput: 39.223377132771134 + estimated_peak_memory_range: + min: 0 + max: 482536432 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 659 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 659 + job_id: jp2ky6mqp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 17137.0 + throughput: 58.35327070082278 + estimated_peak_memory_range: + min: 614400 + max: 180468880 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 438 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 438 + job_id: j5wee18j5 + job_status: Passed + torchscript_onnx: + inference_time: 17137.0 + throughput: 58.35327070082278 + estimated_peak_memory_range: + min: 614400 + max: 180468880 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 438 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 438 + job_id: j5wee18j5 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:08:45Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-16T08:38:40Z' - torchscript_onnx_qnn: - inference_time: 28828.0 - throughput: 34.688497294297214 + inference_time: 22135.0 + throughput: 45.177320984865595 estimated_peak_memory_range: min: 602112 max: 602112 @@ -660,11 +794,11 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 438 - job_id: jwgoyvdq5 + job_id: jglvmz0m5 job_status: Passed torchscript_onnx: - inference_time: 189660.0 - throughput: 5.272593061267531 + inference_time: 162155.0 + throughput: 6.166939039807591 estimated_peak_memory_range: min: 196714496 max: 196714496 @@ -675,7 +809,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 1 total_layers: 502 - job_id: jqpyed28g + job_id: jgjvvw08g job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -684,4 +818,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:08:51Z' + timestamp: '2024-10-16T08:30:35Z' diff --git a/qai_hub_models/models/openpose/README.md b/qai_hub_models/models/openpose/README.md index 1585e5e7..601af4de 100644 --- a/qai_hub_models/models/openpose/README.md +++ b/qai_hub_models/models/openpose/README.md @@ -6,7 +6,7 @@ OpenPose is a machine learning model that estimates body and hand pose in an image and returns location and confidence for each of 19 joints. This is based on the implementation of OpenPose found -[here](https://github.com/CMU-Perceptual-Computing-Lab/openpose). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/openpose). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.openpose.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of OpenPose can be found +* The license for the original implementation of OpenPose can be found [here](https://cmu.flintbox.com/technologies/b820c21d-8443-4aa2-a49f-8919d93a8740). -- The license for the compiled assets for on-device deployment can be found [here](https://cmu.flintbox.com/technologies/b820c21d-8443-4aa2-a49f-8919d93a8740) +* The license for the compiled assets for on-device deployment can be found [here](https://cmu.flintbox.com/technologies/b820c21d-8443-4aa2-a49f-8919d93a8740) + ## References * [OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields](https://arxiv.org/abs/1812.08008) * [Source Model Implementation](https://github.com/CMU-Perceptual-Computing-Lab/openpose) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/openpose/export.py b/qai_hub_models/models/openpose/export.py index 5a00f4eb..8d423a7e 100644 --- a/qai_hub_models/models/openpose/export.py +++ b/qai_hub_models/models/openpose/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.openpose import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "openpose" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/openpose/perf.yaml b/qai_hub_models/models/openpose/perf.yaml index 2dc6b971..5895d32b 100644 --- a/qai_hub_models/models/openpose/perf.yaml +++ b/qai_hub_models/models/openpose/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: OpenPose performance_metrics: - torchscript_onnx_tflite: - inference_time: 11699.0 - throughput: 85.47739123001966 + inference_time: 11959.0 + throughput: 83.61903169161302 estimated_peak_memory_range: - min: 208896 - max: 2089080 + min: 225280 + max: 1795280 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 103 - job_id: jnp108n85 + job_id: jgo268odp job_status: Passed torchscript_onnx_qnn: - inference_time: 11933.0 - throughput: 83.80122349786306 + inference_time: 11864.0 + throughput: 84.28860418071477 estimated_peak_memory_range: - min: 626688 - max: 229315488 + min: 655360 + max: 215276248 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 186 - job_id: joprkyl05 + job_id: j57yr97r5 job_status: Passed torchscript_onnx: - inference_time: 12254.0 - throughput: 81.60600620205648 + inference_time: 12080.0 + throughput: 82.78145695364239 estimated_peak_memory_range: - min: 1105920 - max: 3296864 + min: 49152 + max: 119390272 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 189 - job_id: jw5661865 + job_id: jgkex89og job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:07:31Z' + timestamp: '2024-10-15T00:03:15Z' - torchscript_onnx_tflite: - inference_time: 11402.0 - throughput: 87.70391159445711 + inference_time: 11393.0 + throughput: 87.77319406653208 estimated_peak_memory_range: min: 212992 - max: 40327632 + max: 42955552 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 103 - job_id: jvgdwvdr5 + job_id: jpv6k7em5 job_status: Passed torchscript_onnx_qnn: - inference_time: 11476.0 - throughput: 87.1383757406762 + inference_time: 11479.0 + throughput: 87.11560240439063 estimated_peak_memory_range: - min: 634880 - max: 18893744 + min: 618496 + max: 18821424 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 186 - job_id: jep28mrrp + job_id: jp4lr39l5 job_status: Passed torchscript_onnx: - inference_time: 11671.0 - throughput: 85.68246080027419 + inference_time: 11482.0 + throughput: 87.09284096847239 estimated_peak_memory_range: - min: 1126400 - max: 45462048 + min: 0 + max: 47661712 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 189 - job_id: j1p3kmz35 + job_id: j5q6qvmmp job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:07:32Z' + timestamp: '2024-10-15T00:03:16Z' - torchscript_onnx_tflite: - inference_time: 11652.0 - throughput: 85.82217645039478 + inference_time: 11740.0 + throughput: 85.17887563884156 estimated_peak_memory_range: - min: 204800 - max: 2085920 + min: 208896 + max: 2874496 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 103 - job_id: jz57zdevp + job_id: jgjvnqo8g job_status: Passed torchscript_onnx_qnn: - inference_time: 11606.0 - throughput: 86.16232982939859 + inference_time: 12078.0 + throughput: 82.79516476237788 estimated_peak_memory_range: - min: 638976 - max: 2658336 + min: 643072 + max: 1744056 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 186 - job_id: j2p0yrm9g + job_id: j5mnx8dqp job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:07:26Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T00:03:07Z' - torchscript_onnx_tflite: - inference_time: 23631.0 - throughput: 42.31729507849858 + inference_time: 11737.0 + throughput: 85.2006475249212 estimated_peak_memory_range: - min: 192512 - max: 41372880 + min: 225280 + max: 2119696 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 103 - job_id: jqp4qwy8g + job_id: jg9lndkvg job_status: Passed torchscript_onnx_qnn: - inference_time: 23709.0 - throughput: 42.178075836180355 + inference_time: 12134.0 + throughput: 82.41305422778969 estimated_peak_memory_range: - min: 0 - max: 17770416 + min: 700416 + max: 2015240 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 186 - job_id: j1glnkyjp + job_id: jp2kyevmp job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:07:30Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T00:03:10Z' - torchscript_onnx_tflite: - inference_time: 11728.0 - throughput: 85.26603001364256 + inference_time: 11702.0 + throughput: 85.45547769612033 estimated_peak_memory_range: - min: 208896 - max: 583859432 + min: 184320 + max: 1958856 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 103 - job_id: j0pxv1l3g + job_id: j5we648j5 job_status: Passed torchscript_onnx_qnn: - inference_time: 11676.0 - throughput: 85.64576909900651 + inference_time: 12105.0 + throughput: 82.61049153242462 estimated_peak_memory_range: - min: 638976 - max: 1857784 + min: 688128 + max: 1905592 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 186 - job_id: j1p8o7ekg + job_id: jprv3wneg job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:07:27Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T00:03:09Z' - torchscript_onnx_tflite: - inference_time: 11793.0 - throughput: 84.79606546256254 + inference_time: 11688.0 + throughput: 85.55783709787816 estimated_peak_memory_range: - min: 12288 - max: 4062312 + min: 233472 + max: 2213904 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 103 - job_id: jo5mrz0dg + job_id: jgz3dn865 job_status: Passed torchscript_onnx_qnn: - inference_time: 11597.0 - throughput: 86.229197206174 + inference_time: 12118.0 + throughput: 82.5218682950982 estimated_peak_memory_range: - min: 667648 - max: 1990376 + min: 675840 + max: 2011832 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 186 - job_id: jogkzy2wg + job_id: jgn6vk7m5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:07:28Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T00:03:08Z' - torchscript_onnx_tflite: - inference_time: 11676.0 - throughput: 85.64576909900651 + inference_time: 23527.0 + throughput: 42.504356696561395 estimated_peak_memory_range: - min: 212992 - max: 2224528 + min: 249856 + max: 43490928 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 103 - job_id: jegn2ezkg + job_id: jpedmy805 job_status: Passed torchscript_onnx_qnn: - inference_time: 11727.0 - throughput: 85.27330092947898 + inference_time: 23749.0 + throughput: 42.10703608572992 estimated_peak_memory_range: - min: 679936 - max: 2011376 + min: 634880 + max: 18569184 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 186 - job_id: jn5q82ln5 + job_id: jp0z06ve5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T00:03:13Z' + - torchscript_onnx_tflite: + inference_time: 8656.0 + throughput: 115.5268022181146 + estimated_peak_memory_range: + min: 204800 + max: 24459280 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 103 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 103 + job_id: jgdx128lp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 8742.0 + throughput: 114.39029970258522 + estimated_peak_memory_range: + min: 614400 + max: 16413312 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 186 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 186 + job_id: jp8qy148p + job_status: Passed + torchscript_onnx: + inference_time: 7177.0 + throughput: 139.33398355858995 + estimated_peak_memory_range: + min: 1134592 + max: 28735968 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 189 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 189 + job_id: jp3j06wzg + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:07:29Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T00:03:19Z' - torchscript_onnx_qnn: - inference_time: 12322.0 - throughput: 81.15565654926148 + inference_time: 12659.0 + throughput: 78.99518129394107 estimated_peak_memory_range: min: 602112 max: 602112 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 186 - job_id: jqpyedo8g + job_id: jpxkoxd95 job_status: Passed torchscript_onnx: - inference_time: 12346.0 - throughput: 80.99789405475458 + inference_time: 12628.0 + throughput: 79.18910357934749 estimated_peak_memory_range: - min: 106577920 - max: 106577920 + min: 106684416 + max: 106684416 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 189 - job_id: jwgoyvlq5 + job_id: jglvml1l5 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:07:33Z' + timestamp: '2024-10-15T00:03:17Z' diff --git a/qai_hub_models/models/plamo_1b_quantized/README.md b/qai_hub_models/models/plamo_1b_quantized/README.md new file mode 100644 index 00000000..fe3322ea --- /dev/null +++ b/qai_hub_models/models/plamo_1b_quantized/README.md @@ -0,0 +1,50 @@ +[![Qualcomm® AI Hub Models](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/quic-logo.jpg)](../../README.md) + + +# [PLaMo-1B: State-of-the-art large language model useful on a variety of language understanding and generation tasks](https://aihub.qualcomm.com/models/plamo_1b_quantized) + +PLaMo-1B is the first small language model (SLM) in the PLaMo™ Lite series from Preferred Networks (PFN), designed to power AI applications for edge devices including mobile, automotive, and robots across various industrial sectors. This model builds on the advancements of PLaMo-100B, a 100-billion parameter large language model (LLM) developed from the ground up by PFN’s subsidiary Preferred Elements (PFE). Leveraging high-quality Japanese and English text data generated by PLaMo-100B, PLaMo-1B has been pre-trained on a total of 4 trillion tokens. As a result, it delivers exceptional performance in Japanese benchmarks, outperforming other SLMs with similar parameter sizes. In evaluations such as Jaster 0-shot and 4-shot, PLaMo-1B has demonstrated performance on par with larger LLMs, making it a highly efficient solution for edge-based AI tasks. + +This is based on the implementation of PLaMo-1B found +[here]({source_repo}). This repository contains scripts for optimized on-device +export suitable to run on Qualcomm® devices. More details on model performance +accross various devices, can be found [here](https://aihub.qualcomm.com/models/plamo_1b_quantized). + +[Sign up](https://myaccount.qualcomm.com/signup) to start using Qualcomm AI Hub and run these models on a hosted Qualcomm® device. + +## Deploying PLaMo-1B on-device + +Please follow the [LLM on-device deployment](https://github.com/quic/ai-hub-apps/tree/main/tutorials/llm_on_genie) tutorial. + + + + + + +## Community +* Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. +* For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). + + +## Usage and Limitations + +This model may not be used for or in connection with any of the following applications: + +- Accessing essential private and public services and benefits; +- Administration of justice and democratic processes; +- Assessing or recognizing the emotional state of a person; +- Biometric and biometrics-based systems, including categorization of persons based on sensitive characteristics; +- Education and vocational training; +- Employment and workers management; +- Exploitation of the vulnerabilities of persons resulting in harmful behavior; +- General purpose social scoring; +- Law enforcement; +- Management and operation of critical infrastructure; +- Migration, asylum and border control management; +- Predictive policing; +- Real-time remote biometric identification in public spaces; +- Recommender systems of social media platforms; +- Scraping of facial images (from the internet or otherwise); and/or +- Subliminal manipulation + + diff --git a/qai_hub_models/models/plamo_1b_quantized/info.yaml b/qai_hub_models/models/plamo_1b_quantized/info.yaml new file mode 100644 index 00000000..669632cc --- /dev/null +++ b/qai_hub_models/models/plamo_1b_quantized/info.yaml @@ -0,0 +1,39 @@ +name: PLaMo-1B +id: plamo_1b_quantized +status: public +headline: State-of-the-art large language model useful on a variety of language + understanding and generation tasks. +domain: Generative AI +description: PLaMo-1B is the first small language model (SLM) in the PLaMo™ Lite series from Preferred Networks (PFN), designed to power AI applications for edge devices including mobile, automotive, and robots across various industrial sectors. This model builds on the advancements of PLaMo-100B, a 100-billion parameter large language model (LLM) developed from the ground up by PFN’s subsidiary Preferred Elements (PFE). Leveraging high-quality Japanese and English text data generated by PLaMo-100B, PLaMo-1B has been pre-trained on a total of 4 trillion tokens. As a result, it delivers exceptional performance in Japanese benchmarks, outperforming other SLMs with similar parameter sizes. In evaluations such as Jaster 0-shot and 4-shot, PLaMo-1B has demonstrated performance on par with larger LLMs, making it a highly efficient solution for edge-based AI tasks. +model_maker_id: preferred-networks +use_case: Text Generation +tags: + - llm + - generative-ai + - quantized +technical_details: + Input sequence length for Prompt Processor: 128 + Context length: 4096 + Number of parameters: 1B + Precision: w4a16 + w8a16 (few layers) + Use: Initiate conversation with prompt-processor and then token generator for subsequent iterations. + Minimum QNN SDK version required: 2.27.7 + Supported languages: Japanese and English. + TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens). + Response Rate: Rate of response generation after the first response token. +applicable_scenarios: + - Dialogue + - Content Generation + - Customer Support +related_models: [] +form_factors: + - Phone + - Tablet +has_static_banner: true +has_animated_banner: false +license_type: 'other' +dataset: [] +model_type_llm: true +restrict_model_sharing: true +llm_details: + call_to_action: 'contact_for_purchase' diff --git a/qai_hub_models/models/plamo_1b_quantized/perf.yaml b/qai_hub_models/models/plamo_1b_quantized/perf.yaml new file mode 100644 index 00000000..10f9dd57 --- /dev/null +++ b/qai_hub_models/models/plamo_1b_quantized/perf.yaml @@ -0,0 +1,25 @@ +aggregated: + supported_oses: + - Android + supported_devices: + - Snapdragon 8 Elite QRD + supported_chipsets: + - Snapdragon® 8 Elite +models: + name: 'PLaMo-1B' + performance_metrics: + - torchscript_onnx_qnn: + llm_metrics: + time_to_first_token_range: + min: 31448 + max: 1006336 + tokens_per_second: 68.21 + evaluation_metrics: null + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-16T00:32:42.210701Z' diff --git a/qai_hub_models/models/posenet_mobilenet/README.md b/qai_hub_models/models/posenet_mobilenet/README.md index 90a83c3b..7d2a086d 100644 --- a/qai_hub_models/models/posenet_mobilenet/README.md +++ b/qai_hub_models/models/posenet_mobilenet/README.md @@ -6,7 +6,7 @@ Posenet performs pose estimation on human images. This is based on the implementation of Posenet-Mobilenet found -[here](https://github.com/rwightman/posenet-pytorch). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/posenet_mobilenet). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.posenet_mobilenet.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Posenet-Mobilenet can be found +* The license for the original implementation of Posenet-Mobilenet can be found [here](https://github.com/rwightman/posenet-pytorch/blob/master/LICENSE.txt). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model](https://arxiv.org/abs/1803.08225) * [Source Model Implementation](https://github.com/rwightman/posenet-pytorch) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/posenet_mobilenet/export.py b/qai_hub_models/models/posenet_mobilenet/export.py index ab0cd994..f6340c5e 100644 --- a/qai_hub_models/models/posenet_mobilenet/export.py +++ b/qai_hub_models/models/posenet_mobilenet/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.posenet_mobilenet import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "posenet_mobilenet" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/posenet_mobilenet/perf.yaml b/qai_hub_models/models/posenet_mobilenet/perf.yaml index 44c6b725..8280a33b 100644 --- a/qai_hub_models/models/posenet_mobilenet/perf.yaml +++ b/qai_hub_models/models/posenet_mobilenet/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: Posenet-Mobilenet performance_metrics: - torchscript_onnx_tflite: - inference_time: 1367.0 - throughput: 731.528895391368 + inference_time: 1375.0 + throughput: 727.2727272727273 estimated_peak_memory_range: min: 12288 - max: 7543960 + max: 33771368 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 41 - job_id: jo5mrzmwg + job_id: j5mnx84qp job_status: Passed torchscript_onnx_qnn: - inference_time: 1444.0 - throughput: 692.5207756232687 + inference_time: 1442.0 + throughput: 693.4812760055479 estimated_peak_memory_range: - min: 12288 - max: 13179184 + min: 36864 + max: 12659440 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 69 - job_id: jogkzy42g + job_id: jglvml7l5 job_status: Passed torchscript_onnx: - inference_time: 2098.0 - throughput: 476.64442326024783 + inference_time: 1894.0 + throughput: 527.9831045406547 estimated_peak_memory_range: min: 12288 - max: 8496224 + max: 7822264 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 70 - job_id: jlpe9vz1g + job_id: jp14z63lp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:05:00Z' + timestamp: '2024-10-14T23:59:32Z' - torchscript_onnx_tflite: - inference_time: 1095.0 - throughput: 913.2420091324201 + inference_time: 1102.0 + throughput: 907.4410163339383 estimated_peak_memory_range: - min: 12288 - max: 41395376 + min: 16384 + max: 42395216 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 41 - job_id: jegn2enrg + job_id: jgn6vkxm5 job_status: Passed torchscript_onnx_qnn: - inference_time: 1154.0 - throughput: 866.5511265164645 + inference_time: 1157.0 + throughput: 864.304235090752 estimated_peak_memory_range: - min: 32137216 - max: 47341776 + min: 1597440 + max: 19228640 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 69 - job_id: jn5q82y45 + job_id: j56y4wv7p job_status: Passed torchscript_onnx: - inference_time: 1741.0 - throughput: 574.3825387708214 + inference_time: 1493.0 + throughput: 669.7923643670462 estimated_peak_memory_range: - min: 794624 - max: 45730752 + min: 49152 + max: 47532832 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 70 - job_id: jygze7mkg + job_id: jgdx120lp job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:05:01Z' + timestamp: '2024-10-14T23:59:33Z' - torchscript_onnx_tflite: - inference_time: 1358.0 - throughput: 736.3770250368188 + inference_time: 1396.0 + throughput: 716.3323782234957 estimated_peak_memory_range: min: 12288 - max: 9840912 + max: 1456816 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 41 - job_id: joprky095 + job_id: jprv3w9eg job_status: Passed torchscript_onnx_qnn: - inference_time: 1383.0 - throughput: 723.0657989877079 + inference_time: 1387.0 + throughput: 720.9805335255949 estimated_peak_memory_range: - min: 1617920 - max: 3022376 + min: 1622016 + max: 2785808 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 69 - job_id: jw5661705 + job_id: jgo268mdp job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:04:55Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:59:25Z' - torchscript_onnx_tflite: - inference_time: 2186.0 - throughput: 457.45654162854527 + inference_time: 1369.0 + throughput: 730.4601899196493 estimated_peak_memory_range: min: 12288 - max: 42154592 + max: 2618912 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 41 - job_id: jep28mw4p + job_id: jp8qy188p job_status: Passed torchscript_onnx_qnn: - inference_time: 2266.0 - throughput: 441.306266548985 + inference_time: 1396.0 + throughput: 716.3323782234957 estimated_peak_memory_range: - min: 1597440 - max: 19997008 + min: 1613824 + max: 3051224 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 69 - job_id: j7gjxl7xp + job_id: jpedmy205 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:04:59Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:59:28Z' - torchscript_onnx_tflite: - inference_time: 1367.0 - throughput: 731.528895391368 + inference_time: 1364.0 + throughput: 733.1378299120234 estimated_peak_memory_range: - min: 12288 - max: 1384288 + min: 36864 + max: 9949568 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 41 - job_id: jqpyedx7g + job_id: jp0z06ke5 job_status: Passed torchscript_onnx_qnn: - inference_time: 1388.0 - throughput: 720.4610951008646 + inference_time: 1390.0 + throughput: 719.4244604316547 estimated_peak_memory_range: - min: 1605632 - max: 2986128 + min: 1617920 + max: 3364464 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 69 - job_id: j1p3km9l5 + job_id: jgjvnq18g job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:04:56Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:59:27Z' - torchscript_onnx_tflite: - inference_time: 1366.0 - throughput: 732.0644216691069 + inference_time: 1372.0 + throughput: 728.862973760933 estimated_peak_memory_range: - min: 12288 - max: 8282504 + min: 315392 + max: 1901168 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 41 - job_id: j2p0yrj6g + job_id: jpy13mn4p job_status: Passed torchscript_onnx_qnn: - inference_time: 1374.0 - throughput: 727.802037845706 + inference_time: 1399.0 + throughput: 714.7962830593281 estimated_peak_memory_range: - min: 1654784 - max: 3036896 + min: 1581056 + max: 3173216 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 69 - job_id: jwgoyvrx5 + job_id: jpv6k74m5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:04:57Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:59:26Z' - torchscript_onnx_tflite: - inference_time: 1365.0 - throughput: 732.6007326007326 + inference_time: 2195.0 + throughput: 455.58086560364467 estimated_peak_memory_range: - min: 12288 - max: 6272568 + min: 16384 + max: 42900576 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 41 - job_id: j1p8o7xxg + job_id: jp2kyejmp job_status: Passed torchscript_onnx_qnn: - inference_time: 1389.0 - throughput: 719.9424046076314 + inference_time: 2293.0 + throughput: 436.1098996947231 estimated_peak_memory_range: - min: 1634304 - max: 2932016 + min: 1597440 + max: 22069984 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 69 - job_id: j1pv3wdj5 + job_id: j5we64xj5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:59:30Z' + - torchscript_onnx_tflite: + inference_time: 963.0 + throughput: 1038.4215991692627 + estimated_peak_memory_range: + min: 12288 + max: 22594784 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 41 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 41 + job_id: j5q6qvwmp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 1077.0 + throughput: 928.5051067780872 + estimated_peak_memory_range: + min: 1593344 + max: 15697760 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 69 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 69 + job_id: jg9lnd8vg + job_status: Passed + torchscript_onnx: + inference_time: 1076.0 + throughput: 929.368029739777 + estimated_peak_memory_range: + min: 0 + max: 25205696 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 70 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 70 + job_id: jpxkox395 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:04:58Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:59:36Z' - torchscript_onnx_qnn: - inference_time: 1569.0 - throughput: 637.3486297004462 + inference_time: 1556.0 + throughput: 642.6735218508998 estimated_peak_memory_range: min: 1589248 max: 1589248 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 69 - job_id: j1glnkx8p + job_id: jp3j068zg job_status: Passed torchscript_onnx: - inference_time: 2163.0 - throughput: 462.32085067036525 + inference_time: 2147.0 + throughput: 465.76618537494176 estimated_peak_memory_range: - min: 8146944 - max: 8146944 + min: 7008256 + max: 7008256 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 70 - job_id: jz5wo9l6p + job_id: j57yr9kr5 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:05:02Z' + timestamp: '2024-10-14T23:59:34Z' diff --git a/qai_hub_models/models/posenet_mobilenet_quantized/README.md b/qai_hub_models/models/posenet_mobilenet_quantized/README.md index f039d3c5..a7622691 100644 --- a/qai_hub_models/models/posenet_mobilenet_quantized/README.md +++ b/qai_hub_models/models/posenet_mobilenet_quantized/README.md @@ -6,7 +6,7 @@ Posenet performs pose estimation on human images. This is based on the implementation of Posenet-Mobilenet-Quantized found -[here](https://github.com/rwightman/posenet-pytorch). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/posenet_mobilenet_quantized). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.posenet_mobilenet_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Posenet-Mobilenet-Quantized can be found +* The license for the original implementation of Posenet-Mobilenet-Quantized can be found [here](https://github.com/rwightman/posenet-pytorch/blob/master/LICENSE.txt). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model](https://arxiv.org/abs/1803.08225) * [Source Model Implementation](https://github.com/rwightman/posenet-pytorch) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/posenet_mobilenet_quantized/export.py b/qai_hub_models/models/posenet_mobilenet_quantized/export.py index c0dd02b6..b3413484 100644 --- a/qai_hub_models/models/posenet_mobilenet_quantized/export.py +++ b/qai_hub_models/models/posenet_mobilenet_quantized/export.py @@ -10,17 +10,17 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.posenet_mobilenet_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.printing import ( print_inference_metrics, @@ -45,20 +45,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -80,10 +78,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "posenet_mobilenet_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,7 +107,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( target_runtime, output_path, input_spec ) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -135,7 +133,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +148,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +169,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -198,7 +196,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/posenet_mobilenet_quantized/perf.yaml b/qai_hub_models/models/posenet_mobilenet_quantized/perf.yaml index 1b31238e..1d253fd1 100644 --- a/qai_hub_models/models/posenet_mobilenet_quantized/perf.yaml +++ b/qai_hub_models/models/posenet_mobilenet_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,41 +20,38 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: Posenet-Mobilenet-Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 560.0 - throughput: 1785.7142857142858 + inference_time: 558.0 + throughput: 1792.1146953405018 estimated_peak_memory_range: min: 12288 - max: 64550200 + max: 1725168 primary_compute_unit: NPU precision: int8 layer_info: @@ -61,22 +59,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 48 - job_id: j0pxv1x1g + job_id: j57yr9jq5 job_status: Passed torchscript_onnx_qnn: - inference_time: 644.0 - throughput: 1552.7950310559006 + inference_time: 640.0 + throughput: 1562.5 estimated_peak_memory_range: - min: 28672 - max: 12530808 + min: 12288 + max: 11923560 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 42 + layers_on_npu: 69 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 42 - job_id: jn5q82v45 + total_layers: 69 + job_id: j5q6qv97p job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -85,13 +83,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:04:15Z' + timestamp: '2024-10-14T23:58:37Z' - torchscript_onnx_tflite: - inference_time: 393.0 - throughput: 2544.529262086514 + inference_time: 480.0 + throughput: 2083.3333333333335 estimated_peak_memory_range: min: 12288 - max: 48935440 + max: 49692032 primary_compute_unit: NPU precision: int8 layer_info: @@ -99,22 +97,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 48 - job_id: jo5mrz8wg + job_id: jp4lr3xq5 job_status: Passed torchscript_onnx_qnn: - inference_time: 541.0 - throughput: 1848.4288354898335 + inference_time: 445.0 + throughput: 2247.191011235955 estimated_peak_memory_range: min: 409600 - max: 18725552 + max: 19133904 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 42 + layers_on_npu: 69 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 42 - job_id: j1glnkl8p + total_layers: 69 + job_id: jglvmlee5 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -123,13 +121,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:04:16Z' + timestamp: '2024-10-14T23:58:38Z' - torchscript_onnx_tflite: - inference_time: 563.0 - throughput: 1776.1989342806394 + inference_time: 2182.0 + throughput: 458.29514207149407 estimated_peak_memory_range: - min: 12288 - max: 109879568 + min: 40960 + max: 28749600 primary_compute_unit: NPU precision: int8 layer_info: @@ -137,37 +135,60 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 48 - job_id: jegn2ekrg + job_id: jp0z06e25 job_status: Passed torchscript_onnx_qnn: - inference_time: 563.0 - throughput: 1776.1989342806394 + inference_time: 2902.0 + throughput: 344.5899379738112 estimated_peak_memory_range: - min: 425984 - max: 1708840 + min: 12288 + max: 8312528 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 42 + layers_on_npu: 69 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 42 - job_id: j1p3km6l5 + total_layers: 69 + job_id: jg9lnd9qg job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS6490 Proxy + timestamp: '2024-10-14T23:58:46Z' + - torchscript_onnx_tflite: + inference_time: 12597.0 + throughput: 79.38398031277288 + estimated_peak_memory_range: + min: 450560 + max: 12687728 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 45 + layers_on_gpu: 3 + layers_on_cpu: 0 + total_layers: 48 + job_id: jp8qy1wzp + job_status: Passed + reference_device_info: + name: RB5 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:04:18Z' + chipset: QCS8250 Proxy + timestamp: '2024-10-14T23:58:34Z' - torchscript_onnx_tflite: - inference_time: 726.0 - throughput: 1377.4104683195592 + inference_time: 551.0 + throughput: 1814.8820326678765 estimated_peak_memory_range: min: 12288 - max: 50338816 + max: 1304952 primary_compute_unit: NPU precision: int8 layer_info: @@ -175,37 +196,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 48 - job_id: joprkyw95 + job_id: jpxkox7j5 job_status: Passed torchscript_onnx_qnn: - inference_time: 787.0 - throughput: 1270.6480304955528 + inference_time: 555.0 + throughput: 1801.8018018018017 estimated_peak_memory_range: - min: 409600 - max: 21638960 + min: 421888 + max: 1641552 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 42 + layers_on_npu: 69 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 42 - job_id: jlpe9vy1g + total_layers: 69 + job_id: jp3j06qxg job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:04:23Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:58:40Z' - torchscript_onnx_tflite: - inference_time: 556.0 - throughput: 1798.5611510791366 + inference_time: 560.0 + throughput: 1785.7142857142858 estimated_peak_memory_range: min: 12288 - max: 108564488 + max: 111488632 primary_compute_unit: NPU precision: int8 layer_info: @@ -213,37 +234,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 48 - job_id: jep28me4p + job_id: jp2kye3xp job_status: Passed torchscript_onnx_qnn: - inference_time: 549.0 - throughput: 1821.4936247723133 + inference_time: 561.0 + throughput: 1782.5311942959001 estimated_peak_memory_range: - min: 24576 - max: 1709704 + min: 434176 + max: 2253600 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 42 + layers_on_npu: 69 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 42 - job_id: jwgoyv8x5 + total_layers: 69 + job_id: jpedmy475 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:04:20Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:58:43Z' - torchscript_onnx_tflite: - inference_time: 559.0 - throughput: 1788.9087656529516 + inference_time: 557.0 + throughput: 1795.3321364452424 estimated_peak_memory_range: - min: 16384 - max: 111715192 + min: 12288 + max: 17778904 primary_compute_unit: NPU precision: int8 layer_info: @@ -251,22 +272,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 48 - job_id: jqpyedm7g + job_id: jprv3w1vg job_status: Passed torchscript_onnx_qnn: - inference_time: 566.0 - throughput: 1766.7844522968198 + inference_time: 560.0 + throughput: 1785.7142857142858 estimated_peak_memory_range: - min: 425984 - max: 1707728 + min: 417792 + max: 2038960 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 42 + layers_on_npu: 69 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 42 - job_id: j1pv3w7j5 + total_layers: 69 + job_id: jpv6k7z75 job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -274,14 +295,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:04:21Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:58:42Z' - torchscript_onnx_tflite: - inference_time: 561.0 - throughput: 1782.5311942959001 + inference_time: 559.0 + throughput: 1788.9087656529516 estimated_peak_memory_range: min: 12288 - max: 111987240 + max: 3012608 primary_compute_unit: NPU precision: int8 layer_info: @@ -289,37 +310,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 48 - job_id: j2p0yr66g + job_id: jgn6vkrv5 job_status: Passed torchscript_onnx_qnn: - inference_time: 555.0 - throughput: 1801.8018018018017 + inference_time: 561.0 + throughput: 1782.5311942959001 estimated_peak_memory_range: - min: 446464 - max: 1660944 + min: 442368 + max: 1759800 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 42 + layers_on_npu: 69 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 42 - job_id: j7gjxlqxp + total_layers: 69 + job_id: jgo268e4p job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:04:22Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:58:41Z' - torchscript_onnx_tflite: - inference_time: 2174.0 - throughput: 459.9816007359706 + inference_time: 714.0 + throughput: 1400.5602240896358 estimated_peak_memory_range: min: 12288 - max: 28761120 + max: 52877952 primary_compute_unit: NPU precision: int8 layer_info: @@ -327,68 +348,83 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 48 - job_id: j1p8o71xg + job_id: j5mnx8wyp job_status: Passed torchscript_onnx_qnn: - inference_time: 2940.0 - throughput: 340.13605442176873 + inference_time: 794.0 + throughput: 1259.4458438287154 estimated_peak_memory_range: - min: 413696 - max: 8607040 + min: 430080 + max: 22998128 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 42 + layers_on_npu: 69 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 42 - job_id: jygze7nkg + total_layers: 69 + job_id: j5we64mz5 job_status: Passed reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T12:04:24Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:58:45Z' - torchscript_onnx_tflite: - inference_time: 13079.0 - throughput: 76.45844483523206 + inference_time: 412.0 + throughput: 2427.1844660194174 estimated_peak_memory_range: - min: 454656 - max: 11918216 + min: 8192 + max: 27482688 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 45 - layers_on_gpu: 3 + layers_on_npu: 48 + layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 48 - job_id: jogkzy82g + job_id: jgkex8ryg + job_status: Passed + torchscript_onnx_qnn: + inference_time: 484.0 + throughput: 2066.115702479339 + estimated_peak_memory_range: + min: 409600 + max: 17955344 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 69 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 69 + job_id: jp14z6qkp job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T12:04:14Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:58:47Z' - torchscript_onnx_qnn: - inference_time: 673.0 - throughput: 1485.8841010401188 + inference_time: 679.0 + throughput: 1472.7540500736377 estimated_peak_memory_range: min: 397312 max: 397312 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 42 + layers_on_npu: 69 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 42 - job_id: jw5661w05 + total_layers: 69 + job_id: j56y4wqvp job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -397,4 +433,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:04:17Z' + timestamp: '2024-10-14T23:58:39Z' diff --git a/qai_hub_models/models/quicksrnetlarge/README.md b/qai_hub_models/models/quicksrnetlarge/README.md index 58607804..2ce082f7 100644 --- a/qai_hub_models/models/quicksrnetlarge/README.md +++ b/qai_hub_models/models/quicksrnetlarge/README.md @@ -6,7 +6,7 @@ QuickSRNet Large is designed for upscaling images on mobile platforms to sharpen in real-time. This is based on the implementation of QuickSRNetLarge found -[here](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/quicksrnet). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/quicksrnetlarge). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.quicksrnetlarge.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of QuickSRNetLarge can be found +* The license for the original implementation of QuickSRNetLarge can be found [here](https://github.com/quic/aimet-model-zoo/blob/develop/LICENSE.pdf). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [QuickSRNet: Plain Single-Image Super-Resolution Architecture for Faster Inference on Mobile Platforms](https://arxiv.org/abs/2303.04336) * [Source Model Implementation](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/quicksrnet) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/quicksrnetlarge/export.py b/qai_hub_models/models/quicksrnetlarge/export.py index a683df51..d3b9a087 100644 --- a/qai_hub_models/models/quicksrnetlarge/export.py +++ b/qai_hub_models/models/quicksrnetlarge/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.quicksrnetlarge import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "quicksrnetlarge" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/quicksrnetlarge/perf.yaml b/qai_hub_models/models/quicksrnetlarge/perf.yaml index e49d3c13..80548d14 100644 --- a/qai_hub_models/models/quicksrnetlarge/perf.yaml +++ b/qai_hub_models/models/quicksrnetlarge/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: QuickSRNetLarge performance_metrics: - torchscript_onnx_tflite: - inference_time: 2439.0 - throughput: 410.0041000410004 + inference_time: 2476.0 + throughput: 403.8772213247173 estimated_peak_memory_range: - min: 6365184 - max: 7849648 + min: 16384 + max: 1507448 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 31 - job_id: jegn2e7rg + job_id: jgz3dn7z5 job_status: Passed torchscript_onnx_qnn: - inference_time: 2106.0 - throughput: 474.8338081671415 + inference_time: 2107.0 + throughput: 474.6084480303749 estimated_peak_memory_range: - min: 2117632 - max: 6739536 + min: 28672 + max: 3253456 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 31 - job_id: jn5q82m45 + job_id: jgn6vk9v5 job_status: Passed torchscript_onnx: - inference_time: 2662.0 - throughput: 375.6574004507889 + inference_time: 2750.0 + throughput: 363.6363636363636 estimated_peak_memory_range: - min: 12288 - max: 66866688 + min: 4096 + max: 2253520 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 33 - job_id: jygze74kg + job_id: jp3j064xg job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:03:40Z' + timestamp: '2024-10-14T23:57:56Z' - torchscript_onnx_tflite: - inference_time: 1957.0 - throughput: 510.98620337250895 + inference_time: 1933.0 + throughput: 517.3305742369374 estimated_peak_memory_range: min: 16384 - max: 32660624 + max: 33715280 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 31 - job_id: joprkyn95 + job_id: j5we649z5 job_status: Passed torchscript_onnx_qnn: - inference_time: 1781.0 - throughput: 561.4823133071309 + inference_time: 1901.0 + throughput: 526.0389268805892 estimated_peak_memory_range: - min: 212992 - max: 11543632 + min: 208896 + max: 11501120 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 31 - job_id: j1glnk18p + job_id: jprv3w4vg job_status: Passed torchscript_onnx: - inference_time: 2115.0 - throughput: 472.8132387706856 + inference_time: 2674.0 + throughput: 373.97157816005983 estimated_peak_memory_range: min: 0 - max: 35034944 + max: 35547056 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 33 - job_id: jz5wo946p + job_id: jgo26814p job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:03:41Z' + timestamp: '2024-10-14T23:57:57Z' - torchscript_onnx_tflite: - inference_time: 2421.0 - throughput: 413.0524576621231 + inference_time: 2400.0 + throughput: 416.6666666666667 estimated_peak_memory_range: - min: 49152 - max: 23850336 + min: 16384 + max: 1680016 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 31 - job_id: jep28mv4p + job_id: jg9lnd4qg job_status: Passed torchscript_onnx_qnn: - inference_time: 2179.0 - throughput: 458.9261128958238 + inference_time: 2183.0 + throughput: 458.0852038479157 estimated_peak_memory_range: - min: 221184 - max: 1405688 + min: 225280 + max: 1614408 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 31 - job_id: j1p3kmwl5 + job_id: jpy13m4rp job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:03:35Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:57:48Z' - torchscript_onnx_tflite: - inference_time: 4401.0 - throughput: 227.22108611679164 + inference_time: 2443.0 + throughput: 409.3327875562833 estimated_peak_memory_range: - min: 6332416 - max: 38759760 + min: 20480 + max: 14372720 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 31 - job_id: jqpyed77g + job_id: jp4lr3wq5 job_status: Passed torchscript_onnx_qnn: - inference_time: 3471.0 - throughput: 288.1014116969173 + inference_time: 2184.0 + throughput: 457.87545787545787 estimated_peak_memory_range: - min: 212992 - max: 15745712 + min: 225280 + max: 1604072 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 31 - job_id: jlpe9vl1g + job_id: jgkex8lyg job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:03:39Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:57:51Z' - torchscript_onnx_tflite: - inference_time: 2382.0 - throughput: 419.81528127623847 + inference_time: 2482.0 + throughput: 402.90088638195004 estimated_peak_memory_range: - min: 20480 - max: 7572552 + min: 16384 + max: 8206304 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 31 - job_id: j2p0yrv6g + job_id: j57yr9dq5 job_status: Passed torchscript_onnx_qnn: - inference_time: 2187.0 - throughput: 457.2473708276177 + inference_time: 2209.0 + throughput: 452.6935264825713 estimated_peak_memory_range: - min: 217088 - max: 1582792 + min: 221184 + max: 1602896 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 31 - job_id: jwgoyv4x5 + job_id: jp8qy13zp job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:03:36Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:57:50Z' - torchscript_onnx_tflite: - inference_time: 2404.0 - throughput: 415.97337770382694 + inference_time: 2448.0 + throughput: 408.4967320261438 estimated_peak_memory_range: min: 16384 - max: 91562184 + max: 6329616 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 31 - job_id: j1p8o74xg + job_id: jgdx12vkp job_status: Passed torchscript_onnx_qnn: - inference_time: 2184.0 - throughput: 457.87545787545787 + inference_time: 2238.0 + throughput: 446.82752457551385 estimated_peak_memory_range: min: 233472 - max: 4994536 + max: 1502728 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 31 - job_id: j1pv3w9j5 + job_id: jp0z06125 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:03:37Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:57:49Z' - torchscript_onnx_tflite: - inference_time: 2399.0 - throughput: 416.84035014589415 + inference_time: 4174.0 + throughput: 239.57834211787255 estimated_peak_memory_range: - min: 16384 - max: 5143680 + min: 6336512 + max: 39484208 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 31 - job_id: jogkzy92g + job_id: jp14z68kp job_status: Passed torchscript_onnx_qnn: - inference_time: 2525.0 - throughput: 396.03960396039605 + inference_time: 3471.0 + throughput: 288.1014116969173 estimated_peak_memory_range: - min: 221184 - max: 4898896 + min: 208896 + max: 15573856 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,22 +329,75 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 31 - job_id: j7gjxlwxp + job_id: jglvml0e5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:03:38Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:57:54Z' + - torchscript_onnx_tflite: + inference_time: 1859.0 + throughput: 537.9236148466917 + estimated_peak_memory_range: + min: 12288 + max: 17013024 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 28 + layers_on_gpu: 0 + layers_on_cpu: 3 + total_layers: 31 + job_id: j5mnx8zyp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 1594.0 + throughput: 627.3525721455458 + estimated_peak_memory_range: + min: 0 + max: 10260544 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 31 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 31 + job_id: j56y4w3vp + job_status: Passed + torchscript_onnx: + inference_time: 1871.0 + throughput: 534.4735435595938 + estimated_peak_memory_range: + min: 0 + max: 15896080 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 33 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 33 + job_id: jpedmyr75 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:58:00Z' - torchscript_onnx_qnn: - inference_time: 2387.0 - throughput: 418.93590280687056 + inference_time: 2388.0 + throughput: 418.7604690117253 estimated_peak_memory_range: - min: 212992 - max: 212992 + min: 221184 + max: 221184 primary_compute_unit: NPU precision: fp16 layer_info: @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 31 - job_id: jw5661d05 + job_id: jp2kye7xp job_status: Passed torchscript_onnx: - inference_time: 2690.0 - throughput: 371.74721189591077 + inference_time: 2684.0 + throughput: 372.5782414307005 estimated_peak_memory_range: - min: 8937472 - max: 8937472 + min: 8847360 + max: 8847360 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 33 - job_id: jmg9v4dl5 + job_id: jpv6k7175 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:03:42Z' + timestamp: '2024-10-14T23:57:58Z' diff --git a/qai_hub_models/models/quicksrnetlarge_quantized/README.md b/qai_hub_models/models/quicksrnetlarge_quantized/README.md index 35690b40..1c550ce8 100644 --- a/qai_hub_models/models/quicksrnetlarge_quantized/README.md +++ b/qai_hub_models/models/quicksrnetlarge_quantized/README.md @@ -6,7 +6,7 @@ QuickSRNet Large is designed for upscaling images on mobile platforms to sharpen in real-time. This is based on the implementation of QuickSRNetLarge-Quantized found -[here](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/quicksrnet). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/quicksrnetlarge_quantized). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.quicksrnetlarge_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of QuickSRNetLarge-Quantized can be found +* The license for the original implementation of QuickSRNetLarge-Quantized can be found [here](https://github.com/quic/aimet-model-zoo/blob/develop/LICENSE.pdf). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [QuickSRNet: Plain Single-Image Super-Resolution Architecture for Faster Inference on Mobile Platforms](https://arxiv.org/abs/2303.04336) * [Source Model Implementation](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/quicksrnet) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/quicksrnetlarge_quantized/export.py b/qai_hub_models/models/quicksrnetlarge_quantized/export.py index aea0ea89..44ee3e43 100644 --- a/qai_hub_models/models/quicksrnetlarge_quantized/export.py +++ b/qai_hub_models/models/quicksrnetlarge_quantized/export.py @@ -10,17 +10,17 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.quicksrnetlarge_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.printing import ( print_inference_metrics, @@ -45,20 +45,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -80,10 +78,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "quicksrnetlarge_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,7 +107,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( target_runtime, output_path, input_spec ) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -135,7 +133,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +148,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +169,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -198,7 +196,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/quicksrnetlarge_quantized/perf.yaml b/qai_hub_models/models/quicksrnetlarge_quantized/perf.yaml index 48fce07e..a1610cbc 100644 --- a/qai_hub_models/models/quicksrnetlarge_quantized/perf.yaml +++ b/qai_hub_models/models/quicksrnetlarge_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,44 +20,41 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS6490 (Proxy) - RB3 Gen 2 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Sa8775p Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Qcs6490 Proxy + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: QuickSRNetLarge-Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 1501.0 - throughput: 666.2225183211193 + inference_time: 1434.0 + throughput: 697.350069735007 estimated_peak_memory_range: - min: 28672 - max: 6381552 + min: 12288 + max: 3802792 primary_compute_unit: NPU precision: int8 layer_info: @@ -64,29 +62,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 33 - job_id: jo5mrzowg + job_id: j5q6qv37p job_status: Passed torchscript_onnx_qnn: - inference_time: 907.0 - throughput: 1102.5358324145534 + inference_time: 905.0 + throughput: 1104.9723756906078 estimated_peak_memory_range: - min: 65536 - max: 8346048 + min: 24576 + max: 8377864 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 18 + layers_on_npu: 31 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 18 - job_id: j1glnko8p + total_layers: 31 + job_id: jp14z6ekp job_status: Passed torchscript_onnx: - inference_time: 1058.0 - throughput: 945.179584120983 + inference_time: 902.0 + throughput: 1108.6474501108648 estimated_peak_memory_range: - min: 65536 - max: 16427112 + min: 57344 + max: 1672000 primary_compute_unit: NPU precision: int8 layer_info: @@ -94,7 +92,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 34 - job_id: jmg9v4xl5 + job_id: jgkex8yyg job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -103,13 +101,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:03:03Z' + timestamp: '2024-10-14T23:57:11Z' - torchscript_onnx_tflite: - inference_time: 1113.0 - throughput: 898.4725965858041 + inference_time: 1223.0 + throughput: 817.6614881439084 estimated_peak_memory_range: min: 12288 - max: 28921856 + max: 29971408 primary_compute_unit: NPU precision: int8 layer_info: @@ -117,29 +115,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 33 - job_id: jegn2eorg + job_id: jglvml3e5 job_status: Passed torchscript_onnx_qnn: - inference_time: 643.0 - throughput: 1555.2099533437015 + inference_time: 647.0 + throughput: 1545.595054095827 estimated_peak_memory_range: - min: 16384 - max: 13702768 + min: 12288 + max: 12152080 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 18 + layers_on_npu: 31 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 18 - job_id: jw5661r05 + total_layers: 31 + job_id: jgdx12okp job_status: Passed torchscript_onnx: - inference_time: 770.0 - throughput: 1298.7012987012988 + inference_time: 682.0 + throughput: 1466.275659824047 estimated_peak_memory_range: min: 0 - max: 30754016 + max: 30995088 primary_compute_unit: NPU precision: int8 layer_info: @@ -147,7 +145,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 34 - job_id: jnp108v25 + job_id: j5q6qv27p job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -156,13 +154,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:03:04Z' + timestamp: '2024-10-14T23:57:12Z' - torchscript_onnx_tflite: - inference_time: 1445.0 - throughput: 692.0415224913495 + inference_time: 4239.0 + throughput: 235.90469450342061 estimated_peak_memory_range: - min: 28672 - max: 1592008 + min: 1810432 + max: 23251024 primary_compute_unit: NPU precision: int8 layer_info: @@ -170,37 +168,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 33 - job_id: joprkyo95 + job_id: jgz3dnrz5 job_status: Passed torchscript_onnx_qnn: - inference_time: 681.0 - throughput: 1468.4287812041116 + inference_time: 3128.0 + throughput: 319.693094629156 estimated_peak_memory_range: - min: 77824 - max: 1397472 + min: 65536 + max: 8110256 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 18 + layers_on_npu: 31 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 18 - job_id: jwgoyvox5 + total_layers: 31 + job_id: jp0z06r25 job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:02:56Z' + chipset: QCS6490 Proxy + timestamp: '2024-10-14T23:57:09Z' - torchscript_onnx_tflite: - inference_time: 2256.0 - throughput: 443.26241134751774 + inference_time: 38641.0 + throughput: 25.879247431484693 estimated_peak_memory_range: - min: 1589248 - max: 31850544 + min: 1839104 + max: 8801856 primary_compute_unit: NPU precision: int8 layer_info: @@ -208,37 +206,60 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 33 - job_id: jep28m44p + job_id: j5we64qz5 job_status: Passed - torchscript_onnx_qnn: - inference_time: 1051.0 - throughput: 951.4747859181732 + reference_device_info: + name: RB5 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8250 Proxy + timestamp: '2024-10-14T23:56:56Z' + - torchscript_onnx_tflite: + inference_time: 1458.0 + throughput: 685.8710562414266 estimated_peak_memory_range: min: 12288 - max: 15042256 + max: 3084656 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 30 + layers_on_gpu: 0 + layers_on_cpu: 3 + total_layers: 33 + job_id: j56y4wnvp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 682.0 + throughput: 1466.275659824047 + estimated_peak_memory_range: + min: 0 + max: 1165008 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 18 + layers_on_npu: 31 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 18 - job_id: jygze78kg + total_layers: 31 + job_id: jp4lr3vq5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:03:01Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:57:02Z' - torchscript_onnx_tflite: - inference_time: 1447.0 - throughput: 691.0850034554251 + inference_time: 1439.0 + throughput: 694.9270326615705 estimated_peak_memory_range: - min: 20480 - max: 1439032 + min: 28672 + max: 1494800 primary_compute_unit: NPU precision: int8 layer_info: @@ -246,37 +267,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 33 - job_id: jqpyedq7g + job_id: jgjvnqe7g job_status: Passed torchscript_onnx_qnn: - inference_time: 685.0 - throughput: 1459.85401459854 + inference_time: 684.0 + throughput: 1461.9883040935672 estimated_peak_memory_range: - min: 81920 - max: 1380480 + min: 73728 + max: 1344312 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 18 + layers_on_npu: 31 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 18 - job_id: j1pv3wej5 + total_layers: 31 + job_id: jprv3wyvg job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:02:57Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:57:05Z' - torchscript_onnx_tflite: - inference_time: 1434.0 - throughput: 697.350069735007 + inference_time: 1453.0 + throughput: 688.2312456985547 estimated_peak_memory_range: - min: 12288 - max: 1390504 + min: 806912 + max: 5299552 primary_compute_unit: NPU precision: int8 layer_info: @@ -284,22 +305,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 33 - job_id: j2p0yrd6g + job_id: jpv6k7v75 job_status: Passed torchscript_onnx_qnn: - inference_time: 720.0 - throughput: 1388.888888888889 + inference_time: 679.0 + throughput: 1472.7540500736377 estimated_peak_memory_range: min: 81920 - max: 1342168 + max: 1741272 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 18 + layers_on_npu: 31 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 18 - job_id: j7gjxloxp + total_layers: 31 + job_id: jgn6vkev5 job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -307,14 +328,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:02:58Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:57:04Z' - torchscript_onnx_tflite: - inference_time: 1449.0 - throughput: 690.1311249137336 + inference_time: 1429.0 + throughput: 699.7900629811056 estimated_peak_memory_range: - min: 20480 - max: 1442000 + min: 24576 + max: 78505088 primary_compute_unit: NPU precision: int8 layer_info: @@ -322,37 +343,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 33 - job_id: j1p8o76xg + job_id: jgo26834p job_status: Passed torchscript_onnx_qnn: - inference_time: 681.0 - throughput: 1468.4287812041116 + inference_time: 723.0 + throughput: 1383.1258644536654 estimated_peak_memory_range: - min: 0 - max: 1413664 + min: 81920 + max: 1427008 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 18 + layers_on_npu: 31 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 18 - job_id: jlpe9v81g + total_layers: 31 + job_id: jpxkoxyj5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:02:59Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:57:03Z' - torchscript_onnx_tflite: - inference_time: 3920.0 - throughput: 255.10204081632654 + inference_time: 1923.0 + throughput: 520.0208008320333 estimated_peak_memory_range: - min: 1609728 - max: 22801536 + min: 16384 + max: 30993168 primary_compute_unit: NPU precision: int8 layer_info: @@ -360,37 +381,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 33 - job_id: jogkzyo2g + job_id: jp3j06exg job_status: Passed torchscript_onnx_qnn: - inference_time: 3175.0 - throughput: 314.96062992125985 + inference_time: 1059.0 + throughput: 944.2870632672333 estimated_peak_memory_range: - min: 65536 - max: 8335184 + min: 12288 + max: 15219760 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 18 + layers_on_npu: 31 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 18 - job_id: jz5wo916p + total_layers: 31 + job_id: jpy13mdrp job_status: Passed reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T12:03:02Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:57:08Z' - torchscript_onnx_tflite: - inference_time: 38110.0 - throughput: 26.239832065074783 + inference_time: 1594.0 + throughput: 627.3525721455458 estimated_peak_memory_range: - min: 1503232 - max: 4870208 + min: 0 + max: 20576960 primary_compute_unit: NPU precision: int8 layer_info: @@ -398,37 +419,67 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 33 - job_id: jn5q82z45 + job_id: jg9lndwqg + job_status: Passed + torchscript_onnx_qnn: + inference_time: 655.0 + throughput: 1526.7175572519084 + estimated_peak_memory_range: + min: 8192 + max: 11344048 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 31 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 31 + job_id: jp8qy17zp + job_status: Passed + torchscript_onnx: + inference_time: 519.0 + throughput: 1926.7822736030828 + estimated_peak_memory_range: + min: 0 + max: 21261808 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 34 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 34 + job_id: jp3j06mxg job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T12:02:52Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:57:15Z' - torchscript_onnx_qnn: - inference_time: 791.0 - throughput: 1264.2225031605562 + inference_time: 814.0 + throughput: 1228.5012285012285 estimated_peak_memory_range: - min: 61440 - max: 61440 + min: 233472 + max: 233472 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 18 + layers_on_npu: 31 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 18 - job_id: j1p3kmxl5 + total_layers: 31 + job_id: j57yr9xq5 job_status: Passed torchscript_onnx: - inference_time: 1104.0 - throughput: 905.7971014492754 + inference_time: 1105.0 + throughput: 904.9773755656108 estimated_peak_memory_range: - min: 3301376 - max: 3301376 + min: 3379200 + max: 3379200 primary_compute_unit: NPU precision: int8 layer_info: @@ -436,7 +487,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 34 - job_id: jvgdwvze5 + job_id: jglvmlke5 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -445,4 +496,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:03:05Z' + timestamp: '2024-10-14T23:57:13Z' diff --git a/qai_hub_models/models/quicksrnetmedium/README.md b/qai_hub_models/models/quicksrnetmedium/README.md index a3adabe4..35ffd138 100644 --- a/qai_hub_models/models/quicksrnetmedium/README.md +++ b/qai_hub_models/models/quicksrnetmedium/README.md @@ -6,7 +6,7 @@ QuickSRNet Medium is designed for upscaling images on mobile platforms to sharpen in real-time. This is based on the implementation of QuickSRNetMedium found -[here](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/quicksrnet). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/quicksrnetmedium). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.quicksrnetmedium.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of QuickSRNetMedium can be found +* The license for the original implementation of QuickSRNetMedium can be found [here](https://github.com/quic/aimet-model-zoo/blob/develop/LICENSE.pdf). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [QuickSRNet: Plain Single-Image Super-Resolution Architecture for Faster Inference on Mobile Platforms](https://arxiv.org/abs/2303.04336) * [Source Model Implementation](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/quicksrnet) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/quicksrnetmedium/export.py b/qai_hub_models/models/quicksrnetmedium/export.py index 42b487ec..dae61e56 100644 --- a/qai_hub_models/models/quicksrnetmedium/export.py +++ b/qai_hub_models/models/quicksrnetmedium/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.quicksrnetmedium import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "quicksrnetmedium" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/quicksrnetmedium/perf.yaml b/qai_hub_models/models/quicksrnetmedium/perf.yaml index 7e0119c1..6ee1eb06 100644 --- a/qai_hub_models/models/quicksrnetmedium/perf.yaml +++ b/qai_hub_models/models/quicksrnetmedium/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: QuickSRNetMedium performance_metrics: - torchscript_onnx_tflite: - inference_time: 1334.0 - throughput: 749.6251874062968 + inference_time: 1359.0 + throughput: 735.8351729212657 estimated_peak_memory_range: - min: 28672 - max: 1477824 + min: 16384 + max: 1656408 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 17 - job_id: j1p8o788g + job_id: jgkex8qvg job_status: Passed torchscript_onnx_qnn: - inference_time: 994.0 - throughput: 1006.0362173038229 + inference_time: 1017.0 + throughput: 983.284169124877 estimated_peak_memory_range: - min: 233472 - max: 7231440 + min: 217088 + max: 2695224 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 17 - job_id: j1pv3w4m5 + job_id: jgz3dnj45 job_status: Passed torchscript_onnx: - inference_time: 1530.0 - throughput: 653.59477124183 + inference_time: 1512.0 + throughput: 661.3756613756614 estimated_peak_memory_range: - min: 28672 - max: 7838800 + min: 40960 + max: 6775640 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 19 - job_id: jz5wo986p + job_id: jpxkox6j5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:02:20Z' + timestamp: '2024-10-14T23:56:22Z' - torchscript_onnx_tflite: - inference_time: 904.0 - throughput: 1106.1946902654868 + inference_time: 981.0 + throughput: 1019.367991845056 estimated_peak_memory_range: - min: 20480 - max: 22514608 + min: 16384 + max: 22545904 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 17 - job_id: jogkzydog + job_id: j5q6qvrep job_status: Passed torchscript_onnx_qnn: - inference_time: 682.0 - throughput: 1466.275659824047 + inference_time: 674.0 + throughput: 1483.679525222552 estimated_peak_memory_range: - min: 208896 - max: 12075760 + min: 204800 + max: 11154848 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 17 - job_id: j7gjxl18p + job_id: j5we64345 job_status: Passed torchscript_onnx: - inference_time: 1367.0 - throughput: 731.528895391368 + inference_time: 1084.0 + throughput: 922.509225092251 estimated_peak_memory_range: min: 0 - max: 24822464 + max: 24816912 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 19 - job_id: jmg9v4kl5 + job_id: j5mnx86yp job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:02:21Z' + timestamp: '2024-10-14T23:56:23Z' - torchscript_onnx_tflite: - inference_time: 1405.0 - throughput: 711.7437722419929 + inference_time: 1333.0 + throughput: 750.1875468867216 estimated_peak_memory_range: - min: 28672 - max: 1392592 + min: 36864 + max: 1384704 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 17 - job_id: jn5q82wm5 + job_id: jglvml225 job_status: Passed torchscript_onnx_qnn: - inference_time: 934.0 - throughput: 1070.6638115631692 + inference_time: 910.0 + throughput: 1098.901098901099 estimated_peak_memory_range: - min: 0 - max: 1310736 + min: 221184 + max: 1362640 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 17 - job_id: jygze7w6g + job_id: jgdx12q6p job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:02:14Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:56:14Z' - torchscript_onnx_tflite: - inference_time: 2043.0 - throughput: 489.47626040137055 + inference_time: 1366.0 + throughput: 732.0644216691069 estimated_peak_memory_range: - min: 6307840 - max: 29877280 + min: 24576 + max: 1448016 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 17 - job_id: j1glnk7lp + job_id: jpv6k7rz5 job_status: Passed torchscript_onnx_qnn: - inference_time: 1250.0 - throughput: 800.0 + inference_time: 932.0 + throughput: 1072.961373390558 estimated_peak_memory_range: - min: 208896 - max: 13835424 + min: 221184 + max: 1817048 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 17 - job_id: jvgdwv0l5 + job_id: jp14z6wkp job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:02:19Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:56:18Z' - torchscript_onnx_tflite: - inference_time: 1373.0 - throughput: 728.3321194464676 + inference_time: 1411.0 + throughput: 708.7172218284904 estimated_peak_memory_range: - min: 28672 - max: 1643312 + min: 36864 + max: 1406160 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 17 - job_id: jw5661v75 + job_id: jgo268n1p job_status: Passed torchscript_onnx_qnn: - inference_time: 919.0 - throughput: 1088.139281828074 + inference_time: 1003.0 + throughput: 997.0089730807578 estimated_peak_memory_range: - min: 217088 - max: 1431720 + min: 221184 + max: 1640680 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 17 - job_id: jz5wo9xjp + job_id: jg9lndyqg job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:02:15Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:56:16Z' - torchscript_onnx_tflite: - inference_time: 1333.0 - throughput: 750.1875468867216 + inference_time: 1324.0 + throughput: 755.2870090634441 estimated_peak_memory_range: min: 24576 - max: 1289536 + max: 1500104 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 17 - job_id: j1p3km8z5 + job_id: jp3j061mg job_status: Passed torchscript_onnx_qnn: - inference_time: 938.0 - throughput: 1066.0980810234541 + inference_time: 925.0 + throughput: 1081.081081081081 estimated_peak_memory_range: - min: 266240 - max: 4960960 + min: 229376 + max: 1460104 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 17 - job_id: jmg9v48v5 + job_id: j5we643z5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:02:17Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:56:15Z' - torchscript_onnx_tflite: - inference_time: 1343.0 - throughput: 744.6016381236038 + inference_time: 2746.0 + throughput: 364.1660597232338 estimated_peak_memory_range: - min: 1007616 - max: 2372504 + min: 6316032 + max: 29582256 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 17 - job_id: jwgoyvmd5 + job_id: j56y4wznp job_status: Passed torchscript_onnx_qnn: - inference_time: 1004.0 - throughput: 996.01593625498 + inference_time: 1234.0 + throughput: 810.3727714748784 estimated_peak_memory_range: - min: 229376 - max: 4888840 + min: 204800 + max: 15147392 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,22 +329,75 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 17 - job_id: jnp1083l5 + job_id: j57yr9lq5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:02:18Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:56:20Z' + - torchscript_onnx_tflite: + inference_time: 971.0 + throughput: 1029.8661174047375 + estimated_peak_memory_range: + min: 16384 + max: 15718240 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 14 + layers_on_gpu: 0 + layers_on_cpu: 3 + total_layers: 17 + job_id: jpedmyw85 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 684.0 + throughput: 1461.9883040935672 + estimated_peak_memory_range: + min: 0 + max: 8908608 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 17 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 17 + job_id: jp4lr3dq5 + job_status: Passed + torchscript_onnx: + inference_time: 925.0 + throughput: 1081.081081081081 + estimated_peak_memory_range: + min: 0 + max: 16116864 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 19 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 19 + job_id: jp2kyelxp + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:56:26Z' - torchscript_onnx_qnn: - inference_time: 1039.0 - throughput: 962.4639076034649 + inference_time: 1035.0 + throughput: 966.1835748792271 estimated_peak_memory_range: - min: 212992 - max: 212992 + min: 208896 + max: 208896 primary_compute_unit: NPU precision: fp16 layer_info: @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 17 - job_id: jlpe9v20g + job_id: jp14z6wnp job_status: Passed torchscript_onnx: - inference_time: 1515.0 - throughput: 660.0660066006601 + inference_time: 1552.0 + throughput: 644.3298969072165 estimated_peak_memory_range: - min: 8982528 - max: 8982528 + min: 8929280 + max: 8929280 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 19 - job_id: jnp108725 + job_id: jgn6vk3v5 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:02:22Z' + timestamp: '2024-10-14T23:56:24Z' diff --git a/qai_hub_models/models/quicksrnetmedium_quantized/README.md b/qai_hub_models/models/quicksrnetmedium_quantized/README.md index ed5b04f5..64ae7804 100644 --- a/qai_hub_models/models/quicksrnetmedium_quantized/README.md +++ b/qai_hub_models/models/quicksrnetmedium_quantized/README.md @@ -6,7 +6,7 @@ QuickSRNet Medium is designed for upscaling images on mobile platforms to sharpen in real-time. This is based on the implementation of QuickSRNetMedium-Quantized found -[here](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/quicksrnet). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/quicksrnetmedium_quantized). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.quicksrnetmedium_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of QuickSRNetMedium-Quantized can be found +* The license for the original implementation of QuickSRNetMedium-Quantized can be found [here](https://github.com/quic/aimet-model-zoo/blob/develop/LICENSE.pdf). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [QuickSRNet: Plain Single-Image Super-Resolution Architecture for Faster Inference on Mobile Platforms](https://arxiv.org/abs/2303.04336) * [Source Model Implementation](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/quicksrnet) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/quicksrnetmedium_quantized/export.py b/qai_hub_models/models/quicksrnetmedium_quantized/export.py index 767f0ab4..83e43b06 100644 --- a/qai_hub_models/models/quicksrnetmedium_quantized/export.py +++ b/qai_hub_models/models/quicksrnetmedium_quantized/export.py @@ -10,17 +10,17 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.quicksrnetmedium_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.printing import ( print_inference_metrics, @@ -45,20 +45,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -80,10 +78,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "quicksrnetmedium_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,7 +107,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( target_runtime, output_path, input_spec ) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -135,7 +133,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +148,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +169,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -198,7 +196,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/quicksrnetmedium_quantized/perf.yaml b/qai_hub_models/models/quicksrnetmedium_quantized/perf.yaml index 08cf7fef..e69a2b3c 100644 --- a/qai_hub_models/models/quicksrnetmedium_quantized/perf.yaml +++ b/qai_hub_models/models/quicksrnetmedium_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,44 +20,41 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS6490 (Proxy) - RB3 Gen 2 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Sa8775p Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Qcs6490 Proxy + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: QuickSRNetMedium-Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 1111.0 - throughput: 900.0900090009001 + inference_time: 1127.0 + throughput: 887.3114463176574 estimated_peak_memory_range: - min: 28672 - max: 1464480 + min: 831488 + max: 66800768 primary_compute_unit: NPU precision: int8 layer_info: @@ -64,29 +62,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 19 - job_id: j2p0yreeg + job_id: jgkexn7wg job_status: Passed torchscript_onnx_qnn: - inference_time: 512.0 - throughput: 1953.125 + inference_time: 519.0 + throughput: 1926.7822736030828 estimated_peak_memory_range: - min: 69632 - max: 66788144 + min: 12288 + max: 3826384 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 10 + layers_on_npu: 17 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 10 - job_id: j7gjxlk8p + total_layers: 17 + job_id: jp0z0q295 job_status: Passed torchscript_onnx: - inference_time: 771.0 - throughput: 1297.0168612191958 + inference_time: 676.0 + throughput: 1479.2899408284025 estimated_peak_memory_range: - min: 69632 - max: 1463824 + min: 65536 + max: 1402392 primary_compute_unit: NPU precision: int8 layer_info: @@ -94,7 +92,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 19 - job_id: j0pxv1m9g + job_id: jgdx19orp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -103,13 +101,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:01:43Z' + timestamp: '2024-10-15T17:24:40Z' - torchscript_onnx_tflite: - inference_time: 910.0 - throughput: 1098.901098901099 + inference_time: 899.0 + throughput: 1112.3470522803113 estimated_peak_memory_range: min: 16384 - max: 23857536 + max: 23513088 primary_compute_unit: NPU precision: int8 layer_info: @@ -117,29 +115,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 19 - job_id: j1p8o7w8g + job_id: jglvmz6j5 job_status: Passed torchscript_onnx_qnn: - inference_time: 470.0 - throughput: 2127.659574468085 + inference_time: 359.0 + throughput: 2785.515320334262 estimated_peak_memory_range: - min: 12288 - max: 12497168 + min: 65536 + max: 11297680 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 10 + layers_on_npu: 17 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 10 - job_id: jlpe9v40g + total_layers: 17 + job_id: jp8qy9mkp job_status: Passed torchscript_onnx: - inference_time: 550.0 - throughput: 1818.1818181818182 + inference_time: 503.0 + throughput: 1988.0715705765408 estimated_peak_memory_range: min: 0 - max: 24443968 + max: 24442864 primary_compute_unit: NPU precision: int8 layer_info: @@ -147,7 +145,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 19 - job_id: jo5mrz4qg + job_id: jpxkojy35 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -156,13 +154,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:01:43Z' + timestamp: '2024-10-15T17:24:41Z' - torchscript_onnx_tflite: - inference_time: 1116.0 - throughput: 896.0573476702509 + inference_time: 3558.0 + throughput: 281.0567734682406 estimated_peak_memory_range: - min: 16384 - max: 5672664 + min: 1622016 + max: 17850784 primary_compute_unit: NPU precision: int8 layer_info: @@ -170,37 +168,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 19 - job_id: jogkzyrog + job_id: jprv3q20g job_status: Passed torchscript_onnx_qnn: - inference_time: 413.0 - throughput: 2421.3075060532688 + inference_time: 1050.0 + throughput: 952.3809523809524 estimated_peak_memory_range: - min: 81920 - max: 1274688 + min: 61440 + max: 8135440 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 10 + layers_on_npu: 17 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 10 - job_id: jz5wo9mjp + total_layers: 17 + job_id: jgz3d9ro5 job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:01:37Z' + chipset: QCS6490 Proxy + timestamp: '2024-10-15T17:24:38Z' - torchscript_onnx_tflite: - inference_time: 1842.0 - throughput: 542.8881650380022 + inference_time: 12711.0 + throughput: 78.6720163637794 estimated_peak_memory_range: - min: 1605632 - max: 25678304 + min: 1748992 + max: 7692808 primary_compute_unit: NPU precision: int8 layer_info: @@ -208,37 +206,60 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 19 - job_id: jn5q829m5 + job_id: jp2ky69rp + job_status: Passed + reference_device_info: + name: RB5 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8250 Proxy + timestamp: '2024-10-15T17:24:27Z' + - torchscript_onnx_tflite: + inference_time: 1115.0 + throughput: 896.8609865470852 + estimated_peak_memory_range: + min: 20480 + max: 1408456 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 16 + layers_on_gpu: 0 + layers_on_cpu: 3 + total_layers: 19 + job_id: jp3j03v3g job_status: Passed torchscript_onnx_qnn: - inference_time: 573.0 - throughput: 1745.2006980802792 + inference_time: 412.0 + throughput: 2427.1844660194174 estimated_peak_memory_range: - min: 65536 - max: 13607824 + min: 77824 + max: 2562544 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 10 + layers_on_npu: 17 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 10 - job_id: jz57zd6rp + total_layers: 17 + job_id: j5q6qkrnp job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:01:41Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T17:24:33Z' - torchscript_onnx_tflite: - inference_time: 1119.0 - throughput: 893.6550491510277 + inference_time: 1106.0 + throughput: 904.1591320072333 estimated_peak_memory_range: min: 28672 - max: 1358744 + max: 5821904 primary_compute_unit: NPU precision: int8 layer_info: @@ -246,37 +267,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 19 - job_id: j1glnkelp + job_id: j57yrwlv5 job_status: Passed torchscript_onnx_qnn: inference_time: 413.0 throughput: 2421.3075060532688 estimated_peak_memory_range: - min: 81920 - max: 1262552 + min: 28672 + max: 1874272 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 10 + layers_on_npu: 17 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 10 - job_id: jmg9v49v5 + total_layers: 17 + job_id: jp3j0313g job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:01:38Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T17:24:36Z' - torchscript_onnx_tflite: - inference_time: 1124.0 - throughput: 889.6797153024911 + inference_time: 1134.0 + throughput: 881.8342151675485 estimated_peak_memory_range: - min: 20480 - max: 5770984 + min: 1601536 + max: 71370032 primary_compute_unit: NPU precision: int8 layer_info: @@ -284,22 +305,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 19 - job_id: jw5661q75 + job_id: jp14zlw8p job_status: Passed torchscript_onnx_qnn: - inference_time: 413.0 - throughput: 2421.3075060532688 + inference_time: 416.0 + throughput: 2403.846153846154 estimated_peak_memory_range: - min: 81920 - max: 2355968 + min: 16384 + max: 1941952 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 10 + layers_on_npu: 17 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 10 - job_id: jnp108ql5 + total_layers: 17 + job_id: j56y4jz6p job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -307,14 +328,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:01:39Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T17:24:35Z' - torchscript_onnx_tflite: - inference_time: 1134.0 - throughput: 881.8342151675485 + inference_time: 1108.0 + throughput: 902.5270758122743 estimated_peak_memory_range: - min: 40960 - max: 1397032 + min: 20480 + max: 3105112 primary_compute_unit: NPU precision: int8 layer_info: @@ -322,37 +343,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 19 - job_id: j1p3kmqz5 + job_id: j5we6v335 job_status: Passed torchscript_onnx_qnn: inference_time: 415.0 throughput: 2409.6385542168673 estimated_peak_memory_range: - min: 90112 - max: 1416856 + min: 81920 + max: 1384840 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 10 + layers_on_npu: 17 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 10 - job_id: jvgdwv7l5 + total_layers: 17 + job_id: jglvmz2j5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:01:40Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T17:24:34Z' - torchscript_onnx_tflite: - inference_time: 2491.0 - throughput: 401.4452027298274 + inference_time: 1368.0 + throughput: 730.9941520467836 estimated_peak_memory_range: - min: 1617920 - max: 17432640 + min: 16384 + max: 24876320 primary_compute_unit: NPU precision: int8 layer_info: @@ -360,37 +381,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 19 - job_id: jwgoyved5 + job_id: jgjvnm2vg job_status: Passed torchscript_onnx_qnn: - inference_time: 1089.0 - throughput: 918.2736455463728 + inference_time: 581.0 + throughput: 1721.170395869191 estimated_peak_memory_range: - min: 28672 - max: 7778400 + min: 65536 + max: 13853008 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 10 + layers_on_npu: 17 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 10 - job_id: jqp4qw8lg + total_layers: 17 + job_id: jpv6kovk5 job_status: Passed reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T12:01:42Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-15T17:24:38Z' - torchscript_onnx_tflite: - inference_time: 11336.0 - throughput: 88.21453775582216 + inference_time: 845.0 + throughput: 1183.4319526627219 estimated_peak_memory_range: - min: 1798144 - max: 4980664 + min: 16384 + max: 15947360 primary_compute_unit: NPU precision: int8 layer_info: @@ -398,37 +419,67 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 19 - job_id: j1pv3wzm5 + job_id: jpy13wj8p + job_status: Passed + torchscript_onnx_qnn: + inference_time: 303.0 + throughput: 3300.3300330033003 + estimated_peak_memory_range: + min: 57344 + max: 9609648 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 17 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 17 + job_id: jg9ln1wwg + job_status: Passed + torchscript_onnx: + inference_time: 373.0 + throughput: 2680.9651474530833 + estimated_peak_memory_range: + min: 0 + max: 15758176 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 19 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 19 + job_id: jgkexn3wg job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T12:01:33Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T17:24:44Z' - torchscript_onnx_qnn: - inference_time: 516.0 - throughput: 1937.984496124031 + inference_time: 521.0 + throughput: 1919.3857965451057 estimated_peak_memory_range: - min: 69632 - max: 69632 + min: 229376 + max: 229376 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 10 + layers_on_npu: 17 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 10 - job_id: jygze7v6g + total_layers: 17 + job_id: jgkexnqwg job_status: Passed torchscript_onnx: - inference_time: 759.0 - throughput: 1317.5230566534915 + inference_time: 777.0 + throughput: 1287.001287001287 estimated_peak_memory_range: - min: 3301376 - max: 3301376 + min: 3325952 + max: 3325952 primary_compute_unit: NPU precision: int8 layer_info: @@ -436,7 +487,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 19 - job_id: jegn2exmg + job_id: jprv3qe0g job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -445,4 +496,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:01:44Z' + timestamp: '2024-10-15T17:24:42Z' diff --git a/qai_hub_models/models/quicksrnetsmall/README.md b/qai_hub_models/models/quicksrnetsmall/README.md index 8b3c02c1..17dc918e 100644 --- a/qai_hub_models/models/quicksrnetsmall/README.md +++ b/qai_hub_models/models/quicksrnetsmall/README.md @@ -6,7 +6,7 @@ QuickSRNet Small is designed for upscaling images on mobile platforms to sharpen in real-time. This is based on the implementation of QuickSRNetSmall found -[here](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/quicksrnet). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/quicksrnetsmall). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.quicksrnetsmall.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of QuickSRNetSmall can be found +* The license for the original implementation of QuickSRNetSmall can be found [here](https://github.com/quic/aimet-model-zoo/blob/develop/LICENSE.pdf). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [QuickSRNet: Plain Single-Image Super-Resolution Architecture for Faster Inference on Mobile Platforms](https://arxiv.org/abs/2303.04336) * [Source Model Implementation](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/quicksrnet) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/quicksrnetsmall/export.py b/qai_hub_models/models/quicksrnetsmall/export.py index 7adf31b5..ce3dfe17 100644 --- a/qai_hub_models/models/quicksrnetsmall/export.py +++ b/qai_hub_models/models/quicksrnetsmall/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.quicksrnetsmall import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "quicksrnetsmall" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/quicksrnetsmall/perf.yaml b/qai_hub_models/models/quicksrnetsmall/perf.yaml index 4ecdcd4c..6b7af2d2 100644 --- a/qai_hub_models/models/quicksrnetsmall/perf.yaml +++ b/qai_hub_models/models/quicksrnetsmall/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: QuickSRNetSmall performance_metrics: - torchscript_onnx_tflite: - inference_time: 1298.0 - throughput: 770.4160246533128 + inference_time: 1340.0 + throughput: 746.2686567164179 estimated_peak_memory_range: - min: 7168000 - max: 79020744 + min: 16384 + max: 8878600 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 11 - job_id: jogkzylog + job_id: jgdx12x6p job_status: Passed torchscript_onnx_qnn: - inference_time: 1010.0 - throughput: 990.0990099009902 + inference_time: 1061.0 + throughput: 942.5070688030161 estimated_peak_memory_range: - min: 12288 - max: 65498344 + min: 217088 + max: 7570656 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 11 - job_id: j7gjxl08p + job_id: jp0z06405 job_status: Passed torchscript_onnx: - inference_time: 1455.0 - throughput: 687.2852233676975 + inference_time: 1440.0 + throughput: 694.4444444444445 estimated_peak_memory_range: - min: 212992 - max: 11710960 + min: 217088 + max: 1760856 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 13 - job_id: jqp4qwjlg + job_id: jpedmyo85 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:01:03Z' + timestamp: '2024-10-14T23:54:49Z' - torchscript_onnx_tflite: - inference_time: 937.0 - throughput: 1067.2358591248667 + inference_time: 830.0 + throughput: 1204.8192771084337 estimated_peak_memory_range: min: 16384 - max: 21546912 + max: 21265472 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 11 - job_id: jn5q827m5 + job_id: j57yr9yn5 job_status: Passed torchscript_onnx_qnn: - inference_time: 636.0 - throughput: 1572.3270440251572 + inference_time: 638.0 + throughput: 1567.398119122257 estimated_peak_memory_range: - min: 208896 - max: 10675392 + min: 204800 + max: 12254000 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 11 - job_id: jlpe9vr0g + job_id: jp8qy12qp job_status: Passed torchscript_onnx: - inference_time: 987.0 - throughput: 1013.1712259371834 + inference_time: 997.0 + throughput: 1003.0090270812437 estimated_peak_memory_range: min: 0 - max: 22202016 + max: 21946496 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 13 - job_id: j0pxv1e9g + job_id: jgz3dn245 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:01:04Z' + timestamp: '2024-10-14T23:54:50Z' - torchscript_onnx_tflite: - inference_time: 1324.0 - throughput: 755.2870090634441 + inference_time: 1360.0 + throughput: 735.2941176470588 estimated_peak_memory_range: - min: 16384 - max: 3401384 + min: 24576 + max: 1367264 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 11 - job_id: j1glnk0lp + job_id: jp4lr3l25 job_status: Passed torchscript_onnx_qnn: - inference_time: 846.0 - throughput: 1182.033096926714 + inference_time: 863.0 + throughput: 1158.7485515643104 estimated_peak_memory_range: - min: 0 - max: 1412928 + min: 221184 + max: 1497272 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 11 - job_id: jz5wo9djp + job_id: j5q6qv0ep job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:00:58Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:54:42Z' - torchscript_onnx_tflite: - inference_time: 1782.0 - throughput: 561.1672278338945 + inference_time: 1413.0 + throughput: 707.7140835102618 estimated_peak_memory_range: min: 16384 - max: 22383920 + max: 23004864 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 11 - job_id: jw5661375 + job_id: jprv3wvkg job_status: Passed torchscript_onnx_qnn: - inference_time: 1120.0 - throughput: 892.8571428571429 + inference_time: 877.0 + throughput: 1140.2508551881415 estimated_peak_memory_range: - min: 45056 - max: 11723152 + min: 28672 + max: 3997856 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 11 - job_id: jz57zdvrp + job_id: jp3j06nmg job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:01:02Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:54:45Z' - torchscript_onnx_tflite: - inference_time: 1297.0 - throughput: 771.0100231303007 + inference_time: 1361.0 + throughput: 734.7538574577517 estimated_peak_memory_range: - min: 16384 - max: 9217824 + min: 32768 + max: 1463672 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 11 - job_id: j1p3km4z5 + job_id: jgn6vk6j5 job_status: Passed torchscript_onnx_qnn: - inference_time: 864.0 - throughput: 1157.4074074074074 + inference_time: 863.0 + throughput: 1158.7485515643104 estimated_peak_memory_range: - min: 28672 - max: 4037384 + min: 233472 + max: 1538088 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 11 - job_id: jmg9v43v5 + job_id: j56y4w2np job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:00:59Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:54:44Z' - torchscript_onnx_tflite: - inference_time: 1429.0 - throughput: 699.7900629811056 + inference_time: 1316.0 + throughput: 759.8784194528876 estimated_peak_memory_range: - min: 16384 - max: 7894552 + min: 28672 + max: 1429920 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 11 - job_id: jwgoyv1d5 + job_id: j5mnx8n7p job_status: Passed torchscript_onnx_qnn: - inference_time: 858.0 - throughput: 1165.5011655011656 + inference_time: 872.0 + throughput: 1146.788990825688 estimated_peak_memory_range: min: 229376 - max: 1506264 + max: 1540536 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 11 - job_id: jnp108dl5 + job_id: jglvml425 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:01:00Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:54:43Z' - torchscript_onnx_tflite: - inference_time: 1356.0 - throughput: 737.4631268436578 + inference_time: 2807.0 + throughput: 356.2522265764161 estimated_peak_memory_range: - min: 1384448 - max: 2780032 + min: 16384 + max: 21002512 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 11 - job_id: j1pv3w1m5 + job_id: jpxkoxk85 job_status: Passed torchscript_onnx_qnn: - inference_time: 879.0 - throughput: 1137.6564277588168 + inference_time: 1118.0 + throughput: 894.4543828264758 estimated_peak_memory_range: - min: 258048 - max: 1769616 + min: 208896 + max: 13023728 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,22 +329,75 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 11 - job_id: jvgdwvrl5 + job_id: jpv6k7qz5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:54:47Z' + - torchscript_onnx_tflite: + inference_time: 746.0 + throughput: 1340.4825737265417 + estimated_peak_memory_range: + min: 0 + max: 14930928 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 8 + layers_on_gpu: 0 + layers_on_cpu: 3 + total_layers: 11 + job_id: jpy13m10p + job_status: Passed + torchscript_onnx_qnn: + inference_time: 663.0 + throughput: 1508.2956259426849 + estimated_peak_memory_range: + min: 208896 + max: 9152912 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 11 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 11 + job_id: jgjvnqd1g + job_status: Passed + torchscript_onnx: + inference_time: 968.0 + throughput: 1033.0578512396694 + estimated_peak_memory_range: + min: 0 + max: 15075680 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 13 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 13 + job_id: jp14z62np + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:01:01Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:54:53Z' - torchscript_onnx_qnn: - inference_time: 935.0 - throughput: 1069.51871657754 + inference_time: 942.0 + throughput: 1061.5711252653928 estimated_peak_memory_range: - min: 212992 - max: 212992 + min: 204800 + max: 204800 primary_compute_unit: NPU precision: fp16 layer_info: @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 11 - job_id: jygze7x6g + job_id: jgkex8vvg job_status: Passed torchscript_onnx: - inference_time: 1453.0 - throughput: 688.2312456985547 + inference_time: 1468.0 + throughput: 681.1989100817439 estimated_peak_memory_range: - min: 8908800 - max: 8908800 + min: 8978432 + max: 8978432 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 13 - job_id: jo5mrzvqg + job_id: j5we64w45 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:01:05Z' + timestamp: '2024-10-14T23:54:51Z' diff --git a/qai_hub_models/models/quicksrnetsmall_quantized/README.md b/qai_hub_models/models/quicksrnetsmall_quantized/README.md index 9eb783fb..415b8d00 100644 --- a/qai_hub_models/models/quicksrnetsmall_quantized/README.md +++ b/qai_hub_models/models/quicksrnetsmall_quantized/README.md @@ -6,7 +6,7 @@ QuickSRNet Small is designed for upscaling images on mobile platforms to sharpen in real-time. This is based on the implementation of QuickSRNetSmall-Quantized found -[here](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/quicksrnet). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/quicksrnetsmall_quantized). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.quicksrnetsmall_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of QuickSRNetSmall-Quantized can be found +* The license for the original implementation of QuickSRNetSmall-Quantized can be found [here](https://github.com/quic/aimet-model-zoo/blob/develop/LICENSE.pdf). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [QuickSRNet: Plain Single-Image Super-Resolution Architecture for Faster Inference on Mobile Platforms](https://arxiv.org/abs/2303.04336) * [Source Model Implementation](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/quicksrnet) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/quicksrnetsmall_quantized/export.py b/qai_hub_models/models/quicksrnetsmall_quantized/export.py index 2fa4da5b..4b071aaf 100644 --- a/qai_hub_models/models/quicksrnetsmall_quantized/export.py +++ b/qai_hub_models/models/quicksrnetsmall_quantized/export.py @@ -10,17 +10,17 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.quicksrnetsmall_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.printing import ( print_inference_metrics, @@ -45,20 +45,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -80,10 +78,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "quicksrnetsmall_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,7 +107,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( target_runtime, output_path, input_spec ) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -135,7 +133,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +148,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +169,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -198,7 +196,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/quicksrnetsmall_quantized/perf.yaml b/qai_hub_models/models/quicksrnetsmall_quantized/perf.yaml index 115392b3..9f7b027d 100644 --- a/qai_hub_models/models/quicksrnetsmall_quantized/perf.yaml +++ b/qai_hub_models/models/quicksrnetsmall_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,44 +20,41 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS6490 (Proxy) - RB3 Gen 2 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Sa8775p Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Qcs6490 Proxy + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: QuickSRNetSmall-Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 1072.0 - throughput: 932.8358208955224 + inference_time: 1083.0 + throughput: 923.3610341643582 estimated_peak_memory_range: - min: 16384 - max: 2049624 + min: 196608 + max: 1592560 primary_compute_unit: NPU precision: int8 layer_info: @@ -64,29 +62,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 13 - job_id: j1p8o778g + job_id: j5we646m5 job_status: Passed torchscript_onnx_qnn: inference_time: 466.0 throughput: 2145.922746781116 estimated_peak_memory_range: - min: 53248 - max: 65720216 + min: 69632 + max: 2296112 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 7 + layers_on_npu: 11 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 7 - job_id: jlpe9vv0g + total_layers: 11 + job_id: j5mnx8x7p job_status: Passed torchscript_onnx: - inference_time: 706.0 - throughput: 1416.4305949008499 + inference_time: 649.0 + throughput: 1540.8320493066255 estimated_peak_memory_range: min: 65536 - max: 1305568 + max: 1385928 primary_compute_unit: NPU precision: int8 layer_info: @@ -94,7 +92,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 13 - job_id: jo5mrzwqg + job_id: jp3j06jmg job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -103,13 +101,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T12:00:26Z' + timestamp: '2024-10-14T23:54:05Z' - torchscript_onnx_tflite: - inference_time: 924.0 - throughput: 1082.2510822510822 + inference_time: 891.0 + throughput: 1122.334455667789 estimated_peak_memory_range: min: 12288 - max: 20972496 + max: 21241088 primary_compute_unit: NPU precision: int8 layer_info: @@ -117,29 +115,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 13 - job_id: jogkzyyog + job_id: jg9lndn8g job_status: Passed torchscript_onnx_qnn: - inference_time: 320.0 - throughput: 3125.0 + inference_time: 424.0 + throughput: 2358.490566037736 estimated_peak_memory_range: min: 65536 - max: 10499312 + max: 10591200 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 7 + layers_on_npu: 11 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 7 - job_id: jygze776g + total_layers: 11 + job_id: jgn6vkvj5 job_status: Passed torchscript_onnx: - inference_time: 526.0 - throughput: 1901.1406844106464 + inference_time: 489.0 + throughput: 2044.9897750511248 estimated_peak_memory_range: min: 0 - max: 21326464 + max: 21853616 primary_compute_unit: NPU precision: int8 layer_info: @@ -147,7 +145,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 13 - job_id: jegn2e9mg + job_id: jgo26821p job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -156,13 +154,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T12:00:27Z' + timestamp: '2024-10-14T23:54:06Z' - torchscript_onnx_tflite: - inference_time: 1096.0 - throughput: 912.4087591240876 + inference_time: 2310.0 + throughput: 432.9004329004329 estimated_peak_memory_range: - min: 24576 - max: 1363104 + min: 1601536 + max: 16737120 primary_compute_unit: NPU precision: int8 layer_info: @@ -170,37 +168,60 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 13 - job_id: jn5q822m5 + job_id: j57yr9rn5 job_status: Passed torchscript_onnx_qnn: - inference_time: 390.0 - throughput: 2564.102564102564 + inference_time: 957.0 + throughput: 1044.932079414838 estimated_peak_memory_range: - min: 77824 - max: 1338264 + min: 16384 + max: 7769488 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 7 + layers_on_npu: 11 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 7 - job_id: jmg9v44v5 + total_layers: 11 + job_id: jglvmlv25 job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T12:00:20Z' + chipset: QCS6490 Proxy + timestamp: '2024-10-14T23:54:03Z' - torchscript_onnx_tflite: - inference_time: 1597.0 - throughput: 626.1740763932373 + inference_time: 10718.0 + throughput: 93.3009889904833 estimated_peak_memory_range: - min: 16384 - max: 22560000 + min: 1224704 + max: 4587208 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 10 + layers_on_gpu: 0 + layers_on_cpu: 3 + total_layers: 13 + job_id: jp4lr3r25 + job_status: Passed + reference_device_info: + name: RB5 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8250 Proxy + timestamp: '2024-10-14T23:53:51Z' + - torchscript_onnx_tflite: + inference_time: 1076.0 + throughput: 929.368029739777 + estimated_peak_memory_range: + min: 32768 + max: 1386368 primary_compute_unit: NPU precision: int8 layer_info: @@ -208,37 +229,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 13 - job_id: j1glnkklp + job_id: jp14z6z7p job_status: Passed torchscript_onnx_qnn: - inference_time: 545.0 - throughput: 1834.8623853211009 + inference_time: 385.0 + throughput: 2597.4025974025976 estimated_peak_memory_range: - min: 65536 - max: 12065680 + min: 0 + max: 1169280 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 7 + layers_on_npu: 11 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 7 - job_id: jqp4qwxlg + total_layers: 11 + job_id: jp2kyey6p job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T12:00:24Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:53:56Z' - torchscript_onnx_tflite: - inference_time: 1069.0 - throughput: 935.4536950420954 + inference_time: 1070.0 + throughput: 934.5794392523364 estimated_peak_memory_range: - min: 12288 - max: 5594104 + min: 24576 + max: 1480888 primary_compute_unit: NPU precision: int8 layer_info: @@ -246,37 +267,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 13 - job_id: jw5661175 + job_id: jp14z6znp job_status: Passed torchscript_onnx_qnn: inference_time: 390.0 throughput: 2564.102564102564 estimated_peak_memory_range: - min: 122880 - max: 1772512 + min: 73728 + max: 1311440 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 7 + layers_on_npu: 11 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 7 - job_id: jnp1088l5 + total_layers: 11 + job_id: jp8qy1qqp job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T12:00:21Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:53:59Z' - torchscript_onnx_tflite: - inference_time: 1088.0 - throughput: 919.1176470588235 + inference_time: 1133.0 + throughput: 882.61253309797 estimated_peak_memory_range: - min: 40960 - max: 1450760 + min: 28672 + max: 1411208 primary_compute_unit: NPU precision: int8 layer_info: @@ -284,22 +305,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 13 - job_id: j1p3kmmz5 + job_id: jg9lndnmg job_status: Passed torchscript_onnx_qnn: - inference_time: 392.0 - throughput: 2551.0204081632655 + inference_time: 393.0 + throughput: 2544.529262086514 estimated_peak_memory_range: - min: 86016 - max: 1307992 + min: 81920 + max: 1500024 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 7 + layers_on_npu: 11 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 7 - job_id: jvgdwvvl5 + total_layers: 11 + job_id: jp0z06z05 job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -307,14 +328,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T12:00:22Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:53:58Z' - torchscript_onnx_tflite: - inference_time: 1067.0 - throughput: 937.207122774133 + inference_time: 1087.0 + throughput: 919.9632014719411 estimated_peak_memory_range: - min: 32768 - max: 1628848 + min: 16384 + max: 5843504 primary_compute_unit: NPU precision: int8 layer_info: @@ -322,37 +343,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 13 - job_id: jwgoyvvd5 + job_id: j5we64645 job_status: Passed torchscript_onnx_qnn: - inference_time: 396.0 - throughput: 2525.252525252525 + inference_time: 393.0 + throughput: 2544.529262086514 estimated_peak_memory_range: - min: 65536 - max: 1322144 + min: 81920 + max: 1487880 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 7 + layers_on_npu: 11 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 7 - job_id: jz57zdjrp + total_layers: 11 + job_id: jpy13m30p job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T12:00:23Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:53:57Z' - torchscript_onnx_tflite: - inference_time: 3216.0 - throughput: 310.9452736318408 + inference_time: 1451.0 + throughput: 689.1798759476223 estimated_peak_memory_range: - min: 16384 - max: 15786608 + min: 1634304 + max: 24164512 primary_compute_unit: NPU precision: int8 layer_info: @@ -360,37 +381,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 13 - job_id: j1pv3wwm5 + job_id: jgdx121zp job_status: Passed torchscript_onnx_qnn: - inference_time: 979.0 - throughput: 1021.4504596527069 + inference_time: 539.0 + throughput: 1855.287569573284 estimated_peak_memory_range: - min: 61440 - max: 7831568 + min: 0 + max: 11135840 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 7 + layers_on_npu: 11 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 7 - job_id: j0pxv179g + total_layers: 11 + job_id: j5q6qv6ep job_status: Passed reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T12:00:25Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:54:02Z' - torchscript_onnx_tflite: - inference_time: 11413.0 - throughput: 87.61938140716727 + inference_time: 1268.0 + throughput: 788.6435331230284 estimated_peak_memory_range: - min: 1712128 - max: 7502224 + min: 16384 + max: 14624368 primary_compute_unit: NPU precision: int8 layer_info: @@ -398,37 +419,67 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 13 - job_id: j7gjxll8p + job_id: jpxkoxo85 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 336.0 + throughput: 2976.190476190476 + estimated_peak_memory_range: + min: 86016 + max: 9037360 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 11 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 11 + job_id: j56y4wynp + job_status: Passed + torchscript_onnx: + inference_time: 444.0 + throughput: 2252.252252252252 + estimated_peak_memory_range: + min: 57344 + max: 14252928 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 13 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 13 + job_id: jpedmyd85 job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T12:00:16Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:54:09Z' - torchscript_onnx_qnn: - inference_time: 494.0 - throughput: 2024.2914979757086 + inference_time: 482.0 + throughput: 2074.688796680498 estimated_peak_memory_range: - min: 61440 - max: 61440 + min: 139264 + max: 139264 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 7 + layers_on_npu: 11 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 7 - job_id: jz5wo99jp + total_layers: 11 + job_id: jprv3w3kg job_status: Passed torchscript_onnx: - inference_time: 705.0 - throughput: 1418.4397163120568 + inference_time: 736.0 + throughput: 1358.695652173913 estimated_peak_memory_range: - min: 3301376 - max: 3301376 + min: 3375104 + max: 3375104 primary_compute_unit: NPU precision: int8 layer_info: @@ -436,7 +487,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 13 - job_id: joprky4e5 + job_id: jpv6k76z5 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -445,4 +496,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T12:00:28Z' + timestamp: '2024-10-14T23:54:07Z' diff --git a/qai_hub_models/models/qwen2_7b_instruct_quantized/README.md b/qai_hub_models/models/qwen2_7b_instruct_quantized/README.md new file mode 100644 index 00000000..83dc0e06 --- /dev/null +++ b/qai_hub_models/models/qwen2_7b_instruct_quantized/README.md @@ -0,0 +1,61 @@ +[![Qualcomm® AI Hub Models](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/quic-logo.jpg)](../../README.md) + + +# [Qwen2-7B-Instruct: State-of-the-art large language model useful on a variety of language understanding and generation tasks](https://aihub.qualcomm.com/models/qwen2_7b_instruct_quantized) + +The Qwen2-7B-Instruct is a state-of-the-art multilingual language model with 7.07 billion parameters, excelling in language understanding, generation, coding, and mathematics. AI Hub provides with four QNN context binaries (shared weights) that can be deployed on Snapdragon 8 Elite with Genie SDK. + +This is based on the implementation of Qwen2-7B-Instruct found +[here]({source_repo}). This repository contains scripts for optimized on-device +export suitable to run on Qualcomm® devices. More details on model performance +accross various devices, can be found [here](https://aihub.qualcomm.com/models/qwen2_7b_instruct_quantized). + +[Sign up](https://myaccount.qualcomm.com/signup) to start using Qualcomm AI Hub and run these models on a hosted Qualcomm® device. + +## Deploying Qwen2-7B-Instruct on-device + +Please follow the [LLM on-device deployment](https://github.com/quic/ai-hub-apps/tree/main/tutorials/llm_on_genie) tutorial. + + + + + +## License +* The license for the original implementation of Qwen2-7B-Instruct can be found + [here](https://huggingface.co/Qwen/Qwen2-7B-Instruct/blob/main/LICENSE). +* The license for the compiled assets for on-device deployment can be found [here](https://huggingface.co/Qwen/Qwen2-7B-Instruct/blob/main/LICENSE) + + +## References +* [Qwen2 Technical Report](https://arxiv.org/abs/2407.10671v1) +* [Source Model Implementation](https://github.com/QwenLM/Qwen2.5) + + + +## Community +* Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. +* For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). + + +## Usage and Limitations + +This model may not be used for or in connection with any of the following applications: + +- Accessing essential private and public services and benefits; +- Administration of justice and democratic processes; +- Assessing or recognizing the emotional state of a person; +- Biometric and biometrics-based systems, including categorization of persons based on sensitive characteristics; +- Education and vocational training; +- Employment and workers management; +- Exploitation of the vulnerabilities of persons resulting in harmful behavior; +- General purpose social scoring; +- Law enforcement; +- Management and operation of critical infrastructure; +- Migration, asylum and border control management; +- Predictive policing; +- Real-time remote biometric identification in public spaces; +- Recommender systems of social media platforms; +- Scraping of facial images (from the internet or otherwise); and/or +- Subliminal manipulation + + diff --git a/qai_hub_models/models/qwen2_7b_instruct_quantized/info.yaml b/qai_hub_models/models/qwen2_7b_instruct_quantized/info.yaml new file mode 100644 index 00000000..b20e93e9 --- /dev/null +++ b/qai_hub_models/models/qwen2_7b_instruct_quantized/info.yaml @@ -0,0 +1,59 @@ +name: Qwen2-7B-Instruct +id: qwen2_7b_instruct_quantized +status: public +headline: State-of-the-art large language model useful on a variety of language + understanding and generation tasks. +domain: Generative AI +description: The Qwen2-7B-Instruct is a state-of-the-art multilingual language model with 7.07 billion parameters, excelling in language understanding, generation, coding, and mathematics. AI Hub provides with four QNN context binaries (shared weights) that can be deployed on Snapdragon 8 Elite with Genie SDK. +use_case: Text Generation +tags: + - llm + - generative-ai + - quantized +research_paper: https://arxiv.org/abs/2407.10671v1 +research_paper_title: "Qwen2 Technical Report" +license: https://huggingface.co/Qwen/Qwen2-7B-Instruct/blob/main/LICENSE +deploy_license: https://huggingface.co/Qwen/Qwen2-7B-Instruct/blob/main/LICENSE +source_repo: https://github.com/QwenLM/Qwen2.5 +technical_details: + Input sequence length for Prompt Processor: 128 + Context length: 4096 + Number of parameters: 7.07B + Precision: w4a16 + w8a16 (few layers) + Num of key-value heads: 8 + Information about the model parts: Prompt Processor and Token Generator are split into 5 parts each. Each corresponding Prompt Processor and Token Generator part share weights. + Prompt processor model size: 5.16 GB + Prompt processor input (part1): 128 tokens + Prompt processor output (part1): Embeddings output + Prompt processor input (other parts): 128 tokens + KVCache initialized with pad token + Prompt processor output (other parts): 128 output tokens + KVCache for token generator + Token generator model size: 5.16 GB + Token generator input (part1): 128 tokens + Token generator output (part1): Embeddings output + Token generator input (other parts): 1 input token + past KVCache + Token generator output (other parts): 1 output token + KVCache for next iteration + Use: Initiate conversation with prompt-processor and then token generator for subsequent iterations. + Minimum QNN SDK version required: 2.27.7 + Supported languages: English, Chinese, German, French, Spanish, Portuguese, Italian, Dutch, Russian, Czech, Polish, Arabic, Persian, Hebrew, Turkish, Japanese, Korean, Vietnamese, Thai, Indonesian, Malay, Lao, Burmese, Cebuano, Khmer, Tagalog, Hindi, Bengali, Urdu. + TTFT: Time To First Token is the time it takes to generate the first response token. This is expressed as a range because it varies based on the length of the prompt. The lower bound is for a short prompt (up to 128 tokens, i.e., one iteration of the prompt processor) and the upper bound is for a prompt using the full context length (4096 tokens). + Response Rate: Rate of response generation after the first response token. +applicable_scenarios: + - Dialogue + - Content Generation + - Customer Support +related_models: [] +form_factors: + - Phone + - Tablet +has_static_banner: true +has_animated_banner: true +license_type: apache-2.0 +deploy_license_type: apache-2.0 +dataset: [] +model_type_llm: true +llm_details: + call_to_action: 'download' + genie_compatible: true + Snapdragon 8 Elite QRD: + torchscript_onnx_qnn: + model_download_url: v2/snapdragon_8_elite/models.zip diff --git a/qai_hub_models/models/qwen2_7b_instruct_quantized/perf.yaml b/qai_hub_models/models/qwen2_7b_instruct_quantized/perf.yaml new file mode 100644 index 00000000..f3cfce2c --- /dev/null +++ b/qai_hub_models/models/qwen2_7b_instruct_quantized/perf.yaml @@ -0,0 +1,24 @@ +aggregated: + supported_oses: + - Android + supported_devices: + - Snapdragon 8 Elite QRD + supported_chipsets: + - Snapdragon® 8 Elite +models: + name: '' + performance_metrics: + - torchscript_onnx_qnn: + llm_metrics: + time_to_first_token_range: + min: 170593 + max: 5458976 + tokens_per_second: 13.65 + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-16T00:32:42.210701Z' diff --git a/qai_hub_models/models/real_esrgan_general_x4v3/README.md b/qai_hub_models/models/real_esrgan_general_x4v3/README.md index 87d2214e..34138a03 100644 --- a/qai_hub_models/models/real_esrgan_general_x4v3/README.md +++ b/qai_hub_models/models/real_esrgan_general_x4v3/README.md @@ -6,7 +6,7 @@ Real-ESRGAN is a machine learning model that upscales an image with minimal loss in quality. This is based on the implementation of Real-ESRGAN-General-x4v3 found -[here](https://github.com/xinntao/Real-ESRGAN/tree/master). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/real_esrgan_general_x4v3). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.real_esrgan_general_x4v3.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Real-ESRGAN-General-x4v3 can be found +* The license for the original implementation of Real-ESRGAN-General-x4v3 can be found [here](https://github.com/xinntao/Real-ESRGAN/blob/master/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data](https://arxiv.org/abs/2107.10833) * [Source Model Implementation](https://github.com/xinntao/Real-ESRGAN/tree/master) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/real_esrgan_general_x4v3/export.py b/qai_hub_models/models/real_esrgan_general_x4v3/export.py index ea7905fe..e21e758b 100644 --- a/qai_hub_models/models/real_esrgan_general_x4v3/export.py +++ b/qai_hub_models/models/real_esrgan_general_x4v3/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.real_esrgan_general_x4v3 import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "real_esrgan_general_x4v3" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/real_esrgan_general_x4v3/perf.yaml b/qai_hub_models/models/real_esrgan_general_x4v3/perf.yaml index 36f1b74d..7fbb3e8c 100644 --- a/qai_hub_models/models/real_esrgan_general_x4v3/perf.yaml +++ b/qai_hub_models/models/real_esrgan_general_x4v3/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: Real-ESRGAN-General-x4v3 performance_metrics: - torchscript_onnx_tflite: - inference_time: 7154.0 - throughput: 139.78194017332962 + inference_time: 7337.0 + throughput: 136.2954886193267 estimated_peak_memory_range: - min: 9478144 - max: 112291720 + min: 8445952 + max: 9903736 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 72 - job_id: jwgoyv345 + job_id: jpv6k9xr5 job_status: Passed torchscript_onnx_qnn: - inference_time: 6287.0 - throughput: 159.0583744234134 + inference_time: 6333.0 + throughput: 157.9030475288173 estimated_peak_memory_range: - min: 28672 - max: 8153240 + min: 36864 + max: 12837416 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 72 - job_id: jnp108ek5 + job_id: jp4lr9015 job_status: Passed torchscript_onnx: - inference_time: 6955.0 - throughput: 143.78145219266713 + inference_time: 6784.0 + throughput: 147.4056603773585 estimated_peak_memory_range: - min: 12288 - max: 21765248 + min: 9166848 + max: 10588528 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 74 - job_id: j0pxv119g + job_id: j5q6qmoop job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:59:46Z' + timestamp: '2024-10-14T23:53:17Z' - torchscript_onnx_tflite: - inference_time: 5974.0 - throughput: 167.39203213927016 + inference_time: 5996.0 + throughput: 166.77785190126752 estimated_peak_memory_range: - min: 9457664 - max: 71029904 + min: 9461760 + max: 73897776 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 72 - job_id: j1pv3wv75 + job_id: jgjvnw4eg job_status: Passed torchscript_onnx_qnn: - inference_time: 5196.0 - throughput: 192.4557351809084 + inference_time: 5195.0 + throughput: 192.49278152069297 estimated_peak_memory_range: - min: 212992 - max: 17542816 + min: 208896 + max: 19567936 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 72 - job_id: jvgdwvok5 + job_id: jpxkod2l5 job_status: Passed torchscript_onnx: - inference_time: 5828.0 - throughput: 171.58544955387782 + inference_time: 5664.0 + throughput: 176.5536723163842 estimated_peak_memory_range: - min: 4751360 - max: 72541696 + min: 4780032 + max: 77508288 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 74 - job_id: jo5mrzzqg + job_id: jglvmlmm5 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:59:47Z' + timestamp: '2024-10-14T23:53:18Z' - torchscript_onnx_tflite: - inference_time: 7253.0 - throughput: 137.87398317937405 + inference_time: 7333.0 + throughput: 136.36983499249965 estimated_peak_memory_range: - min: 9461760 - max: 10875304 + min: 9490432 + max: 10871472 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 72 - job_id: j7gjxle7p + job_id: jpedml3v5 job_status: Passed torchscript_onnx_qnn: - inference_time: 5754.0 - throughput: 173.79214459506431 + inference_time: 5743.0 + throughput: 174.1250217656277 estimated_peak_memory_range: min: 270336 - max: 1588352 + max: 5138304 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 72 - job_id: jmg9v4wv5 + job_id: jgn6v78q5 job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:59:41Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:53:09Z' - torchscript_onnx_tflite: - inference_time: 12097.0 - throughput: 82.66512358435976 + inference_time: 7257.0 + throughput: 137.79798814937303 estimated_peak_memory_range: - min: 9474048 - max: 75480208 + min: 9478144 + max: 113089928 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 72 - job_id: jlpe9vk7g + job_id: jp14zvx7p job_status: Passed torchscript_onnx_qnn: - inference_time: 9659.0 - throughput: 103.53038616834041 + inference_time: 5789.0 + throughput: 172.74140611504578 estimated_peak_memory_range: - min: 548864 - max: 24289712 + min: 266240 + max: 1612080 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 72 - job_id: jqp4qwwlg + job_id: jpy1370lp job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:59:45Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:53:13Z' - torchscript_onnx_tflite: - inference_time: 7232.0 - throughput: 138.27433628318585 + inference_time: 7390.0 + throughput: 135.31799729364005 estimated_peak_memory_range: - min: 9453568 - max: 18787984 + min: 9482240 + max: 16018264 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,52 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 72 - job_id: jygze7rzg + job_id: jg9lnxe8g + job_status: Passed + torchscript_onnx_qnn: + inference_time: 5761.0 + throughput: 173.58097552508247 + estimated_peak_memory_range: + min: 229376 + max: 1589192 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 72 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 72 + job_id: jp2kyvnqp + job_status: Passed + reference_device_info: + name: SA8775 (Proxy) + os: '13' + form_factor: Auto + os_name: Android + manufacturer: Qualcomm + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:53:12Z' + - torchscript_onnx_tflite: + inference_time: 7297.0 + throughput: 137.04262025489928 + estimated_peak_memory_range: + min: 9490432 + max: 13783512 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 69 + layers_on_gpu: 0 + layers_on_cpu: 3 + total_layers: 72 + job_id: j5we61nm5 job_status: Passed torchscript_onnx_qnn: inference_time: 5771.0 throughput: 173.28019407381737 estimated_peak_memory_range: - min: 233472 - max: 2006128 + min: 225280 + max: 1545880 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,7 +291,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 72 - job_id: jnp108el5 + job_id: jprv3nj7g job_status: Passed reference_device_info: name: SA8650 (Proxy) @@ -263,14 +299,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:59:42Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:53:11Z' - torchscript_onnx_tflite: - inference_time: 7216.0 - throughput: 138.5809312638581 + inference_time: 11015.0 + throughput: 90.78529278256923 estimated_peak_memory_range: - min: 9502720 - max: 11002432 + min: 9469952 + max: 80649056 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 72 - job_id: jz5wo9qzp + job_id: jgz3d4kx5 job_status: Passed torchscript_onnx_qnn: - inference_time: 5792.0 - throughput: 172.65193370165747 + inference_time: 9657.0 + throughput: 103.55182768975872 estimated_peak_memory_range: - min: 270336 - max: 1533720 + min: 208896 + max: 28375632 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +329,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 72 - job_id: jvgdwvol5 + job_id: jp8qy4vop job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:59:43Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:53:15Z' - torchscript_onnx_tflite: - inference_time: 7145.0 - throughput: 139.95801259622112 + inference_time: 4187.0 + throughput: 238.83448770002389 estimated_peak_memory_range: - min: 9465856 - max: 22162832 + min: 12288 + max: 30248944 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +352,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 72 - job_id: jmg9v4wq5 + job_id: j57yr7395 job_status: Passed torchscript_onnx_qnn: - inference_time: 5766.0 - throughput: 173.4304543877905 + inference_time: 3583.0 + throughput: 279.09572983533354 estimated_peak_memory_range: - min: 286720 - max: 1568128 + min: 0 + max: 19120256 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,22 +367,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 72 - job_id: jz57zddrp + job_id: jgkex9mng + job_status: Passed + torchscript_onnx: + inference_time: 4635.0 + throughput: 215.7497303128371 + estimated_peak_memory_range: + min: 7495680 + max: 36413632 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 74 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 74 + job_id: jgo2686kp job_status: Passed reference_device_info: - name: SA8255 (Proxy) - os: '13' - form_factor: Auto + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:59:44Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:53:21Z' - torchscript_onnx_qnn: - inference_time: 6137.0 - throughput: 162.94606485253382 + inference_time: 6160.0 + throughput: 162.33766233766235 estimated_peak_memory_range: - min: 212992 - max: 212992 + min: 204800 + max: 204800 primary_compute_unit: NPU precision: fp16 layer_info: @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 72 - job_id: jz5wo9qjp + job_id: j5mnxdy9p job_status: Passed torchscript_onnx: - inference_time: 7041.0 - throughput: 142.02528049992898 + inference_time: 7052.0 + throughput: 141.80374361883153 estimated_peak_memory_range: - min: 8908800 - max: 8908800 + min: 8912896 + max: 8912896 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 74 - job_id: jegn2eemg + job_id: j56y4w4yp job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:59:48Z' + timestamp: '2024-10-14T23:53:19Z' diff --git a/qai_hub_models/models/real_esrgan_x4plus/README.md b/qai_hub_models/models/real_esrgan_x4plus/README.md index 8dd6beec..f3544722 100644 --- a/qai_hub_models/models/real_esrgan_x4plus/README.md +++ b/qai_hub_models/models/real_esrgan_x4plus/README.md @@ -6,7 +6,7 @@ Real-ESRGAN is a machine learning model that upscales an image with minimal loss in quality. The implementation is a derivative of the Real-ESRGAN-x4plus architecture, a larger and more powerful version compared to the Real-ESRGAN-general-x4v3 architecture. This is based on the implementation of Real-ESRGAN-x4plus found -[here](https://github.com/xinntao/Real-ESRGAN). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/real_esrgan_x4plus). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.real_esrgan_x4plus.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Real-ESRGAN-x4plus can be found +* The license for the original implementation of Real-ESRGAN-x4plus can be found [here](https://github.com/xinntao/Real-ESRGAN/blob/master/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data](https://arxiv.org/abs/2107.10833) * [Source Model Implementation](https://github.com/xinntao/Real-ESRGAN) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/real_esrgan_x4plus/export.py b/qai_hub_models/models/real_esrgan_x4plus/export.py index 97f68065..ae0c31c0 100644 --- a/qai_hub_models/models/real_esrgan_x4plus/export.py +++ b/qai_hub_models/models/real_esrgan_x4plus/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.real_esrgan_x4plus import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "real_esrgan_x4plus" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/real_esrgan_x4plus/perf.yaml b/qai_hub_models/models/real_esrgan_x4plus/perf.yaml index 3b6b172a..f15b2ca1 100644 --- a/qai_hub_models/models/real_esrgan_x4plus/perf.yaml +++ b/qai_hub_models/models/real_esrgan_x4plus/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: Real-ESRGAN-x4plus performance_metrics: - torchscript_onnx_tflite: - inference_time: 67256.0 - throughput: 14.868561912691804 + inference_time: 68798.0 + throughput: 14.535306258902875 estimated_peak_memory_range: - min: 3162112 - max: 11224464 + min: 3244032 + max: 6182328 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1028 - job_id: j7gjxl27p + job_id: jgkex9zng job_status: Passed torchscript_onnx_qnn: - inference_time: 70685.0 - throughput: 14.147273113107449 + inference_time: 67503.0 + throughput: 14.814156407863354 estimated_peak_memory_range: - min: 40960 - max: 36487680 + min: 233472 + max: 114789384 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1029 - job_id: jz57zdlqp + job_id: jgz3d49x5 job_status: Passed torchscript_onnx: - inference_time: 68808.0 - throughput: 14.533193814672712 + inference_time: 70730.0 + throughput: 14.138272303124559 estimated_peak_memory_range: min: 118784 - max: 44608840 + max: 44736312 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1030 - job_id: j2p0yrl2g + job_id: jpy137wlp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:59:08Z' + timestamp: '2024-10-14T23:52:31Z' - torchscript_onnx_tflite: - inference_time: 55698.0 - throughput: 17.953966031096268 + inference_time: 55834.0 + throughput: 17.910233907654835 estimated_peak_memory_range: - min: 3260416 - max: 629103440 + min: 3289088 + max: 693883200 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1028 - job_id: jlpe9vw7g + job_id: j5q6qm8op job_status: Passed torchscript_onnx_qnn: - inference_time: 56796.0 - throughput: 17.606873723501653 + inference_time: 55888.0 + throughput: 17.892928714572 estimated_peak_memory_range: - min: 86016 - max: 98274560 + min: 69632 + max: 113449408 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1029 - job_id: jqp4qwdqg + job_id: jg9lnx18g job_status: Passed torchscript_onnx: - inference_time: 56604.0 - throughput: 17.666596000282667 + inference_time: 55527.0 + throughput: 18.0092567579736 estimated_peak_memory_range: - min: 6447104 - max: 656213632 + min: 8118272 + max: 731345728 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1030 - job_id: j1p8o7zzg + job_id: jp0z0vqn5 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:59:09Z' + timestamp: '2024-10-14T23:52:32Z' - torchscript_onnx_tflite: - inference_time: 63513.0 - throughput: 15.744808149512698 + inference_time: 61376.0 + throughput: 16.29301355578728 estimated_peak_memory_range: - min: 3203072 - max: 7122240 + min: 1331200 + max: 4638360 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1028 - job_id: jygze7jzg + job_id: jglvm1zm5 job_status: Passed torchscript_onnx_qnn: - inference_time: 63644.0 - throughput: 15.712400226258563 + inference_time: 62924.0 + throughput: 15.892187400673828 estimated_peak_memory_range: - min: 380928 - max: 1577664 + min: 397312 + max: 1716056 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1029 - job_id: jo5mrz6yg + job_id: jgdx1z9zp job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:59:03Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:52:23Z' - torchscript_onnx_tflite: - inference_time: 161796.0 - throughput: 6.180622512299439 + inference_time: 66879.0 + throughput: 14.95237668027333 estimated_peak_memory_range: - min: 3280896 - max: 591519728 + min: 3289088 + max: 5565552 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1028 - job_id: jz5wo93zp + job_id: jpv6k9or5 job_status: Passed torchscript_onnx_qnn: - inference_time: 126507.0 - throughput: 7.904700925640478 + inference_time: 63674.0 + throughput: 15.704997330150453 estimated_peak_memory_range: - min: 348160 - max: 80814512 + min: 372736 + max: 2007560 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1029 - job_id: jqpyed6rg + job_id: jpxkodjl5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:59:07Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:52:26Z' - torchscript_onnx_tflite: - inference_time: 67146.0 - throughput: 14.892919905876747 + inference_time: 63276.0 + throughput: 15.803780264239206 estimated_peak_memory_range: - min: 3174400 - max: 54688664 + min: 3256320 + max: 5876512 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1028 - job_id: jmg9v4yq5 + job_id: jgo2640kp job_status: Passed torchscript_onnx_qnn: - inference_time: 64154.0 - throughput: 15.587492595941017 + inference_time: 63755.0 + throughput: 15.685044310250177 estimated_peak_memory_range: - min: 397312 - max: 1547784 + min: 434176 + max: 1675216 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1029 - job_id: jegn2e3vg + job_id: jp4lr9o15 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:59:04Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:52:25Z' - torchscript_onnx_tflite: - inference_time: 65483.0 - throughput: 15.27113907426355 + inference_time: 66934.0 + throughput: 14.940090238145038 estimated_peak_memory_range: - min: 3223552 - max: 5782528 + min: 3264512 + max: 6511304 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1028 - job_id: jnp108wk5 + job_id: jp3j0w3ng job_status: Passed torchscript_onnx_qnn: - inference_time: 63668.0 - throughput: 15.70647735125966 + inference_time: 63721.0 + throughput: 15.693413474364808 estimated_peak_memory_range: - min: 413696 - max: 1686896 + min: 409600 + max: 1722368 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1029 - job_id: joprkyev5 + job_id: j57yr7w95 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:59:05Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:52:24Z' - torchscript_onnx_tflite: - inference_time: 65448.0 - throughput: 15.279305708348613 + inference_time: 143595.0 + throughput: 6.964030781016052 estimated_peak_memory_range: - min: 3182592 - max: 10550792 + min: 0 + max: 647411888 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1028 - job_id: jvgdwvqk5 + job_id: j56y4djyp job_status: Passed torchscript_onnx_qnn: - inference_time: 63341.0 - throughput: 15.787562558216637 + inference_time: 138583.0 + throughput: 7.215892281160027 estimated_peak_memory_range: - min: 434176 - max: 1651824 + min: 299008 + max: 89498112 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,22 +329,75 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1029 - job_id: jep28mlxp + job_id: jprv3nq7g job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:52:29Z' + - torchscript_onnx_tflite: + inference_time: 42951.0 + throughput: 23.282344997788176 + estimated_peak_memory_range: + min: 3158016 + max: 192916848 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 1028 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 1028 + job_id: jpedml1v5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 43454.0 + throughput: 23.012841165370276 + estimated_peak_memory_range: + min: 12288 + max: 135567936 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 1029 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 1029 + job_id: jp2kyv6qp + job_status: Passed + torchscript_onnx: + inference_time: 43103.0 + throughput: 23.20024128250934 + estimated_peak_memory_range: + min: 0 + max: 185570864 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 1030 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 1030 + job_id: j5q6qmkop + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:59:06Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:52:35Z' - torchscript_onnx_qnn: - inference_time: 65087.0 - throughput: 15.364051193018575 + inference_time: 65203.0 + throughput: 15.33671763569161 estimated_peak_memory_range: - min: 212992 - max: 212992 + min: 237568 + max: 237568 primary_compute_unit: NPU precision: fp16 layer_info: @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1029 - job_id: j0pxv16jg + job_id: jp14zvl7p job_status: Passed torchscript_onnx: - inference_time: 65518.0 - throughput: 15.262981165481241 + inference_time: 65666.0 + throughput: 15.228581000822343 estimated_peak_memory_range: - min: 39645184 - max: 39645184 + min: 40755200 + max: 40755200 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1030 - job_id: jogkzy3yg + job_id: jp8qy49op job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:59:10Z' + timestamp: '2024-10-14T23:52:33Z' diff --git a/qai_hub_models/models/regnet/README.md b/qai_hub_models/models/regnet/README.md index 9186826f..e9368f1b 100644 --- a/qai_hub_models/models/regnet/README.md +++ b/qai_hub_models/models/regnet/README.md @@ -6,7 +6,7 @@ RegNet is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of RegNet found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/regnet.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/regnet). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.regnet.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of RegNet can be found +* The license for the original implementation of RegNet can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Designing Network Design Spaces](https://arxiv.org/abs/2003.13678) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/regnet.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/regnet/export.py b/qai_hub_models/models/regnet/export.py index 21045695..fea11d5c 100644 --- a/qai_hub_models/models/regnet/export.py +++ b/qai_hub_models/models/regnet/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.regnet import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "regnet" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/regnet/perf.yaml b/qai_hub_models/models/regnet/perf.yaml index dad1feeb..8140a459 100644 --- a/qai_hub_models/models/regnet/perf.yaml +++ b/qai_hub_models/models/regnet/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: RegNet performance_metrics: - torchscript_onnx_tflite: - inference_time: 2067.0 - throughput: 483.7929366231253 + inference_time: 2075.0 + throughput: 481.9277108433735 estimated_peak_memory_range: - min: 24576 - max: 2068992 + min: 12288 + max: 7107872 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 114 - job_id: jygze7ozg + job_id: jp2kyv2qp job_status: Passed torchscript_onnx_qnn: - inference_time: 2125.0 - throughput: 470.5882352941176 + inference_time: 2149.0 + throughput: 465.33271288971616 estimated_peak_memory_range: - min: 16384 - max: 72977072 + min: 618496 + max: 62817144 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 188 - job_id: j0pxv1njg + job_id: jgo264ykp job_status: Passed torchscript_onnx: - inference_time: 2285.0 - throughput: 437.636761487965 + inference_time: 2197.0 + throughput: 455.1661356395084 estimated_peak_memory_range: - min: 28672 - max: 2310240 + min: 499712 + max: 44367848 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 190 - job_id: jogkzyqyg + job_id: jp4lr9q15 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:58:14Z' + timestamp: '2024-10-14T23:51:30Z' - torchscript_onnx_tflite: - inference_time: 1600.0 - throughput: 625.0 + inference_time: 1601.0 + throughput: 624.6096189881324 estimated_peak_memory_range: min: 16384 - max: 145930384 + max: 149556896 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 114 - job_id: jz5wo92zp + job_id: jpy1379lp job_status: Passed torchscript_onnx_qnn: - inference_time: 1666.0 - throughput: 600.2400960384153 + inference_time: 1652.0 + throughput: 605.3268765133172 estimated_peak_memory_range: - min: 638976 - max: 29361840 + min: 618496 + max: 28703200 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 188 - job_id: jo5mrzqyg + job_id: jpv6k93r5 job_status: Passed torchscript_onnx: - inference_time: 1780.0 - throughput: 561.7977528089888 + inference_time: 1866.0 + throughput: 535.9056806002144 estimated_peak_memory_range: - min: 589824 - max: 149330512 + min: 368640 + max: 153487456 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 190 - job_id: jn5q82r75 + job_id: jpxkodvl5 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:58:14Z' + timestamp: '2024-10-14T23:51:31Z' - torchscript_onnx_tflite: - inference_time: 2035.0 - throughput: 491.4004914004914 + inference_time: 2041.0 + throughput: 489.9559039686428 estimated_peak_memory_range: - min: 192512 - max: 1638264 + min: 28672 + max: 1897064 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 114 - job_id: jmg9v4jq5 + job_id: jp0z0vnn5 job_status: Passed torchscript_onnx_qnn: - inference_time: 2027.0 - throughput: 493.33991119881597 + inference_time: 2036.0 + throughput: 491.1591355599214 estimated_peak_memory_range: - min: 634880 - max: 1938752 + min: 663552 + max: 2116376 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 188 - job_id: joprky2v5 + job_id: jpedml9v5 job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:58:08Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:51:23Z' - torchscript_onnx_tflite: - inference_time: 2794.0 - throughput: 357.9098067287044 + inference_time: 2039.0 + throughput: 490.43648847474253 estimated_peak_memory_range: - min: 12288 - max: 128889488 + min: 139264 + max: 2210168 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 114 - job_id: jnp108yk5 + job_id: jglvm1nm5 job_status: Passed torchscript_onnx_qnn: - inference_time: 2946.0 - throughput: 339.44331296673454 + inference_time: 2039.0 + throughput: 490.43648847474253 estimated_peak_memory_range: - min: 618496 - max: 25499136 + min: 36864 + max: 1334920 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 188 - job_id: j1p8o7mzg + job_id: jg9lnxv8g job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:58:13Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:51:26Z' - torchscript_onnx_tflite: - inference_time: 2068.0 - throughput: 483.55899419729207 + inference_time: 2037.0 + throughput: 490.9180166912126 estimated_peak_memory_range: - min: 28672 - max: 1579576 + min: 16384 + max: 15438464 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 114 - job_id: jvgdwvek5 + job_id: j5q6qmjop job_status: Passed torchscript_onnx_qnn: - inference_time: 2047.0 - throughput: 488.5197850512946 + inference_time: 2028.0 + throughput: 493.0966469428008 estimated_peak_memory_range: - min: 638976 - max: 1993208 + min: 671744 + max: 1904200 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 188 - job_id: jep28m9xp + job_id: j5we61om5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:58:09Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:51:25Z' - torchscript_onnx_tflite: - inference_time: 2043.0 - throughput: 489.47626040137055 + inference_time: 2028.0 + throughput: 493.0966469428008 estimated_peak_memory_range: - min: 16384 - max: 2353856 + min: 28672 + max: 1793128 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 114 - job_id: jz57zd0qp + job_id: jgkex9jng job_status: Passed torchscript_onnx_qnn: - inference_time: 2034.0 - throughput: 491.6420845624385 + inference_time: 2042.0 + throughput: 489.71596474045054 estimated_peak_memory_range: - min: 630784 - max: 2004232 + min: 626688 + max: 2222304 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 188 - job_id: jqpyedjrg + job_id: jgz3d4ex5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:58:11Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:51:24Z' - torchscript_onnx_tflite: - inference_time: 2044.0 - throughput: 489.23679060665364 + inference_time: 2808.0 + throughput: 356.1253561253561 estimated_peak_memory_range: min: 16384 - max: 1996704 + max: 131330864 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 114 - job_id: jqp4qwkqg + job_id: jp8qy4lop job_status: Passed torchscript_onnx_qnn: - inference_time: 2027.0 - throughput: 493.33991119881597 + inference_time: 2905.0 + throughput: 344.2340791738382 estimated_peak_memory_range: - min: 643072 - max: 1875728 + min: 741376 + max: 23528016 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 188 - job_id: j2p0yr22g + job_id: jgdx1zwzp job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:51:28Z' + - torchscript_onnx_tflite: + inference_time: 1363.0 + throughput: 733.6757153338225 + estimated_peak_memory_range: + min: 12288 + max: 74559008 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 114 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 114 + job_id: jp3j0wkng + job_status: Passed + torchscript_onnx_qnn: + inference_time: 1448.0 + throughput: 690.6077348066299 + estimated_peak_memory_range: + min: 0 + max: 28855920 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 188 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 188 + job_id: j57yr7z95 + job_status: Passed + torchscript_onnx: + inference_time: 1561.0 + throughput: 640.6149903907751 + estimated_peak_memory_range: + min: 0 + max: 77292496 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 190 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 190 + job_id: jprv3nk7g + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:58:12Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:51:34Z' - torchscript_onnx_qnn: - inference_time: 2218.0 - throughput: 450.8566275924256 + inference_time: 2232.0 + throughput: 448.02867383512546 estimated_peak_memory_range: min: 602112 max: 602112 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 188 - job_id: jegn2emvg + job_id: jgjvnwxeg job_status: Passed torchscript_onnx: - inference_time: 2235.0 - throughput: 447.42729306487695 + inference_time: 2208.0 + throughput: 452.8985507246377 estimated_peak_memory_range: - min: 43081728 - max: 43081728 + min: 41877504 + max: 41877504 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 190 - job_id: j1glnk2ep + job_id: j5wee4e45 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:58:15Z' + timestamp: '2024-10-16T09:33:52Z' diff --git a/qai_hub_models/models/regnet_quantized/README.md b/qai_hub_models/models/regnet_quantized/README.md index 85fb43e9..6eb10e69 100644 --- a/qai_hub_models/models/regnet_quantized/README.md +++ b/qai_hub_models/models/regnet_quantized/README.md @@ -6,7 +6,7 @@ RegNet is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of RegNetQuantized found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/regnet.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/regnet_quantized). @@ -17,11 +17,6 @@ accross various devices, can be found [here](https://aihub.qualcomm.com/models/r ## Example & Usage -Install the package via pip: -```bash -pip install "qai_hub_models[regnet_quantized]" -``` - Once installed, run the following simple CLI demo: @@ -44,15 +39,19 @@ python -m qai_hub_models.models.regnet_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of RegNetQuantized can be found +* The license for the original implementation of RegNetQuantized can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Designing Network Design Spaces](https://arxiv.org/abs/2003.13678) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/regnet.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/regnet_quantized/evaluate.py b/qai_hub_models/models/regnet_quantized/evaluate.py index 4eb83eec..24b2be32 100644 --- a/qai_hub_models/models/regnet_quantized/evaluate.py +++ b/qai_hub_models/models/regnet_quantized/evaluate.py @@ -13,10 +13,8 @@ from qai_hub_models.models.regnet_quantized import MODEL_ID, Model from qai_hub_models.utils.args import evaluate_parser, get_hub_device, get_model_kwargs -from qai_hub_models.utils.base_model import BaseModel from qai_hub_models.utils.evaluate import evaluate_on_dataset from qai_hub_models.utils.inference import compile_model_from_args -from qai_hub_models.utils.quantization_aimet import AIMETQuantizableMixin SUPPORTED_DATASETS = ["imagenette", "imagenet"] @@ -27,6 +25,7 @@ def main(): model_cls=Model, default_split_size=2500, supported_datasets=SUPPORTED_DATASETS, + is_hub_quantized=True, ) args = parser.parse_args() args.device = None @@ -38,13 +37,7 @@ def main(): MODEL_ID, args, get_model_kwargs(Model, vars(args)) ) hub_device = get_hub_device(None, args.chipset) - - # Use Fp16 model for torch inference - for cls in Model.__mro__: - if issubclass(cls, BaseModel) and not issubclass(cls, AIMETQuantizableMixin): - torch_cls = cls - break - torch_model = torch_cls.from_pretrained(**get_model_kwargs(torch_cls, vars(args))) + torch_model = Model.from_pretrained(**get_model_kwargs(Model, vars(args))) evaluate_on_dataset( hub_model, torch_model, diff --git a/qai_hub_models/models/regnet_quantized/export.py b/qai_hub_models/models/regnet_quantized/export.py index bfd734a2..28212253 100644 --- a/qai_hub_models/models/regnet_quantized/export.py +++ b/qai_hub_models/models/regnet_quantized/export.py @@ -10,18 +10,20 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.regnet_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference +from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( print_inference_metrics, print_on_target_demo_cmd, @@ -31,11 +33,14 @@ can_access_qualcomm_ai_hub, export_without_hub_access, ) +from qai_hub_models.utils.quantization import get_calibration_data def export_model( device: str = "Samsung Galaxy S23 (Family)", chipset: Optional[str] = None, + num_calibration_samples: int = 100, + skip_compiling: bool = False, skip_profiling: bool = False, skip_inferencing: bool = False, skip_downloading: bool = False, @@ -45,20 +50,19 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + 3. Compiles the model to an asset that can be run on device + 4. Profiles the model performance on a real device + 5. Inferences the model on sample inputs + 6. Downloads the model asset to the local directory + 7. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 5 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -66,6 +70,9 @@ def export_model( Defaults to DEFAULT_DEVICE if not specified. chipset: If set, will choose a random device with this chipset. Overrides the `device` argument. + num_calibration_samples: The number of calibration data samples + to use for quantization. + skip_compiling: If set, skips compiling model to format that can run on device. skip_profiling: If set, skips profiling of compiled model on real devices. skip_inferencing: If set, skips computing on-device outputs from sample data. skip_downloading: If set, skips downloading of compiled model. @@ -80,10 +87,11 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: - * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). + A struct of: + * A CompileJob object containing metadata about the compile job submitted to hub (None if compiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). + * A QuantizeJob object containing metadata about the quantize job submitted to hub """ model_name = "regnet_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,33 +117,52 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) ) # Trace the model - source_model = model.convert_to_hub_source_model( - target_runtime, output_path, input_spec + source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) + + print(f"Quantizing model {model_name} with {num_calibration_samples} samples.") + # 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + onnx_compile_job = hub.submit_compile_job( + model=source_model, + input_specs=input_spec, + device=hub_device, + name=model_name, + options="--target_runtime onnx", + ) + quantize_job = hub.submit_quantize_job( + model=onnx_compile_job.get_target_model(), + calibration_data=get_calibration_data( + input_spec, "imagenette", num_calibration_samples + ), + weights_dtype=model.get_weights_dtype(), + activations_dtype=model.get_activations_dtype(), + name=model_name, + options=model.get_quantize_options(), ) + if skip_compiling: + return ExportResult(quantize_job=quantize_job) - # 2. Compile the model to an on-device asset + # 3. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) print(f"Optimizing model {model_name} to run on-device") submitted_compile_job = hub.submit_compile_job( - model=source_model, + model=quantize_job.get_target_model(), input_specs=input_spec, device=hub_device, name=model_name, - calibration_data=model.get_calibration_data(target_runtime), options=model_compile_options, ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 4. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +177,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 5. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +198,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 6. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 7. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -202,12 +229,17 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + quantize_job=quantize_job, + ) def main(): warnings.filterwarnings("ignore") - parser = export_parser(model_cls=Model) + parser = export_parser(model_cls=Model, is_hub_quantized=True) args = parser.parse_args() export_model(**vars(args)) diff --git a/qai_hub_models/models/regnet_quantized/model.py b/qai_hub_models/models/regnet_quantized/model.py index 47e79fed..c348ae26 100644 --- a/qai_hub_models/models/regnet_quantized/model.py +++ b/qai_hub_models/models/regnet_quantized/model.py @@ -4,83 +4,11 @@ # --------------------------------------------------------------------- from __future__ import annotations -# isort: off -# This verifies aimet is installed, and this must be included first. -from qai_hub_models.utils.quantization_aimet import ( - AIMETQuantizableMixin, - constrain_quantized_inputs_to_image_range, -) - -# isort: on - -import torch -from aimet_torch.cross_layer_equalization import ( - equalize_bn_folded_model, - fold_all_batch_norms, -) -from aimet_torch.model_preparer import prepare_model -from aimet_torch.quantsim import QuantizationSimModel, load_encodings_to_sim - from qai_hub_models.models.regnet.model import RegNet -from qai_hub_models.utils.aimet.config_loader import get_default_aimet_config -from qai_hub_models.utils.asset_loaders import CachedWebModelAsset +from qai_hub_models.utils.quantization import HubQuantizableMixin MODEL_ID = __name__.split(".")[-2] -MODEL_ASSET_VERSION = 4 -DEFAULT_ENCODINGS = "regnet_quantized_encodings.json" - - -class RegNetQuantizable(AIMETQuantizableMixin, RegNet): - """RegNet with post train quantization support. - - Supports only 8 bit weights and activations, and only loads pre-quantized checkpoints. - Support for quantizing using your own weights & data will come at a later date.""" - - def __init__( - self, - sim_model: QuantizationSimModel, - ) -> None: - # Input is already normalized by sim_model. Disable it in the wrapper model. - RegNet.__init__(self, sim_model.model, normalize_input=False) - AIMETQuantizableMixin.__init__( - self, - sim_model, - ) - - @classmethod - def from_pretrained( - cls, - aimet_encodings: str | None = "DEFAULT", - ) -> "RegNetQuantizable": - """ - Parameters: - aimet_encodings: - if "DEFAULT": Loads the model with aimet encodings calibrated on imagenette. - elif None: Doesn't load any encodings. Used when computing encodings. - else: Interprets as a filepath and loads the encodings stored there. - """ - model = RegNet.from_pretrained() - input_shape = cls.get_input_spec()["image_tensor"][0] - model = prepare_model(model) - dummy_input = torch.rand(input_shape) - - pairs = fold_all_batch_norms(model, input_shape, dummy_input) - equalize_bn_folded_model(model, input_shape, pairs, dummy_input) - sim = QuantizationSimModel( - model, - quant_scheme="tf_enhanced", - default_param_bw=8, - default_output_bw=8, - config_file=get_default_aimet_config(), - dummy_input=dummy_input, - ) - constrain_quantized_inputs_to_image_range(sim) - if aimet_encodings: - if aimet_encodings == "DEFAULT": - aimet_encodings = CachedWebModelAsset.from_asset_store( - MODEL_ID, MODEL_ASSET_VERSION, DEFAULT_ENCODINGS - ).fetch() - load_encodings_to_sim(sim, aimet_encodings) - return cls(sim) +class RegNetQuantizable(HubQuantizableMixin, RegNet): + pass diff --git a/qai_hub_models/models/regnet_quantized/perf.yaml b/qai_hub_models/models/regnet_quantized/perf.yaml index 45b95dac..43cc6cdf 100644 --- a/qai_hub_models/models/regnet_quantized/perf.yaml +++ b/qai_hub_models/models/regnet_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,41 +20,36 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8775P Proxy models: - name: RegNetQuantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 904.0 - throughput: 1106.1946902654868 + inference_time: 896.0 + throughput: 1116.0714285714287 estimated_peak_memory_range: - min: 20480 - max: 1550536 + min: 16384 + max: 2233688 primary_compute_unit: NPU precision: int8 layer_info: @@ -61,29 +57,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 116 - job_id: jlpe9vo7g + job_id: jg9l04vmg job_status: Passed torchscript_onnx_qnn: - inference_time: 1034.0 - throughput: 967.1179883945841 + inference_time: 1042.0 + throughput: 959.6928982725528 estimated_peak_memory_range: - min: 12288 - max: 14327824 + min: 0 + max: 13613808 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 113 + layers_on_npu: 189 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 113 - job_id: jo5mrzeyg + total_layers: 189 + job_id: jgn60eyv5 job_status: Passed torchscript_onnx: - inference_time: 1547.0 - throughput: 646.4124111182934 + inference_time: 1524.0 + throughput: 656.1679790026246 estimated_peak_memory_range: - min: 94208 - max: 2600640 + min: 12288 + max: 27258352 primary_compute_unit: NPU precision: int8 layer_info: @@ -91,7 +87,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 218 - job_id: j1glnk6ep + job_id: jgo2zv04p job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -100,13 +96,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:57:33Z' + timestamp: '2024-10-17T17:25:31Z' - torchscript_onnx_tflite: - inference_time: 610.0 - throughput: 1639.344262295082 + inference_time: 750.0 + throughput: 1333.3333333333333 estimated_peak_memory_range: min: 12288 - max: 137910400 + max: 140573344 primary_compute_unit: NPU precision: int8 layer_info: @@ -114,29 +110,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 116 - job_id: jygze72zg + job_id: jp14280np job_status: Passed torchscript_onnx_qnn: - inference_time: 751.0 - throughput: 1331.5579227696405 + inference_time: 744.0 + throughput: 1344.0860215053763 estimated_peak_memory_range: - min: 167936 - max: 31013392 + min: 163840 + max: 31485744 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 113 + layers_on_npu: 189 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 113 - job_id: jegn2elvg + total_layers: 189 + job_id: jprv6yqvg job_status: Passed torchscript_onnx: - inference_time: 1316.0 - throughput: 759.8784194528876 + inference_time: 1107.0 + throughput: 903.342366757001 estimated_peak_memory_range: - min: 0 - max: 171753424 + min: 28672 + max: 177195600 primary_compute_unit: NPU precision: int8 layer_info: @@ -144,7 +140,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 218 - job_id: jw5661ev5 + job_id: jpv6qwo75 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -153,51 +149,74 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:57:34Z' + timestamp: '2024-10-17T17:25:33Z' - torchscript_onnx_tflite: - inference_time: 885.0 - throughput: 1129.9435028248588 + inference_time: 30336.0 + throughput: 32.96413502109704 estimated_peak_memory_range: - min: 24576 - max: 5661376 - primary_compute_unit: NPU + min: 86016 + max: 75736048 + primary_compute_unit: GPU precision: int8 layer_info: - layers_on_npu: 116 - layers_on_gpu: 0 + layers_on_npu: 0 + layers_on_gpu: 116 layers_on_cpu: 0 total_layers: 116 - job_id: jz5wo9wzp + job_id: jgdxnvw6p job_status: Passed torchscript_onnx_qnn: - inference_time: 953.0 - throughput: 1049.3179433368311 + inference_time: 4094.0 + throughput: 244.2598925256473 estimated_peak_memory_range: - min: 188416 - max: 1334288 + min: 217088 + max: 9048560 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 113 + layers_on_npu: 189 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 113 - job_id: jep28m0xp + total_layers: 189 + job_id: jp2kxm6xp job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:57:28Z' + chipset: QCS6490 Proxy + timestamp: '2024-10-17T17:25:17Z' - torchscript_onnx_tflite: - inference_time: 1037.0 - throughput: 964.3201542912246 + inference_time: 39674.0 + throughput: 25.205424207289408 estimated_peak_memory_range: - min: 16384 - max: 141484048 + min: 2801664 + max: 91186368 + primary_compute_unit: GPU + precision: int8 + layer_info: + layers_on_npu: 12 + layers_on_gpu: 91 + layers_on_cpu: 13 + total_layers: 116 + job_id: j5wew9oz5 + job_status: Passed + reference_device_info: + name: RB5 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8250 Proxy + timestamp: '2024-10-17T17:25:02Z' + - torchscript_onnx_tflite: + inference_time: 905.0 + throughput: 1104.9723756906078 + estimated_peak_memory_range: + min: 20480 + max: 1443656 primary_compute_unit: NPU precision: int8 layer_info: @@ -205,37 +224,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 116 - job_id: jmg9v40q5 + job_id: jg9l04vqg job_status: Passed torchscript_onnx_qnn: - inference_time: 1174.0 - throughput: 851.7887563884157 + inference_time: 964.0 + throughput: 1037.344398340249 estimated_peak_memory_range: - min: 163840 - max: 31496784 + min: 176128 + max: 1571688 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 113 + layers_on_npu: 189 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 113 - job_id: jogkzy7yg + total_layers: 189 + job_id: jpy1zdwrp job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:57:32Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-17T17:25:19Z' - torchscript_onnx_tflite: - inference_time: 905.0 - throughput: 1104.9723756906078 + inference_time: 898.0 + throughput: 1113.5857461024498 estimated_peak_memory_range: min: 12288 - max: 5407592 + max: 9672640 primary_compute_unit: NPU precision: int8 layer_info: @@ -243,37 +262,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 116 - job_id: jnp1082k5 + job_id: jp14280kp job_status: Passed torchscript_onnx_qnn: - inference_time: 961.0 - throughput: 1040.5827263267429 + inference_time: 966.0 + throughput: 1035.1966873706003 estimated_peak_memory_range: - min: 184320 - max: 1426312 + min: 217088 + max: 1482600 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 113 + layers_on_npu: 189 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 113 - job_id: jqpyedrrg + total_layers: 189 + job_id: jp8q279zp job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:57:28Z' + chipset: SA8255P Proxy + timestamp: '2024-10-17T17:25:22Z' - torchscript_onnx_tflite: - inference_time: 902.0 - throughput: 1108.6474501108648 + inference_time: 900.0 + throughput: 1111.111111111111 estimated_peak_memory_range: min: 12288 - max: 12826976 + max: 8539544 primary_compute_unit: NPU precision: int8 layer_info: @@ -281,22 +300,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 116 - job_id: jvgdwvnk5 + job_id: jgdxnvwkp job_status: Passed torchscript_onnx_qnn: inference_time: 962.0 throughput: 1039.5010395010395 estimated_peak_memory_range: - min: 188416 - max: 1472072 + min: 176128 + max: 1467512 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 113 + layers_on_npu: 189 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 113 - job_id: j2p0yr32g + total_layers: 189 + job_id: jgkevynyg job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -304,14 +323,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:57:30Z' + chipset: SA8775P Proxy + timestamp: '2024-10-17T17:25:24Z' - torchscript_onnx_tflite: - inference_time: 902.0 - throughput: 1108.6474501108648 + inference_time: 1023.0 + throughput: 977.5171065493646 estimated_peak_memory_range: - min: 36864 - max: 1556416 + min: 0 + max: 143886928 primary_compute_unit: NPU precision: int8 layer_info: @@ -319,113 +338,105 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 116 - job_id: jz57zd2qp + job_id: j57y2dzq5 job_status: Passed torchscript_onnx_qnn: - inference_time: 950.0 - throughput: 1052.6315789473683 + inference_time: 1190.0 + throughput: 840.3361344537815 estimated_peak_memory_range: - min: 217088 - max: 1406984 + min: 163840 + max: 35308880 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 113 + layers_on_npu: 189 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 113 - job_id: j1p8o70zg + total_layers: 189 + job_id: j5q602k7p job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:57:31Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-17T17:25:25Z' - torchscript_onnx_tflite: - inference_time: 30147.0 - throughput: 33.17079643082231 + inference_time: 626.0 + throughput: 1597.444089456869 estimated_peak_memory_range: - min: 102400 - max: 75838848 - primary_compute_unit: GPU + min: 8192 + max: 69178288 + primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 0 - layers_on_gpu: 116 + layers_on_npu: 116 + layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 116 - job_id: jqp4qwnqg + job_id: jp4lnwqq5 job_status: Passed torchscript_onnx_qnn: - inference_time: 4055.0 - throughput: 246.6091245376079 + inference_time: 740.0 + throughput: 1351.3513513513512 estimated_peak_memory_range: - min: 167936 - max: 7980064 + min: 29278208 + max: 59289440 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 113 + layers_on_npu: 189 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 113 - job_id: jn5q82e75 + total_layers: 189 + job_id: jglv4kze5 job_status: Passed - reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot - os_name: Android - manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T11:57:32Z' - - torchscript_onnx_tflite: - inference_time: 41510.0 - throughput: 24.09058058299205 + torchscript_onnx: + inference_time: 1087.0 + throughput: 919.9632014719411 estimated_peak_memory_range: - min: 12288 - max: 64667064 - primary_compute_unit: GPU + min: 0 + max: 89205056 + primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 12 - layers_on_gpu: 91 - layers_on_cpu: 13 - total_layers: 116 - job_id: j0pxv19jg + layers_on_npu: 218 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 218 + job_id: jpedov175 job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T11:57:23Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-17T17:25:36Z' - torchscript_onnx_qnn: - inference_time: 1116.0 - throughput: 896.0573476702509 + inference_time: 1129.0 + throughput: 885.7395925597874 estimated_peak_memory_range: min: 442368 max: 442368 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 113 + layers_on_npu: 189 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 113 - job_id: joprky8v5 + total_layers: 189 + job_id: jp0z4rq25 job_status: Passed torchscript_onnx: - inference_time: 1540.0 - throughput: 649.3506493506494 + inference_time: 1532.0 + throughput: 652.7415143603133 estimated_peak_memory_range: - min: 23322624 - max: 23322624 + min: 23302144 + max: 23302144 primary_compute_unit: NPU precision: int8 layer_info: @@ -433,7 +444,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 218 - job_id: j1p3kmvx5 + job_id: jgjvdlm7g job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -442,4 +453,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:57:35Z' + timestamp: '2024-10-17T17:25:34Z' diff --git a/qai_hub_models/models/regnet_quantized/requirements.txt b/qai_hub_models/models/regnet_quantized/requirements.txt deleted file mode 100644 index de5b80e8..00000000 --- a/qai_hub_models/models/regnet_quantized/requirements.txt +++ /dev/null @@ -1 +0,0 @@ -aimet-torch==1.32.1.post1; sys_platform == "linux" diff --git a/qai_hub_models/models/regnet_quantized/test.py b/qai_hub_models/models/regnet_quantized/test.py deleted file mode 100644 index 6018cb2a..00000000 --- a/qai_hub_models/models/regnet_quantized/test.py +++ /dev/null @@ -1,30 +0,0 @@ -# --------------------------------------------------------------------- -# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. -# SPDX-License-Identifier: BSD-3-Clause -# --------------------------------------------------------------------- -from qai_hub_models.models._shared.imagenet_classifier.test_utils import ( - run_imagenet_classifier_test, -) -from qai_hub_models.models.regnet_quantized.demo import main as demo_main -from qai_hub_models.models.regnet_quantized.model import ( - MODEL_ASSET_VERSION, - MODEL_ID, - RegNetQuantizable, -) - - -def test_task(): - run_imagenet_classifier_test( - RegNetQuantizable.from_pretrained(), - MODEL_ID, - probability_threshold=0.45, - diff_tol=0.005, - atol=0.2, - rtol=0.02, - asset_version=MODEL_ASSET_VERSION, - ) - - -def test_demo(): - # Verify demo does not crash - demo_main(is_test=True) diff --git a/qai_hub_models/models/resnet101/README.md b/qai_hub_models/models/resnet101/README.md index 858e0ecd..c9f9189a 100644 --- a/qai_hub_models/models/resnet101/README.md +++ b/qai_hub_models/models/resnet101/README.md @@ -6,7 +6,7 @@ ResNet101 is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of ResNet101 found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/resnet101). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.resnet101.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of ResNet101 can be found +* The license for the original implementation of ResNet101 can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/resnet101/export.py b/qai_hub_models/models/resnet101/export.py index 08f9c3c7..6d9679af 100644 --- a/qai_hub_models/models/resnet101/export.py +++ b/qai_hub_models/models/resnet101/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.resnet101 import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "resnet101" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/resnet101/perf.yaml b/qai_hub_models/models/resnet101/perf.yaml index fc7067c1..e3d72737 100644 --- a/qai_hub_models/models/resnet101/perf.yaml +++ b/qai_hub_models/models/resnet101/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: ResNet101 performance_metrics: - torchscript_onnx_tflite: - inference_time: 3458.0 - throughput: 289.1844997108155 + inference_time: 3460.0 + throughput: 289.01734104046244 estimated_peak_memory_range: - min: 24576 - max: 2168072 + min: 16384 + max: 2472304 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: jz5wo9ezp + job_id: jgjvnx4xg job_status: Passed torchscript_onnx_qnn: - inference_time: 3505.0 - throughput: 285.30670470756064 + inference_time: 3504.0 + throughput: 285.38812785388126 estimated_peak_memory_range: - min: 622592 - max: 160835080 + min: 16384 + max: 156687344 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: jo5mrznyg + job_id: jp2ky6y4p job_status: Passed torchscript_onnx: - inference_time: 3620.0 - throughput: 276.24309392265195 + inference_time: 3631.0 + throughput: 275.40622418066647 estimated_peak_memory_range: min: 618496 - max: 3142688 + max: 2503400 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 247 - job_id: jn5q82075 + job_id: jg9ln1llg job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:56:26Z' + timestamp: '2024-10-15T17:22:41Z' - torchscript_onnx_tflite: - inference_time: 3654.0 - throughput: 273.6726874657909 + inference_time: 2926.0 + throughput: 341.7634996582365 estimated_peak_memory_range: min: 16384 - max: 115147440 + max: 119118768 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: jmg9v4lq5 + job_id: j5we6v665 job_status: Passed torchscript_onnx_qnn: - inference_time: 2860.0 - throughput: 349.65034965034965 + inference_time: 3734.0 + throughput: 267.8093197643278 estimated_peak_memory_range: min: 618496 - max: 34785840 + max: 35940912 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: jegn2e0vg + job_id: jpy13w37p job_status: Passed torchscript_onnx: - inference_time: 2991.0 - throughput: 334.33634236041456 + inference_time: 3791.0 + throughput: 263.7826431020839 estimated_peak_memory_range: min: 0 - max: 122388784 + max: 123516480 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 247 - job_id: j1glnk4ep + job_id: jgdx19xep job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:56:27Z' + timestamp: '2024-10-15T17:22:42Z' - torchscript_onnx_tflite: - inference_time: 3394.0 - throughput: 294.6375957572186 + inference_time: 3371.0 + throughput: 296.6478789676654 estimated_peak_memory_range: min: 20480 - max: 2054584 + max: 2168592 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: jnp1084k5 + job_id: jp14zlz2p job_status: Passed torchscript_onnx_qnn: - inference_time: 3301.0 - throughput: 302.9385034837928 + inference_time: 3275.0 + throughput: 305.3435114503817 estimated_peak_memory_range: - min: 630784 - max: 2486328 + min: 634880 + max: 1855344 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: jep28mxxp + job_id: jp8qy9yxp job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,52 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:56:21Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T17:22:34Z' + - torchscript_onnx_tflite: + inference_time: 3407.0 + throughput: 293.51335485764605 + estimated_peak_memory_range: + min: 20480 + max: 2514272 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 147 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 147 + job_id: j5mnx2xwp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 3283.0 + throughput: 304.5994517209869 + estimated_peak_memory_range: + min: 630784 + max: 1840808 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 245 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 245 + job_id: jglvmzm85 + job_status: Passed + reference_device_info: + name: SA8255 (Proxy) + os: '13' + form_factor: Auto + os_name: Android + manufacturer: Qualcomm + chipset: SA8255P Proxy + timestamp: '2024-10-15T17:22:37Z' - torchscript_onnx_tflite: - inference_time: 4764.0 - throughput: 209.90764063811923 + inference_time: 3391.0 + throughput: 294.8982601002654 estimated_peak_memory_range: min: 16384 - max: 95821536 + max: 6130056 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: jvgdwvxk5 + job_id: jpxkojo15 job_status: Passed torchscript_onnx_qnn: - inference_time: 4806.0 - throughput: 208.07324178110696 + inference_time: 3321.0 + throughput: 301.11412225233363 estimated_peak_memory_range: - min: 618496 - max: 22898176 + min: 634880 + max: 1922832 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: jogkzyvyg + job_id: j5q6qkq4p job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8775 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:56:25Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T17:22:36Z' - torchscript_onnx_tflite: inference_time: 3422.0 throughput: 292.22676797194623 estimated_peak_memory_range: - min: 12288 - max: 1968544 + min: 16384 + max: 5496088 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: jz57zdyqp + job_id: jp4lrorv5 job_status: Passed torchscript_onnx_qnn: - inference_time: 3299.0 - throughput: 303.12215822976657 + inference_time: 3286.0 + throughput: 304.32136335970785 estimated_peak_memory_range: - min: 663552 - max: 1901344 + min: 634880 + max: 1919448 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,7 +291,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: jqpyedzrg + job_id: jgkexnx2g job_status: Passed reference_device_info: name: SA8650 (Proxy) @@ -263,14 +299,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:56:22Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T17:22:35Z' - torchscript_onnx_tflite: - inference_time: 3398.0 - throughput: 294.2907592701589 + inference_time: 4810.0 + throughput: 207.9002079002079 estimated_peak_memory_range: - min: 16384 - max: 1817200 + min: 20480 + max: 96531168 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: jqp4qwlqg + job_id: j57yrwrl5 job_status: Passed torchscript_onnx_qnn: - inference_time: 3307.0 - throughput: 302.3888720895071 + inference_time: 4825.0 + throughput: 207.2538860103627 estimated_peak_memory_range: - min: 647168 - max: 1947368 + min: 638976 + max: 23148960 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +329,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: j2p0yr42g + job_id: jpv6kokj5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:56:23Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-15T17:22:39Z' - torchscript_onnx_tflite: - inference_time: 3396.0 - throughput: 294.4640753828033 + inference_time: 2377.0 + throughput: 420.69835927639883 estimated_peak_memory_range: - min: 24576 - max: 2347792 + min: 12288 + max: 44661680 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +352,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: j0pxv1kjg + job_id: jprv3q39g job_status: Passed torchscript_onnx_qnn: - inference_time: 3339.0 - throughput: 299.4908655286014 + inference_time: 2401.0 + throughput: 416.49312786339027 estimated_peak_memory_range: - min: 634880 - max: 1866304 + min: 614400 + max: 34292064 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +367,34 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: j1p8o72zg + job_id: jpedm1m15 + job_status: Passed + torchscript_onnx: + inference_time: 2510.0 + throughput: 398.40637450199205 + estimated_peak_memory_range: + min: 0 + max: 47890400 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 247 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 247 + job_id: jp2ky6k4p job_status: Passed reference_device_info: - name: SA8255 (Proxy) - os: '13' - form_factor: Auto + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:56:24Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T17:22:44Z' - torchscript_onnx_qnn: - inference_time: 3475.0 - throughput: 287.76978417266184 + inference_time: 3453.0 + throughput: 289.6032435563278 estimated_peak_memory_range: min: 602112 max: 602112 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: joprky6v5 + job_id: jp0z0q065 job_status: Passed torchscript_onnx: - inference_time: 3549.0 - throughput: 281.7695125387433 + inference_time: 3580.0 + throughput: 279.3296089385475 estimated_peak_memory_range: - min: 90714112 - max: 90714112 + min: 90652672 + max: 90652672 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 247 - job_id: jw56612v5 + job_id: jgn667rm5 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:56:28Z' + timestamp: '2024-10-16T08:31:22Z' diff --git a/qai_hub_models/models/resnet101_quantized/README.md b/qai_hub_models/models/resnet101_quantized/README.md index acc1f71d..8c3c49d1 100644 --- a/qai_hub_models/models/resnet101_quantized/README.md +++ b/qai_hub_models/models/resnet101_quantized/README.md @@ -6,7 +6,7 @@ ResNet101 is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of ResNet101Quantized found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/resnet101_quantized). @@ -17,11 +17,6 @@ accross various devices, can be found [here](https://aihub.qualcomm.com/models/r ## Example & Usage -Install the package via pip: -```bash -pip install "qai_hub_models[resnet101_quantized]" -``` - Once installed, run the following simple CLI demo: @@ -44,15 +39,19 @@ python -m qai_hub_models.models.resnet101_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of ResNet101Quantized can be found +* The license for the original implementation of ResNet101Quantized can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/resnet101_quantized/evaluate.py b/qai_hub_models/models/resnet101_quantized/evaluate.py index fde921e3..9aba239a 100644 --- a/qai_hub_models/models/resnet101_quantized/evaluate.py +++ b/qai_hub_models/models/resnet101_quantized/evaluate.py @@ -13,10 +13,8 @@ from qai_hub_models.models.resnet101_quantized import MODEL_ID, Model from qai_hub_models.utils.args import evaluate_parser, get_hub_device, get_model_kwargs -from qai_hub_models.utils.base_model import BaseModel from qai_hub_models.utils.evaluate import evaluate_on_dataset from qai_hub_models.utils.inference import compile_model_from_args -from qai_hub_models.utils.quantization_aimet import AIMETQuantizableMixin SUPPORTED_DATASETS = ["imagenette", "imagenet"] @@ -27,6 +25,7 @@ def main(): model_cls=Model, default_split_size=2500, supported_datasets=SUPPORTED_DATASETS, + is_hub_quantized=True, ) args = parser.parse_args() args.device = None @@ -38,13 +37,7 @@ def main(): MODEL_ID, args, get_model_kwargs(Model, vars(args)) ) hub_device = get_hub_device(None, args.chipset) - - # Use Fp16 model for torch inference - for cls in Model.__mro__: - if issubclass(cls, BaseModel) and not issubclass(cls, AIMETQuantizableMixin): - torch_cls = cls - break - torch_model = torch_cls.from_pretrained(**get_model_kwargs(torch_cls, vars(args))) + torch_model = Model.from_pretrained(**get_model_kwargs(Model, vars(args))) evaluate_on_dataset( hub_model, torch_model, diff --git a/qai_hub_models/models/resnet101_quantized/export.py b/qai_hub_models/models/resnet101_quantized/export.py index bf8ced05..c406c3f4 100644 --- a/qai_hub_models/models/resnet101_quantized/export.py +++ b/qai_hub_models/models/resnet101_quantized/export.py @@ -10,18 +10,20 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.resnet101_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference +from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( print_inference_metrics, print_on_target_demo_cmd, @@ -31,11 +33,14 @@ can_access_qualcomm_ai_hub, export_without_hub_access, ) +from qai_hub_models.utils.quantization import get_calibration_data def export_model( device: str = "Samsung Galaxy S23 (Family)", chipset: Optional[str] = None, + num_calibration_samples: int = 100, + skip_compiling: bool = False, skip_profiling: bool = False, skip_inferencing: bool = False, skip_downloading: bool = False, @@ -45,20 +50,19 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + 3. Compiles the model to an asset that can be run on device + 4. Profiles the model performance on a real device + 5. Inferences the model on sample inputs + 6. Downloads the model asset to the local directory + 7. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 5 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -66,6 +70,9 @@ def export_model( Defaults to DEFAULT_DEVICE if not specified. chipset: If set, will choose a random device with this chipset. Overrides the `device` argument. + num_calibration_samples: The number of calibration data samples + to use for quantization. + skip_compiling: If set, skips compiling model to format that can run on device. skip_profiling: If set, skips profiling of compiled model on real devices. skip_inferencing: If set, skips computing on-device outputs from sample data. skip_downloading: If set, skips downloading of compiled model. @@ -80,10 +87,11 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: - * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). + A struct of: + * A CompileJob object containing metadata about the compile job submitted to hub (None if compiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). + * A QuantizeJob object containing metadata about the quantize job submitted to hub """ model_name = "resnet101_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,33 +117,52 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) ) # Trace the model - source_model = model.convert_to_hub_source_model( - target_runtime, output_path, input_spec + source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) + + print(f"Quantizing model {model_name} with {num_calibration_samples} samples.") + # 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + onnx_compile_job = hub.submit_compile_job( + model=source_model, + input_specs=input_spec, + device=hub_device, + name=model_name, + options="--target_runtime onnx", + ) + quantize_job = hub.submit_quantize_job( + model=onnx_compile_job.get_target_model(), + calibration_data=get_calibration_data( + input_spec, "imagenette", num_calibration_samples + ), + weights_dtype=model.get_weights_dtype(), + activations_dtype=model.get_activations_dtype(), + name=model_name, + options=model.get_quantize_options(), ) + if skip_compiling: + return ExportResult(quantize_job=quantize_job) - # 2. Compile the model to an on-device asset + # 3. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) print(f"Optimizing model {model_name} to run on-device") submitted_compile_job = hub.submit_compile_job( - model=source_model, + model=quantize_job.get_target_model(), input_specs=input_spec, device=hub_device, name=model_name, - calibration_data=model.get_calibration_data(target_runtime), options=model_compile_options, ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 4. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +177,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 5. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +198,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 6. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 7. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -202,12 +229,17 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + quantize_job=quantize_job, + ) def main(): warnings.filterwarnings("ignore") - parser = export_parser(model_cls=Model) + parser = export_parser(model_cls=Model, is_hub_quantized=True) args = parser.parse_args() export_model(**vars(args)) diff --git a/qai_hub_models/models/resnet101_quantized/model.py b/qai_hub_models/models/resnet101_quantized/model.py index c4cfa229..12c3d4d6 100644 --- a/qai_hub_models/models/resnet101_quantized/model.py +++ b/qai_hub_models/models/resnet101_quantized/model.py @@ -4,86 +4,11 @@ # --------------------------------------------------------------------- from __future__ import annotations -# isort: off -# This verifies aimet is installed, and this must be included first. -from qai_hub_models.utils.quantization_aimet import ( - AIMETQuantizableMixin, - constrain_quantized_inputs_to_image_range, -) - -# isort: on - -import torch -from aimet_torch.cross_layer_equalization import ( - equalize_bn_folded_model, - fold_all_batch_norms, -) -from aimet_torch.model_preparer import prepare_model -from aimet_torch.quantsim import QuantizationSimModel, load_encodings_to_sim - from qai_hub_models.models.resnet101.model import ResNet101 -from qai_hub_models.utils.aimet.config_loader import get_default_aimet_config -from qai_hub_models.utils.asset_loaders import CachedWebModelAsset +from qai_hub_models.utils.quantization import HubQuantizableMixin MODEL_ID = __name__.split(".")[-2] -MODEL_ASSET_VERSION = 5 -DEFAULT_ENCODINGS = "resnet101_quantized_encodings.json" - - -class ResNet101Quantizable( - AIMETQuantizableMixin, - ResNet101, -): - """ResNet101 with post train quantization support. - - Supports only 8 bit weights and activations, and only loads pre-quantized checkpoints. - Support for quantizing using your own weights & data will come at a later date.""" - - def __init__( - self, - sim_model: QuantizationSimModel, - ) -> None: - # Input is already normalized by sim_model. Disable it in the wrapper model. - ResNet101.__init__(self, sim_model.model, normalize_input=False) - AIMETQuantizableMixin.__init__( - self, - sim_model, - ) - - @classmethod - def from_pretrained( - cls, - aimet_encodings: str | None = "DEFAULT", - ) -> "ResNet101Quantizable": - """ - Parameters: - aimet_encodings: - if "DEFAULT": Loads the model with aimet encodings calibrated on imagenette. - elif None: Doesn't load any encodings. Used when computing encodings. - else: Interprets as a filepath and loads the encodings stored there. - """ - model = ResNet101.from_pretrained() - input_shape = cls.get_input_spec()["image_tensor"][0] - model = prepare_model(model) - dummy_input = torch.rand(input_shape) - pairs = fold_all_batch_norms(model, input_shape, dummy_input) - equalize_bn_folded_model(model, input_shape, pairs, dummy_input) - - sim = QuantizationSimModel( - model, - quant_scheme="tf_enhanced", - default_param_bw=8, - default_output_bw=8, - config_file=get_default_aimet_config(), - dummy_input=torch.rand(input_shape), - ) - constrain_quantized_inputs_to_image_range(sim) - if aimet_encodings: - if aimet_encodings == "DEFAULT": - aimet_encodings = CachedWebModelAsset.from_asset_store( - MODEL_ID, MODEL_ASSET_VERSION, DEFAULT_ENCODINGS - ).fetch() - load_encodings_to_sim(sim, aimet_encodings) - return cls(sim) +class ResNet101Quantizable(HubQuantizableMixin, ResNet101): + pass diff --git a/qai_hub_models/models/resnet101_quantized/perf.yaml b/qai_hub_models/models/resnet101_quantized/perf.yaml index f65c1de5..21242612 100644 --- a/qai_hub_models/models/resnet101_quantized/perf.yaml +++ b/qai_hub_models/models/resnet101_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,44 +20,39 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS6490 (Proxy) - RB3 Gen 2 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Sa8775p Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Qcs6490 Proxy + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8775P Proxy models: - name: ResNet101Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 1153.0 - throughput: 867.3026886383348 + inference_time: 1159.0 + throughput: 862.8127696289905 estimated_peak_memory_range: - min: 32768 - max: 2011888 + min: 12288 + max: 54352760 primary_compute_unit: NPU precision: int8 layer_info: @@ -64,29 +60,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 150 - job_id: jvgdwv165 + job_id: jp0z4rn05 job_status: Passed torchscript_onnx_qnn: - inference_time: 1373.0 - throughput: 728.3321194464676 + inference_time: 1382.0 + throughput: 723.589001447178 estimated_peak_memory_range: - min: 16384 - max: 47362416 + min: 32768 + max: 10864768 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 246 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: j2p0yrz0g + total_layers: 246 + job_id: jgz327145 job_status: Passed torchscript_onnx: - inference_time: 2397.0 - throughput: 417.18815185648725 + inference_time: 2239.0 + throughput: 446.6279589102278 estimated_peak_memory_range: - min: 217088 - max: 52387320 + min: 12288 + max: 52546072 primary_compute_unit: NPU precision: int8 layer_info: @@ -94,7 +90,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 283 - job_id: j7gjxlv1p + job_id: jp2kxm26p job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -103,13 +99,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:55:45Z' + timestamp: '2024-10-17T17:24:15Z' - torchscript_onnx_tflite: - inference_time: 869.0 - throughput: 1150.7479861910242 + inference_time: 867.0 + throughput: 1153.4025374855826 estimated_peak_memory_range: min: 12288 - max: 101112112 + max: 101898880 primary_compute_unit: NPU precision: int8 layer_info: @@ -117,29 +113,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 150 - job_id: jz57zdrnp + job_id: jp8q27lqp job_status: Passed torchscript_onnx_qnn: - inference_time: 1210.0 - throughput: 826.4462809917355 + inference_time: 1043.0 + throughput: 958.7727708533077 estimated_peak_memory_range: - min: 167936 - max: 22519552 + min: 163840 + max: 22249632 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 246 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: j1p8o7qqg + total_layers: 246 + job_id: j5wew9j45 job_status: Passed torchscript_onnx: - inference_time: 2218.0 - throughput: 450.8566275924256 + inference_time: 1597.0 + throughput: 626.1740763932373 estimated_peak_memory_range: min: 12288 - max: 149319232 + max: 152934336 primary_compute_unit: NPU precision: int8 layer_info: @@ -147,7 +143,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 283 - job_id: jlpe9vd8g + job_id: jpy1zd90p job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -156,13 +152,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:55:46Z' + timestamp: '2024-10-17T17:24:17Z' - torchscript_onnx_tflite: - inference_time: 1144.0 - throughput: 874.1258741258741 + inference_time: 4486.0 + throughput: 222.91573785109227 estimated_peak_memory_range: - min: 12288 - max: 54250312 + min: 36864 + max: 37005520 primary_compute_unit: NPU precision: int8 layer_info: @@ -170,37 +166,60 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 150 - job_id: jqp4qwr2g + job_id: jgkevyjvg job_status: Passed torchscript_onnx_qnn: - inference_time: 1323.0 - throughput: 755.8578987150415 + inference_time: 6377.0 + throughput: 156.81354869060686 estimated_peak_memory_range: - min: 188416 - max: 1486584 + min: 208896 + max: 8274624 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 246 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jn5q826e5 + total_layers: 246 + job_id: jg9l046mg job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:55:39Z' + chipset: QCS6490 Proxy + timestamp: '2024-10-17T17:24:01Z' - torchscript_onnx_tflite: - inference_time: 1358.0 - throughput: 736.3770250368188 + inference_time: 17354.0 + throughput: 57.62360262763628 estimated_peak_memory_range: - min: 28672 - max: 103102416 + min: 208896 + max: 2368376 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 150 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 150 + job_id: j5q602jep + job_status: Passed + reference_device_info: + name: RB5 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8250 Proxy + timestamp: '2024-10-17T17:23:45Z' + - torchscript_onnx_tflite: + inference_time: 1159.0 + throughput: 862.8127696289905 + estimated_peak_memory_range: + min: 20480 + max: 1400088 primary_compute_unit: NPU precision: int8 layer_info: @@ -208,37 +227,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 150 - job_id: j0pxv1o8g + job_id: jglv4kj25 job_status: Passed torchscript_onnx_qnn: - inference_time: 1591.0 - throughput: 628.5355122564425 + inference_time: 1324.0 + throughput: 755.2870090634441 estimated_peak_memory_range: - min: 507904 - max: 24608272 + min: 176128 + max: 1511296 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 246 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jwgoyv215 + total_layers: 246 + job_id: jp1428rnp job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:55:43Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-17T17:24:03Z' - torchscript_onnx_tflite: - inference_time: 1154.0 - throughput: 866.5511265164645 + inference_time: 1157.0 + throughput: 864.304235090752 estimated_peak_memory_range: min: 12288 - max: 18030704 + max: 28724152 primary_compute_unit: NPU precision: int8 layer_info: @@ -246,37 +265,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 150 - job_id: jo5mrzx7g + job_id: j56y21knp job_status: Passed torchscript_onnx_qnn: - inference_time: 1320.0 - throughput: 757.5757575757576 + inference_time: 1324.0 + throughput: 755.2870090634441 estimated_peak_memory_range: min: 184320 - max: 1429024 + max: 1543680 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 246 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: j1glnkv2p + total_layers: 246 + job_id: j57y2dqn5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:55:40Z' + chipset: SA8255P Proxy + timestamp: '2024-10-17T17:24:06Z' - torchscript_onnx_tflite: - inference_time: 1161.0 - throughput: 861.3264427217915 + inference_time: 1162.0 + throughput: 860.5851979345955 estimated_peak_memory_range: min: 28672 - max: 17746288 + max: 388559256 primary_compute_unit: NPU precision: int8 layer_info: @@ -284,22 +303,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 150 - job_id: jegn2evjg + job_id: jp3jnmymg job_status: Passed torchscript_onnx_qnn: - inference_time: 1330.0 - throughput: 751.8796992481203 + inference_time: 1325.0 + throughput: 754.7169811320755 estimated_peak_memory_range: - min: 176128 - max: 1414360 + min: 184320 + max: 1467840 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 246 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jw5661yn5 + total_layers: 246 + job_id: jp4lnwz25 job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -307,14 +326,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:55:41Z' + chipset: SA8775P Proxy + timestamp: '2024-10-17T17:24:08Z' - torchscript_onnx_tflite: - inference_time: 1165.0 - throughput: 858.3690987124463 + inference_time: 1367.0 + throughput: 731.528895391368 estimated_peak_memory_range: min: 16384 - max: 14728928 + max: 104200096 primary_compute_unit: NPU precision: int8 layer_info: @@ -322,37 +341,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 150 - job_id: joprky3k5 + job_id: jgo2zvj1p job_status: Passed torchscript_onnx_qnn: - inference_time: 1333.0 - throughput: 750.1875468867216 + inference_time: 1592.0 + throughput: 628.1407035175879 estimated_peak_memory_range: - min: 184320 - max: 1544360 + min: 172032 + max: 26655536 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 246 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: j1p3kmjm5 + total_layers: 246 + job_id: jpxk91w85 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:55:42Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-17T17:24:09Z' - torchscript_onnx_tflite: - inference_time: 4481.0 - throughput: 223.1644722160232 + inference_time: 832.0 + throughput: 1201.923076923077 estimated_peak_memory_range: - min: 12288 - max: 36912016 + min: 8192 + max: 31175136 primary_compute_unit: NPU precision: int8 layer_info: @@ -360,75 +379,67 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 150 - job_id: jep28my6p + job_id: jpv6qwjz5 job_status: Passed torchscript_onnx_qnn: - inference_time: 6360.0 - throughput: 157.23270440251574 + inference_time: 1004.0 + throughput: 996.01593625498 estimated_peak_memory_range: - min: 192512 - max: 8045248 + min: 163840 + max: 23243040 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 246 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: j1pv3w6z5 + total_layers: 246 + job_id: j5mnezj7p job_status: Passed - reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot - os_name: Android - manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T11:55:44Z' - - torchscript_onnx_tflite: - inference_time: 17279.0 - throughput: 57.87371954395509 + torchscript_onnx: + inference_time: 1585.0 + throughput: 630.9148264984227 estimated_peak_memory_range: - min: 53248 - max: 10200904 + min: 0 + max: 63100464 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 150 + layers_on_npu: 283 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 150 - job_id: jqpyed30g + total_layers: 283 + job_id: jp8q27oqp job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T11:55:34Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-17T17:24:20Z' - torchscript_onnx_qnn: - inference_time: 1309.0 - throughput: 763.9419404125287 + inference_time: 1327.0 + throughput: 753.5795026375282 estimated_peak_memory_range: - min: 348160 - max: 348160 + min: 442368 + max: 442368 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 246 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jogkzyevg + total_layers: 246 + job_id: jgdxnvj6p job_status: Passed torchscript_onnx: inference_time: 2350.0 throughput: 425.531914893617 estimated_peak_memory_range: - min: 48607232 - max: 48607232 + min: 48603136 + max: 48603136 primary_compute_unit: NPU precision: int8 layer_info: @@ -436,7 +447,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 283 - job_id: jygze734g + job_id: jp0z4ry05 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -445,4 +456,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:55:47Z' + timestamp: '2024-10-17T17:24:18Z' diff --git a/qai_hub_models/models/resnet101_quantized/requirements.txt b/qai_hub_models/models/resnet101_quantized/requirements.txt deleted file mode 100644 index de5b80e8..00000000 --- a/qai_hub_models/models/resnet101_quantized/requirements.txt +++ /dev/null @@ -1 +0,0 @@ -aimet-torch==1.32.1.post1; sys_platform == "linux" diff --git a/qai_hub_models/models/resnet101_quantized/test.py b/qai_hub_models/models/resnet101_quantized/test.py deleted file mode 100644 index 876ebffe..00000000 --- a/qai_hub_models/models/resnet101_quantized/test.py +++ /dev/null @@ -1,30 +0,0 @@ -# --------------------------------------------------------------------- -# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. -# SPDX-License-Identifier: BSD-3-Clause -# --------------------------------------------------------------------- -from qai_hub_models.models._shared.imagenet_classifier.test_utils import ( - run_imagenet_classifier_test, -) -from qai_hub_models.models.resnet101_quantized.demo import main as demo_main -from qai_hub_models.models.resnet101_quantized.model import ( - MODEL_ASSET_VERSION, - MODEL_ID, - ResNet101Quantizable, -) - - -def test_task(): - run_imagenet_classifier_test( - ResNet101Quantizable.from_pretrained(), - MODEL_ID, - probability_threshold=0.45, - diff_tol=0.005, - rtol=0.02, - atol=0.2, - asset_version=MODEL_ASSET_VERSION, - ) - - -def test_demo(): - # Verify demo does not crash - demo_main(is_test=True) diff --git a/qai_hub_models/models/resnet18/README.md b/qai_hub_models/models/resnet18/README.md index c78fd6a9..299ae472 100644 --- a/qai_hub_models/models/resnet18/README.md +++ b/qai_hub_models/models/resnet18/README.md @@ -6,7 +6,7 @@ ResNet18 is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of ResNet18 found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/resnet18). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.resnet18.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of ResNet18 can be found +* The license for the original implementation of ResNet18 can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/resnet18/export.py b/qai_hub_models/models/resnet18/export.py index 1baff7ce..cca1addd 100644 --- a/qai_hub_models/models/resnet18/export.py +++ b/qai_hub_models/models/resnet18/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.resnet18 import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "resnet18" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/resnet18/perf.yaml b/qai_hub_models/models/resnet18/perf.yaml index 6404bb77..631f9ed3 100644 --- a/qai_hub_models/models/resnet18/perf.yaml +++ b/qai_hub_models/models/resnet18/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: ResNet18 performance_metrics: - torchscript_onnx_tflite: - inference_time: 1383.0 - throughput: 723.0657989877079 + inference_time: 1384.0 + throughput: 722.543352601156 estimated_peak_memory_range: - min: 16384 - max: 2213008 + min: 32768 + max: 2412488 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 38 - job_id: jz57zx3np + job_id: jgo264dxp job_status: Passed torchscript_onnx_qnn: - inference_time: 1460.0 - throughput: 684.931506849315 + inference_time: 1459.0 + throughput: 685.4009595613434 estimated_peak_memory_range: - min: 16384 - max: 4586272 + min: 167936 + max: 82743112 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 53 - job_id: jqpye600g + job_id: j5we61z35 job_status: Passed torchscript_onnx: - inference_time: 1354.0 - throughput: 738.5524372230428 + inference_time: 1337.0 + throughput: 747.9431563201197 estimated_peak_memory_range: - min: 36864 - max: 25839080 + min: 16384 + max: 25939760 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 55 - job_id: jwgoyv615 + job_id: jp2kyvdrp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:53:41Z' + timestamp: '2024-10-14T23:46:27Z' - torchscript_onnx_tflite: - inference_time: 1075.0 - throughput: 930.2325581395348 + inference_time: 1074.0 + throughput: 931.0986964618249 estimated_peak_memory_range: min: 16384 - max: 28491264 + max: 29274576 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 38 - job_id: jqp4qv02g + job_id: jpv6k92j5 job_status: Passed torchscript_onnx_qnn: - inference_time: 1423.0 - throughput: 702.7406886858749 + inference_time: 1114.0 + throughput: 897.6660682226212 estimated_peak_memory_range: - min: 618496 - max: 15481040 + min: 634880 + max: 13621584 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 53 - job_id: j2p0yr00g + job_id: jg9lnx2wg job_status: Passed torchscript_onnx: - inference_time: 1384.0 - throughput: 722.543352601156 + inference_time: 1057.0 + throughput: 946.073793755913 estimated_peak_memory_range: - min: 638976 - max: 30037392 + min: 303104 + max: 29953216 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 55 - job_id: j1pv3wkz5 + job_id: jpy13728p job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:53:42Z' + timestamp: '2024-10-14T23:46:28Z' - torchscript_onnx_tflite: inference_time: 1383.0 throughput: 723.0657989877079 estimated_peak_memory_range: - min: 24576 - max: 314789328 + min: 32768 + max: 27591104 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 38 - job_id: j0pxvy28g + job_id: jgjvnw3xg job_status: Passed torchscript_onnx_qnn: - inference_time: 1318.0 - throughput: 758.7253414264036 + inference_time: 1322.0 + throughput: 756.4296520423601 estimated_peak_memory_range: - min: 643072 - max: 2015960 + min: 671744 + max: 2247928 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 53 - job_id: jogkzyxvg + job_id: jgdx1z4rp job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:53:36Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:46:20Z' - torchscript_onnx_tflite: - inference_time: 1942.0 - throughput: 514.9330587023687 + inference_time: 1387.0 + throughput: 720.9805335255949 estimated_peak_memory_range: - min: 32768 - max: 26344192 + min: 16384 + max: 5378256 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 38 - job_id: jo5mr3y7g + job_id: jg9lnx2lg job_status: Passed torchscript_onnx_qnn: - inference_time: 2001.0 - throughput: 499.7501249375312 + inference_time: 1322.0 + throughput: 756.4296520423601 estimated_peak_memory_range: - min: 618496 - max: 18347168 + min: 634880 + max: 2372912 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 53 - job_id: j1p3km0m5 + job_id: jpxkodr35 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:53:40Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:46:23Z' - torchscript_onnx_tflite: - inference_time: 1387.0 - throughput: 720.9805335255949 + inference_time: 1385.0 + throughput: 722.0216606498195 estimated_peak_memory_range: min: 16384 - max: 2574440 + max: 20826416 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 38 - job_id: jegn238jg + job_id: j5we61z65 job_status: Passed torchscript_onnx_qnn: - inference_time: 1334.0 - throughput: 749.6251874062968 + inference_time: 1327.0 + throughput: 753.5795026375282 estimated_peak_memory_range: - min: 630784 - max: 2122696 + min: 634880 + max: 2016360 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 53 - job_id: jn5q82qe5 + job_id: jp4lr9485 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:53:37Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:46:22Z' - torchscript_onnx_tflite: - inference_time: 1384.0 - throughput: 722.543352601156 + inference_time: 1386.0 + throughput: 721.5007215007215 estimated_peak_memory_range: min: 28672 - max: 1556416 + max: 1431392 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 38 - job_id: joprkejk5 + job_id: jgz3d4zk5 job_status: Passed torchscript_onnx_qnn: - inference_time: 1329.0 - throughput: 752.4454477050414 + inference_time: 1326.0 + throughput: 754.1478129713424 estimated_peak_memory_range: - min: 626688 - max: 2136400 + min: 634880 + max: 1931784 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 53 - job_id: j1glnkm2p + job_id: j57yr7nv5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:53:38Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:46:21Z' - torchscript_onnx_tflite: - inference_time: 1385.0 - throughput: 722.0216606498195 + inference_time: 1943.0 + throughput: 514.668039114771 estimated_peak_memory_range: - min: 40960 - max: 1291888 + min: 24576 + max: 27766528 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 38 - job_id: jep28ln6p + job_id: jpedml615 job_status: Passed torchscript_onnx_qnn: - inference_time: 1353.0 - throughput: 739.0983000739099 + inference_time: 1994.0 + throughput: 501.5045135406219 estimated_peak_memory_range: - min: 634880 - max: 1879608 + min: 618496 + max: 17848032 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,57 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 53 - job_id: jw56614n5 + job_id: jgn6v7qk5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:46:25Z' + - torchscript_onnx_tflite: + inference_time: 799.0 + throughput: 1251.5644555694619 + estimated_peak_memory_range: + min: 12288 + max: 17085392 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 38 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 38 + job_id: jgdx1z4ep + job_status: Passed + torchscript_onnx: + inference_time: 971.0 + throughput: 1029.8661174047375 + estimated_peak_memory_range: + min: 0 + max: 16409728 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 55 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 55 + job_id: jgkex90wg + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:53:39Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:46:32Z' - torchscript_onnx_qnn: - inference_time: 1447.0 - throughput: 691.0850034554251 + inference_time: 1432.0 + throughput: 698.3240223463687 estimated_peak_memory_range: min: 602112 max: 602112 @@ -354,14 +390,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 53 - job_id: j1p8o7yqg + job_id: jp14zv18p job_status: Passed torchscript_onnx: - inference_time: 1309.0 - throughput: 763.9419404125287 + inference_time: 1316.0 + throughput: 759.8784194528876 estimated_peak_memory_range: - min: 24379392 - max: 24379392 + min: 24326144 + max: 24326144 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +405,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 55 - job_id: j7gjxln1p + job_id: jp0z0v995 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +414,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:53:43Z' + timestamp: '2024-10-14T23:46:29Z' diff --git a/qai_hub_models/models/resnet18_quantized/README.md b/qai_hub_models/models/resnet18_quantized/README.md index 705dd764..907fffb5 100644 --- a/qai_hub_models/models/resnet18_quantized/README.md +++ b/qai_hub_models/models/resnet18_quantized/README.md @@ -6,7 +6,7 @@ ResNet18 is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of ResNet18Quantized found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/resnet18_quantized). @@ -17,11 +17,6 @@ accross various devices, can be found [here](https://aihub.qualcomm.com/models/r ## Example & Usage -Install the package via pip: -```bash -pip install "qai_hub_models[resnet18_quantized]" -``` - Once installed, run the following simple CLI demo: @@ -44,15 +39,19 @@ python -m qai_hub_models.models.resnet18_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of ResNet18Quantized can be found +* The license for the original implementation of ResNet18Quantized can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/resnet18_quantized/evaluate.py b/qai_hub_models/models/resnet18_quantized/evaluate.py index d98aec44..11452ab1 100644 --- a/qai_hub_models/models/resnet18_quantized/evaluate.py +++ b/qai_hub_models/models/resnet18_quantized/evaluate.py @@ -13,10 +13,8 @@ from qai_hub_models.models.resnet18_quantized import MODEL_ID, Model from qai_hub_models.utils.args import evaluate_parser, get_hub_device, get_model_kwargs -from qai_hub_models.utils.base_model import BaseModel from qai_hub_models.utils.evaluate import evaluate_on_dataset from qai_hub_models.utils.inference import compile_model_from_args -from qai_hub_models.utils.quantization_aimet import AIMETQuantizableMixin SUPPORTED_DATASETS = ["imagenette", "imagenet"] @@ -27,6 +25,7 @@ def main(): model_cls=Model, default_split_size=2500, supported_datasets=SUPPORTED_DATASETS, + is_hub_quantized=True, ) args = parser.parse_args() args.device = None @@ -38,13 +37,7 @@ def main(): MODEL_ID, args, get_model_kwargs(Model, vars(args)) ) hub_device = get_hub_device(None, args.chipset) - - # Use Fp16 model for torch inference - for cls in Model.__mro__: - if issubclass(cls, BaseModel) and not issubclass(cls, AIMETQuantizableMixin): - torch_cls = cls - break - torch_model = torch_cls.from_pretrained(**get_model_kwargs(torch_cls, vars(args))) + torch_model = Model.from_pretrained(**get_model_kwargs(Model, vars(args))) evaluate_on_dataset( hub_model, torch_model, diff --git a/qai_hub_models/models/resnet18_quantized/export.py b/qai_hub_models/models/resnet18_quantized/export.py index ebf77975..b803f909 100644 --- a/qai_hub_models/models/resnet18_quantized/export.py +++ b/qai_hub_models/models/resnet18_quantized/export.py @@ -10,18 +10,20 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.resnet18_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference +from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( print_inference_metrics, print_on_target_demo_cmd, @@ -31,11 +33,14 @@ can_access_qualcomm_ai_hub, export_without_hub_access, ) +from qai_hub_models.utils.quantization import get_calibration_data def export_model( device: str = "Samsung Galaxy S23 (Family)", chipset: Optional[str] = None, + num_calibration_samples: int = 100, + skip_compiling: bool = False, skip_profiling: bool = False, skip_inferencing: bool = False, skip_downloading: bool = False, @@ -45,20 +50,19 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + 3. Compiles the model to an asset that can be run on device + 4. Profiles the model performance on a real device + 5. Inferences the model on sample inputs + 6. Downloads the model asset to the local directory + 7. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 5 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -66,6 +70,9 @@ def export_model( Defaults to DEFAULT_DEVICE if not specified. chipset: If set, will choose a random device with this chipset. Overrides the `device` argument. + num_calibration_samples: The number of calibration data samples + to use for quantization. + skip_compiling: If set, skips compiling model to format that can run on device. skip_profiling: If set, skips profiling of compiled model on real devices. skip_inferencing: If set, skips computing on-device outputs from sample data. skip_downloading: If set, skips downloading of compiled model. @@ -80,10 +87,11 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: - * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). + A struct of: + * A CompileJob object containing metadata about the compile job submitted to hub (None if compiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). + * A QuantizeJob object containing metadata about the quantize job submitted to hub """ model_name = "resnet18_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,33 +117,52 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) ) # Trace the model - source_model = model.convert_to_hub_source_model( - target_runtime, output_path, input_spec + source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) + + print(f"Quantizing model {model_name} with {num_calibration_samples} samples.") + # 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + onnx_compile_job = hub.submit_compile_job( + model=source_model, + input_specs=input_spec, + device=hub_device, + name=model_name, + options="--target_runtime onnx", + ) + quantize_job = hub.submit_quantize_job( + model=onnx_compile_job.get_target_model(), + calibration_data=get_calibration_data( + input_spec, "imagenette", num_calibration_samples + ), + weights_dtype=model.get_weights_dtype(), + activations_dtype=model.get_activations_dtype(), + name=model_name, + options=model.get_quantize_options(), ) + if skip_compiling: + return ExportResult(quantize_job=quantize_job) - # 2. Compile the model to an on-device asset + # 3. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) print(f"Optimizing model {model_name} to run on-device") submitted_compile_job = hub.submit_compile_job( - model=source_model, + model=quantize_job.get_target_model(), input_specs=input_spec, device=hub_device, name=model_name, - calibration_data=model.get_calibration_data(target_runtime), options=model_compile_options, ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 4. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +177,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 5. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +198,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 6. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 7. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -202,12 +229,17 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + quantize_job=quantize_job, + ) def main(): warnings.filterwarnings("ignore") - parser = export_parser(model_cls=Model) + parser = export_parser(model_cls=Model, is_hub_quantized=True) args = parser.parse_args() export_model(**vars(args)) diff --git a/qai_hub_models/models/resnet18_quantized/model.py b/qai_hub_models/models/resnet18_quantized/model.py index c0c56598..a9139836 100644 --- a/qai_hub_models/models/resnet18_quantized/model.py +++ b/qai_hub_models/models/resnet18_quantized/model.py @@ -4,78 +4,11 @@ # --------------------------------------------------------------------- from __future__ import annotations -# isort: off -# This verifies aimet is installed, and this must be included first. -from qai_hub_models.utils.quantization_aimet import ( - AIMETQuantizableMixin, - constrain_quantized_inputs_to_image_range, -) - -# isort: on - -import torch -from aimet_torch.cross_layer_equalization import equalize_model -from aimet_torch.model_preparer import prepare_model -from aimet_torch.quantsim import QuantizationSimModel, load_encodings_to_sim - from qai_hub_models.models.resnet18.model import ResNet18 -from qai_hub_models.utils.aimet.config_loader import get_default_aimet_config -from qai_hub_models.utils.asset_loaders import CachedWebModelAsset +from qai_hub_models.utils.quantization import HubQuantizableMixin MODEL_ID = __name__.split(".")[-2] -MODEL_ASSET_VERSION = 8 -DEFAULT_ENCODINGS = "resnet18_quantized_encodings.json" - - -class ResNet18Quantizable(AIMETQuantizableMixin, ResNet18): - """ResNet with post train quantization support. - - Supports only 8 bit weights and activations, and only loads pre-quantized checkpoints. - Support for quantizing using your own weights & data will come at a later date.""" - - def __init__( - self, - resnet18_model: QuantizationSimModel, - ) -> None: - # Input is already normalized by sim_model. Disable it in the wrapper model. - ResNet18.__init__(self, resnet18_model.model, normalize_input=False) - AIMETQuantizableMixin.__init__( - self, - resnet18_model, - ) - - @classmethod - def from_pretrained( - cls, - aimet_encodings: str | None = "DEFAULT", - ) -> "ResNet18Quantizable": - """ - Parameters: - aimet_encodings: - if "DEFAULT": Loads the model with aimet encodings calibrated on imagenette. - elif None: Doesn't load any encodings. Used when computing encodings. - else: Interprets as a filepath and loads the encodings stored there. - """ - model = ResNet18.from_pretrained() - input_shape = cls.get_input_spec()["image_tensor"][0] - - model = prepare_model(model) - equalize_model(model, input_shape) - sim = QuantizationSimModel( - model, - quant_scheme="tf_enhanced", - default_param_bw=8, - default_output_bw=8, - config_file=get_default_aimet_config(), - dummy_input=torch.rand(input_shape), - ) - constrain_quantized_inputs_to_image_range(sim) - if aimet_encodings: - if aimet_encodings == "DEFAULT": - aimet_encodings = CachedWebModelAsset.from_asset_store( - MODEL_ID, MODEL_ASSET_VERSION, DEFAULT_ENCODINGS - ).fetch() - load_encodings_to_sim(sim, aimet_encodings) - return cls(sim) +class ResNet18Quantizable(HubQuantizableMixin, ResNet18): + pass diff --git a/qai_hub_models/models/resnet18_quantized/perf.yaml b/qai_hub_models/models/resnet18_quantized/perf.yaml index cc3fa5fe..fd587c06 100644 --- a/qai_hub_models/models/resnet18_quantized/perf.yaml +++ b/qai_hub_models/models/resnet18_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,44 +20,39 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS6490 (Proxy) - RB3 Gen 2 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Sa8775p Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Qcs6490 Proxy + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8775P Proxy models: - name: ResNet18Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 409.0 - throughput: 2444.987775061125 + inference_time: 406.0 + throughput: 2463.054187192118 estimated_peak_memory_range: min: 12288 - max: 14808408 + max: 1536376 primary_compute_unit: NPU precision: int8 layer_info: @@ -64,29 +60,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 41 - job_id: jnp10eln5 + job_id: jgz327q45 job_status: Passed torchscript_onnx_qnn: - inference_time: 633.0 - throughput: 1579.778830963665 + inference_time: 624.0 + throughput: 1602.5641025641025 estimated_peak_memory_range: min: 16384 - max: 8307960 + max: 8236344 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 37 + layers_on_npu: 54 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 37 - job_id: jqpye6w0g + total_layers: 54 + job_id: jp2kxmq6p job_status: Passed torchscript_onnx: - inference_time: 723.0 - throughput: 1383.1258644536654 + inference_time: 708.0 + throughput: 1412.4293785310736 estimated_peak_memory_range: - min: 77824 - max: 1547736 + min: 16384 + max: 13985416 primary_compute_unit: NPU precision: int8 layer_info: @@ -94,7 +90,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 74 - job_id: j7gjxe41p + job_id: jgjvdl91g job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -103,13 +99,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:53:02Z' + timestamp: '2024-10-17T17:22:56Z' - torchscript_onnx_tflite: - inference_time: 354.0 - throughput: 2824.858757062147 + inference_time: 311.0 + throughput: 3215.434083601286 estimated_peak_memory_range: - min: 20480 - max: 27231504 + min: 12288 + max: 28261504 primary_compute_unit: NPU precision: int8 layer_info: @@ -117,29 +113,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 41 - job_id: jvgdwo965 + job_id: j5wew9045 job_status: Passed torchscript_onnx_qnn: - inference_time: 478.0 - throughput: 2092.050209205021 + inference_time: 477.0 + throughput: 2096.4360587002097 estimated_peak_memory_range: - min: 0 - max: 13139152 + min: 180224 + max: 12752880 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 37 + layers_on_npu: 54 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 37 - job_id: j1p8ozvqg + total_layers: 54 + job_id: jpy1zdk0p job_status: Passed torchscript_onnx: - inference_time: 545.0 - throughput: 1834.8623853211009 + inference_time: 505.0 + throughput: 1980.1980198019803 estimated_peak_memory_range: min: 12288 - max: 34316352 + max: 34808496 primary_compute_unit: NPU precision: int8 layer_info: @@ -147,7 +143,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 74 - job_id: jlpe9k38g + job_id: jpedovq85 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -156,13 +152,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:53:03Z' + timestamp: '2024-10-17T17:22:58Z' - torchscript_onnx_tflite: - inference_time: 406.0 - throughput: 2463.054187192118 + inference_time: 1418.0 + throughput: 705.2186177715091 estimated_peak_memory_range: min: 12288 - max: 1377008 + max: 18548880 primary_compute_unit: NPU precision: int8 layer_info: @@ -170,37 +166,60 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 41 - job_id: jz57zxwnp + job_id: jg9l047mg job_status: Passed torchscript_onnx_qnn: - inference_time: 599.0 - throughput: 1669.449081803005 + inference_time: 2033.0 + throughput: 491.88391539596654 estimated_peak_memory_range: - min: 184320 - max: 1443088 + min: 163840 + max: 7996624 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 37 + layers_on_npu: 54 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 37 - job_id: jn5q83oe5 + total_layers: 54 + job_id: jp0z4rw05 job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:52:56Z' + chipset: QCS6490 Proxy + timestamp: '2024-10-17T17:22:42Z' - torchscript_onnx_tflite: - inference_time: 470.0 - throughput: 2127.659574468085 + inference_time: 7144.0 + throughput: 139.97760358342666 + estimated_peak_memory_range: + min: 12288 + max: 6250336 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 41 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 41 + job_id: jp1428knp + job_status: Passed + reference_device_info: + name: RB5 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8250 Proxy + timestamp: '2024-10-17T17:22:27Z' + - torchscript_onnx_tflite: + inference_time: 406.0 + throughput: 2463.054187192118 estimated_peak_memory_range: - min: 20480 - max: 28156960 + min: 12288 + max: 1447784 primary_compute_unit: NPU precision: int8 layer_info: @@ -208,37 +227,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 41 - job_id: jqp4qvo2g + job_id: jgdxnvy6p job_status: Passed torchscript_onnx_qnn: - inference_time: 702.0 - throughput: 1424.5014245014245 + inference_time: 600.0 + throughput: 1666.6666666666667 estimated_peak_memory_range: - min: 163840 - max: 14723536 + min: 176128 + max: 1446944 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 37 + layers_on_npu: 54 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 37 - job_id: jwgoy3q15 + total_layers: 54 + job_id: jp8q27nqp job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:53:00Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-17T17:22:44Z' - torchscript_onnx_tflite: - inference_time: 406.0 - throughput: 2463.054187192118 + inference_time: 401.0 + throughput: 2493.7655860349128 estimated_peak_memory_range: min: 16384 - max: 8250584 + max: 1446832 primary_compute_unit: NPU precision: int8 layer_info: @@ -246,37 +265,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 41 - job_id: j0pxvyj8g + job_id: j57y2d1n5 job_status: Passed torchscript_onnx_qnn: inference_time: 603.0 throughput: 1658.374792703151 estimated_peak_memory_range: min: 180224 - max: 1981520 + max: 1882592 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 37 + layers_on_npu: 54 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 37 - job_id: j1gln3r2p + total_layers: 54 + job_id: j5q602nep job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:52:57Z' + chipset: SA8255P Proxy + timestamp: '2024-10-17T17:22:47Z' - torchscript_onnx_tflite: - inference_time: 412.0 - throughput: 2427.1844660194174 + inference_time: 405.0 + throughput: 2469.135802469136 estimated_peak_memory_range: min: 12288 - max: 1487056 + max: 1501904 primary_compute_unit: NPU precision: int8 layer_info: @@ -284,22 +303,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 41 - job_id: jo5mr327g + job_id: jp4lnw625 job_status: Passed torchscript_onnx_qnn: - inference_time: 601.0 - throughput: 1663.8935108153078 + inference_time: 600.0 + throughput: 1666.6666666666667 estimated_peak_memory_range: - min: 188416 - max: 1522872 + min: 184320 + max: 1963040 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 37 + layers_on_npu: 54 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 37 - job_id: jw566nln5 + total_layers: 54 + job_id: jglv4kd25 job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -307,14 +326,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:52:58Z' + chipset: SA8775P Proxy + timestamp: '2024-10-17T17:22:48Z' - torchscript_onnx_tflite: - inference_time: 407.0 - throughput: 2457.002457002457 + inference_time: 470.0 + throughput: 2127.659574468085 estimated_peak_memory_range: min: 12288 - max: 14964664 + max: 29889024 primary_compute_unit: NPU precision: int8 layer_info: @@ -322,37 +341,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 41 - job_id: jegn23yjg + job_id: jpxk91885 job_status: Passed torchscript_onnx_qnn: - inference_time: 602.0 - throughput: 1661.1295681063123 + inference_time: 703.0 + throughput: 1422.475106685633 estimated_peak_memory_range: - min: 180224 - max: 1363016 + min: 163840 + max: 15623552 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 37 + layers_on_npu: 54 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 37 - job_id: j1p3ke2m5 + total_layers: 54 + job_id: j56y21xnp job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:52:59Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-17T17:22:50Z' - torchscript_onnx_tflite: - inference_time: 1347.0 - throughput: 742.3904974016333 + inference_time: 299.0 + throughput: 3344.4816053511704 estimated_peak_memory_range: min: 12288 - max: 18336272 + max: 16167008 primary_compute_unit: NPU precision: int8 layer_info: @@ -360,75 +379,67 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 41 - job_id: joprkeqk5 + job_id: j5mnez17p job_status: Passed torchscript_onnx_qnn: - inference_time: 2244.0 - throughput: 445.63279857397504 + inference_time: 437.0 + throughput: 2288.329519450801 estimated_peak_memory_range: - min: 16384 - max: 7546080 + min: 0 + max: 9301760 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 37 + layers_on_npu: 54 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 37 - job_id: j1pv3vxz5 + total_layers: 54 + job_id: jp3jnmdmg job_status: Passed - reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot - os_name: Android - manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T11:53:01Z' - - torchscript_onnx_tflite: - inference_time: 7096.0 - throughput: 140.92446448703495 + torchscript_onnx: + inference_time: 525.0 + throughput: 1904.7619047619048 estimated_peak_memory_range: - min: 90112 - max: 7545872 + min: 0 + max: 20260384 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 41 + layers_on_npu: 74 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 41 - job_id: jep28l66p + total_layers: 74 + job_id: j5wew9k45 job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T11:52:52Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-17T17:23:01Z' - torchscript_onnx_qnn: - inference_time: 687.0 - throughput: 1455.604075691412 + inference_time: 691.0 + throughput: 1447.178002894356 estimated_peak_memory_range: - min: 520192 - max: 520192 + min: 487424 + max: 487424 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 37 + layers_on_npu: 54 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 37 - job_id: jogkz3mvg + total_layers: 54 + job_id: jgkevy1vg job_status: Passed torchscript_onnx: - inference_time: 712.0 - throughput: 1404.4943820224719 + inference_time: 717.0 + throughput: 1394.700139470014 estimated_peak_memory_range: - min: 13742080 - max: 13742080 + min: 14733312 + max: 14733312 primary_compute_unit: NPU precision: int8 layer_info: @@ -436,7 +447,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 74 - job_id: jygzerk4g + job_id: jgz327645 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -445,4 +456,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:53:04Z' + timestamp: '2024-10-17T17:23:00Z' diff --git a/qai_hub_models/models/resnet18_quantized/requirements.txt b/qai_hub_models/models/resnet18_quantized/requirements.txt deleted file mode 100644 index de5b80e8..00000000 --- a/qai_hub_models/models/resnet18_quantized/requirements.txt +++ /dev/null @@ -1 +0,0 @@ -aimet-torch==1.32.1.post1; sys_platform == "linux" diff --git a/qai_hub_models/models/resnet18_quantized/test.py b/qai_hub_models/models/resnet18_quantized/test.py deleted file mode 100644 index 4405e8d2..00000000 --- a/qai_hub_models/models/resnet18_quantized/test.py +++ /dev/null @@ -1,30 +0,0 @@ -# --------------------------------------------------------------------- -# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. -# SPDX-License-Identifier: BSD-3-Clause -# --------------------------------------------------------------------- -from qai_hub_models.models._shared.imagenet_classifier.test_utils import ( - run_imagenet_classifier_test, -) -from qai_hub_models.models.resnet18_quantized.demo import main as demo_main -from qai_hub_models.models.resnet18_quantized.model import ( - MODEL_ASSET_VERSION, - MODEL_ID, - ResNet18Quantizable, -) - - -def test_task(): - run_imagenet_classifier_test( - ResNet18Quantizable.from_pretrained(), - MODEL_ID, - probability_threshold=0.45, - diff_tol=0.005, - rtol=0.02, - atol=0.2, - asset_version=MODEL_ASSET_VERSION, - ) - - -def test_demo(): - # Verify demo does not crash - demo_main(is_test=True) diff --git a/qai_hub_models/models/resnet50/README.md b/qai_hub_models/models/resnet50/README.md index 96ba5cac..38950979 100644 --- a/qai_hub_models/models/resnet50/README.md +++ b/qai_hub_models/models/resnet50/README.md @@ -6,7 +6,7 @@ ResNet50 is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of ResNet50 found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/resnet50). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.resnet50.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of ResNet50 can be found +* The license for the original implementation of ResNet50 can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/resnet50/export.py b/qai_hub_models/models/resnet50/export.py index c61bb0a7..cf8e0502 100644 --- a/qai_hub_models/models/resnet50/export.py +++ b/qai_hub_models/models/resnet50/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.resnet50 import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "resnet50" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/resnet50/perf.yaml b/qai_hub_models/models/resnet50/perf.yaml index 5d94dd8c..71f36a73 100644 --- a/qai_hub_models/models/resnet50/perf.yaml +++ b/qai_hub_models/models/resnet50/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: ResNet50 performance_metrics: - torchscript_onnx_tflite: - inference_time: 2268.0 - throughput: 440.9171075837742 + inference_time: 2278.0 + throughput: 438.98156277436345 estimated_peak_memory_range: - min: 45056 - max: 1952880 + min: 36864 + max: 1999784 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 79 - job_id: jmg9vwvm5 + job_id: jgo26yxdp job_status: Passed torchscript_onnx_qnn: - inference_time: 2399.0 - throughput: 416.84035014589415 + inference_time: 2384.0 + throughput: 419.46308724832215 estimated_peak_memory_range: - min: 622592 - max: 183955168 + min: 618496 + max: 182094504 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: jegn232jg + job_id: jprv3kzeg job_status: Passed torchscript_onnx: - inference_time: 2363.0 - throughput: 423.1908590774439 + inference_time: 2347.0 + throughput: 426.075841499787 estimated_peak_memory_range: - min: 16384 - max: 754917776 + min: 618496 + max: 2422224 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 128 - job_id: jw566njn5 + job_id: jgo26yjdp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:52:05Z' + timestamp: '2024-10-15T17:20:50Z' - torchscript_onnx_tflite: - inference_time: 1789.0 - throughput: 558.9714924538848 + inference_time: 1785.0 + throughput: 560.2240896358544 estimated_peak_memory_range: - min: 16384 - max: 78037504 + min: 12288 + max: 79880464 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 79 - job_id: jnp10e0n5 + job_id: jpedm9q05 job_status: Passed torchscript_onnx_qnn: - inference_time: 1870.0 - throughput: 534.75935828877 + inference_time: 2356.0 + throughput: 424.44821731748726 estimated_peak_memory_range: min: 618496 - max: 24347936 + max: 27992256 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: joprkekk5 + job_id: jp2ky82mp job_status: Passed torchscript_onnx: - inference_time: 1975.0 - throughput: 506.32911392405066 + inference_time: 1889.0 + throughput: 529.3806246691371 estimated_peak_memory_range: - min: 0 - max: 80285824 + min: 618496 + max: 82447184 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 128 - job_id: j1p3ke3m5 + job_id: jgjvnxj8g job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:52:06Z' + timestamp: '2024-10-15T17:20:51Z' - torchscript_onnx_tflite: inference_time: 2253.0 throughput: 443.85264092321347 estimated_peak_memory_range: - min: 0 - max: 700208024 + min: 28672 + max: 725029512 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 79 - job_id: jvgdwow65 + job_id: jg9lnvrvg job_status: Passed torchscript_onnx_qnn: - inference_time: 2177.0 - throughput: 459.34772622875516 + inference_time: 2157.0 + throughput: 463.60686138154847 estimated_peak_memory_range: - min: 634880 - max: 1858536 + min: 659456 + max: 1866720 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: j2p0ylq0g + job_id: jp0z0yne5 job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:52:00Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T17:20:44Z' - torchscript_onnx_tflite: - inference_time: 3080.0 - throughput: 324.6753246753247 + inference_time: 2273.0 + throughput: 439.9472063352398 estimated_peak_memory_range: - min: 16384 - max: 67111280 + min: 32768 + max: 2046360 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 79 - job_id: jz57zxznp + job_id: jpxkovw95 job_status: Passed torchscript_onnx_qnn: - inference_time: 3222.0 - throughput: 310.36623215394167 + inference_time: 2185.0 + throughput: 457.66590389016017 estimated_peak_memory_range: - min: 618496 - max: 18811248 + min: 634880 + max: 1925704 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: j1gln3z2p + job_id: j5q6q8jmp job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:52:05Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T17:20:47Z' - torchscript_onnx_tflite: - inference_time: 2276.0 - throughput: 439.3673110720562 + inference_time: 2277.0 + throughput: 439.17435221783046 estimated_peak_memory_range: - min: 0 - max: 2148096 + min: 36864 + max: 2635800 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 79 - job_id: jqp4qvq2g + job_id: jp4lrqzl5 job_status: Passed torchscript_onnx_qnn: - inference_time: 2192.0 - throughput: 456.2043795620438 + inference_time: 2185.0 + throughput: 457.66590389016017 estimated_peak_memory_range: - min: 638976 - max: 2148800 + min: 659456 + max: 2019208 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: j1p8oz9qg + job_id: jgkexzjog job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:52:01Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T17:20:46Z' - torchscript_onnx_tflite: - inference_time: 2265.0 - throughput: 441.5011037527594 + inference_time: 2268.0 + throughput: 440.9171075837742 estimated_peak_memory_range: - min: 16384 - max: 2511264 + min: 20480 + max: 2788896 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 79 - job_id: j0pxvyv8g + job_id: j57yrzqr5 job_status: Passed torchscript_onnx_qnn: - inference_time: 2198.0 - throughput: 454.9590536851683 + inference_time: 2179.0 + throughput: 458.9261128958238 estimated_peak_memory_range: - min: 630784 - max: 2162136 + min: 688128 + max: 1940992 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: jogkz3nvg + job_id: jp8qyol8p job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:52:02Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T17:20:45Z' - torchscript_onnx_tflite: - inference_time: 2271.0 - throughput: 440.33465433729634 + inference_time: 3097.0 + throughput: 322.8931223764934 estimated_peak_memory_range: - min: 28672 - max: 1915544 + min: 16384 + max: 67716480 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 79 - job_id: jo5mr3r7g + job_id: jgdx1wklp job_status: Passed torchscript_onnx_qnn: - inference_time: 2224.0 - throughput: 449.64028776978415 + inference_time: 3127.0 + throughput: 319.79533098816756 estimated_peak_memory_range: - min: 626688 - max: 2224152 + min: 618496 + max: 21181120 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: jn5q83ke5 + job_id: j56y46k7p job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:52:03Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-15T17:20:48Z' + - torchscript_onnx_tflite: + inference_time: 1550.0 + throughput: 645.1612903225806 + estimated_peak_memory_range: + min: 12288 + max: 31548912 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 79 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 79 + job_id: jgn6v2jm5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 1669.0 + throughput: 599.1611743559017 + estimated_peak_memory_range: + min: 0 + max: 23772240 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 126 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 126 + job_id: jp3j0kyzg + job_status: Passed + torchscript_onnx: + inference_time: 1668.0 + throughput: 599.5203836930456 + estimated_peak_memory_range: + min: 0 + max: 31393232 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 128 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 128 + job_id: j57yrzzr5 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T17:20:54Z' - torchscript_onnx_qnn: - inference_time: 2317.0 - throughput: 431.59257660768236 + inference_time: 2312.0 + throughput: 432.52595155709344 estimated_peak_memory_range: min: 602112 max: 602112 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: jqpye6e0g + job_id: jpy13e94p job_status: Passed torchscript_onnx: - inference_time: 2312.0 - throughput: 432.52595155709344 + inference_time: 2328.0 + throughput: 429.553264604811 estimated_peak_memory_range: - min: 52379648 - max: 52379648 + min: 52461568 + max: 52461568 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 128 - job_id: jwgoy3015 + job_id: j5we6ojj5 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:52:07Z' + timestamp: '2024-10-15T17:20:52Z' diff --git a/qai_hub_models/models/resnet50_quantized/README.md b/qai_hub_models/models/resnet50_quantized/README.md index e6e0a463..726fb4b4 100644 --- a/qai_hub_models/models/resnet50_quantized/README.md +++ b/qai_hub_models/models/resnet50_quantized/README.md @@ -6,7 +6,7 @@ ResNet50 is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of ResNet50Quantized found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/resnet50_quantized). @@ -17,11 +17,6 @@ accross various devices, can be found [here](https://aihub.qualcomm.com/models/r ## Example & Usage -Install the package via pip: -```bash -pip install "qai_hub_models[resnet50_quantized]" -``` - Once installed, run the following simple CLI demo: @@ -44,15 +39,19 @@ python -m qai_hub_models.models.resnet50_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of ResNet50Quantized can be found +* The license for the original implementation of ResNet50Quantized can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/resnet50_quantized/evaluate.py b/qai_hub_models/models/resnet50_quantized/evaluate.py index 8fdc840e..42a16a6e 100644 --- a/qai_hub_models/models/resnet50_quantized/evaluate.py +++ b/qai_hub_models/models/resnet50_quantized/evaluate.py @@ -13,10 +13,8 @@ from qai_hub_models.models.resnet50_quantized import MODEL_ID, Model from qai_hub_models.utils.args import evaluate_parser, get_hub_device, get_model_kwargs -from qai_hub_models.utils.base_model import BaseModel from qai_hub_models.utils.evaluate import evaluate_on_dataset from qai_hub_models.utils.inference import compile_model_from_args -from qai_hub_models.utils.quantization_aimet import AIMETQuantizableMixin SUPPORTED_DATASETS = ["imagenette", "imagenet"] @@ -27,6 +25,7 @@ def main(): model_cls=Model, default_split_size=2500, supported_datasets=SUPPORTED_DATASETS, + is_hub_quantized=True, ) args = parser.parse_args() args.device = None @@ -38,13 +37,7 @@ def main(): MODEL_ID, args, get_model_kwargs(Model, vars(args)) ) hub_device = get_hub_device(None, args.chipset) - - # Use Fp16 model for torch inference - for cls in Model.__mro__: - if issubclass(cls, BaseModel) and not issubclass(cls, AIMETQuantizableMixin): - torch_cls = cls - break - torch_model = torch_cls.from_pretrained(**get_model_kwargs(torch_cls, vars(args))) + torch_model = Model.from_pretrained(**get_model_kwargs(Model, vars(args))) evaluate_on_dataset( hub_model, torch_model, diff --git a/qai_hub_models/models/resnet50_quantized/export.py b/qai_hub_models/models/resnet50_quantized/export.py index bc8004f9..dcc61718 100644 --- a/qai_hub_models/models/resnet50_quantized/export.py +++ b/qai_hub_models/models/resnet50_quantized/export.py @@ -10,18 +10,20 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.resnet50_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference +from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( print_inference_metrics, print_on_target_demo_cmd, @@ -31,11 +33,14 @@ can_access_qualcomm_ai_hub, export_without_hub_access, ) +from qai_hub_models.utils.quantization import get_calibration_data def export_model( device: str = "Samsung Galaxy S23 (Family)", chipset: Optional[str] = None, + num_calibration_samples: int = 100, + skip_compiling: bool = False, skip_profiling: bool = False, skip_inferencing: bool = False, skip_downloading: bool = False, @@ -45,20 +50,19 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + 3. Compiles the model to an asset that can be run on device + 4. Profiles the model performance on a real device + 5. Inferences the model on sample inputs + 6. Downloads the model asset to the local directory + 7. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 5 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -66,6 +70,9 @@ def export_model( Defaults to DEFAULT_DEVICE if not specified. chipset: If set, will choose a random device with this chipset. Overrides the `device` argument. + num_calibration_samples: The number of calibration data samples + to use for quantization. + skip_compiling: If set, skips compiling model to format that can run on device. skip_profiling: If set, skips profiling of compiled model on real devices. skip_inferencing: If set, skips computing on-device outputs from sample data. skip_downloading: If set, skips downloading of compiled model. @@ -80,10 +87,11 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: - * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). + A struct of: + * A CompileJob object containing metadata about the compile job submitted to hub (None if compiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). + * A QuantizeJob object containing metadata about the quantize job submitted to hub """ model_name = "resnet50_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,33 +117,52 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) ) # Trace the model - source_model = model.convert_to_hub_source_model( - target_runtime, output_path, input_spec + source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) + + print(f"Quantizing model {model_name} with {num_calibration_samples} samples.") + # 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + onnx_compile_job = hub.submit_compile_job( + model=source_model, + input_specs=input_spec, + device=hub_device, + name=model_name, + options="--target_runtime onnx", + ) + quantize_job = hub.submit_quantize_job( + model=onnx_compile_job.get_target_model(), + calibration_data=get_calibration_data( + input_spec, "imagenette", num_calibration_samples + ), + weights_dtype=model.get_weights_dtype(), + activations_dtype=model.get_activations_dtype(), + name=model_name, + options=model.get_quantize_options(), ) + if skip_compiling: + return ExportResult(quantize_job=quantize_job) - # 2. Compile the model to an on-device asset + # 3. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) print(f"Optimizing model {model_name} to run on-device") submitted_compile_job = hub.submit_compile_job( - model=source_model, + model=quantize_job.get_target_model(), input_specs=input_spec, device=hub_device, name=model_name, - calibration_data=model.get_calibration_data(target_runtime), options=model_compile_options, ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 4. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +177,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 5. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +198,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 6. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 7. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -202,12 +229,17 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + quantize_job=quantize_job, + ) def main(): warnings.filterwarnings("ignore") - parser = export_parser(model_cls=Model) + parser = export_parser(model_cls=Model, is_hub_quantized=True) args = parser.parse_args() export_model(**vars(args)) diff --git a/qai_hub_models/models/resnet50_quantized/model.py b/qai_hub_models/models/resnet50_quantized/model.py index 54f44eb1..35c5399f 100644 --- a/qai_hub_models/models/resnet50_quantized/model.py +++ b/qai_hub_models/models/resnet50_quantized/model.py @@ -4,78 +4,11 @@ # --------------------------------------------------------------------- from __future__ import annotations -# isort: off -# This verifies aimet is installed, and this must be included first. -from qai_hub_models.utils.quantization_aimet import ( - AIMETQuantizableMixin, - constrain_quantized_inputs_to_image_range, -) - -# isort: on - -import torch -from aimet_torch.cross_layer_equalization import equalize_model -from aimet_torch.model_preparer import prepare_model -from aimet_torch.quantsim import QuantizationSimModel, load_encodings_to_sim - from qai_hub_models.models.resnet50.model import ResNet50 -from qai_hub_models.utils.aimet.config_loader import get_default_aimet_config -from qai_hub_models.utils.asset_loaders import CachedWebModelAsset +from qai_hub_models.utils.quantization import HubQuantizableMixin MODEL_ID = __name__.split(".")[-2] -MODEL_ASSET_VERSION = 6 -DEFAULT_ENCODINGS = "resnet50_quantized_encodings.json" - - -class ResNet50Quantizable(AIMETQuantizableMixin, ResNet50): - """ResNet with post train quantization support. - - Supports only 8 bit weights and activations, and only loads pre-quantized checkpoints. - Support for quantizing using your own weights & data will come at a later date.""" - - def __init__( - self, - resnet50_model: QuantizationSimModel, - ) -> None: - # Input is already normalized by sim_model. Disable it in the wrapper model. - ResNet50.__init__(self, resnet50_model.model, normalize_input=False) - AIMETQuantizableMixin.__init__( - self, - resnet50_model, - ) - - @classmethod - def from_pretrained( - cls, - aimet_encodings: str | None = "DEFAULT", - ) -> "ResNet50Quantizable": - """ - Parameters: - aimet_encodings: - if "DEFAULT": Loads the model with aimet encodings calibrated on imagenette. - elif None: Doesn't load any encodings. Used when computing encodings. - else: Interprets as a filepath and loads the encodings stored there. - """ - model = ResNet50.from_pretrained() - input_shape = cls.get_input_spec()["image_tensor"][0] - - model = prepare_model(model) - equalize_model(model, input_shape) - sim = QuantizationSimModel( - model, - quant_scheme="tf_enhanced", - default_param_bw=8, - default_output_bw=8, - config_file=get_default_aimet_config(), - dummy_input=torch.rand(input_shape), - ) - constrain_quantized_inputs_to_image_range(sim) - if aimet_encodings: - if aimet_encodings == "DEFAULT": - aimet_encodings = CachedWebModelAsset.from_asset_store( - MODEL_ID, MODEL_ASSET_VERSION, DEFAULT_ENCODINGS - ).fetch() - load_encodings_to_sim(sim, aimet_encodings) - return cls(sim) +class ResNet50Quantizable(HubQuantizableMixin, ResNet50): + pass diff --git a/qai_hub_models/models/resnet50_quantized/perf.yaml b/qai_hub_models/models/resnet50_quantized/perf.yaml index 89800b77..4ccd953f 100644 --- a/qai_hub_models/models/resnet50_quantized/perf.yaml +++ b/qai_hub_models/models/resnet50_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,41 +20,36 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8775P Proxy models: - name: ResNet50Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 783.0 - throughput: 1277.139208173691 + inference_time: 788.0 + throughput: 1269.0355329949239 estimated_peak_memory_range: - min: 24576 - max: 2120120 + min: 12288 + max: 11412928 primary_compute_unit: NPU precision: int8 layer_info: @@ -61,29 +57,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 82 - job_id: jz57zxq9p + job_id: jgkevy0ng job_status: Passed torchscript_onnx_qnn: - inference_time: 1003.0 - throughput: 997.0089730807578 + inference_time: 1001.0 + throughput: 999.000999000999 estimated_peak_memory_range: - min: 12288 - max: 255020432 + min: 16384 + max: 33294216 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: j1p8ozlog + total_layers: 127 + job_id: jg9l04q8g job_status: Passed torchscript_onnx: - inference_time: 1591.0 - throughput: 628.5355122564425 + inference_time: 1526.0 + throughput: 655.307994757536 estimated_peak_memory_range: min: 16384 - max: 30887536 + max: 31718720 primary_compute_unit: NPU precision: int8 layer_info: @@ -91,7 +87,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: jlpe9k9vg + job_id: jgn60ewj5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -100,13 +96,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:51:26Z' + timestamp: '2024-10-17T17:21:42Z' - torchscript_onnx_tflite: - inference_time: 643.0 - throughput: 1555.2099533437015 + inference_time: 584.0 + throughput: 1712.3287671232877 estimated_peak_memory_range: - min: 16384 - max: 64793440 + min: 12288 + max: 66134624 primary_compute_unit: NPU precision: int8 layer_info: @@ -114,29 +110,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 82 - job_id: jqp4qvz1g + job_id: j5q6021op job_status: Passed torchscript_onnx_qnn: - inference_time: 754.0 - throughput: 1326.2599469496022 + inference_time: 749.0 + throughput: 1335.1134846461948 estimated_peak_memory_range: min: 167936 - max: 19516848 + max: 15555376 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: jogkz3jng + total_layers: 127 + job_id: jp1428m7p job_status: Passed torchscript_onnx: - inference_time: 1179.0 - throughput: 848.1764206955047 + inference_time: 1128.0 + throughput: 886.5248226950355 estimated_peak_memory_range: - min: 0 - max: 95508896 + min: 151552 + max: 97130560 primary_compute_unit: NPU precision: int8 layer_info: @@ -144,7 +140,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: jygzerexg + job_id: jprv6y7kg job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -153,13 +149,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:51:27Z' + timestamp: '2024-10-17T17:21:44Z' - torchscript_onnx_tflite: - inference_time: 781.0 - throughput: 1280.4097311139565 + inference_time: 2827.0 + throughput: 353.73187124159887 estimated_peak_memory_range: - min: 40960 - max: 1465440 + min: 0 + max: 27793920 primary_compute_unit: NPU precision: int8 layer_info: @@ -167,37 +163,60 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 82 - job_id: j0pxvywlg + job_id: jglv4kqm5 job_status: Passed torchscript_onnx_qnn: - inference_time: 943.0 - throughput: 1060.4453870625662 + inference_time: 4072.0 + throughput: 245.5795677799607 estimated_peak_memory_range: - min: 200704 - max: 1743616 + min: 208896 + max: 8039184 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: j1gln3nmp + total_layers: 127 + job_id: jgdxnvmzp job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:51:20Z' + chipset: QCS6490 Proxy + timestamp: '2024-10-17T17:21:28Z' - torchscript_onnx_tflite: - inference_time: 912.0 - throughput: 1096.4912280701753 + inference_time: 11444.0 + throughput: 87.38203425375742 estimated_peak_memory_range: - min: 16384 - max: 65629312 + min: 32768 + max: 7067008 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 82 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 82 + job_id: j56y210yp + job_status: Passed + reference_device_info: + name: RB5 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8250 Proxy + timestamp: '2024-10-17T17:21:13Z' + - torchscript_onnx_tflite: + inference_time: 785.0 + throughput: 1273.8853503184714 + estimated_peak_memory_range: + min: 12288 + max: 3533448 primary_compute_unit: NPU precision: int8 layer_info: @@ -205,37 +224,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 82 - job_id: jo5mr3j9g + job_id: jp3jnmrng job_status: Passed torchscript_onnx_qnn: - inference_time: 1148.0 - throughput: 871.0801393728223 + inference_time: 943.0 + throughput: 1060.4453870625662 estimated_peak_memory_range: - min: 167936 - max: 19544448 + min: 184320 + max: 1386016 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: j1pv3v3r5 + total_layers: 127 + job_id: j5wew9r45 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:51:24Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-17T17:21:30Z' - torchscript_onnx_tflite: - inference_time: 779.0 - throughput: 1283.6970474967907 + inference_time: 782.0 + throughput: 1278.772378516624 estimated_peak_memory_range: - min: 16384 - max: 32231048 + min: 12288 + max: 15579336 primary_compute_unit: NPU precision: int8 layer_info: @@ -243,37 +262,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 82 - job_id: jegn23jqg + job_id: jgo2zv9kp job_status: Passed torchscript_onnx_qnn: - inference_time: 948.0 - throughput: 1054.8523206751054 + inference_time: 947.0 + throughput: 1055.9662090813094 estimated_peak_memory_range: - min: 208896 - max: 1931696 + min: 172032 + max: 1898736 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: jw566n6y5 + total_layers: 127 + job_id: jp1428mnp job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:51:21Z' + chipset: SA8255P Proxy + timestamp: '2024-10-17T17:21:33Z' - torchscript_onnx_tflite: - inference_time: 810.0 - throughput: 1234.567901234568 + inference_time: 787.0 + throughput: 1270.6480304955528 estimated_peak_memory_range: - min: 12288 - max: 3808264 + min: 16384 + max: 16498400 primary_compute_unit: NPU precision: int8 layer_info: @@ -281,22 +300,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 82 - job_id: joprkez75 + job_id: jpv6qwnr5 job_status: Passed torchscript_onnx_qnn: - inference_time: 943.0 - throughput: 1060.4453870625662 + inference_time: 948.0 + throughput: 1054.8523206751054 estimated_peak_memory_range: - min: 16384 - max: 1642128 + min: 172032 + max: 1935696 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: j1p3kekn5 + total_layers: 127 + job_id: jgdxnvm6p job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -304,14 +323,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:51:22Z' + chipset: SA8775P Proxy + timestamp: '2024-10-17T17:21:34Z' - torchscript_onnx_tflite: - inference_time: 776.0 - throughput: 1288.659793814433 + inference_time: 909.0 + throughput: 1100.1100110011 estimated_peak_memory_range: - min: 12288 - max: 13155864 + min: 16384 + max: 66889440 primary_compute_unit: NPU precision: int8 layer_info: @@ -319,37 +338,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 82 - job_id: jep28l2qp + job_id: jgjvdl8eg job_status: Passed torchscript_onnx_qnn: - inference_time: 959.0 - throughput: 1042.752867570386 + inference_time: 1135.0 + throughput: 881.0572687224669 estimated_peak_memory_range: - min: 184320 - max: 1657520 + min: 167936 + max: 18672800 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: jwgoy3yk5 + total_layers: 127 + job_id: j57y2d8n5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:51:23Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-17T17:21:36Z' - torchscript_onnx_tflite: - inference_time: 2776.0 - throughput: 360.2305475504323 + inference_time: 518.0 + throughput: 1930.5019305019305 estimated_peak_memory_range: - min: 184320 - max: 28048624 + min: 12288 + max: 24413712 primary_compute_unit: NPU precision: int8 layer_info: @@ -357,75 +376,67 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 82 - job_id: jqpye69lg + job_id: jpedovnv5 job_status: Passed torchscript_onnx_qnn: - inference_time: 4015.0 - throughput: 249.06600249066003 + inference_time: 693.0 + throughput: 1443.001443001443 estimated_peak_memory_range: - min: 204800 - max: 8527328 + min: 0 + max: 17452784 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: j7gjxexep + total_layers: 127 + job_id: jp4lnw225 job_status: Passed - reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot - os_name: Android - manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T11:51:25Z' - - torchscript_onnx_tflite: - inference_time: 11484.0 - throughput: 87.07767328456984 + torchscript_onnx: + inference_time: 918.0 + throughput: 1089.3246187363834 estimated_peak_memory_range: - min: 53248 - max: 6890120 + min: 0 + max: 40616048 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 82 + layers_on_npu: 147 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 82 - job_id: j2p0ylnng + total_layers: 147 + job_id: jpy1zdy0p job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T11:51:16Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-17T17:21:46Z' - torchscript_onnx_qnn: - inference_time: 1008.0 - throughput: 992.063492063492 + inference_time: 1009.0 + throughput: 991.0802775024777 estimated_peak_memory_range: - min: 434176 - max: 434176 + min: 397312 + max: 397312 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: jn5q83jo5 + total_layers: 127 + job_id: jg9l04qmg job_status: Passed torchscript_onnx: - inference_time: 1602.0 - throughput: 624.2197253433209 + inference_time: 1569.0 + throughput: 637.3486297004462 estimated_peak_memory_range: - min: 29212672 - max: 29212672 + min: 29220864 + max: 29220864 primary_compute_unit: NPU precision: int8 layer_info: @@ -433,7 +444,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: jz5woqomp + job_id: jp2kxmz6p job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -442,4 +453,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:51:28Z' + timestamp: '2024-10-17T17:21:45Z' diff --git a/qai_hub_models/models/resnet50_quantized/requirements.txt b/qai_hub_models/models/resnet50_quantized/requirements.txt deleted file mode 100644 index de5b80e8..00000000 --- a/qai_hub_models/models/resnet50_quantized/requirements.txt +++ /dev/null @@ -1 +0,0 @@ -aimet-torch==1.32.1.post1; sys_platform == "linux" diff --git a/qai_hub_models/models/resnet50_quantized/test.py b/qai_hub_models/models/resnet50_quantized/test.py deleted file mode 100644 index 55efb858..00000000 --- a/qai_hub_models/models/resnet50_quantized/test.py +++ /dev/null @@ -1,30 +0,0 @@ -# --------------------------------------------------------------------- -# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. -# SPDX-License-Identifier: BSD-3-Clause -# --------------------------------------------------------------------- -from qai_hub_models.models._shared.imagenet_classifier.test_utils import ( - run_imagenet_classifier_test, -) -from qai_hub_models.models.resnet50_quantized.demo import main as demo_main -from qai_hub_models.models.resnet50_quantized.model import ( - MODEL_ASSET_VERSION, - MODEL_ID, - ResNet50Quantizable, -) - - -def test_task(): - run_imagenet_classifier_test( - ResNet50Quantizable.from_pretrained(), - MODEL_ID, - probability_threshold=0.45, - diff_tol=0.005, - rtol=0.02, - atol=0.2, - asset_version=MODEL_ASSET_VERSION, - ) - - -def test_demo(): - # Verify demo does not crash - demo_main(is_test=True) diff --git a/qai_hub_models/models/resnext101/README.md b/qai_hub_models/models/resnext101/README.md index 1776cdf7..a23499fd 100644 --- a/qai_hub_models/models/resnext101/README.md +++ b/qai_hub_models/models/resnext101/README.md @@ -6,7 +6,7 @@ ResNeXt101 is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of ResNeXt101 found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/resnext101). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.resnext101.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of ResNeXt101 can be found +* The license for the original implementation of ResNeXt101 can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/abs/1611.05431) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/resnext101/export.py b/qai_hub_models/models/resnext101/export.py index 4dfc4ad8..b9c1bdce 100644 --- a/qai_hub_models/models/resnext101/export.py +++ b/qai_hub_models/models/resnext101/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.resnext101 import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "resnext101" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/resnext101/perf.yaml b/qai_hub_models/models/resnext101/perf.yaml index 24d2ec21..4e370e28 100644 --- a/qai_hub_models/models/resnext101/perf.yaml +++ b/qai_hub_models/models/resnext101/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: ResNeXt101 performance_metrics: - torchscript_onnx_tflite: - inference_time: 6525.0 - throughput: 153.25670498084293 + inference_time: 6555.0 + throughput: 152.55530129672007 estimated_peak_memory_range: - min: 53248 - max: 2463624 + min: 20480 + max: 2412432 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: j0pxvyqlg + job_id: jgjvnxy7g job_status: Passed torchscript_onnx_qnn: - inference_time: 6685.0 - throughput: 149.58863126402395 + inference_time: 6671.0 + throughput: 149.902563333833 estimated_peak_memory_range: min: 16384 - max: 36743712 + max: 34405528 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: j1p8oznog + job_id: jpxkovrj5 job_status: Passed torchscript_onnx: - inference_time: 7135.0 - throughput: 140.1541695865452 + inference_time: 7073.0 + throughput: 141.38272303124558 estimated_peak_memory_range: - min: 630784 - max: 2624480 + min: 16384 + max: 203636728 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 247 - job_id: j7gjxejep + job_id: jglvmnqe5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:50:02Z' + timestamp: '2024-10-15T17:18:36Z' - torchscript_onnx_tflite: - inference_time: 5381.0 - throughput: 185.8390633711206 + inference_time: 5363.0 + throughput: 186.46280067126608 estimated_peak_memory_range: - min: 24576 - max: 375771296 + min: 20480 + max: 386795184 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: jo5mr379g + job_id: jpedm9x75 job_status: Passed torchscript_onnx_qnn: - inference_time: 5477.0 - throughput: 182.58170531312763 + inference_time: 5439.0 + throughput: 183.85732671446956 estimated_peak_memory_range: min: 618496 - max: 80674720 + max: 96030544 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: jogkz31ng + job_id: j5mnnd4qp job_status: Passed torchscript_onnx: - inference_time: 5913.0 - throughput: 169.1188905800778 + inference_time: 5887.0 + throughput: 169.86580601324954 estimated_peak_memory_range: - min: 675840 - max: 379219632 + min: 0 + max: 393211568 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 247 - job_id: jlpe9kjvg + job_id: j56y460vp job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:50:03Z' + timestamp: '2024-10-16T08:32:55Z' - torchscript_onnx_tflite: - inference_time: 6501.0 - throughput: 153.82248884786955 + inference_time: 6521.0 + throughput: 153.35071308081584 estimated_peak_memory_range: - min: 28672 - max: 2035600 + min: 24576 + max: 2086928 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: jegn234qg + job_id: jgz3deyz5 job_status: Passed torchscript_onnx_qnn: - inference_time: 6799.0 - throughput: 147.08045300779526 + inference_time: 6844.0 + throughput: 146.11338398597312 estimated_peak_memory_range: - min: 659456 - max: 1862880 + min: 647168 + max: 1972464 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: j1gln3jmp + job_id: jprv3k7vg job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:49:56Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T17:18:30Z' - torchscript_onnx_tflite: - inference_time: 9202.0 - throughput: 108.67202782003912 + inference_time: 6558.0 + throughput: 152.48551387618176 estimated_peak_memory_range: - min: 40960 - max: 165382464 + min: 81920 + max: 2447424 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: joprker75 + job_id: jgdx1w4kp job_status: Passed torchscript_onnx_qnn: - inference_time: 9267.0 - throughput: 107.90978741771879 + inference_time: 6784.0 + throughput: 147.4056603773585 estimated_peak_memory_range: - min: 0 - max: 49579808 + min: 647168 + max: 2373096 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: j1pv3vjr5 + job_id: jp0z0yx25 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:50:01Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T17:18:33Z' - torchscript_onnx_tflite: - inference_time: 6527.0 - throughput: 153.2097441397273 + inference_time: 6484.0 + throughput: 154.22578655151142 estimated_peak_memory_range: - min: 57344 - max: 1659520 + min: 32768 + max: 2401624 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: jep28l1qp + job_id: jp14z01kp job_status: Passed torchscript_onnx_qnn: - inference_time: 6864.0 - throughput: 145.6876456876457 + inference_time: 6802.0 + throughput: 147.0155836518671 estimated_peak_memory_range: - min: 643072 - max: 1899104 + min: 675840 + max: 1885160 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: jw566nky5 + job_id: jpy13eyrp job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:49:58Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T17:18:32Z' - torchscript_onnx_tflite: - inference_time: 6538.0 - throughput: 152.95197308045275 + inference_time: 6518.0 + throughput: 153.42129487572876 estimated_peak_memory_range: min: 32768 - max: 2195640 + max: 2345440 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: jqpye6llg + job_id: jg9lnv2qg job_status: Passed torchscript_onnx_qnn: - inference_time: 6846.0 - throughput: 146.0706982179375 + inference_time: 6778.0 + throughput: 147.5361463558572 estimated_peak_memory_range: - min: 643072 - max: 2281544 + min: 634880 + max: 2482112 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: j1p3keyn5 + job_id: jp2ky8zxp job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:49:59Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T17:18:31Z' - torchscript_onnx_tflite: - inference_time: 6485.0 - throughput: 154.20200462606013 + inference_time: 9211.0 + throughput: 108.56584518510476 estimated_peak_memory_range: - min: 32768 - max: 2219208 + min: 20480 + max: 172903680 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: j2p0ylwng + job_id: j5we6ozz5 job_status: Passed torchscript_onnx_qnn: - inference_time: 6856.0 - throughput: 145.85764294049008 + inference_time: 9353.0 + throughput: 106.91756655618518 estimated_peak_memory_range: - min: 638976 - max: 1877880 + min: 0 + max: 55099440 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: jwgoy3jk5 + job_id: jgkexzkyg job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T17:18:34Z' + - torchscript_onnx_tflite: + inference_time: 4612.0 + throughput: 216.8256721595837 + estimated_peak_memory_range: + min: 12288 + max: 165095984 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 147 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 147 + job_id: jp4lrq4q5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 4754.0 + throughput: 210.34917963819942 + estimated_peak_memory_range: + min: 0 + max: 100248816 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 245 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 245 + job_id: j5q6q8d7p + job_status: Passed + torchscript_onnx: + inference_time: 5023.0 + throughput: 199.0842126219391 + estimated_peak_memory_range: + min: 626688 + max: 169371552 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 247 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 247 + job_id: jpv6k3n75 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:50:00Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T17:18:40Z' - torchscript_onnx_qnn: - inference_time: 6911.0 - throughput: 144.6968600781363 + inference_time: 6899.0 + throughput: 144.94854326714017 estimated_peak_memory_range: min: 602112 max: 602112 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 245 - job_id: jn5q83no5 + job_id: jgn6v2wv5 job_status: Passed torchscript_onnx: - inference_time: 6790.0 - throughput: 147.27540500736376 + inference_time: 6813.0 + throughput: 146.7782181124321 estimated_peak_memory_range: - min: 181276672 - max: 181276672 + min: 181223424 + max: 181223424 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 247 - job_id: jygzer1xg + job_id: jp3j0krxg job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:50:03Z' + timestamp: '2024-10-15T17:18:38Z' diff --git a/qai_hub_models/models/resnext101_quantized/README.md b/qai_hub_models/models/resnext101_quantized/README.md index 3ddf62df..6ed9ae1b 100644 --- a/qai_hub_models/models/resnext101_quantized/README.md +++ b/qai_hub_models/models/resnext101_quantized/README.md @@ -6,7 +6,7 @@ ResNeXt101 is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of ResNeXt101Quantized found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/resnext101_quantized). @@ -17,11 +17,6 @@ accross various devices, can be found [here](https://aihub.qualcomm.com/models/r ## Example & Usage -Install the package via pip: -```bash -pip install "qai_hub_models[resnext101_quantized]" -``` - Once installed, run the following simple CLI demo: @@ -44,15 +39,19 @@ python -m qai_hub_models.models.resnext101_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of ResNeXt101Quantized can be found +* The license for the original implementation of ResNeXt101Quantized can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/abs/1611.05431) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/resnext101_quantized/evaluate.py b/qai_hub_models/models/resnext101_quantized/evaluate.py index 9652d8f6..26fe838f 100644 --- a/qai_hub_models/models/resnext101_quantized/evaluate.py +++ b/qai_hub_models/models/resnext101_quantized/evaluate.py @@ -13,10 +13,8 @@ from qai_hub_models.models.resnext101_quantized import MODEL_ID, Model from qai_hub_models.utils.args import evaluate_parser, get_hub_device, get_model_kwargs -from qai_hub_models.utils.base_model import BaseModel from qai_hub_models.utils.evaluate import evaluate_on_dataset from qai_hub_models.utils.inference import compile_model_from_args -from qai_hub_models.utils.quantization_aimet import AIMETQuantizableMixin SUPPORTED_DATASETS = ["imagenette", "imagenet"] @@ -27,6 +25,7 @@ def main(): model_cls=Model, default_split_size=2500, supported_datasets=SUPPORTED_DATASETS, + is_hub_quantized=True, ) args = parser.parse_args() args.device = None @@ -38,13 +37,7 @@ def main(): MODEL_ID, args, get_model_kwargs(Model, vars(args)) ) hub_device = get_hub_device(None, args.chipset) - - # Use Fp16 model for torch inference - for cls in Model.__mro__: - if issubclass(cls, BaseModel) and not issubclass(cls, AIMETQuantizableMixin): - torch_cls = cls - break - torch_model = torch_cls.from_pretrained(**get_model_kwargs(torch_cls, vars(args))) + torch_model = Model.from_pretrained(**get_model_kwargs(Model, vars(args))) evaluate_on_dataset( hub_model, torch_model, diff --git a/qai_hub_models/models/resnext101_quantized/export.py b/qai_hub_models/models/resnext101_quantized/export.py index da30ea7b..2b35d4a0 100644 --- a/qai_hub_models/models/resnext101_quantized/export.py +++ b/qai_hub_models/models/resnext101_quantized/export.py @@ -10,18 +10,20 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.resnext101_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference +from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( print_inference_metrics, print_on_target_demo_cmd, @@ -31,11 +33,14 @@ can_access_qualcomm_ai_hub, export_without_hub_access, ) +from qai_hub_models.utils.quantization import get_calibration_data def export_model( device: str = "Samsung Galaxy S23 (Family)", chipset: Optional[str] = None, + num_calibration_samples: int = 100, + skip_compiling: bool = False, skip_profiling: bool = False, skip_inferencing: bool = False, skip_downloading: bool = False, @@ -45,20 +50,19 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + 3. Compiles the model to an asset that can be run on device + 4. Profiles the model performance on a real device + 5. Inferences the model on sample inputs + 6. Downloads the model asset to the local directory + 7. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 5 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -66,6 +70,9 @@ def export_model( Defaults to DEFAULT_DEVICE if not specified. chipset: If set, will choose a random device with this chipset. Overrides the `device` argument. + num_calibration_samples: The number of calibration data samples + to use for quantization. + skip_compiling: If set, skips compiling model to format that can run on device. skip_profiling: If set, skips profiling of compiled model on real devices. skip_inferencing: If set, skips computing on-device outputs from sample data. skip_downloading: If set, skips downloading of compiled model. @@ -80,10 +87,11 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: - * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). + A struct of: + * A CompileJob object containing metadata about the compile job submitted to hub (None if compiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). + * A QuantizeJob object containing metadata about the quantize job submitted to hub """ model_name = "resnext101_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,33 +117,52 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) ) # Trace the model - source_model = model.convert_to_hub_source_model( - target_runtime, output_path, input_spec + source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) + + print(f"Quantizing model {model_name} with {num_calibration_samples} samples.") + # 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + onnx_compile_job = hub.submit_compile_job( + model=source_model, + input_specs=input_spec, + device=hub_device, + name=model_name, + options="--target_runtime onnx", + ) + quantize_job = hub.submit_quantize_job( + model=onnx_compile_job.get_target_model(), + calibration_data=get_calibration_data( + input_spec, "imagenette", num_calibration_samples + ), + weights_dtype=model.get_weights_dtype(), + activations_dtype=model.get_activations_dtype(), + name=model_name, + options=model.get_quantize_options(), ) + if skip_compiling: + return ExportResult(quantize_job=quantize_job) - # 2. Compile the model to an on-device asset + # 3. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) print(f"Optimizing model {model_name} to run on-device") submitted_compile_job = hub.submit_compile_job( - model=source_model, + model=quantize_job.get_target_model(), input_specs=input_spec, device=hub_device, name=model_name, - calibration_data=model.get_calibration_data(target_runtime), options=model_compile_options, ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 4. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +177,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 5. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +198,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 6. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 7. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -202,12 +229,17 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + quantize_job=quantize_job, + ) def main(): warnings.filterwarnings("ignore") - parser = export_parser(model_cls=Model) + parser = export_parser(model_cls=Model, is_hub_quantized=True) args = parser.parse_args() export_model(**vars(args)) diff --git a/qai_hub_models/models/resnext101_quantized/model.py b/qai_hub_models/models/resnext101_quantized/model.py index 82c48e43..be95544c 100644 --- a/qai_hub_models/models/resnext101_quantized/model.py +++ b/qai_hub_models/models/resnext101_quantized/model.py @@ -4,78 +4,11 @@ # --------------------------------------------------------------------- from __future__ import annotations -# isort: off -# This verifies aimet is installed, and this must be included first. -from qai_hub_models.utils.quantization_aimet import ( - AIMETQuantizableMixin, - constrain_quantized_inputs_to_image_range, -) - -# isort: on - -import torch -from aimet_torch.cross_layer_equalization import equalize_model -from aimet_torch.model_preparer import prepare_model -from aimet_torch.quantsim import QuantizationSimModel, load_encodings_to_sim - from qai_hub_models.models.resnext101.model import ResNeXt101 -from qai_hub_models.utils.aimet.config_loader import get_default_aimet_config -from qai_hub_models.utils.asset_loaders import CachedWebModelAsset +from qai_hub_models.utils.quantization import HubQuantizableMixin MODEL_ID = __name__.split(".")[-2] -MODEL_ASSET_VERSION = 5 -DEFAULT_ENCODINGS = "resnext101_quantized_encodings.json" - - -class ResNeXt101Quantizable(AIMETQuantizableMixin, ResNeXt101): - """ResNeXt101 with post train quantization support. - - Supports only 8 bit weights and activations, and only loads pre-quantized checkpoints. - Support for quantizing using your own weights & data will come at a later date.""" - - def __init__( - self, - sim_model: QuantizationSimModel, - ) -> None: - # Input is already normalized by sim_model. Disable it in the wrapper model. - ResNeXt101.__init__(self, sim_model.model, normalize_input=False) - AIMETQuantizableMixin.__init__( - self, - sim_model, - ) - - @classmethod - def from_pretrained( - cls, - aimet_encodings: str | None = "DEFAULT", - ) -> "ResNeXt101Quantizable": - """ - Parameters: - aimet_encodings: - if "DEFAULT": Loads the model with aimet encodings calibrated on imagenette. - elif None: Doesn't load any encodings. Used when computing encodings. - else: Interprets as a filepath and loads the encodings stored there. - """ - model = ResNeXt101.from_pretrained() - input_shape = cls.get_input_spec()["image_tensor"][0] - - model = prepare_model(model) - equalize_model(model, input_shape) - sim = QuantizationSimModel( - model, - quant_scheme="tf_enhanced", - default_param_bw=8, - default_output_bw=8, - config_file=get_default_aimet_config(), - dummy_input=torch.rand(input_shape), - ) - constrain_quantized_inputs_to_image_range(sim) - if aimet_encodings: - if aimet_encodings == "DEFAULT": - aimet_encodings = CachedWebModelAsset.from_asset_store( - MODEL_ID, MODEL_ASSET_VERSION, DEFAULT_ENCODINGS - ).fetch() - load_encodings_to_sim(sim, aimet_encodings) - return cls(sim) +class ResNeXt101Quantizable(HubQuantizableMixin, ResNeXt101): + pass diff --git a/qai_hub_models/models/resnext101_quantized/perf.yaml b/qai_hub_models/models/resnext101_quantized/perf.yaml index 45176f27..77645754 100644 --- a/qai_hub_models/models/resnext101_quantized/perf.yaml +++ b/qai_hub_models/models/resnext101_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,44 +20,39 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS6490 (Proxy) - RB3 Gen 2 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Sa8775p Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Qcs6490 Proxy + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8775P Proxy models: - name: ResNeXt101Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 2842.0 - throughput: 351.8648838845883 + inference_time: 2869.0 + throughput: 348.5535029627048 estimated_peak_memory_range: - min: 28672 - max: 2602440 + min: 12288 + max: 1741792 primary_compute_unit: NPU precision: int8 layer_info: @@ -64,29 +60,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 150 - job_id: jqp4qv61g + job_id: jg9l04o8g job_status: Passed torchscript_onnx_qnn: - inference_time: 3096.0 - throughput: 322.99741602067184 + inference_time: 3112.0 + throughput: 321.3367609254499 estimated_peak_memory_range: - min: 16384 - max: 35690376 + min: 0 + max: 31007424 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 246 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jogkz3wng + total_layers: 246 + job_id: jp0z4ron5 job_status: Passed torchscript_onnx: - inference_time: 4357.0 - throughput: 229.51572182694514 + inference_time: 4022.0 + throughput: 248.6325211337643 estimated_peak_memory_range: min: 12288 - max: 102884408 + max: 102958696 primary_compute_unit: NPU precision: int8 layer_info: @@ -94,7 +90,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 283 - job_id: jygzer6xg + job_id: jgz327yx5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -103,13 +99,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:49:15Z' + timestamp: '2024-10-17T17:20:25Z' - torchscript_onnx_tflite: - inference_time: 2259.0 - throughput: 442.67374944665784 + inference_time: 2051.0 + throughput: 487.56704046806436 estimated_peak_memory_range: - min: 32768 - max: 277751568 + min: 12288 + max: 287469904 primary_compute_unit: NPU precision: int8 layer_info: @@ -117,29 +113,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 150 - job_id: j0pxvy8lg + job_id: jp1428o7p job_status: Passed torchscript_onnx_qnn: - inference_time: 2550.0 - throughput: 392.15686274509807 + inference_time: 2536.0 + throughput: 394.3217665615142 estimated_peak_memory_range: min: 12288 - max: 87827760 + max: 96556512 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 246 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jn5q83xo5 + total_layers: 246 + job_id: jp8q27jop job_status: Passed torchscript_onnx: - inference_time: 3094.0 - throughput: 323.2062055591467 + inference_time: 2786.0 + throughput: 358.9375448671931 estimated_peak_memory_range: min: 0 - max: 336788912 + max: 352709936 primary_compute_unit: NPU precision: int8 layer_info: @@ -147,7 +143,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 283 - job_id: jz5woqkmp + job_id: j5wew9zm5 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -156,13 +152,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:49:16Z' + timestamp: '2024-10-17T17:20:27Z' - torchscript_onnx_tflite: - inference_time: 2790.0 - throughput: 358.42293906810033 + inference_time: 9831.0 + throughput: 101.71905197843556 estimated_peak_memory_range: - min: 24576 - max: 1597984 + min: 73728 + max: 209272896 primary_compute_unit: NPU precision: int8 layer_info: @@ -170,37 +166,60 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 150 - job_id: jo5mr319g + job_id: jgdxnv6zp job_status: Passed torchscript_onnx_qnn: - inference_time: 2964.0 - throughput: 337.38191632928476 + inference_time: 14602.0 + throughput: 68.48376934666484 estimated_peak_memory_range: - min: 180224 - max: 1462256 + min: 200704 + max: 8236288 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 246 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jw566nxy5 + total_layers: 246 + job_id: jgkevy6ng job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:49:09Z' + chipset: QCS6490 Proxy + timestamp: '2024-10-17T17:20:11Z' - torchscript_onnx_tflite: - inference_time: 3423.0 - throughput: 292.141396435875 + inference_time: 134358.0 + throughput: 7.442802066121853 estimated_peak_memory_range: - min: 16384 - max: 282532096 + min: 28672 + max: 546404624 + primary_compute_unit: GPU + precision: int8 + layer_info: + layers_on_npu: 14 + layers_on_gpu: 125 + layers_on_cpu: 11 + total_layers: 150 + job_id: j57y2do95 + job_status: Passed + reference_device_info: + name: RB5 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8250 Proxy + timestamp: '2024-10-17T17:19:56Z' + - torchscript_onnx_tflite: + inference_time: 2886.0 + throughput: 346.5003465003465 + estimated_peak_memory_range: + min: 24576 + max: 2271104 primary_compute_unit: NPU precision: int8 layer_info: @@ -208,37 +227,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 150 - job_id: jegn23dqg + job_id: jp4lnwe15 job_status: Passed torchscript_onnx_qnn: - inference_time: 3499.0 - throughput: 285.7959416976279 + inference_time: 2933.0 + throughput: 340.94783498124787 estimated_peak_memory_range: - min: 12288 - max: 89765216 + min: 176128 + max: 1327792 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 246 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: j7gjxe9ep + total_layers: 246 + job_id: j5q6024op job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:49:14Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-17T17:20:13Z' - torchscript_onnx_tflite: - inference_time: 2826.0 - throughput: 353.8570417551309 + inference_time: 2845.0 + throughput: 351.493848857645 estimated_peak_memory_range: - min: 16384 - max: 2008088 + min: 20480 + max: 2130952 primary_compute_unit: NPU precision: int8 layer_info: @@ -246,37 +265,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 150 - job_id: joprkem75 + job_id: jpxk910l5 job_status: Passed torchscript_onnx_qnn: - inference_time: 2937.0 - throughput: 340.4834865509023 + inference_time: 3010.0 + throughput: 332.22591362126246 estimated_peak_memory_range: - min: 176128 - max: 1458216 + min: 180224 + max: 1426344 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 246 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: j1p3kedn5 + total_layers: 246 + job_id: j56y21myp job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:49:10Z' + chipset: SA8255P Proxy + timestamp: '2024-10-17T17:20:16Z' - torchscript_onnx_tflite: - inference_time: 2844.0 - throughput: 351.6174402250352 + inference_time: 2788.0 + throughput: 358.6800573888092 estimated_peak_memory_range: - min: 20480 - max: 2332144 + min: 32768 + max: 2689768 primary_compute_unit: NPU precision: int8 layer_info: @@ -284,22 +303,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 150 - job_id: jep28lqqp + job_id: j5mnez99p job_status: Passed torchscript_onnx_qnn: inference_time: 2931.0 throughput: 341.180484476288 estimated_peak_memory_range: - min: 180224 - max: 1498328 + min: 184320 + max: 1383432 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 246 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jwgoy3xk5 + total_layers: 246 + job_id: jp3jnm7ng job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -307,52 +326,37 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:49:11Z' - - torchscript_onnx_tflite: - inference_time: 2791.0 - throughput: 358.29451809387314 - estimated_peak_memory_range: - min: 28672 - max: 2554160 - primary_compute_unit: NPU - precision: int8 - layer_info: - layers_on_npu: 150 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 150 - job_id: jqpye6klg - job_status: Passed - torchscript_onnx_qnn: - inference_time: 2938.0 - throughput: 340.3675970047652 + chipset: SA8775P Proxy + timestamp: '2024-10-17T17:20:17Z' + - torchscript_onnx_qnn: + inference_time: 3456.0 + throughput: 289.35185185185185 estimated_peak_memory_range: - min: 200704 - max: 1391336 + min: 12288 + max: 102493584 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 246 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: j1pv3v8r5 + total_layers: 246 + job_id: jgo2zvwkp job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:49:12Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-17T17:20:19Z' - torchscript_onnx_tflite: - inference_time: 9894.0 - throughput: 101.07135637760258 + inference_time: 2085.0 + throughput: 479.6163069544364 estimated_peak_memory_range: - min: 12288 - max: 208908880 + min: 8192 + max: 199280432 primary_compute_unit: NPU precision: int8 layer_info: @@ -360,75 +364,67 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 150 - job_id: j2p0yl8ng + job_id: jprv6yx7g job_status: Passed torchscript_onnx_qnn: - inference_time: 14969.0 - throughput: 66.80472977486806 + inference_time: 2040.0 + throughput: 490.19607843137254 estimated_peak_memory_range: - min: 217088 - max: 8505648 + min: 0 + max: 95817216 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 246 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: jlpe9kqvg + total_layers: 246 + job_id: jpv6qwmr5 job_status: Passed - reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot - os_name: Android - manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T11:49:15Z' - - torchscript_onnx_tflite: - inference_time: 131178.0 - throughput: 7.623229504947476 + torchscript_onnx: + inference_time: 2941.0 + throughput: 340.02040122407345 estimated_peak_memory_range: - min: 49152 - max: 350266512 - primary_compute_unit: GPU + min: 0 + max: 237657248 + primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 14 - layers_on_gpu: 125 - layers_on_cpu: 11 - total_layers: 150 - job_id: j1p8ozdog + layers_on_npu: 283 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 283 + job_id: jp142817p job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T11:49:05Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-17T17:20:30Z' - torchscript_onnx_qnn: - inference_time: 3077.0 - throughput: 324.99187520311995 + inference_time: 3076.0 + throughput: 325.0975292587776 estimated_peak_memory_range: - min: 204800 - max: 204800 + min: 262144 + max: 262144 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 146 + layers_on_npu: 246 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 146 - job_id: j1gln3dmp + total_layers: 246 + job_id: jglv4k8m5 job_status: Passed torchscript_onnx: - inference_time: 4190.0 - throughput: 238.6634844868735 + inference_time: 4219.0 + throughput: 237.02299123014933 estimated_peak_memory_range: - min: 94441472 - max: 94441472 + min: 94502912 + max: 94502912 primary_compute_unit: NPU precision: int8 layer_info: @@ -436,7 +432,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 283 - job_id: jmg9vwr85 + job_id: jg9l0428g job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -445,4 +441,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:49:17Z' + timestamp: '2024-10-17T17:20:28Z' diff --git a/qai_hub_models/models/resnext101_quantized/requirements.txt b/qai_hub_models/models/resnext101_quantized/requirements.txt deleted file mode 100644 index de5b80e8..00000000 --- a/qai_hub_models/models/resnext101_quantized/requirements.txt +++ /dev/null @@ -1 +0,0 @@ -aimet-torch==1.32.1.post1; sys_platform == "linux" diff --git a/qai_hub_models/models/resnext101_quantized/test.py b/qai_hub_models/models/resnext101_quantized/test.py deleted file mode 100644 index 1df1173a..00000000 --- a/qai_hub_models/models/resnext101_quantized/test.py +++ /dev/null @@ -1,30 +0,0 @@ -# --------------------------------------------------------------------- -# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. -# SPDX-License-Identifier: BSD-3-Clause -# --------------------------------------------------------------------- -from qai_hub_models.models._shared.imagenet_classifier.test_utils import ( - run_imagenet_classifier_test, -) -from qai_hub_models.models.resnext101_quantized.demo import main as demo_main -from qai_hub_models.models.resnext101_quantized.model import ( - MODEL_ASSET_VERSION, - MODEL_ID, - ResNeXt101Quantizable, -) - - -def test_task(): - run_imagenet_classifier_test( - ResNeXt101Quantizable.from_pretrained(), - MODEL_ID, - probability_threshold=0.46, - diff_tol=0.005, - rtol=0.02, - atol=0.2, - asset_version=MODEL_ASSET_VERSION, - ) - - -def test_demo(): - # Verify demo does not crash - demo_main(is_test=True) diff --git a/qai_hub_models/models/resnext50/README.md b/qai_hub_models/models/resnext50/README.md index bb1c8865..2c206d02 100644 --- a/qai_hub_models/models/resnext50/README.md +++ b/qai_hub_models/models/resnext50/README.md @@ -6,7 +6,7 @@ ResNeXt50 is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of ResNeXt50 found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/resnext50). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.resnext50.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of ResNeXt50 can be found +* The license for the original implementation of ResNeXt50 can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/abs/1611.05431) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/resnext50/export.py b/qai_hub_models/models/resnext50/export.py index a2abb9fe..81cd7927 100644 --- a/qai_hub_models/models/resnext50/export.py +++ b/qai_hub_models/models/resnext50/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.resnext50 import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "resnext50" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/resnext50/perf.yaml b/qai_hub_models/models/resnext50/perf.yaml index fb43c3f1..e08359c0 100644 --- a/qai_hub_models/models/resnext50/perf.yaml +++ b/qai_hub_models/models/resnext50/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: ResNeXt50 performance_metrics: - torchscript_onnx_tflite: - inference_time: 2547.0 - throughput: 392.61876717707105 + inference_time: 2525.0 + throughput: 396.03960396039605 estimated_peak_memory_range: - min: 12288 - max: 2299032 + min: 16384 + max: 2647544 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 79 - job_id: jnp10em75 + job_id: jp3j0k9mg job_status: Passed torchscript_onnx_qnn: - inference_time: 2603.0 - throughput: 384.172109104879 + inference_time: 2601.0 + throughput: 384.46751249519417 estimated_peak_memory_range: min: 618496 - max: 63811024 + max: 84120456 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: joprke775 + job_id: jgdx1w36p job_status: Passed torchscript_onnx: - inference_time: 2750.0 - throughput: 363.6363636363636 + inference_time: 2794.0 + throughput: 357.9098067287044 estimated_peak_memory_range: min: 12288 - max: 60543136 + max: 2177184 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 128 - job_id: jw566n9y5 + job_id: jp2ky8rxp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:47:03Z' + timestamp: '2024-10-15T17:15:29Z' - torchscript_onnx_tflite: - inference_time: 1970.0 - throughput: 507.61421319796955 + inference_time: 1967.0 + throughput: 508.38840874428064 estimated_peak_memory_range: min: 12288 - max: 179658848 + max: 183090352 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 79 - job_id: jvgdwomz5 + job_id: jglvv1ke5 job_status: Passed torchscript_onnx_qnn: - inference_time: 2028.0 - throughput: 493.0966469428008 + inference_time: 2173.0 + throughput: 460.1932811780948 estimated_peak_memory_range: min: 618496 - max: 35498192 + max: 37904480 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: jep28lzqp + job_id: j5we6o7z5 job_status: Passed torchscript_onnx: - inference_time: 2281.0 - throughput: 438.4042086804033 + inference_time: 2342.0 + throughput: 426.9854824935952 estimated_peak_memory_range: - min: 442368 - max: 181333472 + min: 0 + max: 185964448 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 128 - job_id: j1p3keln5 + job_id: jp0z0ym25 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:47:04Z' + timestamp: '2024-10-16T08:20:59Z' - torchscript_onnx_tflite: - inference_time: 2501.0 - throughput: 399.8400639744102 + inference_time: 2503.0 + throughput: 399.52057530962844 estimated_peak_memory_range: - min: 12288 - max: 1642008 + min: 32768 + max: 2131744 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 79 - job_id: jz57zx89p + job_id: jpv6k3dz5 job_status: Passed torchscript_onnx_qnn: - inference_time: 2491.0 - throughput: 401.4452027298274 + inference_time: 2515.0 + throughput: 397.61431411530816 estimated_peak_memory_range: - min: 626688 - max: 1918440 + min: 634880 + max: 1887112 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: j2p0ylxng + job_id: jp14z0jkp job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:46:58Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T17:15:23Z' - torchscript_onnx_tflite: - inference_time: 3274.0 - throughput: 305.43677458766035 + inference_time: 2480.0 + throughput: 403.2258064516129 estimated_peak_memory_range: - min: 16384 - max: 116556784 + min: 40960 + max: 1990192 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 79 - job_id: jqp4qv21g + job_id: j5we6o745 job_status: Passed torchscript_onnx_qnn: - inference_time: 3360.0 - throughput: 297.6190476190476 + inference_time: 2480.0 + throughput: 403.2258064516129 estimated_peak_memory_range: - min: 618496 - max: 24560816 + min: 622592 + max: 2243496 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: j1gln39mp + job_id: jp4lrq1q5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:47:02Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T17:15:26Z' - torchscript_onnx_tflite: - inference_time: 2478.0 - throughput: 403.5512510088781 + inference_time: 2501.0 + throughput: 399.8400639744102 estimated_peak_memory_range: - min: 20480 - max: 2038184 + min: 28672 + max: 2388824 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 79 - job_id: j0pxvyzlg + job_id: jgz3dem45 job_status: Passed torchscript_onnx_qnn: - inference_time: 2507.0 - throughput: 398.8831272437176 + inference_time: 2540.0 + throughput: 393.7007874015748 estimated_peak_memory_range: - min: 655360 - max: 1980280 + min: 643072 + max: 1967920 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: j1p8ozkog + job_id: j57yrz4q5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:46:59Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T17:15:25Z' - torchscript_onnx_tflite: - inference_time: 2505.0 - throughput: 399.2015968063872 + inference_time: 2486.0 + throughput: 402.2526146419952 estimated_peak_memory_range: min: 28672 - max: 1902384 + max: 2146288 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 79 - job_id: jo5mr3l9g + job_id: jpedm9z85 job_status: Passed torchscript_onnx_qnn: - inference_time: 2493.0 - throughput: 401.1231448054553 + inference_time: 2488.0 + throughput: 401.92926045016077 estimated_peak_memory_range: - min: 630784 - max: 1816408 + min: 659456 + max: 1823200 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: jogkz3kng + job_id: jgdx1w3kp job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:47:00Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T17:15:24Z' - torchscript_onnx_tflite: - inference_time: 2514.0 - throughput: 397.77247414478916 + inference_time: 3247.0 + throughput: 307.9765937788728 estimated_peak_memory_range: - min: 40960 - max: 2616136 + min: 0 + max: 118191280 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 79 - job_id: jegn23wqg + job_id: jgjvnx71g job_status: Passed torchscript_onnx_qnn: - inference_time: 2547.0 - throughput: 392.61876717707105 + inference_time: 3368.0 + throughput: 296.91211401425176 estimated_peak_memory_range: - min: 634880 - max: 2307472 + min: 618496 + max: 28677200 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: jn5q83do5 + job_id: j5mnxrmyp job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T17:15:28Z' + - torchscript_onnx_tflite: + inference_time: 1705.0 + throughput: 586.5102639296188 + estimated_peak_memory_range: + min: 12288 + max: 63791216 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 79 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 79 + job_id: jp14z0jnp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 1799.0 + throughput: 555.864369093941 + estimated_peak_memory_range: + min: 614400 + max: 40800384 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 126 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 126 + job_id: jgn6v2zv5 + job_status: Passed + torchscript_onnx: + inference_time: 1686.0 + throughput: 593.1198102016607 + estimated_peak_memory_range: + min: 0 + max: 65267168 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 128 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 128 + job_id: jgo26yl4p + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:47:01Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T17:15:33Z' - torchscript_onnx_qnn: - inference_time: 2663.0 - throughput: 375.51633496057076 + inference_time: 2669.0 + throughput: 374.6721618583739 estimated_peak_memory_range: min: 602112 max: 602112 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: jqpye6ylg + job_id: jg9lnvmqg job_status: Passed torchscript_onnx: - inference_time: 2634.0 - throughput: 379.65072133637057 + inference_time: 2676.0 + throughput: 373.69207772795215 estimated_peak_memory_range: - min: 53100544 - max: 53100544 + min: 53096448 + max: 53096448 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 128 - job_id: jwgoy37k5 + job_id: jgkexz2yg job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:47:05Z' + timestamp: '2024-10-15T17:15:31Z' diff --git a/qai_hub_models/models/resnext50_quantized/README.md b/qai_hub_models/models/resnext50_quantized/README.md index 6c69089a..d03bd9ef 100644 --- a/qai_hub_models/models/resnext50_quantized/README.md +++ b/qai_hub_models/models/resnext50_quantized/README.md @@ -6,7 +6,7 @@ ResNeXt50 is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of ResNeXt50Quantized found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/resnext50_quantized). @@ -17,11 +17,6 @@ accross various devices, can be found [here](https://aihub.qualcomm.com/models/r ## Example & Usage -Install the package via pip: -```bash -pip install "qai_hub_models[resnext50_quantized]" -``` - Once installed, run the following simple CLI demo: @@ -44,15 +39,19 @@ python -m qai_hub_models.models.resnext50_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of ResNeXt50Quantized can be found +* The license for the original implementation of ResNeXt50Quantized can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/abs/1611.05431) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/resnext50_quantized/evaluate.py b/qai_hub_models/models/resnext50_quantized/evaluate.py index 1eb23114..87f75d68 100644 --- a/qai_hub_models/models/resnext50_quantized/evaluate.py +++ b/qai_hub_models/models/resnext50_quantized/evaluate.py @@ -13,10 +13,8 @@ from qai_hub_models.models.resnext50_quantized import MODEL_ID, Model from qai_hub_models.utils.args import evaluate_parser, get_hub_device, get_model_kwargs -from qai_hub_models.utils.base_model import BaseModel from qai_hub_models.utils.evaluate import evaluate_on_dataset from qai_hub_models.utils.inference import compile_model_from_args -from qai_hub_models.utils.quantization_aimet import AIMETQuantizableMixin SUPPORTED_DATASETS = ["imagenette", "imagenet"] @@ -27,6 +25,7 @@ def main(): model_cls=Model, default_split_size=2500, supported_datasets=SUPPORTED_DATASETS, + is_hub_quantized=True, ) args = parser.parse_args() args.device = None @@ -38,13 +37,7 @@ def main(): MODEL_ID, args, get_model_kwargs(Model, vars(args)) ) hub_device = get_hub_device(None, args.chipset) - - # Use Fp16 model for torch inference - for cls in Model.__mro__: - if issubclass(cls, BaseModel) and not issubclass(cls, AIMETQuantizableMixin): - torch_cls = cls - break - torch_model = torch_cls.from_pretrained(**get_model_kwargs(torch_cls, vars(args))) + torch_model = Model.from_pretrained(**get_model_kwargs(Model, vars(args))) evaluate_on_dataset( hub_model, torch_model, diff --git a/qai_hub_models/models/resnext50_quantized/export.py b/qai_hub_models/models/resnext50_quantized/export.py index 285dd9f4..83bca661 100644 --- a/qai_hub_models/models/resnext50_quantized/export.py +++ b/qai_hub_models/models/resnext50_quantized/export.py @@ -10,18 +10,20 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.resnext50_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference +from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( print_inference_metrics, print_on_target_demo_cmd, @@ -31,11 +33,14 @@ can_access_qualcomm_ai_hub, export_without_hub_access, ) +from qai_hub_models.utils.quantization import get_calibration_data def export_model( device: str = "Samsung Galaxy S23 (Family)", chipset: Optional[str] = None, + num_calibration_samples: int = 100, + skip_compiling: bool = False, skip_profiling: bool = False, skip_inferencing: bool = False, skip_downloading: bool = False, @@ -45,20 +50,19 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + 3. Compiles the model to an asset that can be run on device + 4. Profiles the model performance on a real device + 5. Inferences the model on sample inputs + 6. Downloads the model asset to the local directory + 7. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 5 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -66,6 +70,9 @@ def export_model( Defaults to DEFAULT_DEVICE if not specified. chipset: If set, will choose a random device with this chipset. Overrides the `device` argument. + num_calibration_samples: The number of calibration data samples + to use for quantization. + skip_compiling: If set, skips compiling model to format that can run on device. skip_profiling: If set, skips profiling of compiled model on real devices. skip_inferencing: If set, skips computing on-device outputs from sample data. skip_downloading: If set, skips downloading of compiled model. @@ -80,10 +87,11 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: - * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). + A struct of: + * A CompileJob object containing metadata about the compile job submitted to hub (None if compiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). + * A QuantizeJob object containing metadata about the quantize job submitted to hub """ model_name = "resnext50_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,33 +117,52 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) ) # Trace the model - source_model = model.convert_to_hub_source_model( - target_runtime, output_path, input_spec + source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) + + print(f"Quantizing model {model_name} with {num_calibration_samples} samples.") + # 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + onnx_compile_job = hub.submit_compile_job( + model=source_model, + input_specs=input_spec, + device=hub_device, + name=model_name, + options="--target_runtime onnx", + ) + quantize_job = hub.submit_quantize_job( + model=onnx_compile_job.get_target_model(), + calibration_data=get_calibration_data( + input_spec, "imagenette", num_calibration_samples + ), + weights_dtype=model.get_weights_dtype(), + activations_dtype=model.get_activations_dtype(), + name=model_name, + options=model.get_quantize_options(), ) + if skip_compiling: + return ExportResult(quantize_job=quantize_job) - # 2. Compile the model to an on-device asset + # 3. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) print(f"Optimizing model {model_name} to run on-device") submitted_compile_job = hub.submit_compile_job( - model=source_model, + model=quantize_job.get_target_model(), input_specs=input_spec, device=hub_device, name=model_name, - calibration_data=model.get_calibration_data(target_runtime), options=model_compile_options, ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 4. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +177,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 5. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +198,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 6. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 7. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -202,12 +229,17 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + quantize_job=quantize_job, + ) def main(): warnings.filterwarnings("ignore") - parser = export_parser(model_cls=Model) + parser = export_parser(model_cls=Model, is_hub_quantized=True) args = parser.parse_args() export_model(**vars(args)) diff --git a/qai_hub_models/models/resnext50_quantized/model.py b/qai_hub_models/models/resnext50_quantized/model.py index 101378c3..7708245a 100644 --- a/qai_hub_models/models/resnext50_quantized/model.py +++ b/qai_hub_models/models/resnext50_quantized/model.py @@ -4,78 +4,11 @@ # --------------------------------------------------------------------- from __future__ import annotations -# isort: off -# This verifies aimet is installed, and this must be included first. -from qai_hub_models.utils.quantization_aimet import ( - AIMETQuantizableMixin, - constrain_quantized_inputs_to_image_range, -) - -# isort: on - -import torch -from aimet_torch.cross_layer_equalization import equalize_model -from aimet_torch.model_preparer import prepare_model -from aimet_torch.quantsim import QuantizationSimModel, load_encodings_to_sim - from qai_hub_models.models.resnext50.model import ResNeXt50 -from qai_hub_models.utils.aimet.config_loader import get_default_aimet_config -from qai_hub_models.utils.asset_loaders import CachedWebModelAsset +from qai_hub_models.utils.quantization import HubQuantizableMixin MODEL_ID = __name__.split(".")[-2] -MODEL_ASSET_VERSION = 2 -DEFAULT_ENCODINGS = "resnext50_quantized_encodings.json" - - -class ResNeXt50Quantizable(AIMETQuantizableMixin, ResNeXt50): - """ResNeXt50 with post train quantization support. - - Supports only 8 bit weights and activations, and only loads pre-quantized checkpoints. - Support for quantizing using your own weights & data will come at a later date.""" - - def __init__( - self, - sim_model: QuantizationSimModel, - ) -> None: - # Input is already normalized by sim_model. Disable it in the wrapper model. - ResNeXt50.__init__(self, sim_model.model, normalize_input=False) - AIMETQuantizableMixin.__init__( - self, - sim_model, - ) - - @classmethod - def from_pretrained( - cls, - aimet_encodings: str | None = "DEFAULT", - ) -> "ResNeXt50Quantizable": - """ - Parameters: - aimet_encodings: - if "DEFAULT": Loads the model with aimet encodings calibrated on imagenette. - elif None: Doesn't load any encodings. Used when computing encodings. - else: Interprets as a filepath and loads the encodings stored there. - """ - model = ResNeXt50.from_pretrained() - input_shape = cls.get_input_spec()["image_tensor"][0] - - model = prepare_model(model) - equalize_model(model, input_shape) - sim = QuantizationSimModel( - model, - quant_scheme="tf_enhanced", - default_param_bw=8, - default_output_bw=8, - config_file=get_default_aimet_config(), - dummy_input=torch.rand(input_shape), - ) - constrain_quantized_inputs_to_image_range(sim) - if aimet_encodings: - if aimet_encodings == "DEFAULT": - aimet_encodings = CachedWebModelAsset.from_asset_store( - MODEL_ID, MODEL_ASSET_VERSION, DEFAULT_ENCODINGS - ).fetch() - load_encodings_to_sim(sim, aimet_encodings) - return cls(sim) +class ResNeXt50Quantizable(HubQuantizableMixin, ResNeXt50): + pass diff --git a/qai_hub_models/models/resnext50_quantized/perf.yaml b/qai_hub_models/models/resnext50_quantized/perf.yaml index d4ae1c1f..749bcdc2 100644 --- a/qai_hub_models/models/resnext50_quantized/perf.yaml +++ b/qai_hub_models/models/resnext50_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,44 +20,39 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS6490 (Proxy) - RB3 Gen 2 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Sa8775p Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Qcs6490 Proxy + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8775P Proxy models: - name: ResNeXt50Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 933.0 - throughput: 1071.8113612004288 + inference_time: 929.0 + throughput: 1076.4262648008612 estimated_peak_memory_range: - min: 12288 - max: 6057144 + min: 16384 + max: 2198064 primary_compute_unit: NPU precision: int8 layer_info: @@ -64,29 +60,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 82 - job_id: jlpe9kxog + job_id: jpy1zdxlp job_status: Passed torchscript_onnx_qnn: - inference_time: 1195.0 - throughput: 836.8200836820083 + inference_time: 1180.0 + throughput: 847.457627118644 estimated_peak_memory_range: - min: 28672 - max: 65421024 + min: 12288 + max: 67891904 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: jo5mr3kdg + total_layers: 127 + job_id: jpedov7v5 job_status: Passed torchscript_onnx: - inference_time: 1977.0 - throughput: 505.8168942842691 + inference_time: 1870.0 + throughput: 534.75935828877 estimated_peak_memory_range: - min: 36864 - max: 30920440 + min: 12288 + max: 31404304 primary_compute_unit: NPU precision: int8 layer_info: @@ -94,7 +90,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: jwgoy3wq5 + job_id: jp2kxmrqp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -103,13 +99,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:46:23Z' + timestamp: '2024-10-17T17:19:03Z' - torchscript_onnx_tflite: - inference_time: 679.0 - throughput: 1472.7540500736377 + inference_time: 687.0 + throughput: 1455.604075691412 estimated_peak_memory_range: min: 12288 - max: 107105616 + max: 111011792 primary_compute_unit: NPU precision: int8 layer_info: @@ -117,29 +113,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 82 - job_id: jygzeryog + job_id: jp0z4rjn5 job_status: Passed torchscript_onnx_qnn: - inference_time: 879.0 - throughput: 1137.6564277588168 + inference_time: 882.0 + throughput: 1133.7868480725624 estimated_peak_memory_range: min: 167936 - max: 32276064 + max: 34667296 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: joprked05 + total_layers: 127 + job_id: jgz327lx5 job_status: Passed torchscript_onnx: - inference_time: 1431.0 - throughput: 698.8120195667366 + inference_time: 1279.0 + throughput: 781.8608287724785 estimated_peak_memory_range: min: 28672 - max: 142368480 + max: 146151648 primary_compute_unit: NPU precision: int8 layer_info: @@ -147,7 +143,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: j1pv3vnk5 + job_id: jpy1zdolp job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -156,13 +152,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:46:24Z' + timestamp: '2024-10-17T17:19:04Z' - torchscript_onnx_tflite: - inference_time: 910.0 - throughput: 1098.901098901099 + inference_time: 3202.0 + throughput: 312.3048094940662 estimated_peak_memory_range: min: 12288 - max: 1396096 + max: 60662176 primary_compute_unit: NPU precision: int8 layer_info: @@ -170,37 +166,60 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 82 - job_id: jz5woqz3p + job_id: jp8q27xop job_status: Passed torchscript_onnx_qnn: - inference_time: 1126.0 - throughput: 888.0994671403197 + inference_time: 4530.0 + throughput: 220.7505518763797 estimated_peak_memory_range: - min: 184320 - max: 1340512 + min: 200704 + max: 8100560 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: jqpye628g + total_layers: 127 + job_id: j5wew9lm5 job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:46:17Z' + chipset: QCS6490 Proxy + timestamp: '2024-10-17T17:18:48Z' - torchscript_onnx_tflite: - inference_time: 1091.0 - throughput: 916.5902841429881 + inference_time: 64073.0 + throughput: 15.607198039735927 estimated_peak_memory_range: - min: 12288 - max: 111592032 + min: 24576 + max: 89783464 + primary_compute_unit: GPU + precision: int8 + layer_info: + layers_on_npu: 14 + layers_on_gpu: 57 + layers_on_cpu: 11 + total_layers: 82 + job_id: jgkevy4ng + job_status: Passed + reference_device_info: + name: RB5 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8250 Proxy + timestamp: '2024-10-17T17:18:33Z' + - torchscript_onnx_tflite: + inference_time: 920.0 + throughput: 1086.9565217391305 + estimated_peak_memory_range: + min: 32768 + max: 6176376 primary_compute_unit: NPU precision: int8 layer_info: @@ -208,37 +227,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 82 - job_id: jmg9vw2w5 + job_id: j5q602yop job_status: Passed torchscript_onnx_qnn: - inference_time: 1391.0 - throughput: 718.9072609633357 + inference_time: 1124.0 + throughput: 889.6797153024911 estimated_peak_memory_range: - min: 167936 - max: 33372320 + min: 172032 + max: 1466640 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: j1gln38jp + total_layers: 127 + job_id: jg9l04z8g job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:46:21Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-17T17:18:50Z' - torchscript_onnx_tflite: - inference_time: 935.0 - throughput: 1069.51871657754 + inference_time: 943.0 + throughput: 1060.4453870625662 estimated_peak_memory_range: min: 12288 - max: 2570320 + max: 1425152 primary_compute_unit: NPU precision: int8 layer_info: @@ -246,37 +265,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 82 - job_id: jnp10e185 + job_id: jglv4kym5 job_status: Passed torchscript_onnx_qnn: - inference_time: 1146.0 - throughput: 872.6003490401396 + inference_time: 1142.0 + throughput: 875.6567425569177 estimated_peak_memory_range: - min: 212992 - max: 1395456 + min: 192512 + max: 1796784 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: j2p0yl99g + total_layers: 127 + job_id: jgdxnvdzp job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:46:18Z' + chipset: SA8255P Proxy + timestamp: '2024-10-17T17:18:53Z' - torchscript_onnx_tflite: - inference_time: 928.0 - throughput: 1077.5862068965516 + inference_time: 946.0 + throughput: 1057.0824524312895 estimated_peak_memory_range: - min: 20480 - max: 9269656 + min: 12288 + max: 2211160 primary_compute_unit: NPU precision: int8 layer_info: @@ -284,22 +303,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 82 - job_id: jvgdwo4r5 + job_id: j56y218yp job_status: Passed torchscript_onnx_qnn: - inference_time: 1138.0 - throughput: 878.7346221441124 + inference_time: 1140.0 + throughput: 877.1929824561404 estimated_peak_memory_range: - min: 184320 - max: 1316032 + min: 192512 + max: 1365624 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: jogkz30wg + total_layers: 127 + job_id: j57y2de95 job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -307,14 +326,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:46:19Z' + chipset: SA8775P Proxy + timestamp: '2024-10-17T17:18:54Z' - torchscript_onnx_tflite: - inference_time: 921.0 - throughput: 1085.7763300760043 + inference_time: 1077.0 + throughput: 928.5051067780872 estimated_peak_memory_range: - min: 12288 - max: 28288840 + min: 4096 + max: 110828336 primary_compute_unit: NPU precision: int8 layer_info: @@ -322,37 +341,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 82 - job_id: jz57zxnvp + job_id: jp3jnmzng job_status: Passed torchscript_onnx_qnn: - inference_time: 1144.0 - throughput: 874.1258741258741 + inference_time: 1432.0 + throughput: 698.3240223463687 estimated_peak_memory_range: - min: 221184 - max: 1511136 + min: 167936 + max: 37381104 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: jn5q831n5 + total_layers: 127 + job_id: jp4lnwy15 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:46:20Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-17T17:18:56Z' - torchscript_onnx_tflite: - inference_time: 3068.0 - throughput: 325.94524119947846 + inference_time: 650.0 + throughput: 1538.4615384615386 estimated_peak_memory_range: - min: 12288 - max: 60359568 + min: 8192 + max: 55905408 primary_compute_unit: NPU precision: int8 layer_info: @@ -360,75 +379,67 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 82 - job_id: jqp4qv48g + job_id: jgo2zvlkp job_status: Passed torchscript_onnx_qnn: - inference_time: 4643.0 - throughput: 215.37798836958862 + inference_time: 737.0 + throughput: 1356.85210312076 estimated_peak_memory_range: - min: 225280 - max: 8241344 + min: 159744 + max: 34219488 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: j1p3ke735 + total_layers: 127 + job_id: jpxk91ll5 job_status: Passed - reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot - os_name: Android - manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T11:46:22Z' - - torchscript_onnx_tflite: - inference_time: 59523.0 - throughput: 16.80022848310737 + torchscript_onnx: + inference_time: 1252.0 + throughput: 798.7220447284345 estimated_peak_memory_range: - min: 24576 - max: 135568256 - primary_compute_unit: GPU + min: 0 + max: 79889536 + primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 14 - layers_on_gpu: 57 - layers_on_cpu: 11 - total_layers: 82 - job_id: j0pxvyr3g + layers_on_npu: 147 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 147 + job_id: jp8q27eop job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T11:46:13Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-17T17:19:07Z' - torchscript_onnx_qnn: - inference_time: 1240.0 - throughput: 806.4516129032259 + inference_time: 1324.0 + throughput: 755.2870090634441 estimated_peak_memory_range: - min: 442368 - max: 442368 + min: 462848 + max: 462848 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: jep28ldrp + total_layers: 127 + job_id: jp1428n7p job_status: Passed torchscript_onnx: - inference_time: 1934.0 - throughput: 517.063081695967 + inference_time: 1948.0 + throughput: 513.347022587269 estimated_peak_memory_range: - min: 29798400 - max: 29798400 + min: 30650368 + max: 30650368 primary_compute_unit: NPU precision: int8 layer_info: @@ -436,7 +447,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: jlpe9knog + job_id: jp0z4rmn5 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -445,4 +456,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:46:25Z' + timestamp: '2024-10-17T17:19:06Z' diff --git a/qai_hub_models/models/resnext50_quantized/requirements.txt b/qai_hub_models/models/resnext50_quantized/requirements.txt deleted file mode 100644 index de5b80e8..00000000 --- a/qai_hub_models/models/resnext50_quantized/requirements.txt +++ /dev/null @@ -1 +0,0 @@ -aimet-torch==1.32.1.post1; sys_platform == "linux" diff --git a/qai_hub_models/models/resnext50_quantized/test.py b/qai_hub_models/models/resnext50_quantized/test.py deleted file mode 100644 index 4cd1dbbd..00000000 --- a/qai_hub_models/models/resnext50_quantized/test.py +++ /dev/null @@ -1,30 +0,0 @@ -# --------------------------------------------------------------------- -# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. -# SPDX-License-Identifier: BSD-3-Clause -# --------------------------------------------------------------------- -from qai_hub_models.models._shared.imagenet_classifier.test_utils import ( - run_imagenet_classifier_test, -) -from qai_hub_models.models.resnext50_quantized.demo import main as demo_main -from qai_hub_models.models.resnext50_quantized.model import ( - MODEL_ASSET_VERSION, - MODEL_ID, - ResNeXt50Quantizable, -) - - -def test_task(): - run_imagenet_classifier_test( - ResNeXt50Quantizable.from_pretrained(), - MODEL_ID, - probability_threshold=0.46, - diff_tol=0.005, - rtol=0.02, - atol=0.2, - asset_version=MODEL_ASSET_VERSION, - ) - - -def test_demo(): - # Verify demo does not crash - demo_main(is_test=True) diff --git a/qai_hub_models/models/riffusion_quantized/README.md b/qai_hub_models/models/riffusion_quantized/README.md index fedfc292..d889deb9 100644 --- a/qai_hub_models/models/riffusion_quantized/README.md +++ b/qai_hub_models/models/riffusion_quantized/README.md @@ -6,7 +6,7 @@ Generates high resolution spectrograms images from text prompts using a latent diffusion model. This model uses CLIP ViT-L/14 as text encoder, U-Net based latent denoising, and VAE based decoder to generate the final image. This is based on the implementation of Riffusion found -[here](https://github.com/CompVis/stable-diffusion/tree/main). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/riffusion_quantized). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.riffusion_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Riffusion can be found +* The license for the original implementation of Riffusion can be found [here](https://github.com/CompVis/stable-diffusion/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://github.com/CompVis/stable-diffusion/blob/main/LICENSE) +* The license for the compiled assets for on-device deployment can be found [here](https://github.com/CompVis/stable-diffusion/blob/main/LICENSE) + ## References * [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752) * [Source Model Implementation](https://github.com/CompVis/stable-diffusion/tree/main) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/riffusion_quantized/export.py b/qai_hub_models/models/riffusion_quantized/export.py index 1cc4be82..496229aa 100644 --- a/qai_hub_models/models/riffusion_quantized/export.py +++ b/qai_hub_models/models/riffusion_quantized/export.py @@ -9,13 +9,14 @@ import warnings from pathlib import Path -from typing import Any, Dict, List, Mapping, Optional, Tuple, cast +from typing import Any, Dict, List, Mapping, Optional, cast import qai_hub as hub +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.riffusion_quantized import Model from qai_hub_models.utils.args import export_parser -from qai_hub_models.utils.base_model import BasePrecompiledModel, TargetRuntime +from qai_hub_models.utils.base_model import BasePrecompiledModel from qai_hub_models.utils.printing import print_profile_metrics_from_job from qai_hub_models.utils.qai_hub_helpers import ( can_access_qualcomm_ai_hub, @@ -36,19 +37,16 @@ def export_model( output_dir: Optional[str] = None, profile_options: str = "", **additional_model_kwargs, -) -> Mapping[str, Tuple[Optional[hub.ProfileJob], Optional[hub.InferenceJob]]] | List[ - str -]: +) -> Mapping[str, ExportResult] | List[str]: """ - This function accomplishes 5 main tasks: + This function executes the following recipe: - 1. Initialize model. - 2. Upload model assets to hub. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Summarizes the results from profiling. + 1. Initialize model + 2. Upload model assets to hub + 3. Profiles the model performance on a real device + 4. Summarizes the results from profiling - Each of the last three steps can be optionally skipped using the input options. + Each of the last 2 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -70,9 +68,8 @@ def export_model( `model_cls.from_precompiled` Returns: - A Mapping from component_name to a 2-tuple of: + A Mapping from component_name to a struct of: * A ProfileJob containing metadata about the profile job (None if profiling skipped). - * An InferenceJob containing metadata about the inference job (None if inferencing skipped). """ model_name = "riffusion_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -101,9 +98,7 @@ def export_model( component_arg, ) - target_runtime = TargetRuntime.TFLITE - # On-device perf improves with I/O in channel_last format except when using ONNX. - use_channel_last_format = target_runtime != TargetRuntime.ONNX + target_runtime = TargetRuntime.QNN # 1. Initialize model print("Initializing model class") @@ -123,8 +118,11 @@ def export_model( uploaded_models[component_name] = hub.upload_model( components_dict[component_name].get_target_model_path() ) + print( + f"The {component_name} model is saved here: {components_dict[component_name].get_target_model_path()}" + ) - # 3. Profile the model assets on real devices + # 3. Profiles the model performance on a real device profile_jobs: Dict[str, hub.client.ProfileJob] = {} if not skip_profiling: for component_name in components: @@ -142,31 +140,7 @@ def export_model( hub.client.ProfileJob, submitted_profile_job ) - # 4. Run inference on-device with sample inputs - inference_jobs: Dict[str, hub.client.InferenceJob] = {} - if not skip_inferencing: - for component_name in components: - print( - f"Running inference for {component_name} on a hosted device with example inputs." - ) - profile_options_all = components_dict[ - component_name - ].get_hub_profile_options(target_runtime, profile_options) - sample_inputs = components_dict[component_name].sample_inputs( - use_channel_last_format=use_channel_last_format - ) - submitted_inference_job = hub.submit_inference_job( - model=uploaded_models[component_name], - inputs=sample_inputs, - device=hub_device, - name=f"{model_name}_{component_name}", - options=profile_options_all, - ) - inference_jobs[component_name] = cast( - hub.client.InferenceJob, submitted_inference_job - ) - - # 5. Summarize the results from profiling + # 4. Summarizes the results from profiling if not skip_summary and not skip_profiling: for component_name in components: profile_job = profile_jobs[component_name] @@ -175,9 +149,8 @@ def export_model( print_profile_metrics_from_job(profile_job, profile_data) return { - component_name: ( - profile_jobs.get(component_name, None), - inference_jobs.get(component_name, None), + component_name: ExportResult( + profile_job=profile_jobs.get(component_name, None), ) for component_name in components } diff --git a/qai_hub_models/models/riffusion_quantized/perf.yaml b/qai_hub_models/models/riffusion_quantized/perf.yaml index 0706f066..9f7b5b11 100644 --- a/qai_hub_models/models/riffusion_quantized/perf.yaml +++ b/qai_hub_models/models/riffusion_quantized/perf.yaml @@ -26,7 +26,7 @@ aggregated: - Xiaomi 12 - Xiaomi 12 Pro supported_chipsets: - - Qcs8550 Proxy + - QCS8550 Proxy - Snapdragon® 8 Gen 1 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 3 @@ -102,7 +102,7 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy + chipset: QCS8550 Proxy timestamp: '2024-10-03T16:20:19Z' - torchscript_onnx_qnn: inference_time: 7594.0 @@ -196,7 +196,7 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy + chipset: QCS8550 Proxy timestamp: '2024-10-03T16:20:19Z' - torchscript_onnx_qnn: inference_time: 227581.0 @@ -290,7 +290,7 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy + chipset: QCS8550 Proxy timestamp: '2024-10-03T16:20:19Z' - torchscript_onnx_qnn: inference_time: 129856.0 diff --git a/qai_hub_models/models/sam/README.md b/qai_hub_models/models/sam/README.md index 5ad08396..ac8adc3b 100644 --- a/qai_hub_models/models/sam/README.md +++ b/qai_hub_models/models/sam/README.md @@ -6,7 +6,7 @@ Transformer based encoder-decoder where prompts specify what to segment in an image thereby allowing segmentation without the need for additional training. The image encoder generates embeddings and the lightweight decoder operates on the embeddings for point and mask based image segmentation. This is based on the implementation of Segment-Anything-Model found -[here](https://github.com/facebookresearch/segment-anything). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/sam). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.sam.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Segment-Anything-Model can be found +* The license for the original implementation of Segment-Anything-Model can be found [here](https://github.com/facebookresearch/segment-anything/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Segment Anything](https://arxiv.org/abs/2304.02643) * [Source Model Implementation](https://github.com/facebookresearch/segment-anything) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/sam/export.py b/qai_hub_models/models/sam/export.py index 0188a6e7..c527843c 100644 --- a/qai_hub_models/models/sam/export.py +++ b/qai_hub_models/models/sam/export.py @@ -10,15 +10,16 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Mapping, Optional, Tuple, cast +from typing import Any, Dict, List, Mapping, Optional, cast import qai_hub as hub import torch from torch.utils import mobile_optimizer +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.sam import Model from qai_hub_models.utils.args import export_parser, get_model_kwargs -from qai_hub_models.utils.base_model import BaseModel, TargetRuntime +from qai_hub_models.utils.base_model import BaseModel from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -46,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Mapping[ - str, Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] -] | List[str]: +) -> Mapping[str, ExportResult] | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -84,10 +83,10 @@ def export_model( `model_cls.from_pretrained` Returns: - A Mapping from component_name to a 3-tuple of: + A Mapping from component_name to a struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "sam" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -119,7 +118,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) components_dict: Dict[str, BaseModel] = {} if "SAMDecoder" in components: @@ -145,7 +144,7 @@ def export_model( }, ) - # 2. Compile the models to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = component.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -161,7 +160,7 @@ def export_model( hub.client.CompileJob, submitted_compile_job ) - # 3. Profile the model assets on real devices + # 3. Profiles the model performance on a real device profile_jobs: Dict[str, hub.client.ProfileJob] = {} if not skip_profiling: for component_name in components: @@ -179,7 +178,7 @@ def export_model( hub.client.ProfileJob, submitted_profile_job ) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_jobs: Dict[str, hub.client.InferenceJob] = {} if not skip_inferencing: for component_name in components: @@ -203,14 +202,14 @@ def export_model( hub.client.InferenceJob, submitted_inference_job ) - # 5. Download the model assets to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) for component_name, compile_job in compile_jobs.items(): target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / component_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: for component_name in components: profile_job = profile_jobs[component_name] @@ -235,10 +234,10 @@ def export_model( ) return { - component_name: ( - compile_jobs[component_name], - profile_jobs.get(component_name, None), - inference_jobs.get(component_name, None), + component_name: ExportResult( + compile_job=compile_jobs[component_name], + inference_job=inference_jobs.get(component_name, None), + profile_job=profile_jobs.get(component_name, None), ) for component_name in components } diff --git a/qai_hub_models/models/sam/perf.yaml b/qai_hub_models/models/sam/perf.yaml index a2f365f6..f3a86019 100644 --- a/qai_hub_models/models/sam/perf.yaml +++ b/qai_hub_models/models/sam/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: SAMDecoder performance_metrics: - torchscript_onnx_tflite: - inference_time: 29972.0 - throughput: 33.364473508608036 + inference_time: 29098.0 + throughput: 34.366623135610695 estimated_peak_memory_range: - min: 4259840 - max: 12540360 + min: 2162688 + max: 21300704 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,7 +56,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 337 - job_id: j1gln3yjp + job_id: jgz3dewx5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -67,13 +65,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:44:43Z' + timestamp: '2024-10-15T17:13:02Z' - torchscript_onnx_tflite: - inference_time: 20690.0 - throughput: 48.33252779120348 + inference_time: 20232.0 + throughput: 49.426650850138394 estimated_peak_memory_range: - min: 3805184 - max: 227103408 + min: 2363392 + max: 238361616 primary_compute_unit: NPU precision: fp16 layer_info: @@ -81,7 +79,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 337 - job_id: j1p3kez35 + job_id: jg9lnv88g job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -90,13 +88,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:44:44Z' + timestamp: '2024-10-15T17:13:04Z' - torchscript_onnx_tflite: - inference_time: 29720.0 - throughput: 33.64737550471063 + inference_time: 28959.0 + throughput: 34.531579129113574 estimated_peak_memory_range: - min: 3985408 - max: 7065168 + min: 3997696 + max: 12470208 primary_compute_unit: NPU precision: fp16 layer_info: @@ -104,7 +102,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 337 - job_id: j1pv3v2k5 + job_id: jgdx1w0zp job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -112,14 +110,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:44:46Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T17:13:06Z' - torchscript_onnx_tflite: - inference_time: 33136.0 - throughput: 30.17865765330758 + inference_time: 29061.0 + throughput: 34.41037817005609 estimated_peak_memory_range: - min: 4046848 - max: 214762848 + min: 4005888 + max: 26106816 primary_compute_unit: NPU precision: fp16 layer_info: @@ -127,22 +125,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 337 - job_id: jlpe9k6og + job_id: jp2ky8j6p job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:44:48Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T17:13:13Z' - torchscript_onnx_tflite: - inference_time: 30005.0 - throughput: 33.327778703549406 + inference_time: 28990.0 + throughput: 34.494653328734046 estimated_peak_memory_range: min: 4030464 - max: 17367192 + max: 49143704 primary_compute_unit: NPU precision: fp16 layer_info: @@ -150,22 +148,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 337 - job_id: jz5woqy3p + job_id: jgn6v2xj5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:44:49Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T17:13:11Z' - torchscript_onnx_tflite: - inference_time: 29956.0 - throughput: 33.38229403124583 + inference_time: 29004.0 + throughput: 34.478003034064265 estimated_peak_memory_range: - min: 4022272 - max: 22289056 + min: 4042752 + max: 6870472 primary_compute_unit: NPU precision: fp16 layer_info: @@ -173,22 +171,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 337 - job_id: jnp10eo85 + job_id: jpxkovm85 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:44:51Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T17:13:09Z' - torchscript_onnx_tflite: - inference_time: 29912.0 - throughput: 33.431398769724524 + inference_time: 32396.0 + throughput: 30.868008396098283 estimated_peak_memory_range: - min: 4001792 - max: 11746344 + min: 4046848 + max: 233043552 primary_compute_unit: NPU precision: fp16 layer_info: @@ -196,24 +194,47 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 337 - job_id: jz57zxovp + job_id: j57yrz6n5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T17:13:08Z' + - torchscript_onnx_tflite: + inference_time: 20466.0 + throughput: 48.8615264340858 + estimated_peak_memory_range: + min: 2555904 + max: 164731968 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 337 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 337 + job_id: jgkexzovg + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:44:53Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T17:13:16Z' - name: SAMEncoder performance_metrics: - torchscript_onnx_tflite: - inference_time: 11293293.0 - throughput: 0.0885481320638719 + inference_time: 11323510.0 + throughput: 0.08831183970341351 estimated_peak_memory_range: - min: 39505920 - max: 225192072 + min: 12288 + max: 285674960 primary_compute_unit: CPU precision: fp32 layer_info: @@ -221,7 +242,7 @@ models: layers_on_gpu: 36 layers_on_cpu: 782 total_layers: 818 - job_id: jw566n865 + job_id: j5we6oxm5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -230,13 +251,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:44:43Z' + timestamp: '2024-10-15T17:13:03Z' - torchscript_onnx_tflite: - inference_time: 8339280.0 - throughput: 0.11991442906342034 + inference_time: 8300484.0 + throughput: 0.12047490242737652 estimated_peak_memory_range: - min: 43937792 - max: 1631367904 + min: 129224704 + max: 1718444144 primary_compute_unit: CPU precision: fp32 layer_info: @@ -244,7 +265,7 @@ models: layers_on_gpu: 36 layers_on_cpu: 782 total_layers: 818 - job_id: jwgoy3lq5 + job_id: jp14z037p job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -253,13 +274,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:44:45Z' + timestamp: '2024-10-15T17:13:05Z' - torchscript_onnx_tflite: - inference_time: 10940893.0 - throughput: 0.09140021751423764 + inference_time: 10870158.0 + throughput: 0.09199498296160921 estimated_peak_memory_range: - min: 129261568 - max: 132892712 + min: 129540096 + max: 300233944 primary_compute_unit: CPU precision: fp32 layer_info: @@ -267,7 +288,7 @@ models: layers_on_gpu: 36 layers_on_cpu: 782 total_layers: 818 - job_id: j7gjxe3vp + job_id: j5we6ox45 job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -275,14 +296,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:44:46Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T17:13:06Z' - torchscript_onnx_tflite: - inference_time: 17506449.0 - throughput: 0.05712180694097358 + inference_time: 10178345.0 + throughput: 0.09824779961771782 estimated_peak_memory_range: - min: 87793664 - max: 1726140736 + min: 126943232 + max: 130050864 primary_compute_unit: CPU precision: fp32 layer_info: @@ -290,22 +311,22 @@ models: layers_on_gpu: 36 layers_on_cpu: 782 total_layers: 818 - job_id: jygzerzog + job_id: jpy13en0p job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:44:48Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T17:13:13Z' - torchscript_onnx_tflite: - inference_time: 10431216.0 - throughput: 0.09586610036643858 + inference_time: 11283428.0 + throughput: 0.08862554890233712 estimated_peak_memory_range: - min: 129617920 - max: 133152664 + min: 126205952 + max: 130851936 primary_compute_unit: CPU precision: fp32 layer_info: @@ -313,22 +334,22 @@ models: layers_on_gpu: 36 layers_on_cpu: 782 total_layers: 818 - job_id: jmg9vwow5 + job_id: jprv3k9kg job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:44:50Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T17:13:11Z' - torchscript_onnx_tflite: - inference_time: 10423682.0 - throughput: 0.09593539020089062 + inference_time: 10102843.0 + throughput: 0.09898203901614624 estimated_peak_memory_range: - min: 128888832 - max: 132565768 + min: 127098880 + max: 131236328 primary_compute_unit: CPU precision: fp32 layer_info: @@ -336,22 +357,22 @@ models: layers_on_gpu: 36 layers_on_cpu: 782 total_layers: 818 - job_id: jvgdwo6r5 + job_id: j5mnxr47p job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:44:52Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T17:13:10Z' - torchscript_onnx_tflite: - inference_time: 11339804.0 - throughput: 0.08818494570100154 + inference_time: 13526091.0 + throughput: 0.07393118972805965 estimated_peak_memory_range: - min: 129880064 - max: 133421840 + min: 137879552 + max: 1774634432 primary_compute_unit: CPU precision: fp32 layer_info: @@ -359,13 +380,36 @@ models: layers_on_gpu: 36 layers_on_cpu: 782 total_layers: 818 - job_id: jqp4qve8g + job_id: jp4lrq825 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T17:13:08Z' + - torchscript_onnx_tflite: + inference_time: 6334196.0 + throughput: 0.15787323284596813 + estimated_peak_memory_range: + min: 102768640 + max: 1649431984 + primary_compute_unit: CPU + precision: fp32 + layer_info: + layers_on_npu: 0 + layers_on_gpu: 36 + layers_on_cpu: 782 + total_layers: 818 + job_id: jg9llx4qg + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:44:53Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-16T08:25:26Z' diff --git a/qai_hub_models/models/sesr_m5/README.md b/qai_hub_models/models/sesr_m5/README.md index 35457e96..2c7bfd68 100644 --- a/qai_hub_models/models/sesr_m5/README.md +++ b/qai_hub_models/models/sesr_m5/README.md @@ -6,7 +6,7 @@ SESR M5 performs efficient on-device upscaling of images. This is based on the implementation of SESR-M5 found -[here](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/sesr). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/sesr_m5). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.sesr_m5.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of SESR-M5 can be found +* The license for the original implementation of SESR-M5 can be found [here](https://github.com/quic/aimet-model-zoo/blob/develop/LICENSE.pdf). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Collapsible Linear Blocks for Super-Efficient Super Resolution](https://arxiv.org/abs/2103.09404) * [Source Model Implementation](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/sesr) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/sesr_m5/export.py b/qai_hub_models/models/sesr_m5/export.py index 0ec23534..86f043b3 100644 --- a/qai_hub_models/models/sesr_m5/export.py +++ b/qai_hub_models/models/sesr_m5/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.sesr_m5 import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "sesr_m5" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/sesr_m5/perf.yaml b/qai_hub_models/models/sesr_m5/perf.yaml index d74eee34..3f6dc320 100644 --- a/qai_hub_models/models/sesr_m5/perf.yaml +++ b/qai_hub_models/models/sesr_m5/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: SESR-M5 performance_metrics: - torchscript_onnx_tflite: - inference_time: 2184.0 - throughput: 457.87545787545787 + inference_time: 2175.0 + throughput: 459.7701149425287 estimated_peak_memory_range: - min: 20480 - max: 11545544 + min: 16384 + max: 21924928 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 25 - job_id: j1p3ke935 + job_id: jgdx1ze6p job_status: Passed torchscript_onnx_qnn: - inference_time: 2154.0 - throughput: 464.2525533890436 + inference_time: 2129.0 + throughput: 469.7040864255519 estimated_peak_memory_range: - min: 20480 - max: 3826424 + min: 16384 + max: 60533184 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 31 - job_id: jmg9vwzw5 + job_id: jprv3n2vg job_status: Passed torchscript_onnx: - inference_time: 2884.0 - throughput: 346.74063800277395 + inference_time: 2687.0 + throughput: 372.1622627465575 estimated_peak_memory_range: - min: 20480 - max: 74815880 + min: 212992 + max: 1417096 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 33 - job_id: joprkel05 + job_id: jgo264n4p job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:43:40Z' + timestamp: '2024-10-14T23:35:04Z' - torchscript_onnx_tflite: - inference_time: 1799.0 - throughput: 555.864369093941 + inference_time: 1778.0 + throughput: 562.429696287964 estimated_peak_memory_range: min: 16384 - max: 28480528 + max: 28718528 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 25 - job_id: jwgoy3rq5 + job_id: jg9lnxjqg job_status: Passed torchscript_onnx_qnn: - inference_time: 1811.0 - throughput: 552.1811154058531 + inference_time: 1652.0 + throughput: 605.3268765133172 estimated_peak_memory_range: - min: 12288 - max: 12853232 + min: 208896 + max: 13220800 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 31 - job_id: jnp10en85 + job_id: jp2kyv9xp job_status: Passed torchscript_onnx: - inference_time: 2388.0 - throughput: 418.7604690117253 + inference_time: 2121.0 + throughput: 471.4757190004715 estimated_peak_memory_range: min: 0 - max: 30576416 + max: 31376288 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 33 - job_id: jep28lrrp + job_id: jpv6k9r75 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:43:41Z' + timestamp: '2024-10-14T23:35:05Z' - torchscript_onnx_tflite: - inference_time: 2187.0 - throughput: 457.2473708276177 + inference_time: 2241.0 + throughput: 446.2293618920125 estimated_peak_memory_range: min: 24576 - max: 43874120 + max: 1384976 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 25 - job_id: j1pv3vlk5 + job_id: jp14zvykp job_status: Passed torchscript_onnx_qnn: - inference_time: 2163.0 - throughput: 462.32085067036525 + inference_time: 2144.0 + throughput: 466.4179104477612 estimated_peak_memory_range: - min: 24576 - max: 4132456 + min: 221184 + max: 4746840 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 31 - job_id: jz57zxevp + job_id: jp0z0v225 job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:43:35Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:34:56Z' - torchscript_onnx_tflite: - inference_time: 3450.0 - throughput: 289.8550724637681 + inference_time: 2191.0 + throughput: 456.41259698767686 estimated_peak_memory_range: - min: 6336512 - max: 35321392 + min: 36864 + max: 8154584 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 25 - job_id: j7gjxervp + job_id: jpxkodnj5 job_status: Passed torchscript_onnx_qnn: - inference_time: 3163.0 - throughput: 316.1555485298767 + inference_time: 2146.0 + throughput: 465.98322460391427 estimated_peak_memory_range: - min: 208896 - max: 16094048 + min: 225280 + max: 1479912 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 31 - job_id: jegn23zkg + job_id: j5q6qmr7p job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:43:39Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:34:59Z' - torchscript_onnx_tflite: - inference_time: 2202.0 - throughput: 454.1326067211626 + inference_time: 2311.0 + throughput: 432.7131112072696 estimated_peak_memory_range: - min: 28672 - max: 8680096 + min: 20480 + max: 91047872 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 25 - job_id: jlpe9k7og + job_id: jp4lr9kq5 job_status: Passed torchscript_onnx_qnn: - inference_time: 2160.0 - throughput: 462.962962962963 + inference_time: 2169.0 + throughput: 461.04195481788844 estimated_peak_memory_range: - min: 225280 - max: 1670544 + min: 229376 + max: 1645592 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 31 - job_id: jqp4qvy8g + job_id: jgkex9qyg job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:43:36Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:34:58Z' - torchscript_onnx_tflite: - inference_time: 2185.0 - throughput: 457.66590389016017 + inference_time: 2198.0 + throughput: 454.9590536851683 estimated_peak_memory_range: min: 24576 - max: 2066464 + max: 7414256 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 25 - job_id: jygzerlog + job_id: j57yr70q5 job_status: Passed torchscript_onnx_qnn: - inference_time: 2162.0 - throughput: 462.53469010175763 + inference_time: 2146.0 + throughput: 465.98322460391427 estimated_peak_memory_range: - min: 225280 - max: 4812536 + min: 221184 + max: 1487632 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 31 - job_id: j0pxvyl3g + job_id: jp8qy4mzp job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:43:37Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:34:57Z' - torchscript_onnx_tflite: - inference_time: 2209.0 - throughput: 452.6935264825713 + inference_time: 3978.0 + throughput: 251.38260432378078 estimated_peak_memory_range: - min: 36864 - max: 7783424 + min: 16384 + max: 27377792 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 25 - job_id: jz5woql3p + job_id: jgdx1zekp job_status: Passed torchscript_onnx_qnn: - inference_time: 2522.0 - throughput: 396.5107057890563 + inference_time: 3202.0 + throughput: 312.3048094940662 estimated_peak_memory_range: - min: 221184 - max: 1397968 + min: 208896 + max: 16850496 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,22 +329,75 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 31 - job_id: jo5mr30dg + job_id: j56y4dzvp job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:43:38Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:35:02Z' + - torchscript_onnx_tflite: + inference_time: 1694.0 + throughput: 590.318772136954 + estimated_peak_memory_range: + min: 12288 + max: 17250624 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 22 + layers_on_gpu: 0 + layers_on_cpu: 3 + total_layers: 25 + job_id: jgn6v7mv5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 1520.0 + throughput: 657.8947368421053 + estimated_peak_memory_range: + min: 208896 + max: 10952832 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 31 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 31 + job_id: jp3j0w1xg + job_status: Passed + torchscript_onnx: + inference_time: 1994.0 + throughput: 501.5045135406219 + estimated_peak_memory_range: + min: 0 + max: 16932288 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 33 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 33 + job_id: j5we613z5 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:35:08Z' - torchscript_onnx_qnn: - inference_time: 2340.0 - throughput: 427.35042735042737 + inference_time: 2358.0 + throughput: 424.08821034775235 estimated_peak_memory_range: - min: 212992 - max: 212992 + min: 237568 + max: 237568 primary_compute_unit: NPU precision: fp16 layer_info: @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 31 - job_id: jvgdwodr5 + job_id: jpy137jrp job_status: Passed torchscript_onnx: - inference_time: 2934.0 - throughput: 340.83162917518746 + inference_time: 2968.0 + throughput: 336.92722371967653 estimated_peak_memory_range: - min: 8941568 - max: 8941568 + min: 8953856 + max: 8953856 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 33 - job_id: jqpye6o8g + job_id: jpedmlw75 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:43:42Z' + timestamp: '2024-10-14T23:35:06Z' diff --git a/qai_hub_models/models/sesr_m5_quantized/README.md b/qai_hub_models/models/sesr_m5_quantized/README.md index 18ee0ea3..4af66e25 100644 --- a/qai_hub_models/models/sesr_m5_quantized/README.md +++ b/qai_hub_models/models/sesr_m5_quantized/README.md @@ -6,7 +6,7 @@ SESR M5 performs efficient on-device upscaling of images. This is based on the implementation of SESR-M5-Quantized found -[here](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/sesr). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/sesr_m5_quantized). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.sesr_m5_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of SESR-M5-Quantized can be found +* The license for the original implementation of SESR-M5-Quantized can be found [here](https://github.com/quic/aimet-model-zoo/blob/develop/LICENSE.pdf). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Collapsible Linear Blocks for Super-Efficient Super Resolution](https://arxiv.org/abs/2103.09404) * [Source Model Implementation](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/sesr) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/sesr_m5_quantized/export.py b/qai_hub_models/models/sesr_m5_quantized/export.py index 2475648d..94bf1c30 100644 --- a/qai_hub_models/models/sesr_m5_quantized/export.py +++ b/qai_hub_models/models/sesr_m5_quantized/export.py @@ -10,17 +10,17 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.sesr_m5_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.printing import ( print_inference_metrics, @@ -45,20 +45,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -80,10 +78,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "sesr_m5_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,7 +107,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( target_runtime, output_path, input_spec ) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -135,7 +133,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +148,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +169,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -198,7 +196,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/sesr_m5_quantized/perf.yaml b/qai_hub_models/models/sesr_m5_quantized/perf.yaml index 7281550b..e151066c 100644 --- a/qai_hub_models/models/sesr_m5_quantized/perf.yaml +++ b/qai_hub_models/models/sesr_m5_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,41 +20,38 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: SESR-M5-Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 1332.0 - throughput: 750.7507507507507 + inference_time: 1339.0 + throughput: 746.8259895444362 estimated_peak_memory_range: - min: 24576 - max: 1307512 + min: 28672 + max: 1352888 primary_compute_unit: NPU precision: int8 layer_info: @@ -61,29 +59,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 27 - job_id: j7gjxeqxp + job_id: jpv6k9qz5 job_status: Passed torchscript_onnx_qnn: inference_time: 973.0 throughput: 1027.749229188078 estimated_peak_memory_range: - min: 20480 - max: 4007080 + min: 16384 + max: 77034472 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 25 + layers_on_npu: 31 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 25 - job_id: jnp10ej85 + total_layers: 31 + job_id: j5mnxde7p job_status: Passed torchscript_onnx: - inference_time: 1190.0 - throughput: 840.3361344537815 + inference_time: 1083.0 + throughput: 923.3610341643582 estimated_peak_memory_range: - min: 65536 - max: 1516480 + min: 12288 + max: 1986272 primary_compute_unit: NPU precision: int8 layer_info: @@ -91,7 +89,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 48 - job_id: jqpye6x8g + job_id: jp3j0wvmg job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -100,13 +98,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:43:03Z' + timestamp: '2024-10-14T23:34:19Z' - torchscript_onnx_tflite: - inference_time: 1112.0 - throughput: 899.2805755395683 + inference_time: 1109.0 + throughput: 901.7132551848512 estimated_peak_memory_range: min: 16384 - max: 26595040 + max: 27098080 primary_compute_unit: NPU precision: int8 layer_info: @@ -114,29 +112,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 27 - job_id: jlpe9ky1g + job_id: jgjvnwd1g job_status: Passed torchscript_onnx_qnn: - inference_time: 856.0 - throughput: 1168.2242990654206 + inference_time: 714.0 + throughput: 1400.5602240896358 estimated_peak_memory_range: - min: 61440 - max: 14073056 + min: 77824 + max: 14626672 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 25 + layers_on_npu: 31 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 25 - job_id: jvgdwo3r5 + total_layers: 31 + job_id: jgn6v70j5 job_status: Passed torchscript_onnx: - inference_time: 859.0 - throughput: 1164.1443538998835 + inference_time: 821.0 + throughput: 1218.026796589525 estimated_peak_memory_range: min: 0 - max: 30469168 + max: 30963296 primary_compute_unit: NPU precision: int8 layer_info: @@ -144,7 +142,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 48 - job_id: j2p0ylj9g + job_id: jgo264k1p job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -153,13 +151,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:43:04Z' + timestamp: '2024-10-14T23:34:20Z' - torchscript_onnx_tflite: - inference_time: 1348.0 - throughput: 741.839762611276 + inference_time: 3597.0 + throughput: 278.00945232137894 estimated_peak_memory_range: - min: 815104 - max: 3639248 + min: 12288 + max: 19733760 primary_compute_unit: NPU precision: int8 layer_info: @@ -167,37 +165,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 27 - job_id: jygzernkg + job_id: j57yr72n5 job_status: Passed torchscript_onnx_qnn: - inference_time: 689.0 - throughput: 1451.3788098693758 + inference_time: 2908.0 + throughput: 343.878954607978 estimated_peak_memory_range: - min: 73728 - max: 1239280 + min: 65536 + max: 7152192 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 25 + layers_on_npu: 31 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 25 - job_id: jqp4qv18g + total_layers: 31 + job_id: jglvm1625 job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:42:57Z' + chipset: QCS6490 Proxy + timestamp: '2024-10-14T23:34:17Z' - torchscript_onnx_tflite: - inference_time: 1766.0 - throughput: 566.2514156285391 + inference_time: 19669.0 + throughput: 50.841425593573646 estimated_peak_memory_range: - min: 12288 - max: 25904848 + min: 1699840 + max: 4295000 primary_compute_unit: NPU precision: int8 layer_info: @@ -205,37 +203,60 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 27 - job_id: jz5woq76p + job_id: jp4lr9n25 + job_status: Passed + reference_device_info: + name: RB5 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8250 Proxy + timestamp: '2024-10-14T23:34:05Z' + - torchscript_onnx_tflite: + inference_time: 1338.0 + throughput: 747.3841554559043 + estimated_peak_memory_range: + min: 1597440 + max: 50979000 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 24 + layers_on_gpu: 0 + layers_on_cpu: 3 + total_layers: 27 + job_id: jpedmlo85 job_status: Passed torchscript_onnx_qnn: - inference_time: 1117.0 - throughput: 895.2551477170994 + inference_time: 684.0 + throughput: 1461.9883040935672 estimated_peak_memory_range: - min: 65536 - max: 15966752 + min: 77824 + max: 1362768 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 25 + layers_on_npu: 31 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 25 - job_id: joprke005 + total_layers: 31 + job_id: jp2kyvx6p job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:43:01Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:34:11Z' - torchscript_onnx_tflite: - inference_time: 1365.0 - throughput: 732.6007326007326 + inference_time: 1344.0 + throughput: 744.047619047619 estimated_peak_memory_range: - min: 28672 - max: 1409808 + min: 20480 + max: 8527456 primary_compute_unit: NPU precision: int8 layer_info: @@ -243,37 +264,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 27 - job_id: jmg9vwml5 + job_id: jp14zv2np job_status: Passed torchscript_onnx_qnn: - inference_time: 691.0 - throughput: 1447.178002894356 + inference_time: 684.0 + throughput: 1461.9883040935672 estimated_peak_memory_range: - min: 73728 - max: 2423632 + min: 81920 + max: 1439032 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 25 + layers_on_npu: 31 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 25 - job_id: j0pxvy43g + total_layers: 31 + job_id: jp8qy40qp job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:42:58Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:34:14Z' - torchscript_onnx_tflite: - inference_time: 1348.0 - throughput: 741.839762611276 + inference_time: 1342.0 + throughput: 745.156482861401 estimated_peak_memory_range: min: 16384 - max: 1510760 + max: 2323080 primary_compute_unit: NPU precision: int8 layer_info: @@ -281,22 +302,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 27 - job_id: jnp10ej25 + job_id: jg9lnx0mg job_status: Passed torchscript_onnx_qnn: - inference_time: 690.0 - throughput: 1449.2753623188405 + inference_time: 686.0 + throughput: 1457.725947521866 estimated_peak_memory_range: - min: 16384 - max: 1747224 + min: 81920 + max: 1792632 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 25 + layers_on_npu: 31 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 25 - job_id: jo5mr3mdg + total_layers: 31 + job_id: jp0z0v305 job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -304,14 +325,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:42:59Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:34:13Z' - torchscript_onnx_tflite: - inference_time: 1357.0 - throughput: 736.9196757553427 + inference_time: 1351.0 + throughput: 740.1924500370096 estimated_peak_memory_range: - min: 12288 - max: 7607032 + min: 20480 + max: 1757768 primary_compute_unit: NPU precision: int8 layer_info: @@ -319,37 +340,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 27 - job_id: jvgdwo3e5 + job_id: j5we61w45 job_status: Passed torchscript_onnx_qnn: - inference_time: 692.0 - throughput: 1445.086705202312 + inference_time: 687.0 + throughput: 1455.604075691412 estimated_peak_memory_range: - min: 73728 - max: 2495152 + min: 77824 + max: 1297840 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 25 + layers_on_npu: 31 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 25 - job_id: jegn23nkg + total_layers: 31 + job_id: jpy137z0p job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:43:00Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:34:12Z' - torchscript_onnx_tflite: - inference_time: 3574.0 - throughput: 279.79854504756577 + inference_time: 1985.0 + throughput: 503.77833753148616 estimated_peak_memory_range: min: 1609728 - max: 21305424 + max: 29215200 primary_compute_unit: NPU precision: int8 layer_info: @@ -357,37 +378,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 27 - job_id: jz5woq73p + job_id: jgz3d4245 job_status: Passed torchscript_onnx_qnn: - inference_time: 3010.0 - throughput: 332.22591362126246 + inference_time: 1106.0 + throughput: 904.1591320072333 estimated_peak_memory_range: - min: 12288 - max: 7239120 + min: 61440 + max: 17044800 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 25 + layers_on_npu: 31 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 25 - job_id: jep28lwrp + total_layers: 31 + job_id: j5q6qmeep job_status: Passed reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T11:43:02Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:34:16Z' - torchscript_onnx_tflite: - inference_time: 19818.0 - throughput: 50.45917852457362 + inference_time: 1351.0 + throughput: 740.1924500370096 estimated_peak_memory_range: - min: 1675264 - max: 5049056 + min: 12288 + max: 18017408 primary_compute_unit: NPU precision: int8 layer_info: @@ -395,37 +416,67 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 27 - job_id: jmg9vwmw5 + job_id: jpxkod985 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 599.0 + throughput: 1669.449081803005 + estimated_peak_memory_range: + min: 61440 + max: 12144352 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 31 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 31 + job_id: j56y4denp + job_status: Passed + torchscript_onnx: + inference_time: 592.0 + throughput: 1689.1891891891892 + estimated_peak_memory_range: + min: 0 + max: 21257104 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 48 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 48 + job_id: jpedmle85 job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T11:42:53Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:34:23Z' - torchscript_onnx_qnn: - inference_time: 821.0 - throughput: 1218.026796589525 + inference_time: 798.0 + throughput: 1253.1328320802006 estimated_peak_memory_range: - min: 61440 - max: 61440 + min: 139264 + max: 139264 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 25 + layers_on_npu: 31 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 25 - job_id: jz57zx4vp + total_layers: 31 + job_id: jprv3n6kg job_status: Passed torchscript_onnx: - inference_time: 1207.0 - throughput: 828.5004142502071 + inference_time: 1201.0 + throughput: 832.6394671107411 estimated_peak_memory_range: - min: 3301376 - max: 3301376 + min: 3321856 + max: 3321856 primary_compute_unit: NPU precision: int8 layer_info: @@ -433,7 +484,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 48 - job_id: j1p8ozxkg + job_id: jpv6k90z5 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -442,4 +493,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:43:05Z' + timestamp: '2024-10-14T23:34:21Z' diff --git a/qai_hub_models/models/shufflenet_v2/README.md b/qai_hub_models/models/shufflenet_v2/README.md index 14b27b6d..87bbee99 100644 --- a/qai_hub_models/models/shufflenet_v2/README.md +++ b/qai_hub_models/models/shufflenet_v2/README.md @@ -6,7 +6,7 @@ ShufflenetV2 is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of Shufflenet-v2 found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/shufflenetv2.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/shufflenet_v2). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.shufflenet_v2.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Shufflenet-v2 can be found +* The license for the original implementation of Shufflenet-v2 can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design](https://arxiv.org/abs/1807.11164) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/shufflenetv2.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/shufflenet_v2/export.py b/qai_hub_models/models/shufflenet_v2/export.py index 064a14ea..3d0f97d7 100644 --- a/qai_hub_models/models/shufflenet_v2/export.py +++ b/qai_hub_models/models/shufflenet_v2/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.shufflenet_v2 import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "shufflenet_v2" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/shufflenet_v2/perf.yaml b/qai_hub_models/models/shufflenet_v2/perf.yaml index f58a25a3..040f539a 100644 --- a/qai_hub_models/models/shufflenet_v2/perf.yaml +++ b/qai_hub_models/models/shufflenet_v2/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: Shufflenet-v2 performance_metrics: - torchscript_onnx_tflite: - inference_time: 1210.0 - throughput: 826.4462809917355 + inference_time: 1201.0 + throughput: 832.6394671107411 estimated_peak_memory_range: - min: 16384 - max: 4278128 + min: 12288 + max: 1323640 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 204 - job_id: jygzer4kg + job_id: j57yrzd95 job_status: Passed torchscript_onnx_qnn: - inference_time: 775.0 - throughput: 1290.3225806451612 + inference_time: 774.0 + throughput: 1291.9896640826873 estimated_peak_memory_range: - min: 618496 - max: 6042968 + min: 16384 + max: 15972656 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 158 - job_id: j0pxvyx1g + job_id: j56y463yp job_status: Passed torchscript_onnx: - inference_time: 1088.0 - throughput: 919.1176470588235 + inference_time: 1128.0 + throughput: 886.5248226950355 estimated_peak_memory_range: - min: 651264 - max: 2178824 + min: 368640 + max: 1853424 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 223 - job_id: jogkz382g + job_id: jglvmnem5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:42:23Z' + timestamp: '2024-10-15T17:11:55Z' - torchscript_onnx_tflite: - inference_time: 971.0 - throughput: 1029.8661174047375 + inference_time: 975.0 + throughput: 1025.6410256410256 estimated_peak_memory_range: min: 12288 - max: 39435904 + max: 39986048 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 204 - job_id: jz5woq46p + job_id: jp4lrqw15 job_status: Passed torchscript_onnx_qnn: - inference_time: 516.0 - throughput: 1937.984496124031 + inference_time: 518.0 + throughput: 1930.5019305019305 estimated_peak_memory_range: - min: 0 - max: 11313536 + min: 618496 + max: 13868592 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 158 - job_id: jo5mr38wg + job_id: jgo26y1kp job_status: Passed torchscript_onnx: - inference_time: 737.0 - throughput: 1356.85210312076 + inference_time: 728.0 + throughput: 1373.6263736263736 estimated_peak_memory_range: min: 0 - max: 41810496 + max: 42560016 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 223 - job_id: jn5q83v45 + job_id: jgo26yekp job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:42:24Z' + timestamp: '2024-10-15T17:11:56Z' - torchscript_onnx_tflite: - inference_time: 1193.0 - throughput: 838.2229673093043 + inference_time: 1197.0 + throughput: 835.421888053467 estimated_peak_memory_range: - min: 12288 - max: 5973288 + min: 24576 + max: 1453192 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 204 - job_id: jmg9vwdl5 + job_id: jpxkov1l5 job_status: Passed torchscript_onnx_qnn: - inference_time: 738.0 - throughput: 1355.0135501355014 + inference_time: 732.0 + throughput: 1366.120218579235 estimated_peak_memory_range: - min: 663552 - max: 1916040 + min: 634880 + max: 1988536 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 158 - job_id: joprkew95 + job_id: j5we6odm5 job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:42:18Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T17:11:48Z' - torchscript_onnx_tflite: - inference_time: 1319.0 - throughput: 758.1501137225171 + inference_time: 1200.0 + throughput: 833.3333333333334 estimated_peak_memory_range: - min: 12288 - max: 39963808 + min: 16384 + max: 1965656 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 204 - job_id: jnp10e625 + job_id: jp2ky8mqp job_status: Passed torchscript_onnx_qnn: - inference_time: 894.0 - throughput: 1118.5682326621925 + inference_time: 741.0 + throughput: 1349.527665317139 estimated_peak_memory_range: - min: 618496 - max: 14610576 + min: 630784 + max: 1889664 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 158 - job_id: j1p8oz1xg + job_id: j5mnxrw9p job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:42:22Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T17:11:51Z' - torchscript_onnx_tflite: - inference_time: 1208.0 - throughput: 827.8145695364238 + inference_time: 1196.0 + throughput: 836.1204013377926 estimated_peak_memory_range: - min: 20480 - max: 111905904 + min: 24576 + max: 1539008 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 204 - job_id: jvgdwo2e5 + job_id: jprv3ky7g job_status: Passed torchscript_onnx_qnn: - inference_time: 752.0 - throughput: 1329.787234042553 + inference_time: 733.0 + throughput: 1364.256480218281 estimated_peak_memory_range: - min: 647168 - max: 2427880 + min: 634880 + max: 1879296 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 158 - job_id: jep28le4p + job_id: j57yrzj95 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:42:19Z' + chipset: SA8775P Proxy + timestamp: '2024-10-15T17:11:50Z' - torchscript_onnx_tflite: - inference_time: 1206.0 - throughput: 829.1873963515754 + inference_time: 1204.0 + throughput: 830.5647840531561 estimated_peak_memory_range: - min: 20480 - max: 1571024 + min: 28672 + max: 1371800 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 204 - job_id: jz57zx9lp + job_id: jgn6v2eq5 job_status: Passed torchscript_onnx_qnn: inference_time: 741.0 throughput: 1349.527665317139 estimated_peak_memory_range: - min: 647168 - max: 2308712 + min: 638976 + max: 1991128 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 158 - job_id: jqpye6m7g + job_id: jp14z0d7p job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:42:20Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T17:11:49Z' - torchscript_onnx_tflite: - inference_time: 1208.0 - throughput: 827.8145695364238 + inference_time: 1315.0 + throughput: 760.4562737642585 estimated_peak_memory_range: - min: 28672 - max: 82256768 + min: 12288 + max: 40734320 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 204 - job_id: jqp4qv3vg + job_id: j5mnxrz9p job_status: Passed torchscript_onnx_qnn: - inference_time: 736.0 - throughput: 1358.695652173913 + inference_time: 885.0 + throughput: 1129.9435028248588 estimated_peak_memory_range: - min: 634880 - max: 1859048 + min: 618496 + max: 14760976 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 158 - job_id: j2p0yl66g + job_id: jpy13e4lp job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-15T17:11:53Z' + - torchscript_onnx_tflite: + inference_time: 803.0 + throughput: 1245.3300124533 + estimated_peak_memory_range: + min: 12288 + max: 22415936 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 204 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 204 + job_id: j5q6q82op + job_status: Passed + torchscript_onnx_qnn: + inference_time: 407.0 + throughput: 2457.002457002457 + estimated_peak_memory_range: + min: 0 + max: 10565600 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 158 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 158 + job_id: jgkexzlng + job_status: Passed + torchscript_onnx: + inference_time: 786.0 + throughput: 1272.264631043257 + estimated_peak_memory_range: + min: 0 + max: 23696464 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 223 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 223 + job_id: jpedm94v5 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:42:21Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T17:11:59Z' - torchscript_onnx_qnn: - inference_time: 881.0 - throughput: 1135.0737797956867 + inference_time: 894.0 + throughput: 1118.5682326621925 estimated_peak_memory_range: min: 602112 max: 602112 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 158 - job_id: jegn23krg + job_id: jpedm9rv5 job_status: Passed torchscript_onnx: - inference_time: 1126.0 - throughput: 888.0994671403197 + inference_time: 1124.0 + throughput: 889.6797153024911 estimated_peak_memory_range: - min: 3928064 - max: 3928064 + min: 3284992 + max: 3284992 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 223 - job_id: j1gln3l8p + job_id: jpv6k3zr5 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:42:25Z' + timestamp: '2024-10-15T17:11:57Z' diff --git a/qai_hub_models/models/shufflenet_v2_quantized/README.md b/qai_hub_models/models/shufflenet_v2_quantized/README.md index f97d918f..76982825 100644 --- a/qai_hub_models/models/shufflenet_v2_quantized/README.md +++ b/qai_hub_models/models/shufflenet_v2_quantized/README.md @@ -6,7 +6,7 @@ ShufflenetV2 is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of Shufflenet-v2Quantized found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/shufflenetv2.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/shufflenet_v2_quantized). @@ -17,11 +17,6 @@ accross various devices, can be found [here](https://aihub.qualcomm.com/models/s ## Example & Usage -Install the package via pip: -```bash -pip install "qai_hub_models[shufflenet_v2_quantized]" -``` - Once installed, run the following simple CLI demo: @@ -44,15 +39,19 @@ python -m qai_hub_models.models.shufflenet_v2_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Shufflenet-v2Quantized can be found +* The license for the original implementation of Shufflenet-v2Quantized can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design](https://arxiv.org/abs/1807.11164) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/shufflenetv2.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/shufflenet_v2_quantized/evaluate.py b/qai_hub_models/models/shufflenet_v2_quantized/evaluate.py index 2fb7d9af..9989c217 100644 --- a/qai_hub_models/models/shufflenet_v2_quantized/evaluate.py +++ b/qai_hub_models/models/shufflenet_v2_quantized/evaluate.py @@ -13,10 +13,8 @@ from qai_hub_models.models.shufflenet_v2_quantized import MODEL_ID, Model from qai_hub_models.utils.args import evaluate_parser, get_hub_device, get_model_kwargs -from qai_hub_models.utils.base_model import BaseModel from qai_hub_models.utils.evaluate import evaluate_on_dataset from qai_hub_models.utils.inference import compile_model_from_args -from qai_hub_models.utils.quantization_aimet import AIMETQuantizableMixin SUPPORTED_DATASETS = ["imagenette", "imagenet"] @@ -27,7 +25,7 @@ def main(): model_cls=Model, default_split_size=2500, supported_datasets=SUPPORTED_DATASETS, - supports_ort=False, + is_hub_quantized=True, ) args = parser.parse_args() args.device = None @@ -39,13 +37,7 @@ def main(): MODEL_ID, args, get_model_kwargs(Model, vars(args)) ) hub_device = get_hub_device(None, args.chipset) - - # Use Fp16 model for torch inference - for cls in Model.__mro__: - if issubclass(cls, BaseModel) and not issubclass(cls, AIMETQuantizableMixin): - torch_cls = cls - break - torch_model = torch_cls.from_pretrained(**get_model_kwargs(torch_cls, vars(args))) + torch_model = Model.from_pretrained(**get_model_kwargs(Model, vars(args))) evaluate_on_dataset( hub_model, torch_model, diff --git a/qai_hub_models/models/shufflenet_v2_quantized/export.py b/qai_hub_models/models/shufflenet_v2_quantized/export.py index 1d90130f..7afa442b 100644 --- a/qai_hub_models/models/shufflenet_v2_quantized/export.py +++ b/qai_hub_models/models/shufflenet_v2_quantized/export.py @@ -10,18 +10,20 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.shufflenet_v2_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference +from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( print_inference_metrics, print_on_target_demo_cmd, @@ -31,11 +33,14 @@ can_access_qualcomm_ai_hub, export_without_hub_access, ) +from qai_hub_models.utils.quantization import get_calibration_data def export_model( device: str = "Samsung Galaxy S23 (Family)", chipset: Optional[str] = None, + num_calibration_samples: int = 100, + skip_compiling: bool = False, skip_profiling: bool = False, skip_inferencing: bool = False, skip_downloading: bool = False, @@ -45,20 +50,19 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + 3. Compiles the model to an asset that can be run on device + 4. Profiles the model performance on a real device + 5. Inferences the model on sample inputs + 6. Downloads the model asset to the local directory + 7. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 5 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -66,6 +70,9 @@ def export_model( Defaults to DEFAULT_DEVICE if not specified. chipset: If set, will choose a random device with this chipset. Overrides the `device` argument. + num_calibration_samples: The number of calibration data samples + to use for quantization. + skip_compiling: If set, skips compiling model to format that can run on device. skip_profiling: If set, skips profiling of compiled model on real devices. skip_inferencing: If set, skips computing on-device outputs from sample data. skip_downloading: If set, skips downloading of compiled model. @@ -80,10 +87,11 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: - * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). + A struct of: + * A CompileJob object containing metadata about the compile job submitted to hub (None if compiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). + * A QuantizeJob object containing metadata about the quantize job submitted to hub """ model_name = "shufflenet_v2_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,33 +117,52 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) ) # Trace the model - source_model = model.convert_to_hub_source_model( - target_runtime, output_path, input_spec + source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) + + print(f"Quantizing model {model_name} with {num_calibration_samples} samples.") + # 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + onnx_compile_job = hub.submit_compile_job( + model=source_model, + input_specs=input_spec, + device=hub_device, + name=model_name, + options="--target_runtime onnx", + ) + quantize_job = hub.submit_quantize_job( + model=onnx_compile_job.get_target_model(), + calibration_data=get_calibration_data( + input_spec, "imagenette", num_calibration_samples + ), + weights_dtype=model.get_weights_dtype(), + activations_dtype=model.get_activations_dtype(), + name=model_name, + options=model.get_quantize_options(), ) + if skip_compiling: + return ExportResult(quantize_job=quantize_job) - # 2. Compile the model to an on-device asset + # 3. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) print(f"Optimizing model {model_name} to run on-device") submitted_compile_job = hub.submit_compile_job( - model=source_model, + model=quantize_job.get_target_model(), input_specs=input_spec, device=hub_device, name=model_name, - calibration_data=model.get_calibration_data(target_runtime), options=model_compile_options, ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 4. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +177,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 5. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +198,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 6. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 7. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -202,12 +229,17 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + quantize_job=quantize_job, + ) def main(): warnings.filterwarnings("ignore") - parser = export_parser(model_cls=Model, supports_onnx=False) + parser = export_parser(model_cls=Model, is_hub_quantized=True) args = parser.parse_args() export_model(**vars(args)) diff --git a/qai_hub_models/models/shufflenet_v2_quantized/model.py b/qai_hub_models/models/shufflenet_v2_quantized/model.py index e77c1af3..4ad13c24 100644 --- a/qai_hub_models/models/shufflenet_v2_quantized/model.py +++ b/qai_hub_models/models/shufflenet_v2_quantized/model.py @@ -4,92 +4,11 @@ # --------------------------------------------------------------------- from __future__ import annotations -# isort: off -# This verifies aimet is installed, and this must be included first. -from qai_hub_models.utils.quantization_aimet import ( - AIMETQuantizableMixin, - constrain_quantized_inputs_to_image_range, -) - -# isort: on - -import torch -from aimet_torch.cross_layer_equalization import ( - equalize_bn_folded_model, - fold_all_batch_norms, -) -from aimet_torch.model_preparer import prepare_model -from aimet_torch.quantsim import QuantizationSimModel, load_encodings_to_sim - from qai_hub_models.models.shufflenet_v2.model import ShufflenetV2 -from qai_hub_models.utils.aimet.config_loader import get_default_aimet_config -from qai_hub_models.utils.asset_loaders import CachedWebModelAsset -from qai_hub_models.utils.quantization_aimet import ( - convert_all_depthwise_to_per_tensor, - tie_observers, -) +from qai_hub_models.utils.quantization import HubQuantizableMixin MODEL_ID = __name__.split(".")[-2] -MODEL_ASSET_VERSION = 3 -DEFAULT_ENCODINGS = "shufflenet_v2_quantized_encodings.json" - - -class ShufflenetV2Quantizable( - AIMETQuantizableMixin, - ShufflenetV2, -): - """ShufflenetV2 with post train quantization support. - - Supports only 8 bit weights and activations, and only loads pre-quantized checkpoints. - Support for quantizing using your own weights & data will come at a later date.""" - - def __init__( - self, - sim_model: QuantizationSimModel, - ) -> None: - # Input is already normalized by sim_model. Disable it in the wrapper model. - ShufflenetV2.__init__(self, sim_model.model, normalize_input=False) - AIMETQuantizableMixin.__init__( - self, - sim_model, - ) - - @classmethod - def from_pretrained( - cls, - aimet_encodings: str | None = "DEFAULT", - ) -> "ShufflenetV2Quantizable": - """ - Parameters: - aimet_encodings: - if "DEFAULT": Loads the model with aimet encodings calibrated on imagenette. - elif None: Doesn't load any encodings. Used when computing encodings. - else: Interprets as a filepath and loads the encodings stored there. - """ - model = ShufflenetV2.from_pretrained() - input_shape = cls.get_input_spec()["image_tensor"][0] - model = prepare_model(model) - dummy_input = torch.rand(input_shape) - - pairs = fold_all_batch_norms(model, input_shape, dummy_input) - equalize_bn_folded_model(model, input_shape, pairs, dummy_input) - sim = QuantizationSimModel( - model, - quant_scheme="tf_enhanced", - default_param_bw=8, - default_output_bw=8, - config_file=get_default_aimet_config(), - dummy_input=dummy_input, - ) - convert_all_depthwise_to_per_tensor(sim.model) - tie_observers(sim) - constrain_quantized_inputs_to_image_range(sim) - if aimet_encodings: - if aimet_encodings == "DEFAULT": - aimet_encodings = CachedWebModelAsset.from_asset_store( - MODEL_ID, MODEL_ASSET_VERSION, DEFAULT_ENCODINGS - ).fetch() - load_encodings_to_sim(sim, aimet_encodings) - return cls(sim) +class ShufflenetV2Quantizable(HubQuantizableMixin, ShufflenetV2): + pass diff --git a/qai_hub_models/models/shufflenet_v2_quantized/perf.yaml b/qai_hub_models/models/shufflenet_v2_quantized/perf.yaml index c6edf3a5..321614e5 100644 --- a/qai_hub_models/models/shufflenet_v2_quantized/perf.yaml +++ b/qai_hub_models/models/shufflenet_v2_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,67 +20,77 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS6490 (Proxy) - RB3 Gen 2 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Sa8775p Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Qcs6490 Proxy + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8775P Proxy models: - name: Shufflenet-v2Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 610.0 - throughput: 1639.344262295082 + inference_time: 615.0 + throughput: 1626.0162601626016 estimated_peak_memory_range: - min: 16384 - max: 2376456 + min: 12288 + max: 1543976 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 207 + layers_on_npu: 233 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 207 - job_id: jygzer8kg + total_layers: 233 + job_id: jp142868p job_status: Passed torchscript_onnx_qnn: - inference_time: 589.0 - throughput: 1697.792869269949 + inference_time: 601.0 + throughput: 1663.8935108153078 estimated_peak_memory_range: - min: 53248 - max: 75919760 + min: 12288 + max: 33104664 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 122 + layers_on_npu: 160 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 122 - job_id: jegn237rg + total_layers: 160 + job_id: jp8q271kp + job_status: Passed + torchscript_onnx: + inference_time: 8856.0 + throughput: 112.91779584462512 + estimated_peak_memory_range: + min: 2326528 + max: 6265288 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 210 + layers_on_gpu: 0 + layers_on_cpu: 5 + total_layers: 215 + job_id: j5wew9735 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -88,36 +99,51 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:41:38Z' + timestamp: '2024-10-17T17:17:47Z' - torchscript_onnx_tflite: - inference_time: 431.0 - throughput: 2320.185614849188 + inference_time: 423.0 + throughput: 2364.066193853428 estimated_peak_memory_range: min: 12288 - max: 28407328 + max: 29672160 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 207 + layers_on_npu: 233 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 207 - job_id: jz5woq16p + total_layers: 233 + job_id: jgdxnv2rp job_status: Passed torchscript_onnx_qnn: - inference_time: 430.0 - throughput: 2325.5813953488373 + inference_time: 438.0 + throughput: 2283.10502283105 estimated_peak_memory_range: min: 159744 - max: 16215744 + max: 13332544 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 122 + layers_on_npu: 160 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 122 - job_id: joprken95 + total_layers: 160 + job_id: jgkevy8wg + job_status: Passed + torchscript_onnx: + inference_time: 7974.0 + throughput: 125.4075746175069 + estimated_peak_memory_range: + min: 921600 + max: 356507072 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 210 + layers_on_gpu: 0 + layers_on_cpu: 5 + total_layers: 215 + job_id: jg9l04mwg job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -126,150 +152,173 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:41:39Z' + timestamp: '2024-10-17T17:17:49Z' - torchscript_onnx_tflite: - inference_time: 618.0 - throughput: 1618.1229773462783 + inference_time: 892.0 + throughput: 1121.0762331838564 estimated_peak_memory_range: min: 12288 - max: 1440816 + max: 22451392 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 207 + layers_on_npu: 233 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 207 - job_id: jmg9vwxl5 + total_layers: 233 + job_id: j57y2d9v5 job_status: Passed torchscript_onnx_qnn: - inference_time: 530.0 - throughput: 1886.7924528301887 + inference_time: 1210.0 + throughput: 826.4462809917355 estimated_peak_memory_range: - min: 212992 - max: 1547008 + min: 192512 + max: 7952096 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 122 + layers_on_npu: 160 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 122 - job_id: jqpye677g + total_layers: 160 + job_id: j5q602vnp job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS6490 Proxy + timestamp: '2024-10-17T17:17:33Z' + - torchscript_onnx_tflite: + inference_time: 10608.0 + throughput: 94.2684766214178 + estimated_peak_memory_range: + min: 192512 + max: 13361760 + primary_compute_unit: CPU + precision: fp32 + layer_info: + layers_on_npu: 44 + layers_on_gpu: 11 + layers_on_cpu: 178 + total_layers: 233 + job_id: jp4lnw385 + job_status: Passed + reference_device_info: + name: RB5 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:41:41Z' + chipset: QCS8250 Proxy + timestamp: '2024-10-17T17:17:18Z' - torchscript_onnx_tflite: - inference_time: 647.0 - throughput: 1545.595054095827 + inference_time: 614.0 + throughput: 1628.6644951140065 estimated_peak_memory_range: - min: 16384 - max: 28141808 + min: 12288 + max: 7154088 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 207 + layers_on_npu: 233 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 207 - job_id: jnp10ev25 + total_layers: 233 + job_id: jpxk91x35 job_status: Passed torchscript_onnx_qnn: - inference_time: 644.0 - throughput: 1552.7950310559006 + inference_time: 546.0 + throughput: 1831.5018315018315 estimated_peak_memory_range: - min: 159744 - max: 16311600 + min: 176128 + max: 1520080 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 122 + layers_on_npu: 160 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 122 - job_id: jn5q83m45 + total_layers: 160 + job_id: jglv4klj5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:41:46Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-17T17:17:35Z' - torchscript_onnx_tflite: inference_time: 611.0 throughput: 1636.6612111292961 estimated_peak_memory_range: min: 12288 - max: 33214624 + max: 1590352 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 207 + layers_on_npu: 233 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 207 - job_id: jvgdwoze5 + total_layers: 233 + job_id: j5mnez8dp job_status: Passed torchscript_onnx_qnn: - inference_time: 529.0 - throughput: 1890.359168241966 + inference_time: 544.0 + throughput: 1838.235294117647 estimated_peak_memory_range: - min: 180224 - max: 1723400 + min: 204800 + max: 1535944 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 122 + layers_on_npu: 160 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 122 - job_id: j2p0ylv6g + total_layers: 160 + job_id: jp3jnm63g job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:41:43Z' + chipset: SA8255P Proxy + timestamp: '2024-10-17T17:17:38Z' - torchscript_onnx_tflite: - inference_time: 618.0 - throughput: 1618.1229773462783 + inference_time: 616.0 + throughput: 1623.3766233766235 estimated_peak_memory_range: - min: 16384 - max: 15518960 + min: 20480 + max: 1659664 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 207 + layers_on_npu: 233 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 207 - job_id: jz57zx7lp + total_layers: 233 + job_id: jgn60ekk5 job_status: Passed torchscript_onnx_qnn: - inference_time: 530.0 - throughput: 1886.7924528301887 + inference_time: 566.0 + throughput: 1766.7844522968198 estimated_peak_memory_range: min: 184320 - max: 1483136 + max: 1791344 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 122 + layers_on_npu: 160 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 122 - job_id: j1p8oz4xg + total_layers: 160 + job_id: jgo2zv8qp job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -277,121 +326,128 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:41:43Z' + chipset: SA8775P Proxy + timestamp: '2024-10-17T17:17:40Z' - torchscript_onnx_tflite: - inference_time: 615.0 - throughput: 1626.0162601626016 + inference_time: 645.0 + throughput: 1550.3875968992247 estimated_peak_memory_range: min: 12288 - max: 1521400 + max: 30330480 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 207 + layers_on_npu: 233 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 207 - job_id: jqp4qv9vg + total_layers: 233 + job_id: jprv6yw0g job_status: Passed torchscript_onnx_qnn: - inference_time: 526.0 - throughput: 1901.1406844106464 + inference_time: 649.0 + throughput: 1540.8320493066255 estimated_peak_memory_range: - min: 212992 - max: 1577352 + min: 159744 + max: 14718512 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 122 + layers_on_npu: 160 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 122 - job_id: jogkz392g + total_layers: 160 + job_id: jpv6qwdk5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:41:45Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-17T17:17:41Z' - torchscript_onnx_tflite: - inference_time: 925.0 - throughput: 1081.081081081081 + inference_time: 466.0 + throughput: 2145.922746781116 estimated_peak_memory_range: - min: 12288 - max: 21349328 + min: 8192 + max: 21476592 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 207 + layers_on_npu: 233 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 207 - job_id: j0pxvyd1g + total_layers: 233 + job_id: jp2kxmerp job_status: Passed torchscript_onnx_qnn: - inference_time: 1119.0 - throughput: 893.6550491510277 + inference_time: 382.0 + throughput: 2617.801047120419 estimated_peak_memory_range: - min: 12288 - max: 7590464 + min: 159744 + max: 10601808 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 122 + layers_on_npu: 160 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 122 - job_id: j1gln318p + total_layers: 160 + job_id: jgjvdl7vg job_status: Passed - reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot - os_name: Android - manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T11:41:47Z' - - torchscript_onnx_tflite: - inference_time: 10938.0 - throughput: 91.42439202779302 + torchscript_onnx: + inference_time: 6292.0 + throughput: 158.93197711379528 estimated_peak_memory_range: - min: 12288 - max: 9429816 - primary_compute_unit: CPU - precision: fp32 + min: 0 + max: 287486368 + primary_compute_unit: NPU + precision: int8 layer_info: - layers_on_npu: 44 - layers_on_gpu: 9 - layers_on_cpu: 154 - total_layers: 207 - job_id: jo5mr3dwg + layers_on_npu: 210 + layers_on_gpu: 0 + layers_on_cpu: 5 + total_layers: 215 + job_id: jgdxnv3rp job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T11:41:37Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-17T17:17:52Z' - torchscript_onnx_qnn: - inference_time: 661.0 - throughput: 1512.8593040847202 + inference_time: 681.0 + throughput: 1468.4287812041116 estimated_peak_memory_range: - min: 532480 - max: 532480 + min: 630784 + max: 630784 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 122 + layers_on_npu: 160 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 122 - job_id: jep28lv4p + total_layers: 160 + job_id: j56y21w6p + job_status: Passed + torchscript_onnx: + inference_time: 10337.0 + throughput: 96.73986649898423 + estimated_peak_memory_range: + min: 6778880 + max: 6778880 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 210 + layers_on_gpu: 0 + layers_on_cpu: 5 + total_layers: 215 + job_id: jp1428j8p job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -400,4 +456,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:41:40Z' + timestamp: '2024-10-17T17:17:51Z' diff --git a/qai_hub_models/models/shufflenet_v2_quantized/requirements.txt b/qai_hub_models/models/shufflenet_v2_quantized/requirements.txt deleted file mode 100644 index de5b80e8..00000000 --- a/qai_hub_models/models/shufflenet_v2_quantized/requirements.txt +++ /dev/null @@ -1 +0,0 @@ -aimet-torch==1.32.1.post1; sys_platform == "linux" diff --git a/qai_hub_models/models/shufflenet_v2_quantized/test.py b/qai_hub_models/models/shufflenet_v2_quantized/test.py deleted file mode 100644 index 995731eb..00000000 --- a/qai_hub_models/models/shufflenet_v2_quantized/test.py +++ /dev/null @@ -1,29 +0,0 @@ -# --------------------------------------------------------------------- -# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. -# SPDX-License-Identifier: BSD-3-Clause -# --------------------------------------------------------------------- -from qai_hub_models.models._shared.imagenet_classifier.test_utils import ( - run_imagenet_classifier_test, -) -from qai_hub_models.models.shufflenet_v2_quantized.demo import main as demo_main -from qai_hub_models.models.shufflenet_v2_quantized.model import ( - MODEL_ASSET_VERSION, - MODEL_ID, - ShufflenetV2Quantizable, -) - - -def test_task(): - run_imagenet_classifier_test( - ShufflenetV2Quantizable.from_pretrained(), - MODEL_ID, - asset_version=MODEL_ASSET_VERSION, - diff_tol=0.005, - rtol=0.02, - atol=0.2, - ) - - -def test_demo(): - # Verify demo does not crash - demo_main(is_test=True) diff --git a/qai_hub_models/models/sinet/README.md b/qai_hub_models/models/sinet/README.md index 388eec6c..eb12f3f7 100644 --- a/qai_hub_models/models/sinet/README.md +++ b/qai_hub_models/models/sinet/README.md @@ -6,7 +6,7 @@ SINet is a machine learning model that is designed to segment people from close-up portrait images in real time. This is based on the implementation of SINet found -[here](https://github.com/clovaai/ext_portrait_segmentation). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/sinet). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.sinet.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of SINet can be found +* The license for the original implementation of SINet can be found [here](https://github.com/clovaai/ext_portrait_segmentation/blob/master/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [SINet: Extreme Lightweight Portrait Segmentation Networks with Spatial Squeeze Modules and Information Blocking Decoder](https://arxiv.org/abs/1911.09099) * [Source Model Implementation](https://github.com/clovaai/ext_portrait_segmentation) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/sinet/export.py b/qai_hub_models/models/sinet/export.py index 56e13737..8d0f710f 100644 --- a/qai_hub_models/models/sinet/export.py +++ b/qai_hub_models/models/sinet/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.sinet import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "sinet" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/sinet/perf.yaml b/qai_hub_models/models/sinet/perf.yaml index 1b53cb2a..e6956bfb 100644 --- a/qai_hub_models/models/sinet/perf.yaml +++ b/qai_hub_models/models/sinet/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: SINet performance_metrics: - torchscript_onnx_tflite: - inference_time: 1743.0 - throughput: 573.7234652897304 + inference_time: 1753.0 + throughput: 570.4506560182544 estimated_peak_memory_range: - min: 12288 - max: 4302648 + min: 28672 + max: 7061232 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 240 - job_id: jz5woq86p + job_id: jgn6voyq5 job_status: Passed torchscript_onnx_qnn: - inference_time: 1175.0 - throughput: 851.063829787234 + inference_time: 1189.0 + throughput: 841.0428931875525 estimated_peak_memory_range: - min: 626688 - max: 5231632 + min: 622592 + max: 5973080 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 186 - job_id: jo5mr3owg + job_id: j56y4rlyp job_status: Passed torchscript_onnx: - inference_time: 2305.0 - throughput: 433.83947939262475 + inference_time: 2281.0 + throughput: 438.4042086804033 estimated_peak_memory_range: - min: 290816 - max: 2259456 + min: 335872 + max: 2129016 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 229 - job_id: jn5q83z45 + job_id: jgdx18lzp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:40:54Z' + timestamp: '2024-10-14T23:31:46Z' - torchscript_onnx_tflite: - inference_time: 1142.0 - throughput: 875.6567425569177 + inference_time: 1164.0 + throughput: 859.106529209622 estimated_peak_memory_range: min: 12288 - max: 31739376 + max: 33021696 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 240 - job_id: jmg9vwkl5 + job_id: jprv3oq7g job_status: Passed torchscript_onnx_qnn: - inference_time: 808.0 - throughput: 1237.6237623762377 + inference_time: 809.0 + throughput: 1236.0939431396787 estimated_peak_memory_range: min: 618496 - max: 14828816 + max: 16535200 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 186 - job_id: jegn23org + job_id: jp3j0x2ng job_status: Passed torchscript_onnx: - inference_time: 1984.0 - throughput: 504.03225806451616 + inference_time: 1539.0 + throughput: 649.772579597141 estimated_peak_memory_range: min: 0 - max: 35618128 + max: 37479840 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 229 - job_id: j1gln3o8p + job_id: j5we68n45 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:40:55Z' + timestamp: '2024-10-14T23:31:47Z' - torchscript_onnx_tflite: - inference_time: 1721.0 - throughput: 581.0575246949448 + inference_time: 1732.0 + throughput: 577.3672055427252 estimated_peak_memory_range: min: 12288 - max: 5637536 + max: 2591184 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 240 - job_id: jnp10e725 + job_id: jp2ky46qp job_status: Passed torchscript_onnx_qnn: - inference_time: 1160.0 - throughput: 862.0689655172414 + inference_time: 1157.0 + throughput: 864.304235090752 estimated_peak_memory_range: - min: 647168 - max: 1964256 + min: 634880 + max: 2772960 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 186 - job_id: jep28l44p + job_id: jpv6kexr5 job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:40:49Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:31:39Z' - torchscript_onnx_tflite: - inference_time: 1893.0 - throughput: 528.2620179609086 + inference_time: 1749.0 + throughput: 571.7552887364208 estimated_peak_memory_range: - min: 12288 - max: 31468624 + min: 32768 + max: 1593152 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 240 - job_id: jvgdwo8e5 + job_id: jgkexonng job_status: Passed torchscript_onnx_qnn: - inference_time: 1335.0 - throughput: 749.0636704119851 + inference_time: 1178.0 + throughput: 848.8964346349745 estimated_peak_memory_range: - min: 622592 - max: 16541776 + min: 630784 + max: 1806512 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 186 - job_id: jogkz3o2g + job_id: jgz3d8kx5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:40:53Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:31:42Z' - torchscript_onnx_tflite: - inference_time: 1768.0 - throughput: 565.6108597285067 + inference_time: 1754.0 + throughput: 570.1254275940707 estimated_peak_memory_range: - min: 40960 - max: 23496256 + min: 28672 + max: 1526024 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 240 - job_id: jz57zxklp + job_id: jp8qy69op job_status: Passed torchscript_onnx_qnn: - inference_time: 1155.0 - throughput: 865.8008658008658 + inference_time: 1183.0 + throughput: 845.30853761623 estimated_peak_memory_range: min: 634880 - max: 2041248 + max: 2031744 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 186 - job_id: jqpye6q7g + job_id: jpedm83v5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:40:50Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:31:41Z' - torchscript_onnx_tflite: - inference_time: 1735.0 - throughput: 576.3688760806916 + inference_time: 1746.0 + throughput: 572.737686139748 estimated_peak_memory_range: min: 12288 - max: 7947984 + max: 2599136 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 240 - job_id: jqp4qvmvg + job_id: jp0z0dqn5 job_status: Passed torchscript_onnx_qnn: - inference_time: 1162.0 - throughput: 860.5851979345955 + inference_time: 1180.0 + throughput: 847.457627118644 estimated_peak_memory_range: - min: 638976 - max: 2347704 + min: 634880 + max: 1988320 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 186 - job_id: j2p0yld6g + job_id: jgjvno4eg job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:40:51Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:31:40Z' - torchscript_onnx_tflite: - inference_time: 1742.0 - throughput: 574.052812858783 + inference_time: 1884.0 + throughput: 530.7855626326964 estimated_peak_memory_range: - min: 28672 - max: 8158144 + min: 12288 + max: 33257216 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 240 - job_id: j0pxvy31g + job_id: jpy13qwlp job_status: Passed torchscript_onnx_qnn: - inference_time: 1164.0 - throughput: 859.106529209622 + inference_time: 1316.0 + throughput: 759.8784194528876 estimated_peak_memory_range: - min: 638976 - max: 2058072 + min: 618496 + max: 18202544 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 186 - job_id: j1p8oz6xg + job_id: jg9lnke8g job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:31:44Z' + - torchscript_onnx_tflite: + inference_time: 1138.0 + throughput: 878.7346221441124 + estimated_peak_memory_range: + min: 12288 + max: 23124416 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 240 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 240 + job_id: jglvmorm5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 770.0 + throughput: 1298.7012987012988 + estimated_peak_memory_range: + min: 0 + max: 12719056 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 186 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 186 + job_id: jp14z7x7p + job_status: Passed + torchscript_onnx: + inference_time: 1526.0 + throughput: 655.307994757536 + estimated_peak_memory_range: + min: 0 + max: 25661488 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 229 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 229 + job_id: jgdx18l6p + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:40:52Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:31:50Z' - torchscript_onnx_qnn: - inference_time: 1344.0 - throughput: 744.047619047619 + inference_time: 1350.0 + throughput: 740.7407407407408 estimated_peak_memory_range: min: 602112 max: 602112 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 186 - job_id: joprkeo95 + job_id: jgo26oqkp job_status: Passed torchscript_onnx: - inference_time: 2371.0 - throughput: 421.76296921130324 + inference_time: 2428.0 + throughput: 411.8616144975288 estimated_peak_memory_range: - min: 1720320 - max: 1720320 + min: 1765376 + max: 1765376 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 229 - job_id: jw566nr05 + job_id: jg9lnkemg job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:40:56Z' + timestamp: '2024-10-14T23:31:48Z' diff --git a/qai_hub_models/models/squeezenet1_1/README.md b/qai_hub_models/models/squeezenet1_1/README.md index 99b00954..71ce0fa4 100644 --- a/qai_hub_models/models/squeezenet1_1/README.md +++ b/qai_hub_models/models/squeezenet1_1/README.md @@ -6,7 +6,7 @@ SqueezeNet is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of SqueezeNet-1_1 found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/squeezenet.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/squeezenet1_1). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.squeezenet1_1.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of SqueezeNet-1_1 can be found +* The license for the original implementation of SqueezeNet-1_1 can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size](https://arxiv.org/abs/1602.07360) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/squeezenet.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/squeezenet1_1/export.py b/qai_hub_models/models/squeezenet1_1/export.py index cd34e8b3..bd810a79 100644 --- a/qai_hub_models/models/squeezenet1_1/export.py +++ b/qai_hub_models/models/squeezenet1_1/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.squeezenet1_1 import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "squeezenet1_1" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/squeezenet1_1/perf.yaml b/qai_hub_models/models/squeezenet1_1/perf.yaml index 6e1bb0c3..eafa7adc 100644 --- a/qai_hub_models/models/squeezenet1_1/perf.yaml +++ b/qai_hub_models/models/squeezenet1_1/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: SqueezeNet-1_1 performance_metrics: - torchscript_onnx_tflite: - inference_time: 643.0 - throughput: 1555.2099533437015 + inference_time: 641.0 + throughput: 1560.0624024960998 estimated_peak_memory_range: min: 16384 - max: 1992976 + max: 2365544 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 41 - job_id: jnp10e325 + job_id: jgdx1worp job_status: Passed torchscript_onnx_qnn: - inference_time: 713.0 - throughput: 1402.5245441795232 + inference_time: 710.0 + throughput: 1408.4507042253522 estimated_peak_memory_range: - min: 618496 - max: 3591720 + min: 634880 + max: 6722776 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 70 - job_id: joprke995 + job_id: jgn6v23q5 job_status: Passed torchscript_onnx: - inference_time: 658.0 - throughput: 1519.756838905775 + inference_time: 653.0 + throughput: 1531.3935681470139 estimated_peak_memory_range: min: 12288 - max: 3840744 + max: 41556344 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 71 - job_id: jw566nv05 + job_id: jp3j0kmng job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:40:16Z' + timestamp: '2024-10-15T17:11:13Z' - torchscript_onnx_tflite: - inference_time: 556.0 - throughput: 1798.5611510791366 + inference_time: 461.0 + throughput: 2169.1973969631235 estimated_peak_memory_range: - min: 12288 - max: 25694912 + min: 16384 + max: 27001168 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 41 - job_id: jvgdwo0e5 + job_id: j5we6oqm5 job_status: Passed torchscript_onnx_qnn: - inference_time: 510.0 - throughput: 1960.7843137254902 + inference_time: 512.0 + throughput: 1953.125 estimated_peak_memory_range: min: 618496 - max: 14093824 + max: 12746688 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 70 - job_id: jep28lj4p + job_id: jprv3ke7g job_status: Passed torchscript_onnx: - inference_time: 491.0 - throughput: 2036.6598778004072 + inference_time: 578.0 + throughput: 1730.1038062283737 estimated_peak_memory_range: min: 0 - max: 28459728 + max: 28721200 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 71 - job_id: j1p3ke8l5 + job_id: jgo26yvkp job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:40:17Z' + timestamp: '2024-10-15T17:11:14Z' - torchscript_onnx_tflite: - inference_time: 642.0 - throughput: 1557.632398753894 + inference_time: 640.0 + throughput: 1562.5 estimated_peak_memory_range: - min: 20480 - max: 1352552 + min: 12288 + max: 1821560 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 41 - job_id: jz57zx6lp + job_id: jg9lnvw8g job_status: Passed torchscript_onnx_qnn: - inference_time: 647.0 - throughput: 1545.595054095827 + inference_time: 645.0 + throughput: 1550.3875968992247 estimated_peak_memory_range: - min: 630784 - max: 1875288 + min: 667648 + max: 1987184 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 70 - job_id: j2p0ylk6g + job_id: jpy13e6lp job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:40:12Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-15T17:11:07Z' - torchscript_onnx_tflite: - inference_time: 809.0 - throughput: 1236.0939431396787 + inference_time: 641.0 + throughput: 1560.0624024960998 estimated_peak_memory_range: min: 16384 - max: 27268672 + max: 6654104 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 41 - job_id: jqp4qv8vg + job_id: jp4lrqv15 job_status: Passed torchscript_onnx_qnn: - inference_time: 893.0 - throughput: 1119.8208286674133 + inference_time: 654.0 + throughput: 1529.051987767584 estimated_peak_memory_range: - min: 618496 - max: 15241280 + min: 622592 + max: 1958368 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 70 - job_id: j1gln378p + job_id: jgkexz3ng job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:40:16Z' + chipset: SA8255P Proxy + timestamp: '2024-10-15T17:11:10Z' - torchscript_onnx_tflite: inference_time: 640.0 throughput: 1562.5 estimated_peak_memory_range: - min: 20480 - max: 72121952 + min: 16384 + max: 2603568 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,52 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 41 - job_id: j0pxvym1g + job_id: j57yrzx95 job_status: Passed torchscript_onnx_qnn: - inference_time: 660.0 - throughput: 1515.1515151515152 + inference_time: 644.0 + throughput: 1552.7950310559006 + estimated_peak_memory_range: + min: 323584 + max: 1672120 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 70 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 70 + job_id: jp8qyozop + job_status: Passed + reference_device_info: + name: SA8775 (Proxy) + os: '13' + form_factor: Auto + os_name: Android + manufacturer: Qualcomm + chipset: SA8775P Proxy + timestamp: '2024-10-15T17:11:09Z' + - torchscript_onnx_tflite: + inference_time: 639.0 + throughput: 1564.9452269170579 + estimated_peak_memory_range: + min: 12288 + max: 1913520 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 41 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 41 + job_id: jgdx1wozp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 642.0 + throughput: 1557.632398753894 estimated_peak_memory_range: min: 634880 - max: 1896816 + max: 2400840 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,7 +291,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 70 - job_id: j1p8oz8xg + job_id: jp0z0yln5 job_status: Passed reference_device_info: name: SA8650 (Proxy) @@ -263,14 +299,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:40:13Z' + chipset: SA8650P Proxy + timestamp: '2024-10-15T17:11:08Z' - torchscript_onnx_tflite: - inference_time: 644.0 - throughput: 1552.7950310559006 + inference_time: 813.0 + throughput: 1230.0123001230013 estimated_peak_memory_range: min: 16384 - max: 7319112 + max: 28305552 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 41 - job_id: jo5mr34wg + job_id: jp14z0e7p job_status: Passed torchscript_onnx_qnn: - inference_time: 654.0 - throughput: 1529.051987767584 + inference_time: 891.0 + throughput: 1122.334455667789 estimated_peak_memory_range: - min: 643072 - max: 1873984 + min: 618496 + max: 15094096 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +329,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 70 - job_id: jogkz3d2g + job_id: jglvmnkm5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:40:14Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-15T17:11:11Z' - torchscript_onnx_tflite: - inference_time: 647.0 - throughput: 1545.595054095827 + inference_time: 431.0 + throughput: 2320.185614849188 estimated_peak_memory_range: - min: 12288 - max: 1333008 + min: 8192 + max: 16302400 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +352,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 41 - job_id: jegn23xrg + job_id: j5mnxr39p job_status: Passed torchscript_onnx_qnn: - inference_time: 658.0 - throughput: 1519.756838905775 + inference_time: 391.0 + throughput: 2557.544757033248 estimated_peak_memory_range: - min: 626688 - max: 2149952 + min: 614400 + max: 9250464 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +367,34 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 70 - job_id: jn5q83w45 + job_id: j56y461yp + job_status: Passed + torchscript_onnx: + inference_time: 420.0 + throughput: 2380.9523809523807 + estimated_peak_memory_range: + min: 0 + max: 17425504 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 71 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 71 + job_id: jpedm9vv5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) - os: '13' - form_factor: Auto + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:40:15Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-15T17:11:16Z' - torchscript_onnx_qnn: - inference_time: 780.0 - throughput: 1282.051282051282 + inference_time: 784.0 + throughput: 1275.5102040816328 estimated_peak_memory_range: min: 602112 max: 602112 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 70 - job_id: jqpye6n7g + job_id: jp2ky8lqp job_status: Passed torchscript_onnx: - inference_time: 695.0 - throughput: 1438.8489208633093 + inference_time: 697.0 + throughput: 1434.7202295552368 estimated_peak_memory_range: - min: 2756608 - max: 2756608 + min: 2879488 + max: 2879488 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 71 - job_id: jwgoy3mx5 + job_id: jpv6k3wr5 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:40:18Z' + timestamp: '2024-10-15T17:11:15Z' diff --git a/qai_hub_models/models/squeezenet1_1_quantized/README.md b/qai_hub_models/models/squeezenet1_1_quantized/README.md index daed51c8..063e6f42 100644 --- a/qai_hub_models/models/squeezenet1_1_quantized/README.md +++ b/qai_hub_models/models/squeezenet1_1_quantized/README.md @@ -6,7 +6,7 @@ SqueezeNet is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of SqueezeNet-1_1Quantized found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/squeezenet.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/squeezenet1_1_quantized). @@ -17,11 +17,6 @@ accross various devices, can be found [here](https://aihub.qualcomm.com/models/s ## Example & Usage -Install the package via pip: -```bash -pip install "qai_hub_models[squeezenet1_1_quantized]" -``` - Once installed, run the following simple CLI demo: @@ -44,15 +39,19 @@ python -m qai_hub_models.models.squeezenet1_1_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of SqueezeNet-1_1Quantized can be found +* The license for the original implementation of SqueezeNet-1_1Quantized can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size](https://arxiv.org/abs/1602.07360) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/squeezenet.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/squeezenet1_1_quantized/evaluate.py b/qai_hub_models/models/squeezenet1_1_quantized/evaluate.py index bdaf6536..6d914b65 100644 --- a/qai_hub_models/models/squeezenet1_1_quantized/evaluate.py +++ b/qai_hub_models/models/squeezenet1_1_quantized/evaluate.py @@ -13,10 +13,8 @@ from qai_hub_models.models.squeezenet1_1_quantized import MODEL_ID, Model from qai_hub_models.utils.args import evaluate_parser, get_hub_device, get_model_kwargs -from qai_hub_models.utils.base_model import BaseModel from qai_hub_models.utils.evaluate import evaluate_on_dataset from qai_hub_models.utils.inference import compile_model_from_args -from qai_hub_models.utils.quantization_aimet import AIMETQuantizableMixin SUPPORTED_DATASETS = ["imagenette", "imagenet"] @@ -27,6 +25,7 @@ def main(): model_cls=Model, default_split_size=2500, supported_datasets=SUPPORTED_DATASETS, + is_hub_quantized=True, ) args = parser.parse_args() args.device = None @@ -38,13 +37,7 @@ def main(): MODEL_ID, args, get_model_kwargs(Model, vars(args)) ) hub_device = get_hub_device(None, args.chipset) - - # Use Fp16 model for torch inference - for cls in Model.__mro__: - if issubclass(cls, BaseModel) and not issubclass(cls, AIMETQuantizableMixin): - torch_cls = cls - break - torch_model = torch_cls.from_pretrained(**get_model_kwargs(torch_cls, vars(args))) + torch_model = Model.from_pretrained(**get_model_kwargs(Model, vars(args))) evaluate_on_dataset( hub_model, torch_model, diff --git a/qai_hub_models/models/squeezenet1_1_quantized/export.py b/qai_hub_models/models/squeezenet1_1_quantized/export.py index f17f90f8..8b5f0275 100644 --- a/qai_hub_models/models/squeezenet1_1_quantized/export.py +++ b/qai_hub_models/models/squeezenet1_1_quantized/export.py @@ -10,18 +10,20 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.squeezenet1_1_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference +from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( print_inference_metrics, print_on_target_demo_cmd, @@ -31,11 +33,14 @@ can_access_qualcomm_ai_hub, export_without_hub_access, ) +from qai_hub_models.utils.quantization import get_calibration_data def export_model( device: str = "Samsung Galaxy S23 (Family)", chipset: Optional[str] = None, + num_calibration_samples: int = 100, + skip_compiling: bool = False, skip_profiling: bool = False, skip_inferencing: bool = False, skip_downloading: bool = False, @@ -45,20 +50,19 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + 3. Compiles the model to an asset that can be run on device + 4. Profiles the model performance on a real device + 5. Inferences the model on sample inputs + 6. Downloads the model asset to the local directory + 7. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 5 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -66,6 +70,9 @@ def export_model( Defaults to DEFAULT_DEVICE if not specified. chipset: If set, will choose a random device with this chipset. Overrides the `device` argument. + num_calibration_samples: The number of calibration data samples + to use for quantization. + skip_compiling: If set, skips compiling model to format that can run on device. skip_profiling: If set, skips profiling of compiled model on real devices. skip_inferencing: If set, skips computing on-device outputs from sample data. skip_downloading: If set, skips downloading of compiled model. @@ -80,10 +87,11 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: - * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). + A struct of: + * A CompileJob object containing metadata about the compile job submitted to hub (None if compiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). + * A QuantizeJob object containing metadata about the quantize job submitted to hub """ model_name = "squeezenet1_1_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,33 +117,52 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) ) # Trace the model - source_model = model.convert_to_hub_source_model( - target_runtime, output_path, input_spec + source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) + + print(f"Quantizing model {model_name} with {num_calibration_samples} samples.") + # 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + onnx_compile_job = hub.submit_compile_job( + model=source_model, + input_specs=input_spec, + device=hub_device, + name=model_name, + options="--target_runtime onnx", + ) + quantize_job = hub.submit_quantize_job( + model=onnx_compile_job.get_target_model(), + calibration_data=get_calibration_data( + input_spec, "imagenette", num_calibration_samples + ), + weights_dtype=model.get_weights_dtype(), + activations_dtype=model.get_activations_dtype(), + name=model_name, + options=model.get_quantize_options(), ) + if skip_compiling: + return ExportResult(quantize_job=quantize_job) - # 2. Compile the model to an on-device asset + # 3. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) print(f"Optimizing model {model_name} to run on-device") submitted_compile_job = hub.submit_compile_job( - model=source_model, + model=quantize_job.get_target_model(), input_specs=input_spec, device=hub_device, name=model_name, - calibration_data=model.get_calibration_data(target_runtime), options=model_compile_options, ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 4. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +177,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 5. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +198,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 6. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 7. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -198,12 +225,17 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + quantize_job=quantize_job, + ) def main(): warnings.filterwarnings("ignore") - parser = export_parser(model_cls=Model) + parser = export_parser(model_cls=Model, is_hub_quantized=True) args = parser.parse_args() export_model(**vars(args)) diff --git a/qai_hub_models/models/squeezenet1_1_quantized/model.py b/qai_hub_models/models/squeezenet1_1_quantized/model.py index 197d101b..457ee0c5 100644 --- a/qai_hub_models/models/squeezenet1_1_quantized/model.py +++ b/qai_hub_models/models/squeezenet1_1_quantized/model.py @@ -4,100 +4,12 @@ # --------------------------------------------------------------------- from __future__ import annotations -# isort: off -# This verifies aimet is installed, and this must be included first. -from qai_hub_models.utils.quantization_aimet import ( - AIMETQuantizableMixin, - constrain_quantized_inputs_to_image_range, -) - -# isort: on - -from typing import Optional - -import torch -from aimet_torch.cross_layer_equalization import equalize_model -from aimet_torch.model_preparer import prepare_model -from aimet_torch.quantsim import QuantizationSimModel, load_encodings_to_sim -from qai_hub import Device - -from qai_hub_models.models.common import TargetRuntime from qai_hub_models.models.squeezenet1_1.model import SqueezeNet -from qai_hub_models.utils.aimet.config_loader import get_default_aimet_config -from qai_hub_models.utils.asset_loaders import CachedWebModelAsset +from qai_hub_models.utils.quantization import HubQuantizableMixin MODEL_ID = __name__.split(".")[-2] -MODEL_ASSET_VERSION = 3 -DEFAULT_ENCODINGS = "squeezenet1_1_quantized_encodings.json" - - -class SqueezeNetQuantizable(AIMETQuantizableMixin, SqueezeNet): - """SqueezeNet with post train quantization support. - - Supports only 8 bit weights and activations, and only loads pre-quantized checkpoints. - Support for quantizing using your own weights & data will come at a later date.""" - - def __init__( - self, - sim_model: QuantizationSimModel, - ) -> None: - # Input is already normalized by sim_model. Disable it in the wrapper model. - SqueezeNet.__init__(self, sim_model.model, normalize_input=False) - AIMETQuantizableMixin.__init__( - self, - sim_model, - needs_onnx_direct_aimet_export=True, - ) - - @classmethod - def from_pretrained( - cls, - aimet_encodings: str | None = "DEFAULT", - ) -> "SqueezeNetQuantizable": - """ - Parameters: - aimet_encodings: - if "DEFAULT": Loads the model with aimet encodings calibrated on imagenette. - elif None: Doesn't load any encodings. Used when computing encodings. - else: Interprets as a filepath and loads the encodings stored there. - """ - model = SqueezeNet.from_pretrained() - input_shape = cls.get_input_spec()["image_tensor"][0] - - model = prepare_model(model) - equalize_model(model, input_shape) - sim = QuantizationSimModel( - model, - quant_scheme="tf_enhanced", - default_param_bw=8, - default_output_bw=8, - config_file=get_default_aimet_config(), - dummy_input=torch.rand(input_shape), - ) - constrain_quantized_inputs_to_image_range(sim) - - if aimet_encodings: - if aimet_encodings == "DEFAULT": - aimet_encodings = CachedWebModelAsset.from_asset_store( - MODEL_ID, MODEL_ASSET_VERSION, DEFAULT_ENCODINGS - ).fetch() - load_encodings_to_sim(sim, aimet_encodings) - return cls(sim) - # TODO(12424) remove this once encodings export correctly - def get_hub_compile_options( - self, - target_runtime: TargetRuntime, - other_compile_options: str = "", - device: Optional[Device] = None, - ) -> str: - compile_options = super().get_hub_compile_options( - target_runtime, other_compile_options, device - ) - if target_runtime not in [ - TargetRuntime.ONNX, - TargetRuntime.PRECOMPILED_QNN_ONNX, - ]: - compile_options += " --quantize_full_type int8" - return compile_options +class SqueezeNetQuantizable(HubQuantizableMixin, SqueezeNet): + def get_quantize_options(self) -> str: + return "--range_scheme min_max" diff --git a/qai_hub_models/models/squeezenet1_1_quantized/perf.yaml b/qai_hub_models/models/squeezenet1_1_quantized/perf.yaml index 2240e932..30462525 100644 --- a/qai_hub_models/models/squeezenet1_1_quantized/perf.yaml +++ b/qai_hub_models/models/squeezenet1_1_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,44 +20,39 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS6490 (Proxy) - RB3 Gen 2 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Sa8775p Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Qcs6490 Proxy + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8775P Proxy models: - name: SqueezeNet-1_1Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 197.0 - throughput: 5076.1421319796955 + inference_time: 205.0 + throughput: 4878.048780487805 estimated_peak_memory_range: - min: 28672 - max: 1522728 + min: 20480 + max: 71348624 primary_compute_unit: NPU precision: int8 layer_info: @@ -64,37 +60,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 43 - job_id: jz57zxvrp + job_id: jp8q276kp job_status: Passed torchscript_onnx_qnn: - inference_time: 462.0 - throughput: 2164.5021645021643 + inference_time: 466.0 + throughput: 2145.922746781116 estimated_peak_memory_range: - min: 12288 - max: 2858688 + min: 172032 + max: 3320704 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 45 + layers_on_npu: 71 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 45 - job_id: jogkz3rog + total_layers: 71 + job_id: j5wew9135 job_status: Passed torchscript_onnx: - inference_time: 620.0 - throughput: 1612.9032258064517 + inference_time: 467.0 + throughput: 2141.3276231263385 estimated_peak_memory_range: - min: 77824 - max: 17339224 + min: 167936 + max: 1721136 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 73 + layers_on_npu: 47 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 73 - job_id: jygzerv6g + total_layers: 47 + job_id: jpy1zd78p job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -103,13 +99,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:39:40Z' + timestamp: '2024-10-17T17:16:33Z' - torchscript_onnx_tflite: - inference_time: 146.0 - throughput: 6849.315068493151 + inference_time: 200.0 + throughput: 5000.0 estimated_peak_memory_range: - min: 0 - max: 26175056 + min: 12288 + max: 27490512 primary_compute_unit: NPU precision: int8 layer_info: @@ -117,37 +113,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 43 - job_id: jqp4qvjlg + job_id: jgkevyowg job_status: Passed torchscript_onnx_qnn: - inference_time: 348.0 - throughput: 2873.5632183908046 + inference_time: 344.0 + throughput: 2906.9767441860463 estimated_peak_memory_range: min: 163840 - max: 14540416 + max: 12878832 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 45 + layers_on_npu: 71 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 45 - job_id: jn5q839m5 + total_layers: 71 + job_id: jg9l04xwg job_status: Passed torchscript_onnx: - inference_time: 568.0 - throughput: 1760.5633802816901 + inference_time: 437.0 + throughput: 2288.329519450801 estimated_peak_memory_range: - min: 12288 - max: 27792848 + min: 28672 + max: 31185840 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 73 + layers_on_npu: 47 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 73 - job_id: jz5woqmjp + total_layers: 47 + job_id: jp0z4rv95 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -156,13 +152,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:39:41Z' + timestamp: '2024-10-17T17:16:35Z' - torchscript_onnx_tflite: - inference_time: 203.0 - throughput: 4926.108374384236 + inference_time: 493.0 + throughput: 2028.3975659229209 estimated_peak_memory_range: min: 12288 - max: 71271896 + max: 17850320 primary_compute_unit: NPU precision: int8 layer_info: @@ -170,37 +166,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 43 - job_id: j0pxvye9g + job_id: j5q602znp job_status: Passed torchscript_onnx_qnn: - inference_time: 434.0 - throughput: 2304.147465437788 + inference_time: 997.0 + throughput: 1003.0090270812437 estimated_peak_memory_range: - min: 184320 - max: 1459560 + min: 12288 + max: 8075200 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 45 + layers_on_npu: 71 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 45 - job_id: jw566nq75 + total_layers: 71 + job_id: jp1428v8p job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:39:34Z' + chipset: QCS6490 Proxy + timestamp: '2024-10-17T17:16:19Z' - torchscript_onnx_tflite: - inference_time: 238.0 - throughput: 4201.680672268908 + inference_time: 4154.0 + throughput: 240.73182474723157 estimated_peak_memory_range: - min: 16384 - max: 26893664 + min: 122880 + max: 7024328 primary_compute_unit: NPU precision: int8 layer_info: @@ -208,37 +204,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 43 - job_id: jo5mr3vqg - job_status: Passed - torchscript_onnx_qnn: - inference_time: 529.0 - throughput: 1890.359168241966 - estimated_peak_memory_range: - min: 163840 - max: 14734320 - primary_compute_unit: NPU - precision: int8 - layer_info: - layers_on_npu: 45 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 45 - job_id: j7gjxek8p + job_id: jglv4koj5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: RB5 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:39:38Z' + chipset: QCS8250 Proxy + timestamp: '2024-10-17T17:16:04Z' - torchscript_onnx_tflite: - inference_time: 204.0 - throughput: 4901.9607843137255 + inference_time: 205.0 + throughput: 4878.048780487805 estimated_peak_memory_range: min: 12288 - max: 3714432 + max: 3154344 primary_compute_unit: NPU precision: int8 layer_info: @@ -246,37 +227,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 43 - job_id: jegn23rmg + job_id: j56y21r6p job_status: Passed torchscript_onnx_qnn: inference_time: 430.0 throughput: 2325.5813953488373 estimated_peak_memory_range: - min: 188416 - max: 1448928 + min: 184320 + max: 1474128 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 45 + layers_on_npu: 71 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 45 - job_id: j1p3keqz5 + total_layers: 71 + job_id: jgdxnvzrp job_status: Passed reference_device_info: - name: SA8650 (Proxy) - os: '13' - form_factor: Auto + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:39:35Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-17T17:16:21Z' - torchscript_onnx_tflite: inference_time: 205.0 throughput: 4878.048780487805 estimated_peak_memory_range: min: 12288 - max: 12581216 + max: 72500552 primary_compute_unit: NPU precision: int8 layer_info: @@ -284,37 +265,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 43 - job_id: joprke1e5 + job_id: jp3jnmx3g job_status: Passed torchscript_onnx_qnn: - inference_time: 430.0 - throughput: 2325.5813953488373 + inference_time: 429.0 + throughput: 2331.002331002331 estimated_peak_memory_range: - min: 184320 - max: 1606720 + min: 196608 + max: 1544376 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 45 + layers_on_npu: 71 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 45 - job_id: jwgoy3ed5 + total_layers: 71 + job_id: jp4lnw985 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:39:36Z' + chipset: SA8255P Proxy + timestamp: '2024-10-17T17:16:24Z' - torchscript_onnx_tflite: - inference_time: 206.0 - throughput: 4854.368932038835 + inference_time: 201.0 + throughput: 4975.124378109453 estimated_peak_memory_range: - min: 12288 - max: 3364288 + min: 20480 + max: 24320576 primary_compute_unit: NPU precision: int8 layer_info: @@ -322,37 +303,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 43 - job_id: jep28l3mp + job_id: jgo2zvoqp job_status: Passed torchscript_onnx_qnn: - inference_time: 428.0 - throughput: 2336.448598130841 + inference_time: 429.0 + throughput: 2331.002331002331 estimated_peak_memory_range: min: 184320 - max: 1461264 + max: 1419928 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 45 + layers_on_npu: 71 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 45 - job_id: j1pv3vzm5 + total_layers: 71 + job_id: jpxk91d35 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:39:37Z' + chipset: SA8775P Proxy + timestamp: '2024-10-17T17:16:26Z' - torchscript_onnx_tflite: - inference_time: 510.0 - throughput: 1960.7843137254902 + inference_time: 235.0 + throughput: 4255.31914893617 estimated_peak_memory_range: - min: 12288 - max: 17397776 + min: 20480 + max: 27791328 primary_compute_unit: NPU precision: int8 layer_info: @@ -360,37 +341,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 43 - job_id: jqpye6v4g + job_id: jpv6qw9k5 job_status: Passed torchscript_onnx_qnn: - inference_time: 991.0 - throughput: 1009.0817356205853 + inference_time: 533.0 + throughput: 1876.172607879925 estimated_peak_memory_range: - min: 12288 - max: 7853840 + min: 163840 + max: 13579040 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 45 + layers_on_npu: 71 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 45 - job_id: jlpe9k40g + total_layers: 71 + job_id: j5mnezddp job_status: Passed reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T11:39:39Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-17T17:16:27Z' - torchscript_onnx_tflite: - inference_time: 4128.0 - throughput: 242.24806201550388 + inference_time: 147.0 + throughput: 6802.721088435374 estimated_peak_memory_range: - min: 65536 - max: 1918952 + min: 8192 + max: 16414800 primary_compute_unit: NPU precision: int8 layer_info: @@ -398,45 +379,75 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 43 - job_id: j2p0yleeg + job_id: jgjvdlwvg + job_status: Passed + torchscript_onnx_qnn: + inference_time: 345.0 + throughput: 2898.550724637681 + estimated_peak_memory_range: + min: 159744 + max: 9814384 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 71 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 71 + job_id: jgn60e7k5 + job_status: Passed + torchscript_onnx: + inference_time: 390.0 + throughput: 2564.102564102564 + estimated_peak_memory_range: + min: 0 + max: 18980048 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 47 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 47 + job_id: jgkevy9wg job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T11:39:30Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-17T17:16:37Z' - torchscript_onnx_qnn: - inference_time: 543.0 - throughput: 1841.6206261510129 + inference_time: 553.0 + throughput: 1808.3182640144666 estimated_peak_memory_range: - min: 552960 - max: 552960 + min: 692224 + max: 692224 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 45 + layers_on_npu: 71 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 45 - job_id: j1gln3elp + total_layers: 71 + job_id: j57y2d7v5 job_status: Passed torchscript_onnx: - inference_time: 666.0 - throughput: 1501.5015015015015 + inference_time: 524.0 + throughput: 1908.3969465648854 estimated_peak_memory_range: - min: 3252224 - max: 3252224 + min: 1941504 + max: 1941504 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 73 + layers_on_npu: 47 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 73 - job_id: jmg9vw9v5 + total_layers: 47 + job_id: jp8q274kp job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -445,4 +456,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:39:42Z' + timestamp: '2024-10-17T17:16:36Z' diff --git a/qai_hub_models/models/squeezenet1_1_quantized/requirements.txt b/qai_hub_models/models/squeezenet1_1_quantized/requirements.txt deleted file mode 100644 index de5b80e8..00000000 --- a/qai_hub_models/models/squeezenet1_1_quantized/requirements.txt +++ /dev/null @@ -1 +0,0 @@ -aimet-torch==1.32.1.post1; sys_platform == "linux" diff --git a/qai_hub_models/models/squeezenet1_1_quantized/test.py b/qai_hub_models/models/squeezenet1_1_quantized/test.py deleted file mode 100644 index 9c927cf5..00000000 --- a/qai_hub_models/models/squeezenet1_1_quantized/test.py +++ /dev/null @@ -1,29 +0,0 @@ -# --------------------------------------------------------------------- -# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. -# SPDX-License-Identifier: BSD-3-Clause -# --------------------------------------------------------------------- -from qai_hub_models.models._shared.imagenet_classifier.test_utils import ( - run_imagenet_classifier_test, -) -from qai_hub_models.models.squeezenet1_1_quantized.demo import main as demo_main -from qai_hub_models.models.squeezenet1_1_quantized.model import ( - MODEL_ASSET_VERSION, - MODEL_ID, - SqueezeNetQuantizable, -) - - -def test_task(): - run_imagenet_classifier_test( - SqueezeNetQuantizable.from_pretrained(), - MODEL_ID, - asset_version=MODEL_ASSET_VERSION, - diff_tol=0.005, - rtol=0.02, - atol=0.2, - ) - - -def test_demo(): - # Verify demo does not crash - demo_main(is_test=True) diff --git a/qai_hub_models/models/stable_diffusion_v1_5_quantized/README.md b/qai_hub_models/models/stable_diffusion_v1_5_quantized/README.md index c4690f12..64ea5be3 100644 --- a/qai_hub_models/models/stable_diffusion_v1_5_quantized/README.md +++ b/qai_hub_models/models/stable_diffusion_v1_5_quantized/README.md @@ -6,7 +6,7 @@ Generates high resolution images from text prompts using a latent diffusion model. This model uses CLIP ViT-L/14 as text encoder, U-Net based latent denoising, and VAE based decoder to generate the final image. This is based on the implementation of Stable-Diffusion-v1.5 found -[here](https://github.com/CompVis/stable-diffusion/tree/main). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/stable_diffusion_v1_5_quantized). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.stable_diffusion_v1_5_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Stable-Diffusion-v1.5 can be found +* The license for the original implementation of Stable-Diffusion-v1.5 can be found [here](https://github.com/CompVis/stable-diffusion/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://github.com/CompVis/stable-diffusion/blob/main/LICENSE) +* The license for the compiled assets for on-device deployment can be found [here](https://github.com/CompVis/stable-diffusion/blob/main/LICENSE) + ## References * [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752) * [Source Model Implementation](https://github.com/CompVis/stable-diffusion/tree/main) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/stable_diffusion_v1_5_quantized/export.py b/qai_hub_models/models/stable_diffusion_v1_5_quantized/export.py index 33fe0503..e80883a3 100644 --- a/qai_hub_models/models/stable_diffusion_v1_5_quantized/export.py +++ b/qai_hub_models/models/stable_diffusion_v1_5_quantized/export.py @@ -9,13 +9,14 @@ import warnings from pathlib import Path -from typing import Any, Dict, List, Mapping, Optional, Tuple, cast +from typing import Any, Dict, List, Mapping, Optional, cast import qai_hub as hub +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.stable_diffusion_v1_5_quantized import Model from qai_hub_models.utils.args import export_parser -from qai_hub_models.utils.base_model import BasePrecompiledModel, TargetRuntime +from qai_hub_models.utils.base_model import BasePrecompiledModel from qai_hub_models.utils.printing import print_profile_metrics_from_job from qai_hub_models.utils.qai_hub_helpers import ( can_access_qualcomm_ai_hub, @@ -36,19 +37,16 @@ def export_model( output_dir: Optional[str] = None, profile_options: str = "", **additional_model_kwargs, -) -> Mapping[str, Tuple[Optional[hub.ProfileJob], Optional[hub.InferenceJob]]] | List[ - str -]: +) -> Mapping[str, ExportResult] | List[str]: """ - This function accomplishes 5 main tasks: + This function executes the following recipe: - 1. Initialize model. - 2. Upload model assets to hub. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Summarizes the results from profiling. + 1. Initialize model + 2. Upload model assets to hub + 3. Profiles the model performance on a real device + 4. Summarizes the results from profiling - Each of the last three steps can be optionally skipped using the input options. + Each of the last 2 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -70,9 +68,8 @@ def export_model( `model_cls.from_precompiled` Returns: - A Mapping from component_name to a 2-tuple of: + A Mapping from component_name to a struct of: * A ProfileJob containing metadata about the profile job (None if profiling skipped). - * An InferenceJob containing metadata about the inference job (None if inferencing skipped). """ model_name = "stable_diffusion_v1_5_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -101,9 +98,7 @@ def export_model( component_arg, ) - target_runtime = TargetRuntime.TFLITE - # On-device perf improves with I/O in channel_last format except when using ONNX. - use_channel_last_format = target_runtime != TargetRuntime.ONNX + target_runtime = TargetRuntime.QNN # 1. Initialize model print("Initializing model class") @@ -123,8 +118,11 @@ def export_model( uploaded_models[component_name] = hub.upload_model( components_dict[component_name].get_target_model_path() ) + print( + f"The {component_name} model is saved here: {components_dict[component_name].get_target_model_path()}" + ) - # 3. Profile the model assets on real devices + # 3. Profiles the model performance on a real device profile_jobs: Dict[str, hub.client.ProfileJob] = {} if not skip_profiling: for component_name in components: @@ -142,31 +140,7 @@ def export_model( hub.client.ProfileJob, submitted_profile_job ) - # 4. Run inference on-device with sample inputs - inference_jobs: Dict[str, hub.client.InferenceJob] = {} - if not skip_inferencing: - for component_name in components: - print( - f"Running inference for {component_name} on a hosted device with example inputs." - ) - profile_options_all = components_dict[ - component_name - ].get_hub_profile_options(target_runtime, profile_options) - sample_inputs = components_dict[component_name].sample_inputs( - use_channel_last_format=use_channel_last_format - ) - submitted_inference_job = hub.submit_inference_job( - model=uploaded_models[component_name], - inputs=sample_inputs, - device=hub_device, - name=f"{model_name}_{component_name}", - options=profile_options_all, - ) - inference_jobs[component_name] = cast( - hub.client.InferenceJob, submitted_inference_job - ) - - # 5. Summarize the results from profiling + # 4. Summarizes the results from profiling if not skip_summary and not skip_profiling: for component_name in components: profile_job = profile_jobs[component_name] @@ -175,9 +149,8 @@ def export_model( print_profile_metrics_from_job(profile_job, profile_data) return { - component_name: ( - profile_jobs.get(component_name, None), - inference_jobs.get(component_name, None), + component_name: ExportResult( + profile_job=profile_jobs.get(component_name, None), ) for component_name in components } diff --git a/qai_hub_models/models/stable_diffusion_v1_5_quantized/perf.yaml b/qai_hub_models/models/stable_diffusion_v1_5_quantized/perf.yaml index 1e2ae7b0..12218593 100644 --- a/qai_hub_models/models/stable_diffusion_v1_5_quantized/perf.yaml +++ b/qai_hub_models/models/stable_diffusion_v1_5_quantized/perf.yaml @@ -31,7 +31,7 @@ aggregated: - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy + - QCS8550 Proxy models: - name: TextEncoder_Quantized performance_metrics: @@ -125,7 +125,7 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy + chipset: QCS8550 Proxy timestamp: '2024-10-03T16:18:34Z' - name: VAEDecoder_Quantized performance_metrics: @@ -219,7 +219,7 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy + chipset: QCS8550 Proxy timestamp: '2024-10-03T16:18:34Z' - name: UNet_Quantized performance_metrics: @@ -313,5 +313,5 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy + chipset: QCS8550 Proxy timestamp: '2024-10-03T16:18:35Z' diff --git a/qai_hub_models/models/stable_diffusion_v2_1_quantized/README.md b/qai_hub_models/models/stable_diffusion_v2_1_quantized/README.md index ca77cd00..8bb23e47 100644 --- a/qai_hub_models/models/stable_diffusion_v2_1_quantized/README.md +++ b/qai_hub_models/models/stable_diffusion_v2_1_quantized/README.md @@ -6,7 +6,7 @@ Generates high resolution images from text prompts using a latent diffusion model. This model uses CLIP ViT-L/14 as text encoder, U-Net based latent denoising, and VAE based decoder to generate the final image. This is based on the implementation of Stable-Diffusion-v2.1 found -[here](https://github.com/CompVis/stable-diffusion/tree/main). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/stable_diffusion_v2_1_quantized). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.stable_diffusion_v2_1_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Stable-Diffusion-v2.1 can be found +* The license for the original implementation of Stable-Diffusion-v2.1 can be found [here](https://github.com/CompVis/stable-diffusion/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://github.com/CompVis/stable-diffusion/blob/main/LICENSE) +* The license for the compiled assets for on-device deployment can be found [here](https://github.com/CompVis/stable-diffusion/blob/main/LICENSE) + ## References * [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752) * [Source Model Implementation](https://github.com/CompVis/stable-diffusion/tree/main) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/stable_diffusion_v2_1_quantized/export.py b/qai_hub_models/models/stable_diffusion_v2_1_quantized/export.py index 6945840c..78e6b923 100644 --- a/qai_hub_models/models/stable_diffusion_v2_1_quantized/export.py +++ b/qai_hub_models/models/stable_diffusion_v2_1_quantized/export.py @@ -9,13 +9,14 @@ import warnings from pathlib import Path -from typing import Any, Dict, List, Mapping, Optional, Tuple, cast +from typing import Any, Dict, List, Mapping, Optional, cast import qai_hub as hub +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.stable_diffusion_v2_1_quantized import Model from qai_hub_models.utils.args import export_parser -from qai_hub_models.utils.base_model import BasePrecompiledModel, TargetRuntime +from qai_hub_models.utils.base_model import BasePrecompiledModel from qai_hub_models.utils.printing import print_profile_metrics_from_job from qai_hub_models.utils.qai_hub_helpers import ( can_access_qualcomm_ai_hub, @@ -36,19 +37,16 @@ def export_model( output_dir: Optional[str] = None, profile_options: str = "", **additional_model_kwargs, -) -> Mapping[str, Tuple[Optional[hub.ProfileJob], Optional[hub.InferenceJob]]] | List[ - str -]: +) -> Mapping[str, ExportResult] | List[str]: """ - This function accomplishes 5 main tasks: + This function executes the following recipe: - 1. Initialize model. - 2. Upload model assets to hub. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Summarizes the results from profiling. + 1. Initialize model + 2. Upload model assets to hub + 3. Profiles the model performance on a real device + 4. Summarizes the results from profiling - Each of the last three steps can be optionally skipped using the input options. + Each of the last 2 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -70,9 +68,8 @@ def export_model( `model_cls.from_precompiled` Returns: - A Mapping from component_name to a 2-tuple of: + A Mapping from component_name to a struct of: * A ProfileJob containing metadata about the profile job (None if profiling skipped). - * An InferenceJob containing metadata about the inference job (None if inferencing skipped). """ model_name = "stable_diffusion_v2_1_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -101,9 +98,7 @@ def export_model( component_arg, ) - target_runtime = TargetRuntime.TFLITE - # On-device perf improves with I/O in channel_last format except when using ONNX. - use_channel_last_format = target_runtime != TargetRuntime.ONNX + target_runtime = TargetRuntime.QNN # 1. Initialize model print("Initializing model class") @@ -123,8 +118,11 @@ def export_model( uploaded_models[component_name] = hub.upload_model( components_dict[component_name].get_target_model_path() ) + print( + f"The {component_name} model is saved here: {components_dict[component_name].get_target_model_path()}" + ) - # 3. Profile the model assets on real devices + # 3. Profiles the model performance on a real device profile_jobs: Dict[str, hub.client.ProfileJob] = {} if not skip_profiling: for component_name in components: @@ -142,31 +140,7 @@ def export_model( hub.client.ProfileJob, submitted_profile_job ) - # 4. Run inference on-device with sample inputs - inference_jobs: Dict[str, hub.client.InferenceJob] = {} - if not skip_inferencing: - for component_name in components: - print( - f"Running inference for {component_name} on a hosted device with example inputs." - ) - profile_options_all = components_dict[ - component_name - ].get_hub_profile_options(target_runtime, profile_options) - sample_inputs = components_dict[component_name].sample_inputs( - use_channel_last_format=use_channel_last_format - ) - submitted_inference_job = hub.submit_inference_job( - model=uploaded_models[component_name], - inputs=sample_inputs, - device=hub_device, - name=f"{model_name}_{component_name}", - options=profile_options_all, - ) - inference_jobs[component_name] = cast( - hub.client.InferenceJob, submitted_inference_job - ) - - # 5. Summarize the results from profiling + # 4. Summarizes the results from profiling if not skip_summary and not skip_profiling: for component_name in components: profile_job = profile_jobs[component_name] @@ -175,9 +149,8 @@ def export_model( print_profile_metrics_from_job(profile_job, profile_data) return { - component_name: ( - profile_jobs.get(component_name, None), - inference_jobs.get(component_name, None), + component_name: ExportResult( + profile_job=profile_jobs.get(component_name, None), ) for component_name in components } diff --git a/qai_hub_models/models/stable_diffusion_v2_1_quantized/perf.yaml b/qai_hub_models/models/stable_diffusion_v2_1_quantized/perf.yaml index c3f4cabf..d9c91a78 100644 --- a/qai_hub_models/models/stable_diffusion_v2_1_quantized/perf.yaml +++ b/qai_hub_models/models/stable_diffusion_v2_1_quantized/perf.yaml @@ -31,7 +31,7 @@ aggregated: - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy + - QCS8550 Proxy models: - name: TextEncoder_Quantized performance_metrics: @@ -125,7 +125,7 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy + chipset: QCS8550 Proxy timestamp: '2024-10-03T16:17:41Z' - name: VAEDecoder_Quantized performance_metrics: @@ -219,7 +219,7 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy + chipset: QCS8550 Proxy timestamp: '2024-10-03T16:17:42Z' - name: UNet_Quantized performance_metrics: @@ -313,5 +313,5 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy + chipset: QCS8550 Proxy timestamp: '2024-10-04T14:15:28Z' diff --git a/qai_hub_models/models/swin_base/README.md b/qai_hub_models/models/swin_base/README.md index 6ffedb98..bd54515a 100644 --- a/qai_hub_models/models/swin_base/README.md +++ b/qai_hub_models/models/swin_base/README.md @@ -6,7 +6,7 @@ SwinBase is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of Swin-Base found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/swin_transformer.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/swin_base). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.swin_base.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Swin-Base can be found +* The license for the original implementation of Swin-Base can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/swin_transformer.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/swin_base/export.py b/qai_hub_models/models/swin_base/export.py index a42c833c..498bdf4f 100644 --- a/qai_hub_models/models/swin_base/export.py +++ b/qai_hub_models/models/swin_base/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.swin_base import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "swin_base" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/swin_base/perf.yaml b/qai_hub_models/models/swin_base/perf.yaml index b8e8a2ed..e97091c4 100644 --- a/qai_hub_models/models/swin_base/perf.yaml +++ b/qai_hub_models/models/swin_base/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: Swin-Base performance_metrics: - torchscript_onnx_tflite: - inference_time: 28070.0 - throughput: 35.62522265764161 + inference_time: 25234.0 + throughput: 39.629071887136405 estimated_peak_memory_range: - min: 188416 - max: 4159936 + min: 0 + max: 3319560 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,37 +56,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1568 - job_id: jqp4qvxlg + job_id: jgkexowwg job_status: Passed torchscript_onnx_qnn: - inference_time: 31138.0 - throughput: 32.115100520264626 + inference_time: 28507.0 + throughput: 35.07910337811766 estimated_peak_memory_range: - min: 36864 - max: 52033760 + min: 57344 + max: 51000304 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 1255 + layers_on_npu: 1264 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 1255 - job_id: j2p0yl1eg + total_layers: 1264 + job_id: jgz3d86o5 job_status: Passed torchscript_onnx: - inference_time: 63662.0 - throughput: 15.707957651346172 + inference_time: 46693.0 + throughput: 21.416486411239372 estimated_peak_memory_range: - min: 81920 - max: 236996344 + min: 98304 + max: 237346048 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 1141 + layers_on_npu: 1150 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 1141 - job_id: j1pv3v1m5 + total_layers: 1150 + job_id: jpxko3ql5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:35:33Z' + timestamp: '2024-10-14T23:26:30Z' - torchscript_onnx_tflite: - inference_time: 19898.0 - throughput: 50.256307166549405 + inference_time: 18148.0 + throughput: 55.10249063257659 estimated_peak_memory_range: min: 49152 - max: 559514784 + max: 597627248 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,37 +109,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1568 - job_id: j0pxvy79g + job_id: j5q6qzxnp job_status: Passed torchscript_onnx_qnn: - inference_time: 22060.0 - throughput: 45.33091568449683 + inference_time: 23321.0 + throughput: 42.87980789846061 estimated_peak_memory_range: - min: 118784 - max: 166813088 + min: 0 + max: 202385200 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 1255 + layers_on_npu: 1264 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 1255 - job_id: j1p8oz38g + total_layers: 1264 + job_id: j5we68k35 job_status: Passed torchscript_onnx: - inference_time: 45400.0 - throughput: 22.026431718061673 + inference_time: 38588.0 + throughput: 25.91479216336685 estimated_peak_memory_range: - min: 720896 - max: 809851216 + min: 688128 + max: 873983520 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 1141 + layers_on_npu: 1150 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 1141 - job_id: j7gjxe08p + total_layers: 1150 + job_id: j5mnxo79p job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:35:34Z' + timestamp: '2024-10-14T23:26:31Z' - torchscript_onnx_tflite: - inference_time: 28072.0 - throughput: 35.62268452550584 + inference_time: 25144.0 + throughput: 39.770919503658924 estimated_peak_memory_range: - min: 241664 - max: 3370624 + min: 86016 + max: 2766208 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,22 +162,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1568 - job_id: jo5mr3wqg + job_id: jglvmo9j5 job_status: Passed torchscript_onnx_qnn: - inference_time: 29176.0 - throughput: 34.27474636687688 + inference_time: 26838.0 + throughput: 37.26060064088233 estimated_peak_memory_range: - min: 708608 - max: 2011296 + min: 749568 + max: 1914960 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 1255 + layers_on_npu: 1264 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 1255 - job_id: jn5q837m5 + total_layers: 1264 + job_id: jp14z798p job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:35:28Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:26:22Z' - torchscript_onnx_tflite: - inference_time: 35085.0 - throughput: 28.502208921191393 + inference_time: 25529.0 + throughput: 39.17113870500216 estimated_peak_memory_range: - min: 262144 - max: 529877072 + min: 24576 + max: 3714200 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,37 +200,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1568 - job_id: jegn239mg + job_id: jpv6ke8k5 job_status: Passed torchscript_onnx_qnn: - inference_time: 38873.0 - throughput: 25.72479613099066 + inference_time: 27083.0 + throughput: 36.9235313665399 estimated_peak_memory_range: - min: 663552 - max: 161477040 + min: 684032 + max: 2412728 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 1255 + layers_on_npu: 1264 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 1255 - job_id: jwgoy31d5 + total_layers: 1264 + job_id: jg9lnkr8g job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:35:32Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:26:26Z' - torchscript_onnx_tflite: - inference_time: 28202.0 - throughput: 35.458478122119 + inference_time: 25456.0 + throughput: 39.28346951602766 estimated_peak_memory_range: - min: 81920 - max: 2815344 + min: 61440 + max: 2515168 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,37 +238,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1568 - job_id: joprke4e5 + job_id: jgo26o7qp job_status: Passed torchscript_onnx_qnn: - inference_time: 29234.0 - throughput: 34.206745570226445 + inference_time: 27124.0 + throughput: 36.86771862557145 estimated_peak_memory_range: - min: 708608 - max: 1891152 + min: 684032 + max: 2341568 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 1255 + layers_on_npu: 1264 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 1255 - job_id: j1gln30lp + total_layers: 1264 + job_id: j5we68km5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:35:29Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:26:24Z' - torchscript_onnx_tflite: - inference_time: 28179.0 - throughput: 35.48741970971291 + inference_time: 25633.0 + throughput: 39.01221082198728 estimated_peak_memory_range: - min: 61440 - max: 2713416 + min: 73728 + max: 2992616 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,37 +276,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1568 - job_id: jep28l7mp + job_id: jp3j0xl3g job_status: Passed torchscript_onnx_qnn: - inference_time: 29089.0 - throughput: 34.37725600742549 + inference_time: 27232.0 + throughput: 36.72150411280846 estimated_peak_memory_range: - min: 720896 - max: 1994368 + min: 704512 + max: 2553296 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 1255 + layers_on_npu: 1264 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 1255 - job_id: jw566n375 + total_layers: 1264 + job_id: jgdx18krp job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:35:30Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:26:23Z' - torchscript_onnx_tflite: - inference_time: 28273.0 - throughput: 35.369433735365895 + inference_time: 32539.0 + throughput: 30.732351946894497 estimated_peak_memory_range: - min: 122880 - max: 3287760 + min: 110592 + max: 565050224 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,60 +314,113 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1568 - job_id: jqpye644g + job_id: j56y4r96p job_status: Passed torchscript_onnx_qnn: - inference_time: 28811.0 - throughput: 34.70896532574364 + inference_time: 35739.0 + throughput: 27.980637398919946 estimated_peak_memory_range: - min: 741376 - max: 2410512 + min: 655360 + max: 199566896 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 1255 + layers_on_npu: 1264 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 1255 - job_id: j1p3ke4z5 + total_layers: 1264 + job_id: j57yrkm95 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:26:28Z' + - torchscript_onnx_tflite: + inference_time: 16379.0 + throughput: 61.05378838756945 + estimated_peak_memory_range: + min: 24576 + max: 282381088 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 1568 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 1568 + job_id: jpedm8qo5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 15042.0 + throughput: 66.48052120728626 + estimated_peak_memory_range: + min: 614400 + max: 216465408 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 1264 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 1264 + job_id: jp4lrm715 + job_status: Passed + torchscript_onnx: + inference_time: 29175.0 + throughput: 34.27592116538132 + estimated_peak_memory_range: + min: 663552 + max: 346042352 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 1150 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 1150 + job_id: jp2ky41qp + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:35:31Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:26:34Z' - torchscript_onnx_qnn: - inference_time: 29617.0 - throughput: 33.76439207212074 + inference_time: 27571.0 + throughput: 36.26999383410105 estimated_peak_memory_range: min: 602112 max: 602112 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 1255 + layers_on_npu: 1264 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 1255 - job_id: jogkz3log + total_layers: 1264 + job_id: jg9lnkrwg job_status: Passed torchscript_onnx: - inference_time: 65960.0 - throughput: 15.160703456640388 + inference_time: 51987.0 + throughput: 19.235578125300556 estimated_peak_memory_range: - min: 207126528 - max: 207126528 + min: 207286272 + max: 207286272 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 1141 + layers_on_npu: 1150 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 1141 - job_id: jlpe9kr0g + total_layers: 1150 + job_id: jgn6vo4q5 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:35:35Z' + timestamp: '2024-10-14T23:26:32Z' diff --git a/qai_hub_models/models/swin_small/README.md b/qai_hub_models/models/swin_small/README.md index 8594f83c..e5ba3f12 100644 --- a/qai_hub_models/models/swin_small/README.md +++ b/qai_hub_models/models/swin_small/README.md @@ -6,7 +6,7 @@ SwinSmall is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of Swin-Small found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/swin_transformer.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/swin_small). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.swin_small.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Swin-Small can be found +* The license for the original implementation of Swin-Small can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/swin_transformer.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/swin_small/export.py b/qai_hub_models/models/swin_small/export.py index d9bcea8f..3f0ff5bd 100644 --- a/qai_hub_models/models/swin_small/export.py +++ b/qai_hub_models/models/swin_small/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.swin_small import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "swin_small" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/swin_small/perf.yaml b/qai_hub_models/models/swin_small/perf.yaml index 55d355eb..c577f36b 100644 --- a/qai_hub_models/models/swin_small/perf.yaml +++ b/qai_hub_models/models/swin_small/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: Swin-Small performance_metrics: - torchscript_onnx_tflite: - inference_time: 21002.0 - throughput: 47.614512903533 + inference_time: 18699.0 + throughput: 53.47879565752179 estimated_peak_memory_range: - min: 20480 - max: 4495792 + min: 106496 + max: 4718936 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,37 +56,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1563 - job_id: jo5mr3zqg + job_id: jgn6vowk5 job_status: Passed torchscript_onnx_qnn: - inference_time: 23699.0 - throughput: 42.19587324359678 + inference_time: 21583.0 + throughput: 46.33276189593661 estimated_peak_memory_range: - min: 36864 - max: 38709952 + min: 16384 + max: 40130776 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 1246 + layers_on_npu: 1255 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 1246 - job_id: jogkz3yog + total_layers: 1255 + job_id: j56y4r06p job_status: Passed torchscript_onnx: - inference_time: 54120.0 - throughput: 18.477457501847745 + inference_time: 34575.0 + throughput: 28.922631959508315 estimated_peak_memory_range: - min: 94208 - max: 136459248 + min: 69632 + max: 136549584 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 1136 + layers_on_npu: 1145 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 1136 - job_id: jlpe9kv0g + total_layers: 1145 + job_id: j57yrk1v5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:34:44Z' + timestamp: '2024-10-14T23:25:30Z' - torchscript_onnx_tflite: - inference_time: 14445.0 - throughput: 69.22810661128418 + inference_time: 12959.0 + throughput: 77.16644802839726 estimated_peak_memory_range: - min: 24576 - max: 524959744 + min: 16384 + max: 551676848 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,37 +109,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1563 - job_id: jegn23emg + job_id: jprv3o70g job_status: Passed torchscript_onnx_qnn: - inference_time: 16066.0 - throughput: 62.24324660774306 + inference_time: 14588.0 + throughput: 68.54949273375377 estimated_peak_memory_range: min: 0 - max: 138258560 + max: 164090432 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 1246 + layers_on_npu: 1255 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 1246 - job_id: jn5q832m5 + total_layers: 1255 + job_id: jgo26o9qp job_status: Passed torchscript_onnx: - inference_time: 38307.0 - throughput: 26.104889445793198 + inference_time: 23854.0 + throughput: 41.92169028255219 estimated_peak_memory_range: min: 0 - max: 761041680 + max: 820872432 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 1136 + layers_on_npu: 1145 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 1136 - job_id: jygzer76g + total_layers: 1145 + job_id: jp4lrm685 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:34:45Z' + timestamp: '2024-10-14T23:25:31Z' - torchscript_onnx_tflite: - inference_time: 20989.0 - throughput: 47.64400400209634 + inference_time: 18642.0 + throughput: 53.642313056538995 estimated_peak_memory_range: - min: 65536 - max: 2823888 + min: 278528 + max: 3571760 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,22 +162,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1563 - job_id: joprkeye5 + job_id: jp2ky4zrp job_status: Passed torchscript_onnx_qnn: - inference_time: 21626.0 - throughput: 46.24063627115509 + inference_time: 20236.0 + throughput: 49.4168808064835 estimated_peak_memory_range: - min: 692224 - max: 2500920 + min: 679936 + max: 1922600 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 1246 + layers_on_npu: 1255 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 1246 - job_id: jw566n175 + total_layers: 1255 + job_id: jgjvno6vg job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:34:39Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:25:21Z' - torchscript_onnx_tflite: - inference_time: 26590.0 - throughput: 37.608123354644604 + inference_time: 18785.0 + throughput: 53.23396326856535 estimated_peak_memory_range: - min: 69632 - max: 509872816 + min: 45056 + max: 3508088 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,37 +200,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1563 - job_id: jep28lmmp + job_id: jgkexokwg job_status: Passed torchscript_onnx_qnn: - inference_time: 28655.0 - throughput: 34.897923573547374 + inference_time: 20621.0 + throughput: 48.49425343096843 estimated_peak_memory_range: - min: 0 - max: 133381584 + min: 671744 + max: 2429504 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 1246 + layers_on_npu: 1255 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 1246 - job_id: j7gjxel8p + total_layers: 1255 + job_id: j5we68035 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:34:43Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:25:26Z' - torchscript_onnx_tflite: - inference_time: 20977.0 - throughput: 47.67125899795013 + inference_time: 18670.0 + throughput: 53.56186395286556 estimated_peak_memory_range: - min: 20480 - max: 3102680 + min: 24576 + max: 5234240 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,37 +238,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1563 - job_id: jqpye6d4g + job_id: jp8qy6kkp job_status: Passed torchscript_onnx_qnn: - inference_time: 22011.0 - throughput: 45.43182953977556 + inference_time: 20685.0 + throughput: 48.344210780759006 estimated_peak_memory_range: - min: 651264 - max: 2418872 + min: 675840 + max: 2083696 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 1246 + layers_on_npu: 1255 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 1246 - job_id: j1p3kemz5 + total_layers: 1255 + job_id: jgz3d8qo5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:34:40Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:25:25Z' - torchscript_onnx_tflite: - inference_time: 21013.0 - throughput: 47.58958739827726 + inference_time: 18664.0 + throughput: 53.57908272610373 estimated_peak_memory_range: - min: 32768 - max: 3470144 + min: 61440 + max: 3426808 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,37 +276,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1563 - job_id: j2p0ylreg + job_id: jp0z0dx95 job_status: Passed torchscript_onnx_qnn: - inference_time: 21986.0 - throughput: 45.483489493313925 + inference_time: 20596.0 + throughput: 48.55311711011847 estimated_peak_memory_range: min: 671744 - max: 1938984 + max: 1923944 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 1246 + layers_on_npu: 1255 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 1246 - job_id: jwgoy3vd5 + total_layers: 1255 + job_id: jpedm80o5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:34:41Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:25:22Z' - torchscript_onnx_tflite: - inference_time: 21092.0 - throughput: 47.41134079271762 + inference_time: 24220.0 + throughput: 41.28819157720892 estimated_peak_memory_range: - min: 57344 - max: 2855080 + min: 69632 + max: 534668128 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,60 +314,113 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1563 - job_id: j1p8oz78g + job_id: jpy13qy8p job_status: Passed torchscript_onnx_qnn: - inference_time: 21966.0 - throughput: 45.524902121460435 + inference_time: 26468.0 + throughput: 37.7814719661478 estimated_peak_memory_range: - min: 696320 - max: 1905664 + min: 643072 + max: 162107152 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 1246 + layers_on_npu: 1255 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 1246 - job_id: j1pv3vwm5 + total_layers: 1255 + job_id: jp14z7k8p job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:25:28Z' + - torchscript_onnx_tflite: + inference_time: 11752.0 + throughput: 85.0918992511913 + estimated_peak_memory_range: + min: 2113536 + max: 243191392 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 1563 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 1563 + job_id: jglvmoqj5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 12483.0 + throughput: 80.10894816951054 + estimated_peak_memory_range: + min: 618496 + max: 170383712 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 1255 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 1255 + job_id: jgdx18yrp + job_status: Passed + torchscript_onnx: + inference_time: 20333.0 + throughput: 49.181134116952734 + estimated_peak_memory_range: + min: 0 + max: 327920288 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 1145 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 1145 + job_id: jgn6vodk5 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:34:42Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:25:34Z' - torchscript_onnx_qnn: - inference_time: 22652.0 - throughput: 44.14621225498852 + inference_time: 21141.0 + throughput: 47.30145215458115 estimated_peak_memory_range: min: 602112 max: 602112 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 1246 + layers_on_npu: 1255 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 1246 - job_id: j1gln3klp + total_layers: 1255 + job_id: jpv6keyk5 job_status: Passed torchscript_onnx: - inference_time: 57017.0 - throughput: 17.53862882999807 + inference_time: 37889.0 + throughput: 26.392884478344637 estimated_peak_memory_range: - min: 123432960 - max: 123432960 + min: 123691008 + max: 123691008 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 1136 + layers_on_npu: 1145 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 1136 - job_id: jz5woq9jp + total_layers: 1145 + job_id: jpxko3835 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:34:46Z' + timestamp: '2024-10-14T23:25:32Z' diff --git a/qai_hub_models/models/swin_tiny/README.md b/qai_hub_models/models/swin_tiny/README.md index 08b4cf3a..33acf7ff 100644 --- a/qai_hub_models/models/swin_tiny/README.md +++ b/qai_hub_models/models/swin_tiny/README.md @@ -6,7 +6,7 @@ SwinTiny is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of Swin-Tiny found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/swin_transformer.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/swin_tiny). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.swin_tiny.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Swin-Tiny can be found +* The license for the original implementation of Swin-Tiny can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/swin_transformer.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/swin_tiny/export.py b/qai_hub_models/models/swin_tiny/export.py index 89d35fad..623d8b83 100644 --- a/qai_hub_models/models/swin_tiny/export.py +++ b/qai_hub_models/models/swin_tiny/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.swin_tiny import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "swin_tiny" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/swin_tiny/perf.yaml b/qai_hub_models/models/swin_tiny/perf.yaml index 1fee5940..4e15f1cc 100644 --- a/qai_hub_models/models/swin_tiny/perf.yaml +++ b/qai_hub_models/models/swin_tiny/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: Swin-Tiny performance_metrics: - torchscript_onnx_tflite: - inference_time: 13488.0 - throughput: 74.13997627520759 + inference_time: 11939.0 + throughput: 83.75910880308234 estimated_peak_memory_range: - min: 20480 - max: 3269384 + min: 40960 + max: 2902584 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,37 +56,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 837 - job_id: joprkeee5 + job_id: j57yrknv5 job_status: Passed torchscript_onnx_qnn: - inference_time: 14968.0 - throughput: 66.80919294494923 + inference_time: 13291.0 + throughput: 75.23888345496952 estimated_peak_memory_range: - min: 40960 - max: 24819936 + min: 12288 + max: 24774456 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 700 + layers_on_npu: 709 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 700 - job_id: j1gln33lp + total_layers: 709 + job_id: jp8qy6rkp job_status: Passed torchscript_onnx: - inference_time: 32582.0 - throughput: 30.69179301454791 + inference_time: 19804.0 + throughput: 50.494849525348414 estimated_peak_memory_range: - min: 36864 - max: 69358280 + min: 53248 + max: 69055400 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 614 + layers_on_npu: 623 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 614 - job_id: jz5woqqjp + total_layers: 623 + job_id: jgz3d80o5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:33:58Z' + timestamp: '2024-10-14T23:24:35Z' - torchscript_onnx_tflite: - inference_time: 11114.0 - throughput: 89.97660608241857 + inference_time: 8121.0 + throughput: 123.13754463735994 estimated_peak_memory_range: - min: 49152 - max: 323779760 + min: 20480 + max: 342709040 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,37 +109,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 837 - job_id: jep28llmp + job_id: jp4lrm485 job_status: Passed torchscript_onnx_qnn: - inference_time: 10043.0 - throughput: 99.57184108334162 + inference_time: 10599.0 + throughput: 94.34852344560808 estimated_peak_memory_range: - min: 618496 - max: 93102960 + min: 638976 + max: 107634384 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 700 + layers_on_npu: 709 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 700 - job_id: jw566nn75 + total_layers: 709 + job_id: jgkexo0wg job_status: Passed torchscript_onnx: - inference_time: 22617.0 - throughput: 44.21452889419463 + inference_time: 16514.0 + throughput: 60.55468087683178 estimated_peak_memory_range: - min: 45056 - max: 436578464 + min: 0 + max: 492780176 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 614 + layers_on_npu: 623 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 614 - job_id: jmg9vwwv5 + total_layers: 623 + job_id: j5we68r35 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:33:59Z' + timestamp: '2024-10-14T23:24:36Z' - torchscript_onnx_tflite: - inference_time: 13351.0 - throughput: 74.90075649764063 + inference_time: 11848.0 + throughput: 84.40243079000675 estimated_peak_memory_range: - min: 24576 - max: 3192536 + min: 20480 + max: 3173096 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,22 +162,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 837 - job_id: jqpye664g + job_id: jpxko3r35 job_status: Passed torchscript_onnx_qnn: - inference_time: 13321.0 - throughput: 75.06943923128894 + inference_time: 12224.0 + throughput: 81.80628272251309 estimated_peak_memory_range: - min: 634880 - max: 1876344 + min: 647168 + max: 2486752 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 700 + layers_on_npu: 709 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 700 - job_id: jwgoy33d5 + total_layers: 709 + job_id: jglvmo8j5 job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:33:53Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:24:28Z' - torchscript_onnx_tflite: - inference_time: 16739.0 - throughput: 59.740725252404566 + inference_time: 11874.0 + throughput: 84.21761832575375 estimated_peak_memory_range: - min: 45056 - max: 315094992 + min: 36864 + max: 2407856 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,37 +200,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 837 - job_id: j2p0ylleg + job_id: jp2ky4drp job_status: Passed torchscript_onnx_qnn: - inference_time: 18022.0 - throughput: 55.487737210076574 + inference_time: 12370.0 + throughput: 80.84074373484236 estimated_peak_memory_range: - min: 626688 - max: 92351424 + min: 655360 + max: 2007808 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 700 + layers_on_npu: 709 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 700 - job_id: jygzerr6g + total_layers: 709 + job_id: jgo26owqp job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:33:57Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:24:31Z' - torchscript_onnx_tflite: - inference_time: 13413.0 - throughput: 74.55453664355475 + inference_time: 11920.0 + throughput: 83.89261744966443 estimated_peak_memory_range: - min: 20480 - max: 8742792 + min: 45056 + max: 3093240 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,37 +238,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 837 - job_id: j1p8ozz8g + job_id: jprv3od0g job_status: Passed torchscript_onnx_qnn: - inference_time: 13566.0 - throughput: 73.71369600471768 + inference_time: 12533.0 + throughput: 79.78935609989627 estimated_peak_memory_range: - min: 626688 - max: 1883504 + min: 45056 + max: 1710104 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 700 + layers_on_npu: 709 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 700 - job_id: j1pv3vvm5 + total_layers: 709 + job_id: jp3j0x73g job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:33:54Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:24:30Z' - torchscript_onnx_tflite: - inference_time: 13446.0 - throughput: 74.37156031533542 + inference_time: 11837.0 + throughput: 84.48086508405846 estimated_peak_memory_range: - min: 32768 - max: 2457664 + min: 36864 + max: 3066768 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,37 +276,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 837 - job_id: jogkz33og + job_id: jgn6voqk5 job_status: Passed torchscript_onnx_qnn: - inference_time: 13533.0 - throughput: 73.89344565137073 + inference_time: 12459.0 + throughput: 80.26326350429409 estimated_peak_memory_range: - min: 40960 - max: 1667992 + min: 684032 + max: 1817448 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 700 + layers_on_npu: 709 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 700 - job_id: j7gjxee8p + total_layers: 709 + job_id: j56y4rm6p job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:33:55Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:24:29Z' - torchscript_onnx_tflite: - inference_time: 13404.0 - throughput: 74.60459564309161 + inference_time: 15225.0 + throughput: 65.68144499178982 estimated_peak_memory_range: - min: 40960 - max: 3005544 + min: 24576 + max: 333978560 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,60 +314,113 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 837 - job_id: jn5q833m5 + job_id: j5mnxokdp job_status: Passed torchscript_onnx_qnn: - inference_time: 13441.0 - throughput: 74.39922624804701 + inference_time: 16315.0 + throughput: 61.29328838492185 estimated_peak_memory_range: - min: 651264 - max: 1889152 + min: 0 + max: 109370064 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 700 + layers_on_npu: 709 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 700 - job_id: jlpe9kk0g + total_layers: 709 + job_id: jgjvno8vg job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:24:33Z' + - torchscript_onnx_tflite: + inference_time: 7384.0 + throughput: 135.42795232936078 + estimated_peak_memory_range: + min: 16384 + max: 163683760 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 837 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 837 + job_id: jp0z0d995 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 7646.0 + throughput: 130.78733978550875 + estimated_peak_memory_range: + min: 614400 + max: 110912592 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 709 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 709 + job_id: jpedm8no5 + job_status: Passed + torchscript_onnx: + inference_time: 12008.0 + throughput: 83.27781479013991 + estimated_peak_memory_range: + min: 53248 + max: 222424144 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 623 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 623 + job_id: jgdx18mrp + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:33:56Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:24:39Z' - torchscript_onnx_qnn: - inference_time: 13969.0 - throughput: 71.58708568974157 + inference_time: 12872.0 + throughput: 77.68800497203232 estimated_peak_memory_range: min: 602112 max: 602112 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 700 + layers_on_npu: 709 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 700 - job_id: j1p3keez5 + total_layers: 709 + job_id: j5q6qz1np job_status: Passed torchscript_onnx: - inference_time: 33942.0 - throughput: 29.46202345177067 + inference_time: 22104.0 + throughput: 45.24068041983352 estimated_peak_memory_range: - min: 67137536 - max: 67137536 + min: 67313664 + max: 67313664 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 614 + layers_on_npu: 623 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 614 - job_id: jnp10eel5 + total_layers: 623 + job_id: jg9lnkqwg job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:34:00Z' + timestamp: '2024-10-14T23:24:37Z' diff --git a/qai_hub_models/models/trocr/README.md b/qai_hub_models/models/trocr/README.md index 6051cbb9..49968d83 100644 --- a/qai_hub_models/models/trocr/README.md +++ b/qai_hub_models/models/trocr/README.md @@ -6,7 +6,7 @@ End-to-end text recognition approach with pre-trained image transformer and text transformer models for both image understanding and wordpiece-level text generation. This is based on the implementation of TrOCR found -[here](https://huggingface.co/microsoft/trocr-small-stage1). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/trocr). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.trocr.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of TrOCR can be found +* The license for the original implementation of TrOCR can be found [here](https://github.com/microsoft/unilm/blob/master/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models](https://arxiv.org/abs/2109.10282) * [Source Model Implementation](https://huggingface.co/microsoft/trocr-small-stage1) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/trocr/export.py b/qai_hub_models/models/trocr/export.py index 8b99469e..897086eb 100644 --- a/qai_hub_models/models/trocr/export.py +++ b/qai_hub_models/models/trocr/export.py @@ -10,14 +10,15 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Mapping, Optional, Tuple, cast +from typing import Any, Dict, List, Mapping, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.trocr import Model from qai_hub_models.utils.args import export_parser, get_model_kwargs -from qai_hub_models.utils.base_model import BaseModel, TargetRuntime +from qai_hub_models.utils.base_model import BaseModel from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -45,20 +46,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Mapping[ - str, Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] -] | List[str]: +) -> Mapping[str, ExportResult] | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -83,10 +82,10 @@ def export_model( `model_cls.from_pretrained` Returns: - A Mapping from component_name to a 3-tuple of: + A Mapping from component_name to a struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "trocr" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -118,7 +117,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) components_dict: Dict[str, BaseModel] = {} if "TrOCREncoder" in components: @@ -135,7 +134,7 @@ def export_model( component.to("cpu"), make_torch_inputs(input_spec) ) - # 2. Compile the models to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = component.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -151,7 +150,7 @@ def export_model( hub.client.CompileJob, submitted_compile_job ) - # 3. Profile the model assets on real devices + # 3. Profiles the model performance on a real device profile_jobs: Dict[str, hub.client.ProfileJob] = {} if not skip_profiling: for component_name in components: @@ -169,7 +168,7 @@ def export_model( hub.client.ProfileJob, submitted_profile_job ) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_jobs: Dict[str, hub.client.InferenceJob] = {} if not skip_inferencing: for component_name in components: @@ -193,14 +192,14 @@ def export_model( hub.client.InferenceJob, submitted_inference_job ) - # 5. Download the model assets to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) for component_name, compile_job in compile_jobs.items(): target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / component_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: for component_name in components: profile_job = profile_jobs[component_name] @@ -225,10 +224,10 @@ def export_model( ) return { - component_name: ( - compile_jobs[component_name], - profile_jobs.get(component_name, None), - inference_jobs.get(component_name, None), + component_name: ExportResult( + compile_job=compile_jobs[component_name], + inference_job=inference_jobs.get(component_name, None), + profile_job=profile_jobs.get(component_name, None), ) for component_name in components } diff --git a/qai_hub_models/models/trocr/perf.yaml b/qai_hub_models/models/trocr/perf.yaml index f8af380e..31c92d4b 100644 --- a/qai_hub_models/models/trocr/perf.yaml +++ b/qai_hub_models/models/trocr/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: TrOCREncoder performance_metrics: - torchscript_onnx_tflite: - inference_time: 66632.0 - throughput: 15.007804058110217 + inference_time: 50652.0 + throughput: 19.74255705598989 estimated_peak_memory_range: - min: 7770112 - max: 9877904 + min: 7196672 + max: 9363616 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 591 - job_id: j1gln36ep + job_id: jgz3d8mk5 job_status: Passed torchscript_onnx_qnn: - inference_time: 68321.0 - throughput: 14.63678810321863 + inference_time: 52890.0 + throughput: 18.907165815844206 estimated_peak_memory_range: - min: 98304 - max: 22307152 + min: 258048 + max: 23201496 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 443 - job_id: j0pxvynjg + job_id: j56y4r80p job_status: Passed torchscript_onnx: - inference_time: 56119.0 - throughput: 17.819276893743652 + inference_time: 39309.0 + throughput: 25.43946678877611 estimated_peak_memory_range: - min: 1912832 - max: 120345624 + min: 73728 + max: 186476328 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 396 - job_id: jlpe9kw7g + job_id: jprv3ox0g job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:33:09Z' + timestamp: '2024-10-14T23:23:36Z' - torchscript_onnx_tflite: - inference_time: 51027.0 - throughput: 19.59746800713348 + inference_time: 40349.0 + throughput: 24.78376167934769 estimated_peak_memory_range: - min: 7266304 - max: 307619920 + min: 5361664 + max: 321212896 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 591 - job_id: j1p3kevx5 + job_id: jg9lnkzlg job_status: Passed torchscript_onnx_qnn: - inference_time: 54054.0 - throughput: 18.5000185000185 + inference_time: 42073.0 + throughput: 23.76821239274594 estimated_peak_memory_range: min: 1802240 - max: 58552016 + max: 67186704 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,7 +124,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 443 - job_id: jegn23mvg + job_id: jgo26olxp + job_status: Passed + torchscript_onnx: + inference_time: 31228.0 + throughput: 32.0225438708851 + estimated_peak_memory_range: + min: 0 + max: 364484256 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 396 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 396 + job_id: jpy13q88p job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -135,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:32:57Z' + timestamp: '2024-10-14T23:23:38Z' - torchscript_onnx_tflite: - inference_time: 65665.0 - throughput: 15.228812914033352 + inference_time: 50061.0 + throughput: 19.97562973172729 estimated_peak_memory_range: min: 7188480 - max: 9768312 + max: 8843600 primary_compute_unit: NPU precision: fp16 layer_info: @@ -149,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 591 - job_id: j1pv3v075 + job_id: jgdx18dep job_status: Passed torchscript_onnx_qnn: - inference_time: 48577.0 - throughput: 20.585873973279536 + inference_time: 36086.0 + throughput: 27.71157789724547 estimated_peak_memory_range: - min: 1933312 - max: 3796736 + min: 1843200 + max: 3281984 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 443 - job_id: j2p0yl22g + job_id: jgz3d8lk5 job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -172,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:33:00Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:23:23Z' - torchscript_onnx_tflite: - inference_time: 75863.0 - throughput: 13.181656406944096 + inference_time: 50179.0 + throughput: 19.928655413619243 estimated_peak_memory_range: - min: 7282688 - max: 299682560 + min: 7118848 + max: 9955776 primary_compute_unit: NPU precision: fp16 layer_info: @@ -187,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 591 - job_id: jlpe9ke7g + job_id: jpy13qo7p job_status: Passed torchscript_onnx_qnn: - inference_time: 77027.0 - throughput: 12.982460695600244 + inference_time: 36899.0 + throughput: 27.101005447302096 estimated_peak_memory_range: - min: 57344 - max: 53780128 + min: 1884160 + max: 3777416 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 443 - job_id: j1pv3vr75 + job_id: jg9lnkowg job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:33:07Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:23:29Z' - torchscript_onnx_tflite: - inference_time: 66452.0 - throughput: 15.048456028411485 + inference_time: 51951.0 + throughput: 19.24890762449231 estimated_peak_memory_range: - min: 7294976 - max: 735597352 + min: 7122944 + max: 9553192 primary_compute_unit: NPU precision: fp16 layer_info: @@ -225,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 591 - job_id: jz5woq2zp + job_id: jprv3ol9g job_status: Passed torchscript_onnx_qnn: - inference_time: 48980.0 - throughput: 20.41649652919559 + inference_time: 37124.0 + throughput: 26.936752505117983 estimated_peak_memory_range: - min: 1810432 - max: 4960864 + min: 1916928 + max: 3660680 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 443 - job_id: jogkz3qyg + job_id: jgdx186ep job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:33:02Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:23:27Z' - torchscript_onnx_tflite: - inference_time: 65975.0 - throughput: 15.157256536566882 + inference_time: 51056.0 + throughput: 19.586336571607646 estimated_peak_memory_range: - min: 7229440 - max: 9665952 + min: 7188480 + max: 9399120 primary_compute_unit: NPU precision: fp16 layer_info: @@ -263,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 591 - job_id: jnp10eyk5 + job_id: j5mnxo0wp job_status: Passed torchscript_onnx_qnn: - inference_time: 49173.0 - throughput: 20.336363451487603 + inference_time: 37072.0 + throughput: 26.974536037980148 estimated_peak_memory_range: - min: 1871872 - max: 3306144 + min: 1912832 + max: 6938936 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 443 - job_id: j1gln32ep + job_id: jg9lnkolg job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:33:04Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:23:25Z' - torchscript_onnx_tflite: - inference_time: 65985.0 - throughput: 15.154959460483443 + inference_time: 60938.0 + throughput: 16.410121763103483 estimated_peak_memory_range: - min: 7180288 - max: 9713720 + min: 7118848 + max: 310706544 primary_compute_unit: NPU precision: fp16 layer_info: @@ -301,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 591 - job_id: jz57zx0qp + job_id: jp4lrmyv5 job_status: Passed torchscript_onnx_qnn: - inference_time: 49624.0 - throughput: 20.15153957762373 + inference_time: 60192.0 + throughput: 16.61350345560872 estimated_peak_memory_range: - min: 1941504 - max: 3772984 + min: 1785856 + max: 66791168 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,22 +329,75 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 443 - job_id: j1p3ke1x5 + job_id: jp4lrme85 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:33:06Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:23:32Z' + - torchscript_onnx_tflite: + inference_time: 36174.0 + throughput: 27.6441643169127 + estimated_peak_memory_range: + min: 2641920 + max: 125164800 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 591 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 591 + job_id: j5q6qzl4p + job_status: Passed + torchscript_onnx_qnn: + inference_time: 33016.0 + throughput: 30.28834504482675 + estimated_peak_memory_range: + min: 1810432 + max: 69495072 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 443 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 443 + job_id: j5mnxo9dp + job_status: Passed + torchscript_onnx: + inference_time: 23693.0 + throughput: 42.20655889925295 + estimated_peak_memory_range: + min: 5943296 + max: 217240080 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 396 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 396 + job_id: j56y4ro6p + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:23:43Z' - torchscript_onnx_qnn: - inference_time: 47461.0 - throughput: 21.0699311013253 + inference_time: 33885.0 + throughput: 29.511583296443852 estimated_peak_memory_range: - min: 1777664 - max: 1777664 + min: 1773568 + max: 1773568 primary_compute_unit: NPU precision: fp16 layer_info: @@ -339,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 443 - job_id: jep28l9xp + job_id: jgjvnorxg job_status: Passed torchscript_onnx: - inference_time: 55221.0 - throughput: 18.109052715452453 + inference_time: 35659.0 + throughput: 28.043411200538433 estimated_peak_memory_range: - min: 114429952 - max: 114429952 + min: 114479104 + max: 114479104 primary_compute_unit: NPU precision: fp16 layer_info: @@ -354,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 396 - job_id: jnp10ewk5 + job_id: jp8qy6jkp job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -363,15 +429,15 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:33:12Z' + timestamp: '2024-10-14T23:23:39Z' - name: TrOCRDecoder performance_metrics: - torchscript_onnx_tflite: - inference_time: 2710.0 - throughput: 369.00369003690037 + inference_time: 2600.0 + throughput: 384.61538461538464 estimated_peak_memory_range: min: 12288 - max: 2321944 + max: 2023104 primary_compute_unit: NPU precision: fp16 layer_info: @@ -379,37 +445,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 399 - job_id: jw566nev5 + job_id: j5we68l65 job_status: Passed torchscript_onnx_qnn: - inference_time: 3068.0 - throughput: 325.94524119947846 + inference_time: 3012.0 + throughput: 332.00531208499336 estimated_peak_memory_range: - min: 24576 - max: 283198712 + min: 3383296 + max: 275704168 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 356 + layers_on_npu: 375 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 356 - job_id: jo5mr3qyg + total_layers: 375 + job_id: jp3j0xzlg job_status: Passed torchscript_onnx: - inference_time: 3024.0 - throughput: 330.6878306878307 + inference_time: 2843.0 + throughput: 351.74111853675697 estimated_peak_memory_range: - min: 720896 - max: 3353256 + min: 704512 + max: 3211728 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 376 + layers_on_npu: 395 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 376 - job_id: jygzerjzg + total_layers: 395 + job_id: jp2ky4orp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -418,13 +484,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:33:09Z' + timestamp: '2024-10-14T23:23:36Z' - torchscript_onnx_tflite: - inference_time: 1917.0 - throughput: 521.6484089723526 + inference_time: 1851.0 + throughput: 540.2485143165857 estimated_peak_memory_range: min: 12288 - max: 196663088 + max: 198757456 primary_compute_unit: NPU precision: fp16 layer_info: @@ -432,37 +498,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 399 - job_id: jwgoy3k45 + job_id: jp14z7n2p job_status: Passed torchscript_onnx_qnn: - inference_time: 2190.0 - throughput: 456.62100456621005 + inference_time: 2471.0 + throughput: 404.6944556859571 estimated_peak_memory_range: min: 0 - max: 51016704 + max: 53666960 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 356 + layers_on_npu: 375 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 356 - job_id: joprke2v5 + total_layers: 375 + job_id: jpv6kelj5 job_status: Passed torchscript_onnx: - inference_time: 2259.0 - throughput: 442.67374944665784 + inference_time: 2148.0 + throughput: 465.54934823091247 estimated_peak_memory_range: min: 0 - max: 151491264 + max: 155889552 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 376 + layers_on_npu: 395 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 376 - job_id: jmg9vwyq5 + total_layers: 395 + job_id: jp0z0do95 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -471,13 +537,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:33:11Z' + timestamp: '2024-10-14T23:23:38Z' - torchscript_onnx_tflite: - inference_time: 2676.0 - throughput: 373.69207772795215 + inference_time: 2562.0 + throughput: 390.32006245121 estimated_peak_memory_range: min: 12288 - max: 1689544 + max: 2224376 primary_compute_unit: NPU precision: fp16 layer_info: @@ -485,22 +551,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 399 - job_id: j7gjxez7p + job_id: j57yrkel5 job_status: Passed torchscript_onnx_qnn: - inference_time: 2709.0 - throughput: 369.139904023625 + inference_time: 2631.0 + throughput: 380.08361839604714 estimated_peak_memory_range: - min: 1744896 - max: 3108000 + min: 188416 + max: 1513696 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 356 + layers_on_npu: 375 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 356 - job_id: j1p8ozmzg + total_layers: 375 + job_id: j5we68y65 job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -508,14 +574,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:33:01Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:23:24Z' - torchscript_onnx_tflite: - inference_time: 2819.0 - throughput: 354.735721887194 + inference_time: 2608.0 + throughput: 383.4355828220859 estimated_peak_memory_range: - min: 16384 - max: 193892048 + min: 12288 + max: 2024384 primary_compute_unit: NPU precision: fp16 layer_info: @@ -523,37 +589,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 399 - job_id: jygzerozg + job_id: jp0z0dm65 job_status: Passed torchscript_onnx_qnn: - inference_time: 3418.0 - throughput: 292.5687536571094 + inference_time: 2607.0 + throughput: 383.5826620636747 estimated_peak_memory_range: - min: 0 - max: 44338144 + min: 1265664 + max: 3372392 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 356 + layers_on_npu: 375 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 356 - job_id: j7gjxe27p + total_layers: 375 + job_id: jp14z7o8p job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:33:08Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:23:29Z' - torchscript_onnx_tflite: - inference_time: 2708.0 - throughput: 369.2762186115214 + inference_time: 2604.0 + throughput: 384.0245775729647 estimated_peak_memory_range: min: 12288 - max: 1956704 + max: 1654136 primary_compute_unit: NPU precision: fp16 layer_info: @@ -561,37 +627,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 399 - job_id: jmg9vwjq5 + job_id: jp2ky4r4p job_status: Passed torchscript_onnx_qnn: - inference_time: 2768.0 - throughput: 361.271676300578 + inference_time: 2613.0 + throughput: 382.70187523918867 estimated_peak_memory_range: - min: 1347584 - max: 3406960 + min: 1286144 + max: 3420960 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 356 + layers_on_npu: 375 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 356 - job_id: jn5q83r75 + total_layers: 375 + job_id: j5we68y35 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:33:03Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:23:27Z' - torchscript_onnx_tflite: - inference_time: 2656.0 - throughput: 376.50602409638554 + inference_time: 2573.0 + throughput: 388.65137971239795 estimated_peak_memory_range: - min: 12288 - max: 2222080 + min: 16384 + max: 2053744 primary_compute_unit: NPU precision: fp16 layer_info: @@ -599,37 +665,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 399 - job_id: jvgdwoek5 + job_id: jgn6vozr5 job_status: Passed torchscript_onnx_qnn: - inference_time: 2832.0 - throughput: 353.1073446327684 + inference_time: 2658.0 + throughput: 376.2227238525207 estimated_peak_memory_range: - min: 1912832 - max: 3317016 + min: 1323008 + max: 2779128 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 356 + layers_on_npu: 375 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 356 - job_id: jw566nzv5 + total_layers: 375 + job_id: jp14z7o2p job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:33:04Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:23:26Z' - torchscript_onnx_tflite: - inference_time: 2707.0 - throughput: 369.4126339120798 + inference_time: 2814.0 + throughput: 355.36602700781805 estimated_peak_memory_range: - min: 16384 - max: 2148800 + min: 12288 + max: 198674720 primary_compute_unit: NPU precision: fp16 layer_info: @@ -637,60 +703,113 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 399 - job_id: jqp4qvkqg + job_id: jpxko3l15 job_status: Passed torchscript_onnx_qnn: - inference_time: 2796.0 - throughput: 357.653791130186 + inference_time: 3375.0 + throughput: 296.2962962962963 estimated_peak_memory_range: - min: 1351680 - max: 2756632 + min: 4358144 + max: 54921552 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 356 + layers_on_npu: 375 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 356 - job_id: jwgoy3n45 + total_layers: 375 + job_id: jpxko3035 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:23:33Z' + - torchscript_onnx_tflite: + inference_time: 2104.0 + throughput: 475.2851711026616 + estimated_peak_memory_range: + min: 12288 + max: 28373760 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 399 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 399 + job_id: jglvmoy85 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 2016.0 + throughput: 496.031746031746 + estimated_peak_memory_range: + min: 0 + max: 46974240 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 375 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 375 + job_id: jgn6vo1k5 + job_status: Passed + torchscript_onnx: + inference_time: 2078.0 + throughput: 481.23195380173246 + estimated_peak_memory_range: + min: 0 + max: 36669456 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 395 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 395 + job_id: jp3j0xo3g + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:33:06Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:23:44Z' - torchscript_onnx_qnn: - inference_time: 3022.0 - throughput: 330.90668431502314 + inference_time: 2793.0 + throughput: 358.03795202291445 estimated_peak_memory_range: - min: 7397376 - max: 7397376 + min: 7385088 + max: 7385088 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 356 + layers_on_npu: 375 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 356 - job_id: jqpye6jrg + total_layers: 375 + job_id: jpedm8715 job_status: Passed torchscript_onnx: - inference_time: 2984.0 - throughput: 335.1206434316354 + inference_time: 2881.0 + throughput: 347.1017007983339 estimated_peak_memory_range: - min: 72138752 - max: 72138752 + min: 71094272 + max: 71094272 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 376 + layers_on_npu: 395 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 376 - job_id: jvgdwoqk5 + total_layers: 395 + job_id: jgkexo6wg job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -699,4 +818,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:33:13Z' + timestamp: '2024-10-14T23:23:40Z' diff --git a/qai_hub_models/models/unet_segmentation/README.md b/qai_hub_models/models/unet_segmentation/README.md index f8474142..f29f695b 100644 --- a/qai_hub_models/models/unet_segmentation/README.md +++ b/qai_hub_models/models/unet_segmentation/README.md @@ -6,7 +6,7 @@ UNet is a machine learning model that produces a segmentation mask for an image. The most basic use case will label each pixel in the image as being in the foreground or the background. More advanced usage will assign a class label to each pixel. This version of the model was trained on the data from Kaggle's Carvana Image Masking Challenge (see https://www.kaggle.com/c/carvana-image-masking-challenge) and is used for vehicle segmentation. This is based on the implementation of Unet-Segmentation found -[here](https://github.com/milesial/Pytorch-UNet). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/unet_segmentation). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.unet_segmentation.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Unet-Segmentation can be found +* The license for the original implementation of Unet-Segmentation can be found [here](https://github.com/milesial/Pytorch-UNet/blob/master/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://github.com/milesial/Pytorch-UNet/blob/master/LICENSE) +* The license for the compiled assets for on-device deployment can be found [here](https://github.com/milesial/Pytorch-UNet/blob/master/LICENSE) + ## References * [U-Net: Convolutional Networks for Biomedical Image Segmentation](https://arxiv.org/abs/1505.04597) * [Source Model Implementation](https://github.com/milesial/Pytorch-UNet) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/unet_segmentation/export.py b/qai_hub_models/models/unet_segmentation/export.py index 5eb8aa5d..c3722b9a 100644 --- a/qai_hub_models/models/unet_segmentation/export.py +++ b/qai_hub_models/models/unet_segmentation/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.unet_segmentation import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "unet_segmentation" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/unet_segmentation/perf.yaml b/qai_hub_models/models/unet_segmentation/perf.yaml index e4b18965..fc3c3c51 100644 --- a/qai_hub_models/models/unet_segmentation/perf.yaml +++ b/qai_hub_models/models/unet_segmentation/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: Unet-Segmentation performance_metrics: - torchscript_onnx_tflite: - inference_time: 156677.0 - throughput: 6.382557746191209 + inference_time: 153929.0 + throughput: 6.496501633870161 estimated_peak_memory_range: - min: 6684672 - max: 9129000 + min: 6717440 + max: 463282184 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 32 - job_id: j1p3kenx5 + job_id: jgo26o8xp job_status: Passed torchscript_onnx_qnn: - inference_time: 157042.0 - throughput: 6.367723284216961 + inference_time: 151064.0 + throughput: 6.619710851030027 estimated_peak_memory_range: - min: 9863168 - max: 29829824 + min: 9973760 + max: 31622512 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 52 - job_id: jmg9vw0q5 + job_id: j57yrk4l5 job_status: Passed torchscript_onnx: - inference_time: 160699.0 - throughput: 6.222814080983702 + inference_time: 155224.0 + throughput: 6.442302736690203 estimated_peak_memory_range: - min: 36864 - max: 58894448 + min: 17252352 + max: 18880624 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 53 - job_id: joprke8v5 + job_id: jgkexo42g job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:32:07Z' + timestamp: '2024-10-14T23:22:22Z' - torchscript_onnx_tflite: - inference_time: 133225.0 - throughput: 7.5060987051979735 + inference_time: 132249.0 + throughput: 7.561493848724754 estimated_peak_memory_range: - min: 6701056 - max: 345142176 + min: 6791168 + max: 410391120 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 32 - job_id: jwgoy3z45 + job_id: jpv6ke7j5 job_status: Passed torchscript_onnx_qnn: - inference_time: 131838.0 - throughput: 7.585066521033389 + inference_time: 132978.0 + throughput: 7.520040909022545 estimated_peak_memory_range: - min: 9969664 - max: 80217024 + min: 9846784 + max: 101051712 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 52 - job_id: jnp10e2k5 + job_id: jp4lrm1v5 job_status: Passed torchscript_onnx: - inference_time: 139885.0 - throughput: 7.14872931336455 + inference_time: 134367.0 + throughput: 7.442303541792255 estimated_peak_memory_range: - min: 892928 - max: 350482928 + min: 380928 + max: 421141680 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 53 - job_id: jep28l0xp + job_id: j5q6qzy4p job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:32:08Z' + timestamp: '2024-10-14T23:22:23Z' - torchscript_onnx_tflite: - inference_time: 157105.0 - throughput: 6.365169790904172 + inference_time: 142642.0 + throughput: 7.010557900197698 estimated_peak_memory_range: - min: 24576 - max: 472223368 + min: 6688768 + max: 463253000 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 32 - job_id: j1pv3vq75 + job_id: jgjvnoqxg job_status: Passed torchscript_onnx_qnn: - inference_time: 138754.0 - throughput: 7.206999437854044 + inference_time: 136843.0 + throughput: 7.307644526939631 estimated_peak_memory_range: - min: 10022912 - max: 11385152 + min: 10121216 + max: 11336896 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 52 - job_id: jz57zx2qp + job_id: j5mnxomwp job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:32:02Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:22:14Z' - torchscript_onnx_tflite: - inference_time: 310193.0 - throughput: 3.2237993765172006 + inference_time: 147599.0 + throughput: 6.775113652531521 estimated_peak_memory_range: - min: 7114752 - max: 349415680 + min: 6701056 + max: 462994672 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 32 - job_id: j7gjxed7p + job_id: jg9lnkmlg job_status: Passed torchscript_onnx_qnn: - inference_time: 279701.0 - throughput: 3.5752464238597645 + inference_time: 136006.0 + throughput: 7.352616796317809 estimated_peak_memory_range: - min: 7778304 - max: 79320304 + min: 10055680 + max: 11787144 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 52 - job_id: jegn23lvg + job_id: jp2ky4w4p job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:32:06Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:22:18Z' - torchscript_onnx_tflite: - inference_time: 151563.0 - throughput: 6.597916378007825 + inference_time: 145119.0 + throughput: 6.890896436717453 estimated_peak_memory_range: - min: 6709248 - max: 463143104 + min: 6684672 + max: 463339832 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 32 - job_id: jlpe9ko7g + job_id: j5we68765 job_status: Passed torchscript_onnx_qnn: - inference_time: 139417.0 - throughput: 7.172726425041422 + inference_time: 143044.0 + throughput: 6.990855960403792 estimated_peak_memory_range: - min: 10117120 - max: 18549048 + min: 10100736 + max: 11361408 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 52 - job_id: jqp4qvnqg + job_id: jprv3o09g job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:32:03Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:22:16Z' - torchscript_onnx_tflite: - inference_time: 154551.0 - throughput: 6.470356063694185 + inference_time: 157280.0 + throughput: 6.358087487283825 estimated_peak_memory_range: - min: 6684672 - max: 463394632 + min: 6627328 + max: 478999472 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 32 - job_id: jygzer2zg + job_id: jgz3d8nk5 job_status: Passed torchscript_onnx_qnn: - inference_time: 140581.0 - throughput: 7.11333679515724 + inference_time: 139062.0 + throughput: 7.191037091369317 estimated_peak_memory_range: - min: 11464704 - max: 12719960 + min: 10096640 + max: 11424328 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 52 - job_id: j0pxvy9jg + job_id: jgn6vonr5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:32:04Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:22:15Z' - torchscript_onnx_tflite: - inference_time: 157023.0 - throughput: 6.368493787534311 + inference_time: 380675.0 + throughput: 2.6269127208248504 estimated_peak_memory_range: - min: 6709248 - max: 463057768 + min: 167936 + max: 406578064 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 32 - job_id: jz5woqwzp + job_id: jpedm8y15 job_status: Passed torchscript_onnx_qnn: - inference_time: 143814.0 - throughput: 6.953425952967027 + inference_time: 269680.0 + throughput: 3.708098487095817 estimated_peak_memory_range: - min: 10104832 - max: 18668648 + min: 4374528 + max: 99862544 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 52 - job_id: jo5mr3eyg + job_id: jp0z0dj65 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:22:20Z' + - torchscript_onnx_tflite: + inference_time: 102802.0 + throughput: 9.727437209392813 + estimated_peak_memory_range: + min: 5791744 + max: 124288656 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 32 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 32 + job_id: jgdx183ep + job_status: Passed + torchscript_onnx_qnn: + inference_time: 102598.0 + throughput: 9.746778689643072 + estimated_peak_memory_range: + min: 9932800 + max: 115505984 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 52 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 52 + job_id: jp8qy6xxp + job_status: Passed + torchscript_onnx: + inference_time: 104486.0 + throughput: 9.570660184139502 + estimated_peak_memory_range: + min: 25743360 + max: 149037744 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 53 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 53 + job_id: jp3j0x9lg + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:32:05Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:22:25Z' - torchscript_onnx_qnn: - inference_time: 135619.0 - throughput: 7.373598094662253 + inference_time: 135807.0 + throughput: 7.36339069414684 estimated_peak_memory_range: min: 9850880 max: 9850880 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 52 - job_id: jvgdwonk5 + job_id: jpxko3415 job_status: Passed torchscript_onnx: - inference_time: 147147.0 - throughput: 6.795925163272102 + inference_time: 147497.0 + throughput: 6.779798911164295 estimated_peak_memory_range: - min: 56770560 - max: 56770560 + min: 56721408 + max: 56721408 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 53 - job_id: jqpye6rrg + job_id: jglvmox85 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:32:09Z' + timestamp: '2024-10-14T23:22:23Z' diff --git a/qai_hub_models/models/vit/README.md b/qai_hub_models/models/vit/README.md index 06e0a6df..cb79f499 100644 --- a/qai_hub_models/models/vit/README.md +++ b/qai_hub_models/models/vit/README.md @@ -6,7 +6,7 @@ VIT is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of VIT found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/vision_transformer.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/vit). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.vit.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of VIT can be found +* The license for the original implementation of VIT can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/vision_transformer.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/vit/export.py b/qai_hub_models/models/vit/export.py index 60de5f21..a7ddef2e 100644 --- a/qai_hub_models/models/vit/export.py +++ b/qai_hub_models/models/vit/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.vit import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "vit" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -122,7 +120,7 @@ def export_model( model.to("cpu"), make_torch_inputs(input_spec), check_trace=False ) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -136,7 +134,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -151,7 +149,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -172,13 +170,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -199,7 +197,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/vit/perf.yaml b/qai_hub_models/models/vit/perf.yaml index 9346d460..5e73a690 100644 --- a/qai_hub_models/models/vit/perf.yaml +++ b/qai_hub_models/models/vit/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: VIT performance_metrics: - torchscript_onnx_tflite: - inference_time: 19822.0 - throughput: 50.44899606497831 + inference_time: 19821.0 + throughput: 50.45154129458655 estimated_peak_memory_range: - min: 49152 - max: 3157824 + min: 90112 + max: 3029272 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1579 - job_id: j1pv3v675 + job_id: jglvmo185 job_status: Passed torchscript_onnx: - inference_time: 20353.0 - throughput: 49.13280597454921 + inference_time: 15505.0 + throughput: 64.49532408900355 estimated_peak_memory_range: - min: 53248 - max: 202758152 + min: 61440 + max: 202556824 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,7 +71,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 976 - job_id: jqpye6zrg + job_id: jpy13qm7p job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -82,13 +80,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:31:07Z' + timestamp: '2024-10-14T23:21:14Z' - torchscript_onnx_tflite: - inference_time: 14467.0 - throughput: 69.12283127116886 + inference_time: 16903.0 + throughput: 59.161095663491686 estimated_peak_memory_range: - min: 36864 - max: 385785792 + min: 0 + max: 400784160 primary_compute_unit: NPU precision: fp16 layer_info: @@ -96,14 +94,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1579 - job_id: j7gjxev7p + job_id: j56y4rd0p job_status: Passed torchscript_onnx: - inference_time: 14334.0 - throughput: 69.76419701409237 + inference_time: 11372.0 + throughput: 87.93527963418924 estimated_peak_memory_range: - min: 651264 - max: 144984496 + min: 0 + max: 156306496 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,7 +109,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 976 - job_id: j2p0yl42g + job_id: jp0z0d665 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -120,13 +118,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:31:08Z' + timestamp: '2024-10-14T23:21:15Z' - torchscript_onnx_tflite: - inference_time: 19190.0 - throughput: 52.11047420531527 + inference_time: 19788.0 + throughput: 50.53567818880129 estimated_peak_memory_range: min: 53248 - max: 2811560 + max: 2828712 primary_compute_unit: NPU precision: fp16 layer_info: @@ -134,7 +132,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1579 - job_id: jlpe9kd7g + job_id: jp3j0xwlg job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -142,14 +140,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:30:53Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:20:56Z' - torchscript_onnx_tflite: - inference_time: 23008.0 - throughput: 43.46314325452017 + inference_time: 19830.0 + throughput: 50.42864346949067 estimated_peak_memory_range: - min: 98304 - max: 370568720 + min: 49152 + max: 3033144 primary_compute_unit: NPU precision: fp16 layer_info: @@ -157,22 +155,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1579 - job_id: jygzer3zg + job_id: jpedm8l15 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:30:54Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:21:00Z' - torchscript_onnx_tflite: - inference_time: 19280.0 - throughput: 51.86721991701245 + inference_time: 20031.0 + throughput: 49.9226199390944 estimated_peak_memory_range: - min: 32768 - max: 1765157752 + min: 65536 + max: 2686480 primary_compute_unit: NPU precision: fp16 layer_info: @@ -180,22 +178,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1579 - job_id: jz5woqezp + job_id: jgjvnowxg job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:30:55Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:20:59Z' - torchscript_onnx_tflite: - inference_time: 19169.0 - throughput: 52.16756220981794 + inference_time: 20358.0 + throughput: 49.12073877591119 estimated_peak_memory_range: - min: 53248 - max: 2801568 + min: 57344 + max: 3355672 primary_compute_unit: NPU precision: fp16 layer_info: @@ -203,22 +201,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1579 - job_id: jmg9vwlq5 + job_id: jpv6ke9j5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:30:56Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:20:58Z' - torchscript_onnx_tflite: - inference_time: 19709.0 - throughput: 50.73824141255264 + inference_time: 24972.0 + throughput: 40.04485023226013 estimated_peak_memory_range: - min: 45056 - max: 2867072 + min: 53248 + max: 385918016 primary_compute_unit: NPU precision: fp16 layer_info: @@ -226,22 +224,60 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1579 - job_id: jnp10e4k5 + job_id: jgo26o4xp job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:20:57Z' + - torchscript_onnx_tflite: + inference_time: 11489.0 + throughput: 87.03977717817043 + estimated_peak_memory_range: + min: 40960 + max: 216772384 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 1579 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 1579 + job_id: j5we68465 + job_status: Passed + torchscript_onnx: + inference_time: 9010.0 + throughput: 110.98779134295228 + estimated_peak_memory_range: + min: 647168 + max: 117624416 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 976 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 976 + job_id: jgdxx2x6p + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:30:57Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-16T09:35:49Z' - torchscript_onnx: - inference_time: 21231.0 - throughput: 47.10093730865244 + inference_time: 21624.0 + throughput: 46.24491305956345 estimated_peak_memory_range: - min: 179093504 - max: 179093504 + min: 179056640 + max: 179056640 primary_compute_unit: NPU precision: fp16 layer_info: @@ -249,7 +285,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 976 - job_id: j1p8oz2zg + job_id: jp8qy61xp job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -258,4 +294,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:31:09Z' + timestamp: '2024-10-14T23:21:16Z' diff --git a/qai_hub_models/models/vit_quantized/README.md b/qai_hub_models/models/vit_quantized/README.md new file mode 100644 index 00000000..a4560b1a --- /dev/null +++ b/qai_hub_models/models/vit_quantized/README.md @@ -0,0 +1,59 @@ +[![Qualcomm® AI Hub Models](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/quic-logo.jpg)](../../README.md) + + +# [VITQuantized: Imagenet classifier and general purpose backbone](https://aihub.qualcomm.com/models/vit_quantized) + +VIT is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. + +This is based on the implementation of VITQuantized found +[here]({source_repo}). This repository contains scripts for optimized on-device +export suitable to run on Qualcomm® devices. More details on model performance +accross various devices, can be found [here](https://aihub.qualcomm.com/models/vit_quantized). + +[Sign up](https://myaccount.qualcomm.com/signup) to start using Qualcomm AI Hub and run these models on a hosted Qualcomm® device. + + + + +## Example & Usage + + +Once installed, run the following simple CLI demo: + +```bash +python -m qai_hub_models.models.vit_quantized.demo +``` +More details on the CLI tool can be found with the `--help` option. See +[demo.py](demo.py) for sample usage of the model including pre/post processing +scripts. Please refer to our [general instructions on using +models](../../../#getting-started) for more usage instructions. + +## Export for on-device deployment + +This repository contains export scripts that produce a model optimized for +on-device deployment. This can be run as follows: + +```bash +python -m qai_hub_models.models.vit_quantized.export +``` +Additional options are documented with the `--help` option. Note that the above +script requires access to Deployment instructions for Qualcomm® AI Hub. + + +## License +* The license for the original implementation of VITQuantized can be found + [here](https://github.com/pytorch/vision/blob/main/LICENSE). +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + + +## References +* [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929) +* [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/vision_transformer.py) + + + +## Community +* Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. +* For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). + + diff --git a/qai_hub_models/models/vit_quantized/__init__.py b/qai_hub_models/models/vit_quantized/__init__.py new file mode 100644 index 00000000..e86d7aee --- /dev/null +++ b/qai_hub_models/models/vit_quantized/__init__.py @@ -0,0 +1,10 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +from qai_hub_models.models._shared.imagenet_classifier.app import ( # noqa: F401 + ImagenetClassifierApp as App, +) + +from .model import MODEL_ID # noqa: F401 +from .model import VITQuantizable as Model # noqa: F401 diff --git a/qai_hub_models/models/vit_quantized/conftest.py b/qai_hub_models/models/vit_quantized/conftest.py new file mode 100644 index 00000000..28e56480 --- /dev/null +++ b/qai_hub_models/models/vit_quantized/conftest.py @@ -0,0 +1,37 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +# THIS FILE WAS AUTO-GENERATED. DO NOT EDIT MANUALLY. + +import inspect + +import pytest + +from qai_hub_models.models.vit_quantized import Model + + +# Instantiate the model only once for all tests. +# Mock from_pretrained to always return the initialized model. +# This speeds up tests and limits memory leaks. +@pytest.fixture(scope="module", autouse=True) +def cached_from_pretrained(): + with pytest.MonkeyPatch.context() as mp: + pretrained_cache = {} + from_pretrained = Model.from_pretrained + sig = inspect.signature(from_pretrained) + + def _cached_from_pretrained(*args, **kwargs): + cache_key = str(args) + str(kwargs) + model = pretrained_cache.get(cache_key, None) + if model: + return model + else: + model = from_pretrained(*args, **kwargs) + pretrained_cache[cache_key] = model + return model + + _cached_from_pretrained.__signature__ = sig + + mp.setattr(Model, "from_pretrained", _cached_from_pretrained) + yield mp diff --git a/qai_hub_models/models/vit_quantized/demo.py b/qai_hub_models/models/vit_quantized/demo.py new file mode 100644 index 00000000..71c37648 --- /dev/null +++ b/qai_hub_models/models/vit_quantized/demo.py @@ -0,0 +1,14 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +from qai_hub_models.models._shared.imagenet_classifier.demo import imagenet_demo +from qai_hub_models.models.vit_quantized.model import MODEL_ID, VITQuantizable + + +def main(is_test: bool = False): + imagenet_demo(VITQuantizable, MODEL_ID, is_test) + + +if __name__ == "__main__": + main() diff --git a/qai_hub_models/models/vit_quantized/evaluate.py b/qai_hub_models/models/vit_quantized/evaluate.py new file mode 100644 index 00000000..e27bad7a --- /dev/null +++ b/qai_hub_models/models/vit_quantized/evaluate.py @@ -0,0 +1,56 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +# THIS FILE WAS AUTO-GENERATED. DO NOT EDIT MANUALLY. + + +from __future__ import annotations + +import warnings + +import qai_hub as hub + +from qai_hub_models.models.vit_quantized import MODEL_ID, Model +from qai_hub_models.utils.args import evaluate_parser, get_hub_device, get_model_kwargs +from qai_hub_models.utils.evaluate import evaluate_on_dataset +from qai_hub_models.utils.inference import compile_model_from_args + +SUPPORTED_DATASETS = ["imagenette", "imagenet"] + + +def main(): + warnings.filterwarnings("ignore") + parser = evaluate_parser( + model_cls=Model, + default_split_size=2500, + supported_datasets=SUPPORTED_DATASETS, + supports_tflite=False, + is_hub_quantized=True, + ) + args = parser.parse_args() + args.device = None + + if args.hub_model_id is not None: + hub_model = hub.get_model(args.hub_model_id) + else: + hub_model = compile_model_from_args( + MODEL_ID, args, get_model_kwargs(Model, vars(args)) + ) + hub_device = get_hub_device(None, args.chipset) + torch_model = Model.from_pretrained(**get_model_kwargs(Model, vars(args))) + evaluate_on_dataset( + hub_model, + torch_model, + hub_device, + args.dataset_name, + args.split_size, + args.num_samples, + args.seed, + args.profile_options, + args.use_cache, + ) + + +if __name__ == "__main__": + main() diff --git a/qai_hub_models/models/vit_quantized/export.py b/qai_hub_models/models/vit_quantized/export.py new file mode 100644 index 00000000..9d8eb31a --- /dev/null +++ b/qai_hub_models/models/vit_quantized/export.py @@ -0,0 +1,250 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +# THIS FILE WAS AUTO-GENERATED. DO NOT EDIT MANUALLY. + + +from __future__ import annotations + +import os +import warnings +from pathlib import Path +from typing import Any, Dict, List, Optional, cast + +import qai_hub as hub +import torch + +from qai_hub_models.models.common import ExportResult, TargetRuntime +from qai_hub_models.models.vit_quantized import Model +from qai_hub_models.utils.args import ( + export_parser, + get_input_spec_kwargs, + get_model_kwargs, +) +from qai_hub_models.utils.compare import torch_inference +from qai_hub_models.utils.input_spec import make_torch_inputs +from qai_hub_models.utils.printing import ( + print_inference_metrics, + print_on_target_demo_cmd, + print_profile_metrics_from_job, +) +from qai_hub_models.utils.qai_hub_helpers import ( + can_access_qualcomm_ai_hub, + export_without_hub_access, +) +from qai_hub_models.utils.quantization import get_calibration_data + + +def export_model( + device: str = "Samsung Galaxy S23 (Family)", + chipset: Optional[str] = None, + num_calibration_samples: int = 100, + skip_compiling: bool = False, + skip_profiling: bool = False, + skip_inferencing: bool = False, + skip_downloading: bool = False, + skip_summary: bool = False, + output_dir: Optional[str] = None, + target_runtime: TargetRuntime = TargetRuntime.QNN, + compile_options: str = "", + profile_options: str = "", + **additional_model_kwargs, +) -> ExportResult | List[str]: + """ + This function executes the following recipe: + + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + 3. Compiles the model to an asset that can be run on device + 4. Profiles the model performance on a real device + 5. Inferences the model on sample inputs + 6. Downloads the model asset to the local directory + 7. Summarizes the results from profiling and inference + + Each of the last 5 steps can be optionally skipped using the input options. + + Parameters: + device: Device for which to export the model. + Full list of available devices can be found by running `hub.get_devices()`. + Defaults to DEFAULT_DEVICE if not specified. + chipset: If set, will choose a random device with this chipset. + Overrides the `device` argument. + num_calibration_samples: The number of calibration data samples + to use for quantization. + skip_compiling: If set, skips compiling model to format that can run on device. + skip_profiling: If set, skips profiling of compiled model on real devices. + skip_inferencing: If set, skips computing on-device outputs from sample data. + skip_downloading: If set, skips downloading of compiled model. + skip_summary: If set, skips waiting for and summarizing results + from profiling and inference. + output_dir: Directory to store generated assets (e.g. compiled model). + Defaults to `/build/`. + target_runtime: Which on-device runtime to target. Default is TFLite. + compile_options: Additional options to pass when submitting the compile job. + profile_options: Additional options to pass when submitting the profile job. + **additional_model_kwargs: Additional optional kwargs used to customize + `model_cls.from_pretrained` and `model.get_input_spec` + + Returns: + A struct of: + * A CompileJob object containing metadata about the compile job submitted to hub (None if compiling skipped). + * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). + * A QuantizeJob object containing metadata about the quantize job submitted to hub + """ + model_name = "vit_quantized" + output_path = Path(output_dir or Path.cwd() / "build" / model_name) + if chipset: + hub_device = hub.Device(attributes=f"chipset:{chipset}") + else: + hub_device = hub.Device(name=device) + if not can_access_qualcomm_ai_hub(): + return export_without_hub_access( + "vit_quantized", + "VITQuantized", + device, + skip_profiling, + skip_inferencing, + skip_downloading, + skip_summary, + output_path, + target_runtime, + compile_options, + profile_options, + ) + + # On-device perf improves with I/O in channel_last format except when using ONNX. + use_channel_last_format = target_runtime != TargetRuntime.ONNX + + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) + input_spec = model.get_input_spec( + **get_input_spec_kwargs(model, additional_model_kwargs) + ) + + # Trace the model + source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) + + print(f"Quantizing model {model_name} with {num_calibration_samples} samples.") + # 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + onnx_compile_job = hub.submit_compile_job( + model=source_model, + input_specs=input_spec, + device=hub_device, + name=model_name, + options="--target_runtime onnx", + ) + quantize_job = hub.submit_quantize_job( + model=onnx_compile_job.get_target_model(), + calibration_data=get_calibration_data( + input_spec, "imagenette", num_calibration_samples + ), + weights_dtype=model.get_weights_dtype(), + activations_dtype=model.get_activations_dtype(), + name=model_name, + options=model.get_quantize_options(), + ) + if skip_compiling: + return ExportResult(quantize_job=quantize_job) + + # 3. Compiles the model to an asset that can be run on device + model_compile_options = model.get_hub_compile_options( + target_runtime, compile_options, hub_device + ) + print(f"Optimizing model {model_name} to run on-device") + submitted_compile_job = hub.submit_compile_job( + model=quantize_job.get_target_model(), + input_specs=input_spec, + device=hub_device, + name=model_name, + options=model_compile_options, + ) + compile_job = cast(hub.client.CompileJob, submitted_compile_job) + + # 4. Profiles the model performance on a real device + profile_job: Optional[hub.client.ProfileJob] = None + if not skip_profiling: + profile_options_all = model.get_hub_profile_options( + target_runtime, profile_options + ) + print(f"Profiling model {model_name} on a hosted device.") + submitted_profile_job = hub.submit_profile_job( + model=compile_job.get_target_model(), + device=hub_device, + name=model_name, + options=profile_options_all, + ) + profile_job = cast(hub.client.ProfileJob, submitted_profile_job) + + # 5. Inferences the model on sample inputs + inference_job: Optional[hub.client.InferenceJob] = None + if not skip_inferencing: + profile_options_all = model.get_hub_profile_options( + target_runtime, profile_options + ) + print( + f"Running inference for {model_name} on a hosted device with example inputs." + ) + sample_inputs = model.sample_inputs( + input_spec, use_channel_last_format=use_channel_last_format + ) + submitted_inference_job = hub.submit_inference_job( + model=compile_job.get_target_model(), + inputs=sample_inputs, + device=hub_device, + name=model_name, + options=profile_options_all, + ) + inference_job = cast(hub.client.InferenceJob, submitted_inference_job) + + # 6. Downloads the model asset to the local directory + if not skip_downloading: + os.makedirs(output_path, exist_ok=True) + target_model: hub.Model = compile_job.get_target_model() # type: ignore + target_model.download(str(output_path / model_name)) + + # 7. Summarizes the results from profiling and inference + if not skip_summary and not skip_profiling: + assert profile_job is not None and profile_job.wait().success + profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore + print_profile_metrics_from_job(profile_job, profile_data) + + if not skip_summary and not skip_inferencing: + sample_inputs = model.sample_inputs(use_channel_last_format=False) + torch_out = torch_inference( + model, sample_inputs, return_channel_last_output=use_channel_last_format + ) + assert inference_job is not None and inference_job.wait().success + inference_result: hub.client.DatasetEntries = inference_job.download_output_data() # type: ignore + + print_inference_metrics( + inference_job, + inference_result, + torch_out, + model.get_output_names(), + metrics="psnr,top1,top5", + ) + + if not skip_summary: + print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) + + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + quantize_job=quantize_job, + ) + + +def main(): + warnings.filterwarnings("ignore") + parser = export_parser( + model_cls=Model, supports_tflite=False, is_hub_quantized=True + ) + args = parser.parse_args() + export_model(**vars(args)) + + +if __name__ == "__main__": + main() diff --git a/qai_hub_models/models/vit_quantized/info.yaml b/qai_hub_models/models/vit_quantized/info.yaml new file mode 100644 index 00000000..48b07d44 --- /dev/null +++ b/qai_hub_models/models/vit_quantized/info.yaml @@ -0,0 +1,46 @@ +name: VITQuantized +# id must match with the model dir name in qai_hub_models +id: vit_quantized +status: public +headline: Imagenet classifier and general purpose backbone. +domain: Computer Vision +description: VIT is a machine learning model that can classify images from the Imagenet + dataset. It can also be used as a backbone in building more complex models for specific + use cases. +use_case: Image Classification +tags: + - backbone + - quantized +research_paper: https://arxiv.org/abs/2010.11929 +research_paper_title: 'An Image is Worth 16x16 Words: Transformers for Image Recognition + at Scale' +license: https://github.com/pytorch/vision/blob/main/LICENSE +deploy_license: https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf +source_repo: + https://github.com/pytorch/vision/blob/main/torchvision/models/vision_transformer.py +technical_details: + Model checkpoint: Imagenet + Input resolution: 224x224 + Number of parameters: 86.6M + Model size: 85.9 MB +applicable_scenarios: + - Medical Imaging + - Anomaly Detection + - Inventory Management +related_models: + - mobilenet_v2 + - densenet121 + - googlenet +form_factors: + - Phone + - Tablet + - IoT + - XR +has_static_banner: true +has_animated_banner: true +license_type: bsd-3-clause +deploy_license_type: AI Model Hub License +dataset: + - imagenet-1k + - imagenet-22k +labels_file: imagenet_labels.txt diff --git a/qai_hub_models/models/vit_quantized/model.py b/qai_hub_models/models/vit_quantized/model.py new file mode 100644 index 00000000..0212fbb8 --- /dev/null +++ b/qai_hub_models/models/vit_quantized/model.py @@ -0,0 +1,14 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +from __future__ import annotations + +from qai_hub_models.models.vit.model import VIT +from qai_hub_models.utils.quantization import HubQuantizableMixin + +MODEL_ID = __name__.split(".")[-2] + + +class VITQuantizable(HubQuantizableMixin, VIT): + pass diff --git a/qai_hub_models/models/vit_quantized/perf.yaml b/qai_hub_models/models/vit_quantized/perf.yaml new file mode 100644 index 00000000..ec5e76d7 --- /dev/null +++ b/qai_hub_models/models/vit_quantized/perf.yaml @@ -0,0 +1,313 @@ +aggregated: + supported_oses: + - Android + supported_devices: + - Snapdragon 8 Elite QRD + - Samsung Galaxy S24 + - Samsung Galaxy S24 Ultra + - Samsung Galaxy S24+ + - Snapdragon 8 Gen 3 QRD + - Samsung Galaxy S23 + - Samsung Galaxy S23 Ultra + - Samsung Galaxy S23+ + - Samsung Galaxy S22 5G + - Samsung Galaxy S22 Ultra 5G + - Samsung Galaxy S22+ 5G + - Samsung Galaxy Tab S8 + - Xiaomi 12 + - Xiaomi 12 Pro + - Samsung Galaxy S21 + - Samsung Galaxy S21 Ultra + - Samsung Galaxy S21+ + - Snapdragon X Elite CRD + - Snapdragon X Plus 8-Core CRD + - QCS6490 (Proxy) + - RB3 Gen 2 (Proxy) + - QCS8450 (Proxy) + - XR2 Gen 2 (Proxy) + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8775 (Proxy) + supported_chipsets: + - Snapdragon® 8 Elite + - Snapdragon® 8 Gen 3 + - Snapdragon® 8 Gen 2 + - Snapdragon® 8 Gen 1 + - Snapdragon® 888 + - Snapdragon® X Elite + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8775P Proxy +models: +- name: VITQuantized + performance_metrics: + - torchscript_onnx_qnn: + inference_time: 5499.0 + throughput: 181.8512456810329 + estimated_peak_memory_range: + min: 12288 + max: 31586800 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 903 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 903 + job_id: jp8q278kp + job_status: Passed + torchscript_onnx: + inference_time: 43244.0 + throughput: 23.124595319581907 + estimated_peak_memory_range: + min: 360448 + max: 5032392 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 1654 + layers_on_gpu: 0 + layers_on_cpu: 25 + total_layers: 1679 + job_id: j5wew9835 + job_status: Passed + reference_device_info: + name: Samsung Galaxy S23 + os: '13' + form_factor: Phone + os_name: Android + manufacturer: Samsung + chipset: Snapdragon® 8 Gen 2 + timestamp: '2024-10-17T17:15:24Z' + - torchscript_onnx_qnn: + inference_time: 3591.0 + throughput: 278.473962684489 + estimated_peak_memory_range: + min: 163840 + max: 59568672 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 903 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 903 + job_id: jgkevydwg + job_status: Passed + torchscript_onnx: + inference_time: 32761.0 + throughput: 30.52409877598364 + estimated_peak_memory_range: + min: 221184 + max: 799841776 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 1654 + layers_on_gpu: 0 + layers_on_cpu: 25 + total_layers: 1679 + job_id: jg9l04kwg + job_status: Passed + reference_device_info: + name: Samsung Galaxy S24 + os: '14' + form_factor: Phone + os_name: Android + manufacturer: Samsung + chipset: Snapdragon® 8 Gen 3 + timestamp: '2024-10-17T17:15:26Z' + - torchscript_onnx_qnn: + inference_time: 22428.0 + throughput: 44.58712323880863 + estimated_peak_memory_range: + min: 253952 + max: 8408304 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 902 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 902 + job_id: j5q602wnp + job_status: Passed + reference_device_info: + name: RB3 Gen 2 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS6490 Proxy + timestamp: '2024-10-17T17:15:10Z' + - torchscript_onnx_qnn: + inference_time: 4939.0 + throughput: 202.47013565499088 + estimated_peak_memory_range: + min: 184320 + max: 1447040 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 903 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 903 + job_id: jglv4k7j5 + job_status: Passed + reference_device_info: + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8550 Proxy + timestamp: '2024-10-17T17:15:12Z' + - torchscript_onnx_qnn: + inference_time: 4931.0 + throughput: 202.7986209693774 + estimated_peak_memory_range: + min: 229376 + max: 1524040 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 903 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 903 + job_id: jp3jnm83g + job_status: Passed + reference_device_info: + name: SA8255 (Proxy) + os: '13' + form_factor: Auto + os_name: Android + manufacturer: Qualcomm + chipset: SA8255P Proxy + timestamp: '2024-10-17T17:15:15Z' + - torchscript_onnx_qnn: + inference_time: 4937.0 + throughput: 202.55215718047398 + estimated_peak_memory_range: + min: 217088 + max: 1425256 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 903 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 903 + job_id: jgo2zvmqp + job_status: Passed + reference_device_info: + name: SA8775 (Proxy) + os: '13' + form_factor: Auto + os_name: Android + manufacturer: Qualcomm + chipset: SA8775P Proxy + timestamp: '2024-10-17T17:15:17Z' + - torchscript_onnx_qnn: + inference_time: 6251.0 + throughput: 159.97440409534474 + estimated_peak_memory_range: + min: 163840 + max: 60155424 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 903 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 903 + job_id: jpv6qwek5 + job_status: Passed + reference_device_info: + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-17T17:15:18Z' + - torchscript_onnx_qnn: + inference_time: 3394.0 + throughput: 294.6375957572186 + estimated_peak_memory_range: + min: 159744 + max: 75826496 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 903 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 903 + job_id: jgjvdlovg + job_status: Passed + torchscript_onnx: + inference_time: 27705.0 + throughput: 36.094567767550984 + estimated_peak_memory_range: + min: 0 + max: 339511920 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 1654 + layers_on_gpu: 0 + layers_on_cpu: 25 + total_layers: 1679 + job_id: jgdxnv8rp + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-17T17:15:29Z' + - torchscript_onnx_qnn: + inference_time: 5356.0 + throughput: 186.70649738610905 + estimated_peak_memory_range: + min: 180224 + max: 180224 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 903 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 903 + job_id: j56y21v6p + job_status: Passed + torchscript_onnx: + inference_time: 58341.0 + throughput: 17.140604377710357 + estimated_peak_memory_range: + min: 239259648 + max: 239259648 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 1654 + layers_on_gpu: 0 + layers_on_cpu: 25 + total_layers: 1679 + job_id: jp142878p + job_status: Passed + reference_device_info: + name: Snapdragon X Elite CRD + os: '11' + form_factor: Compute + os_name: Windows + manufacturer: Qualcomm + chipset: Snapdragon® X Elite + timestamp: '2024-10-17T17:15:28Z' diff --git a/qai_hub_models/models/whisper_base_en/README.md b/qai_hub_models/models/whisper_base_en/README.md index 9c784e92..e92925a2 100644 --- a/qai_hub_models/models/whisper_base_en/README.md +++ b/qai_hub_models/models/whisper_base_en/README.md @@ -6,7 +6,7 @@ OpenAI’s Whisper ASR (Automatic Speech Recognition) model is a state-of-the-art system designed for transcribing spoken language into written text. It exhibits robust performance in realistic, noisy environments, making it highly reliable for real-world applications. Specifically, it excels in long-form transcription, capable of accurately transcribing audio clips up to 30 seconds long. Time to the first token is the encoder's latency, while time to each additional token is decoder's latency, where we assume a mean decoded length specified below. This is based on the implementation of Whisper-Base-En found -[here](https://github.com/openai/whisper/tree/main). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/whisper_base_en). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.whisper_base_en.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Whisper-Base-En can be found +* The license for the original implementation of Whisper-Base-En can be found [here](https://github.com/openai/whisper/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Robust Speech Recognition via Large-Scale Weak Supervision](https://cdn.openai.com/papers/whisper.pdf) * [Source Model Implementation](https://github.com/openai/whisper/tree/main) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/whisper_base_en/export.py b/qai_hub_models/models/whisper_base_en/export.py index 978aef9f..7f5cb094 100644 --- a/qai_hub_models/models/whisper_base_en/export.py +++ b/qai_hub_models/models/whisper_base_en/export.py @@ -10,14 +10,15 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Mapping, Optional, Tuple, cast +from typing import Any, Dict, List, Mapping, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.whisper_base_en import Model from qai_hub_models.utils.args import export_parser, get_model_kwargs -from qai_hub_models.utils.base_model import BaseModel, TargetRuntime +from qai_hub_models.utils.base_model import BaseModel from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -45,20 +46,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Mapping[ - str, Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] -] | List[str]: +) -> Mapping[str, ExportResult] | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -83,10 +82,10 @@ def export_model( `model_cls.from_pretrained` Returns: - A Mapping from component_name to a 3-tuple of: + A Mapping from component_name to a struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "whisper_base_en" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -118,7 +117,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) components_dict: Dict[str, BaseModel] = {} if "WhisperEncoder" in components: @@ -135,7 +134,7 @@ def export_model( component.to("cpu"), make_torch_inputs(input_spec) ) - # 2. Compile the models to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = component.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -151,7 +150,7 @@ def export_model( hub.client.CompileJob, submitted_compile_job ) - # 3. Profile the model assets on real devices + # 3. Profiles the model performance on a real device profile_jobs: Dict[str, hub.client.ProfileJob] = {} if not skip_profiling: for component_name in components: @@ -169,7 +168,7 @@ def export_model( hub.client.ProfileJob, submitted_profile_job ) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_jobs: Dict[str, hub.client.InferenceJob] = {} if not skip_inferencing: for component_name in components: @@ -193,14 +192,14 @@ def export_model( hub.client.InferenceJob, submitted_inference_job ) - # 5. Download the model assets to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) for component_name, compile_job in compile_jobs.items(): target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / component_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: for component_name in components: profile_job = profile_jobs[component_name] @@ -225,10 +224,10 @@ def export_model( ) return { - component_name: ( - compile_jobs[component_name], - profile_jobs.get(component_name, None), - inference_jobs.get(component_name, None), + component_name: ExportResult( + compile_job=compile_jobs[component_name], + inference_job=inference_jobs.get(component_name, None), + profile_job=profile_jobs.get(component_name, None), ) for component_name in components } diff --git a/qai_hub_models/models/whisper_base_en/perf.yaml b/qai_hub_models/models/whisper_base_en/perf.yaml index ec42e6f5..b3ab9930 100644 --- a/qai_hub_models/models/whisper_base_en/perf.yaml +++ b/qai_hub_models/models/whisper_base_en/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: WhisperEncoder performance_metrics: - torchscript_onnx_tflite: - inference_time: 205173.0 - throughput: 4.873935654301492 + inference_time: 204008.0 + throughput: 4.901768558095761 estimated_peak_memory_range: - min: 36311040 - max: 118811624 + min: 22716416 + max: 98255680 primary_compute_unit: GPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 408 layers_on_cpu: 11 total_layers: 419 - job_id: jvgdwql65 + job_id: jp0z0dke5 job_status: Passed torchscript_onnx_qnn: - inference_time: 376271.0 - throughput: 2.657658974515708 + inference_time: 300732.0 + throughput: 3.3252197970285837 estimated_peak_memory_range: - min: 126976 - max: 83512096 + min: 57344 + max: 88122224 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 531 - job_id: jw566n4n5 + job_id: jpxko3395 job_status: Passed torchscript_onnx: - inference_time: 412243.0 - throughput: 2.425753742331586 + inference_time: 282733.0 + throughput: 3.5369058440295262 estimated_peak_memory_range: - min: 37031936 - max: 172609304 + min: 12832768 + max: 149000496 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 380 - job_id: j0pxvyojg + job_id: jp14z77lp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:30:08Z' + timestamp: '2024-10-14T23:20:07Z' - torchscript_onnx_tflite: - inference_time: 166323.0 - throughput: 6.012397563776507 + inference_time: 166502.0 + throughput: 6.005933862656304 estimated_peak_memory_range: min: 40566784 - max: 78581776 + max: 79279712 primary_compute_unit: GPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 408 layers_on_cpu: 11 total_layers: 419 - job_id: jqp4qd02g + job_id: jgkexodog job_status: Passed torchscript_onnx_qnn: - inference_time: 271385.0 - throughput: 3.6848020340107226 + inference_time: 222410.0 + throughput: 4.496200710399712 estimated_peak_memory_range: - min: 606208 - max: 170092720 + min: 0 + max: 304439280 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 531 - job_id: jwgoy3615 + job_id: jgn6voom5 job_status: Passed torchscript_onnx: - inference_time: 324335.0 - throughput: 3.083231843618481 + inference_time: 226140.0 + throughput: 4.422039444591846 estimated_peak_memory_range: min: 0 - max: 831425248 + max: 1028259808 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 380 - job_id: jegn236vg + job_id: j5we68165 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:30:10Z' + timestamp: '2024-10-14T23:20:09Z' - torchscript_onnx_tflite: - inference_time: 197960.0 - throughput: 5.0515255607193374 + inference_time: 198111.0 + throughput: 5.047675293143743 estimated_peak_memory_range: - min: 4059136 - max: 68986424 + min: 29540352 + max: 106596056 primary_compute_unit: GPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 408 layers_on_cpu: 11 total_layers: 419 - job_id: jo5mr6y7g + job_id: jglvmo7l5 job_status: Passed torchscript_onnx_qnn: - inference_time: 235151.0 - throughput: 4.252586635821238 + inference_time: 226448.0 + throughput: 4.416024871052074 estimated_peak_memory_range: - min: 299008 - max: 11327888 + min: 663552 + max: 2063088 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 531 - job_id: jygzerd4g + job_id: jp0z0dde5 job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:29:59Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:19:53Z' - torchscript_onnx_tflite: - inference_time: 291293.0 - throughput: 3.4329695529930344 + inference_time: 195152.0 + throughput: 5.1242108715257855 estimated_peak_memory_range: - min: 40312832 - max: 86790544 + min: 35758080 + max: 117749792 primary_compute_unit: GPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 408 layers_on_cpu: 11 total_layers: 419 - job_id: joprk2jk5 + job_id: j5we68xj5 job_status: Passed torchscript_onnx_qnn: - inference_time: 439323.0 - throughput: 2.2762295623038176 + inference_time: 238198.0 + throughput: 4.198188062032427 estimated_peak_memory_range: - min: 495616 - max: 180370032 + min: 163840 + max: 10740728 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 531 - job_id: jvgdwo1k5 + job_id: jp3j0xxzg job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:30:07Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:19:59Z' - torchscript_onnx_tflite: - inference_time: 204428.0 - throughput: 4.89169781047606 + inference_time: 198204.0 + throughput: 5.045306855562956 estimated_peak_memory_range: - min: 16384 - max: 82544840 + min: 39964672 + max: 109288784 primary_compute_unit: GPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 408 layers_on_cpu: 11 total_layers: 419 - job_id: jqpyej00g + job_id: jpedm8205 job_status: Passed torchscript_onnx_qnn: - inference_time: 239045.0 - throughput: 4.183312765378903 + inference_time: 235704.0 + throughput: 4.242609374469674 estimated_peak_memory_range: - min: 745472 - max: 1974112 + min: 262144 + max: 11261928 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 531 - job_id: jmg9vwnm5 + job_id: jglvmool5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:30:01Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:19:57Z' - torchscript_onnx_tflite: - inference_time: 201134.0 - throughput: 4.9718098382173075 + inference_time: 211852.0 + throughput: 4.720276419387119 estimated_peak_memory_range: - min: 15388672 - max: 202327768 + min: 16916480 + max: 87093520 primary_compute_unit: GPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 408 layers_on_cpu: 11 total_layers: 419 - job_id: j1p8ozyqg + job_id: jpv6ke4m5 job_status: Passed torchscript_onnx_qnn: - inference_time: 237663.0 - throughput: 4.207638547018257 + inference_time: 228823.0 + throughput: 4.37019005956569 estimated_peak_memory_range: - min: 659456 - max: 32487016 + min: 708608 + max: 1994360 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 531 - job_id: jvgdwo165 + job_id: jgkexooog job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:30:03Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:19:55Z' - torchscript_onnx_tflite: - inference_time: 196903.0 - throughput: 5.078642783502537 + inference_time: 286641.0 + throughput: 3.488684451980003 estimated_peak_memory_range: - min: 25333760 - max: 69670832 + min: 41242624 + max: 87531568 primary_compute_unit: GPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 408 layers_on_cpu: 11 total_layers: 419 - job_id: jn5q83qe5 + job_id: jp3j0x8zg job_status: Passed torchscript_onnx_qnn: - inference_time: 241292.0 - throughput: 4.144356215705452 + inference_time: 327803.0 + throughput: 3.050612715563921 estimated_peak_memory_range: - min: 303104 - max: 11376744 + min: 598016 + max: 315411264 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 531 - job_id: jmg9vwnq5 + job_id: jpedm8805 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:20:03Z' + - torchscript_onnx_tflite: + inference_time: 167912.0 + throughput: 5.955500500262042 + estimated_peak_memory_range: + min: 40484864 + max: 61780944 + primary_compute_unit: GPU + precision: fp16 + layer_info: + layers_on_npu: 0 + layers_on_gpu: 408 + layers_on_cpu: 11 + total_layers: 419 + job_id: j57yrkkr5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 196114.0 + throughput: 5.099075027789959 + estimated_peak_memory_range: + min: 0 + max: 321859152 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 531 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 531 + job_id: j5we688j5 + job_status: Passed + torchscript_onnx: + inference_time: 197760.0 + throughput: 5.05663430420712 + estimated_peak_memory_range: + min: 82677760 + max: 746041552 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 380 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 380 + job_id: jpxko3d15 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:30:05Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:20:15Z' - torchscript_onnx_qnn: - inference_time: 196372.0 - throughput: 5.092375695109283 + inference_time: 179530.0 + throughput: 5.570099704784716 estimated_peak_memory_range: min: 483328 max: 483328 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 531 - job_id: j7gjxen1p + job_id: jp2ky44mp job_status: Passed torchscript_onnx: - inference_time: 413624.0 - throughput: 2.417654681546525 + inference_time: 308550.0 + throughput: 3.2409658078107277 estimated_peak_memory_range: - min: 139689984 - max: 139689984 + min: 139694080 + max: 139694080 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 380 - job_id: jep28lkxp + job_id: jp14z7v2p job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,15 +429,15 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:30:12Z' + timestamp: '2024-10-14T23:20:10Z' - name: WhisperDecoder performance_metrics: - torchscript_onnx_tflite: - inference_time: 14243.0 - throughput: 70.20992768377448 + inference_time: 14594.0 + throughput: 68.52131012744964 estimated_peak_memory_range: - min: 5775360 - max: 8650080 + min: 3473408 + max: 5768536 primary_compute_unit: NPU precision: fp16 layer_info: @@ -394,14 +445,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 983 - job_id: jz57zl3np + job_id: jp8qy688p job_status: Passed torchscript_onnx_qnn: - inference_time: 4268.0 - throughput: 234.30178069353326 + inference_time: 4038.0 + throughput: 247.64735017335315 estimated_peak_memory_range: - min: 3080192 - max: 148081824 + min: 9445376 + max: 218468648 primary_compute_unit: NPU precision: fp16 layer_info: @@ -409,14 +460,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 821 - job_id: j1p3ke0m5 + job_id: j5mnxooqp job_status: Passed torchscript_onnx: - inference_time: 17358.0 - throughput: 57.61032377001959 + inference_time: 32412.0 + throughput: 30.852770578797976 estimated_peak_memory_range: - min: 32768 - max: 1813739296 + min: 98304 + max: 122110128 primary_compute_unit: NPU precision: fp16 layer_info: @@ -424,7 +475,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 844 - job_id: jo5mr3xyg + job_id: jgdx188lp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -433,13 +484,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:30:09Z' + timestamp: '2024-10-14T23:20:07Z' - torchscript_onnx_tflite: - inference_time: 11488.0 - throughput: 87.04735376044569 + inference_time: 12726.0 + throughput: 78.57928650007858 estimated_peak_memory_range: min: 5758976 - max: 98507808 + max: 104316656 primary_compute_unit: NPU precision: fp16 layer_info: @@ -447,14 +498,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 983 - job_id: j0pxv628g + job_id: j5q6qzwmp job_status: Passed torchscript_onnx_qnn: - inference_time: 3153.0 - throughput: 317.1582619727244 + inference_time: 3126.0 + throughput: 319.8976327575176 estimated_peak_memory_range: - min: 21233664 - max: 57602656 + min: 21217280 + max: 59772608 primary_compute_unit: NPU precision: fp16 layer_info: @@ -462,14 +513,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 821 - job_id: j1pv3vkz5 + job_id: jprv3ooeg job_status: Passed torchscript_onnx: - inference_time: 16181.0 - throughput: 61.80087757246153 + inference_time: 14349.0 + throughput: 69.69126768415917 estimated_peak_memory_range: - min: 41398272 - max: 439256992 + min: 39747584 + max: 460106720 primary_compute_unit: NPU precision: fp16 layer_info: @@ -477,7 +528,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 844 - job_id: joprkevv5 + job_id: jg9lnkxlg job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -486,13 +537,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:30:10Z' + timestamp: '2024-10-14T23:20:09Z' - torchscript_onnx_tflite: - inference_time: 13974.0 - throughput: 71.56147130385001 + inference_time: 14054.0 + throughput: 71.15411982353778 estimated_peak_memory_range: - min: 5771264 - max: 8346400 + min: 5779456 + max: 8021800 primary_compute_unit: NPU precision: fp16 layer_info: @@ -500,14 +551,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 983 - job_id: jegn2m8jg + job_id: j56y4rv7p job_status: Passed torchscript_onnx_qnn: - inference_time: 4059.0 - throughput: 246.3661000246366 + inference_time: 4057.0 + throughput: 246.48755237860487 estimated_peak_memory_range: - min: 19922944 - max: 21078744 + min: 21327872 + max: 22539000 primary_compute_unit: NPU precision: fp16 layer_info: @@ -515,7 +566,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 821 - job_id: jz5woq64p + job_id: jp8qy668p job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -523,14 +574,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:30:00Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:19:54Z' - torchscript_onnx_tflite: - inference_time: 16543.0 - throughput: 60.44852807834129 + inference_time: 14752.0 + throughput: 67.78741865509761 estimated_peak_memory_range: - min: 16384 - max: 85309280 + min: 5828608 + max: 7813864 primary_compute_unit: NPU precision: fp16 layer_info: @@ -538,14 +589,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 983 - job_id: jep289n6p + job_id: jg9lnk8vg job_status: Passed torchscript_onnx_qnn: - inference_time: 4873.0 - throughput: 205.21239482864766 + inference_time: 4109.0 + throughput: 243.3682161109759 estimated_peak_memory_range: - min: 18452480 - max: 55006240 + min: 21270528 + max: 22576728 primary_compute_unit: NPU precision: fp16 layer_info: @@ -553,22 +604,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 821 - job_id: jz57zxrqp + job_id: jgo26oodp job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:30:07Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:19:59Z' - torchscript_onnx_tflite: - inference_time: 14162.0 - throughput: 70.61149555147578 + inference_time: 14219.0 + throughput: 70.3284337857796 estimated_peak_memory_range: - min: 5783552 - max: 8083872 + min: 5795840 + max: 8087320 primary_compute_unit: NPU precision: fp16 layer_info: @@ -576,14 +627,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 983 - job_id: j2p0yl00g + job_id: jgz3d8w65 job_status: Passed torchscript_onnx_qnn: - inference_time: 4142.0 - throughput: 241.42926122646065 + inference_time: 4185.0 + throughput: 238.94862604540023 estimated_peak_memory_range: - min: 21258240 - max: 24414552 + min: 21303296 + max: 22545288 primary_compute_unit: NPU precision: fp16 layer_info: @@ -591,22 +642,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 821 - job_id: jnp10ezn5 + job_id: j56y4rr7p job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:30:02Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:19:57Z' - torchscript_onnx_tflite: - inference_time: 14307.0 - throughput: 69.89585517578807 + inference_time: 14446.0 + throughput: 69.22331441229406 estimated_peak_memory_range: - min: 5758976 - max: 7751376 + min: 5754880 + max: 8122424 primary_compute_unit: NPU precision: fp16 layer_info: @@ -614,14 +665,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 983 - job_id: jogkz3xvg + job_id: jgjvno18g job_status: Passed torchscript_onnx_qnn: - inference_time: 4065.0 - throughput: 246.00246002460025 + inference_time: 4201.0 + throughput: 238.03856224708403 estimated_peak_memory_range: - min: 21262336 - max: 22551160 + min: 21295104 + max: 24611560 primary_compute_unit: NPU precision: fp16 layer_info: @@ -629,22 +680,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 821 - job_id: jz5woq6zp + job_id: j5q6qzzmp job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:30:04Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:19:55Z' - torchscript_onnx_tflite: - inference_time: 14282.0 - throughput: 70.01820473323065 + inference_time: 16093.0 + throughput: 62.138818119679364 estimated_peak_memory_range: - min: 5767168 - max: 8360160 + min: 5758976 + max: 96105728 primary_compute_unit: NPU precision: fp16 layer_info: @@ -652,14 +703,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 983 - job_id: j1gln3m2p + job_id: jgo26omdp job_status: Passed torchscript_onnx_qnn: - inference_time: 4024.0 - throughput: 248.5089463220676 + inference_time: 4775.0 + throughput: 209.4240837696335 estimated_peak_memory_range: - min: 21315584 - max: 22656496 + min: 21213184 + max: 61196176 primary_compute_unit: NPU precision: fp16 layer_info: @@ -667,19 +718,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 821 - job_id: jnp10ezk5 + job_id: jgz3d8865 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:20:03Z' + - torchscript_onnx_tflite: + inference_time: 9185.0 + throughput: 108.87316276537834 + estimated_peak_memory_range: + min: 4014080 + max: 56241248 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 983 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 983 + job_id: jp4lrmml5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 2586.0 + throughput: 386.69760247486465 + estimated_peak_memory_range: + min: 21209088 + max: 57488704 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 821 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 821 + job_id: jg9lnkkvg + job_status: Passed + torchscript_onnx: + inference_time: 12009.0 + throughput: 83.27088017320344 + estimated_peak_memory_range: + min: 30158848 + max: 290867904 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 844 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 844 + job_id: j5mnxodwp + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:30:05Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:20:15Z' - torchscript_onnx_qnn: - inference_time: 3663.0 - throughput: 273.000273000273 + inference_time: 3678.0 + throughput: 271.8868950516585 estimated_peak_memory_range: min: 21229568 max: 21229568 @@ -690,14 +794,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 821 - job_id: jlpe9km8g + job_id: jpy13qq4p job_status: Passed torchscript_onnx: - inference_time: 14264.0 - throughput: 70.10656197420079 + inference_time: 14577.0 + throughput: 68.60122110173562 estimated_peak_memory_range: - min: 112201728 - max: 112201728 + min: 112259072 + max: 112259072 primary_compute_unit: NPU precision: fp16 layer_info: @@ -705,7 +809,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 844 - job_id: jqpye61rg + job_id: jgdx18zep job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -714,4 +818,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:30:12Z' + timestamp: '2024-10-14T23:20:11Z' diff --git a/qai_hub_models/models/whisper_small_en/README.md b/qai_hub_models/models/whisper_small_en/README.md index a7227ed0..cb644d65 100644 --- a/qai_hub_models/models/whisper_small_en/README.md +++ b/qai_hub_models/models/whisper_small_en/README.md @@ -6,7 +6,7 @@ OpenAI’s Whisper ASR (Automatic Speech Recognition) model is a state-of-the-art system designed for transcribing spoken language into written text. It exhibits robust performance in realistic, noisy environments, making it highly reliable for real-world applications. Specifically, it excels in long-form transcription, capable of accurately transcribing audio clips up to 30 seconds long. Time to the first token is the encoder's latency, while time to each additional token is decoder's latency, where we assume a mean decoded length specified below. This is based on the implementation of Whisper-Small-En found -[here](https://github.com/openai/whisper/tree/main). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/whisper_small_en). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.whisper_small_en.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Whisper-Small-En can be found +* The license for the original implementation of Whisper-Small-En can be found [here](https://github.com/openai/whisper/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Robust Speech Recognition via Large-Scale Weak Supervision](https://cdn.openai.com/papers/whisper.pdf) * [Source Model Implementation](https://github.com/openai/whisper/tree/main) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/whisper_small_en/export.py b/qai_hub_models/models/whisper_small_en/export.py index b36a7bf0..bc0f7e93 100644 --- a/qai_hub_models/models/whisper_small_en/export.py +++ b/qai_hub_models/models/whisper_small_en/export.py @@ -10,14 +10,15 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Mapping, Optional, Tuple, cast +from typing import Any, Dict, List, Mapping, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.whisper_small_en import Model from qai_hub_models.utils.args import export_parser, get_model_kwargs -from qai_hub_models.utils.base_model import BaseModel, TargetRuntime +from qai_hub_models.utils.base_model import BaseModel from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -45,20 +46,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Mapping[ - str, Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] -] | List[str]: +) -> Mapping[str, ExportResult] | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -83,10 +82,10 @@ def export_model( `model_cls.from_pretrained` Returns: - A Mapping from component_name to a 3-tuple of: + A Mapping from component_name to a struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "whisper_small_en" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -118,7 +117,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) components_dict: Dict[str, BaseModel] = {} if "WhisperEncoder" in components: @@ -135,7 +134,7 @@ def export_model( component.to("cpu"), make_torch_inputs(input_spec) ) - # 2. Compile the models to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = component.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -151,7 +150,7 @@ def export_model( hub.client.CompileJob, submitted_compile_job ) - # 3. Profile the model assets on real devices + # 3. Profiles the model performance on a real device profile_jobs: Dict[str, hub.client.ProfileJob] = {} if not skip_profiling: for component_name in components: @@ -169,7 +168,7 @@ def export_model( hub.client.ProfileJob, submitted_profile_job ) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_jobs: Dict[str, hub.client.InferenceJob] = {} if not skip_inferencing: for component_name in components: @@ -193,14 +192,14 @@ def export_model( hub.client.InferenceJob, submitted_inference_job ) - # 5. Download the model assets to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) for component_name, compile_job in compile_jobs.items(): target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / component_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: for component_name in components: profile_job = profile_jobs[component_name] @@ -225,10 +224,10 @@ def export_model( ) return { - component_name: ( - compile_jobs[component_name], - profile_jobs.get(component_name, None), - inference_jobs.get(component_name, None), + component_name: ExportResult( + compile_job=compile_jobs[component_name], + inference_job=inference_jobs.get(component_name, None), + profile_job=profile_jobs.get(component_name, None), ) for component_name in components } diff --git a/qai_hub_models/models/whisper_small_en/perf.yaml b/qai_hub_models/models/whisper_small_en/perf.yaml index b7267838..fc159e91 100644 --- a/qai_hub_models/models/whisper_small_en/perf.yaml +++ b/qai_hub_models/models/whisper_small_en/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: WhisperEncoder performance_metrics: - torchscript_onnx_tflite: - inference_time: 707067.0 - throughput: 1.414293129222549 + inference_time: 696399.0 + throughput: 1.4359584089006445 estimated_peak_memory_range: - min: 18771968 - max: 438129848 + min: 84676608 + max: 495897016 primary_compute_unit: GPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 900 layers_on_cpu: 11 total_layers: 911 - job_id: j0pxv6v8g + job_id: jg9lnk4vg job_status: Passed torchscript_onnx_qnn: - inference_time: 1196020.0 - throughput: 0.8361064196250899 + inference_time: 854116.0 + throughput: 1.170801155814901 estimated_peak_memory_range: - min: 53248 - max: 212393776 + min: 49152 + max: 216681440 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,7 +71,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1329 - job_id: j1pv3roz5 + job_id: jgo26o1dp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -82,13 +80,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:28:19Z' + timestamp: '2024-10-14T23:18:17Z' - torchscript_onnx_tflite: - inference_time: 550315.0 - throughput: 1.817141091920082 + inference_time: 549997.0 + throughput: 1.818191735591285 estimated_peak_memory_range: - min: 116359168 - max: 206759088 + min: 116482048 + max: 206660112 primary_compute_unit: GPU precision: fp16 layer_info: @@ -96,14 +94,14 @@ models: layers_on_gpu: 900 layers_on_cpu: 11 total_layers: 911 - job_id: jegn2m2jg + job_id: jgdx18vlp job_status: Passed torchscript_onnx_qnn: - inference_time: 915241.0 - throughput: 1.0926083949473417 + inference_time: 693407.0 + throughput: 1.4421544633959564 estimated_peak_memory_range: min: 0 - max: 467783184 + max: 879347504 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,7 +109,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1329 - job_id: jlpe9w18g + job_id: jgjvno08g + job_status: Passed + torchscript_onnx: + inference_time: 861217.0 + throughput: 1.1611475388897339 + estimated_peak_memory_range: + min: 120385536 + max: 4516540736 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 884 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 884 + job_id: j56y4rq7p job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -120,13 +133,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:28:21Z' + timestamp: '2024-10-14T23:18:19Z' - torchscript_onnx_tflite: - inference_time: 709967.0 - throughput: 1.4085161704698952 + inference_time: 689836.0 + throughput: 1.4496199096596871 estimated_peak_memory_range: - min: 31993856 - max: 434173424 + min: 41590784 + max: 441984352 primary_compute_unit: GPU precision: fp16 layer_info: @@ -134,14 +147,14 @@ models: layers_on_gpu: 900 layers_on_cpu: 11 total_layers: 911 - job_id: jep28986p + job_id: jp4lrmxl5 job_status: Passed torchscript_onnx_qnn: - inference_time: 769338.0 - throughput: 1.2998188052585469 + inference_time: 694520.0 + throughput: 1.4398433450440593 estimated_peak_memory_range: - min: 884736 - max: 2663424 + min: 995328 + max: 2386016 primary_compute_unit: NPU precision: fp16 layer_info: @@ -149,7 +162,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1329 - job_id: jvgdwq965 + job_id: jg9lnk3vg job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -157,14 +170,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:28:25Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:18:02Z' - torchscript_onnx_tflite: - inference_time: 981977.0 - throughput: 1.0183537903637254 + inference_time: 683626.0 + throughput: 1.462788132692437 estimated_peak_memory_range: - min: 93597696 - max: 197478064 + min: 87117824 + max: 489389752 primary_compute_unit: GPU precision: fp16 layer_info: @@ -172,14 +185,14 @@ models: layers_on_gpu: 900 layers_on_cpu: 11 total_layers: 911 - job_id: j2p0y2q0g + job_id: jp8qy638p job_status: Passed torchscript_onnx_qnn: - inference_time: 1274997.0 - throughput: 0.7843155709385983 + inference_time: 712687.0 + throughput: 1.4031405090874396 estimated_peak_memory_range: - min: 0 - max: 580725648 + min: 1200128 + max: 32224872 primary_compute_unit: NPU precision: fp16 layer_info: @@ -187,22 +200,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1329 - job_id: jqpyejw0g + job_id: j5mnxovqp job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:28:33Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:18:09Z' - torchscript_onnx_tflite: - inference_time: 696264.0 - throughput: 1.4362368297082715 + inference_time: 673361.0 + throughput: 1.485087493929705 estimated_peak_memory_range: - min: 93286400 - max: 506950840 + min: 9539584 + max: 234029624 primary_compute_unit: GPU precision: fp16 layer_info: @@ -210,14 +223,14 @@ models: layers_on_gpu: 900 layers_on_cpu: 11 total_layers: 911 - job_id: jogkzqnvg + job_id: jpy13q44p job_status: Passed torchscript_onnx_qnn: - inference_time: 783882.0 - throughput: 1.2757022102816495 + inference_time: 695416.0 + throughput: 1.4379881969928792 estimated_peak_memory_range: - min: 1531904 - max: 2961680 + min: 872448 + max: 2680560 primary_compute_unit: NPU precision: fp16 layer_info: @@ -225,22 +238,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1329 - job_id: jqp4qdo2g + job_id: jp4lrmjl5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:28:27Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:18:07Z' - torchscript_onnx_tflite: - inference_time: 689203.0 - throughput: 1.4509513162304866 + inference_time: 674483.0 + throughput: 1.4826170563231393 estimated_peak_memory_range: - min: 97169408 - max: 507461688 + min: 115093504 + max: 498595456 primary_compute_unit: GPU precision: fp16 layer_info: @@ -248,14 +261,14 @@ models: layers_on_gpu: 900 layers_on_cpu: 11 total_layers: 911 - job_id: j1gln2z2p + job_id: jprv3o4eg job_status: Passed torchscript_onnx_qnn: - inference_time: 788680.0 - throughput: 1.2679413703910332 + inference_time: 691155.0 + throughput: 1.446853455447765 estimated_peak_memory_range: - min: 1351680 - max: 2577264 + min: 6815744 + max: 8259624 primary_compute_unit: NPU precision: fp16 layer_info: @@ -263,22 +276,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1329 - job_id: jo5mr627g + job_id: jgdx18rlp job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:28:28Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:18:04Z' - torchscript_onnx_tflite: - inference_time: 697737.0 - throughput: 1.4332047748650278 + inference_time: 986018.0 + throughput: 1.014180268514368 estimated_peak_memory_range: - min: 114737152 - max: 499696096 + min: 114905088 + max: 216859760 primary_compute_unit: GPU precision: fp16 layer_info: @@ -286,14 +299,37 @@ models: layers_on_gpu: 900 layers_on_cpu: 11 total_layers: 911 - job_id: j1p3k13m5 + job_id: j5mnxowqp + job_status: Passed + reference_device_info: + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:18:13Z' + - torchscript_onnx_tflite: + inference_time: 536818.0 + throughput: 1.8628287427023684 + estimated_peak_memory_range: + min: 114487296 + max: 143590224 + primary_compute_unit: GPU + precision: fp16 + layer_info: + layers_on_npu: 0 + layers_on_gpu: 900 + layers_on_cpu: 11 + total_layers: 911 + job_id: j56y4r37p job_status: Passed torchscript_onnx_qnn: - inference_time: 804200.0 - throughput: 1.2434717731907485 + inference_time: 551075.0 + throughput: 1.8146350315292836 estimated_peak_memory_range: - min: 589824 - max: 1792216 + min: 0 + max: 953714048 primary_compute_unit: NPU precision: fp16 layer_info: @@ -301,19 +337,34 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1329 - job_id: joprk2qk5 + job_id: jp8qy6w8p + job_status: Passed + torchscript_onnx: + inference_time: 635179.0 + throughput: 1.5743593538199467 + estimated_peak_memory_range: + min: 123006976 + max: 2907786160 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 884 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 884 + job_id: j5we68mj5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) - os: '13' - form_factor: Auto + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:28:31Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:18:24Z' - torchscript_onnx_qnn: - inference_time: 641243.0 - throughput: 1.5594712145005871 + inference_time: 526589.0 + throughput: 1.8990142217175063 estimated_peak_memory_range: min: 483328 max: 483328 @@ -324,14 +375,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 1329 - job_id: jmg9vy1m5 + job_id: jgz3d8x65 job_status: Passed torchscript_onnx: - inference_time: 1647041.0 - throughput: 0.6071494273670176 + inference_time: 1357587.0 + throughput: 0.7366010428797565 estimated_peak_memory_range: - min: 470417408 - max: 470417408 + min: 470306816 + max: 470306816 primary_compute_unit: NPU precision: fp16 layer_info: @@ -339,7 +390,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 884 - job_id: jw566zln5 + job_id: jgo26oedp job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -348,15 +399,15 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:28:39Z' + timestamp: '2024-10-14T23:18:21Z' - name: WhisperDecoder performance_metrics: - torchscript_onnx_tflite: - inference_time: 25831.0 - throughput: 38.7131740931439 + inference_time: 26563.0 + throughput: 37.64635018634943 estimated_peak_memory_range: - min: 16760832 - max: 20516296 + min: 16764928 + max: 19759256 primary_compute_unit: NPU precision: fp16 layer_info: @@ -364,14 +415,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 2573 - job_id: jo5mr6r7g + job_id: jp14z78lp job_status: Passed torchscript_onnx_qnn: - inference_time: 12110.0 - throughput: 82.57638315441784 + inference_time: 11991.0 + throughput: 83.39588024351598 estimated_peak_memory_range: - min: 58085376 - max: 126345112 + min: 63504384 + max: 132995256 primary_compute_unit: NPU precision: fp16 layer_info: @@ -379,7 +430,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 2255 - job_id: j7gjx2m1p + job_id: jpv6ke1m5 + job_status: Passed + torchscript_onnx: + inference_time: 56190.0 + throughput: 17.796760989499912 + estimated_peak_memory_range: + min: 127217664 + max: 129732616 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 2302 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 2302 + job_id: jglvmoel5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -388,13 +454,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:28:19Z' + timestamp: '2024-10-14T23:18:17Z' - torchscript_onnx_tflite: - inference_time: 19573.0 - throughput: 51.090788330863944 + inference_time: 21257.0 + throughput: 47.043326904078654 estimated_peak_memory_range: - min: 13574144 - max: 1162869504 + min: 16773120 + max: 1182597392 primary_compute_unit: NPU precision: fp16 layer_info: @@ -402,14 +468,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 2573 - job_id: joprk2kk5 + job_id: j57yrkjr5 job_status: Passed torchscript_onnx_qnn: - inference_time: 9345.0 - throughput: 107.00909577314071 + inference_time: 9770.0 + throughput: 102.35414534288638 estimated_peak_memory_range: - min: 57933824 - max: 144391216 + min: 41037824 + max: 140272512 primary_compute_unit: NPU precision: fp16 layer_info: @@ -417,14 +483,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 2255 - job_id: jygzej94g + job_id: jpedm8r05 job_status: Passed torchscript_onnx: - inference_time: 49160.0 - throughput: 20.34174125305126 + inference_time: 47118.0 + throughput: 21.223311685555416 estimated_peak_memory_range: - min: 90808320 - max: 1527605184 + min: 120029184 + max: 1668877984 primary_compute_unit: NPU precision: fp16 layer_info: @@ -432,7 +498,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 2302 - job_id: j1gln2r2p + job_id: jp3j0xqzg job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -441,13 +507,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:28:37Z' + timestamp: '2024-10-14T23:18:19Z' - torchscript_onnx_tflite: - inference_time: 25314.0 - throughput: 39.50383187169155 + inference_time: 24890.0 + throughput: 40.17677782241864 estimated_peak_memory_range: - min: 16809984 - max: 19070928 + min: 16441344 + max: 19909576 primary_compute_unit: NPU precision: fp16 layer_info: @@ -455,14 +521,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 2573 - job_id: jqpyeje0g + job_id: jpxko3795 job_status: Passed torchscript_onnx_qnn: - inference_time: 11967.0 - throughput: 83.56313194618534 + inference_time: 12378.0 + throughput: 80.78849571820973 estimated_peak_memory_range: - min: 65486848 - max: 66707888 + min: 63721472 + max: 65005912 primary_compute_unit: NPU precision: fp16 layer_info: @@ -470,7 +536,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 2255 - job_id: jz57zlwnp + job_id: jp14z7dlp job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -478,14 +544,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:28:25Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:18:02Z' - torchscript_onnx_tflite: - inference_time: 28257.0 - throughput: 35.38946101850869 + inference_time: 25507.0 + throughput: 39.20492413847179 estimated_peak_memory_range: - min: 16764928 - max: 1146454064 + min: 14761984 + max: 17803144 primary_compute_unit: NPU precision: fp16 layer_info: @@ -493,14 +559,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 2573 - job_id: j1p8om9qg + job_id: jgkexolog job_status: Passed torchscript_onnx_qnn: - inference_time: 15726.0 - throughput: 63.588960956377974 + inference_time: 12827.0 + throughput: 77.96055196070787 estimated_peak_memory_range: - min: 56090624 - max: 153402384 + min: 63680512 + max: 69340872 primary_compute_unit: NPU precision: fp16 layer_info: @@ -508,22 +574,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 2255 - job_id: j2p0y270g + job_id: jgn6vorm5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:28:33Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:18:09Z' - torchscript_onnx_tflite: - inference_time: 25687.0 - throughput: 38.930198154708606 + inference_time: 24833.0 + throughput: 40.26899689928724 estimated_peak_memory_range: - min: 16797696 - max: 19702104 + min: 16093184 + max: 19219080 primary_compute_unit: NPU precision: fp16 layer_info: @@ -531,14 +597,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 2573 - job_id: jn5q8rke5 + job_id: jp0z0d1e5 job_status: Passed torchscript_onnx_qnn: - inference_time: 12374.0 - throughput: 80.81461128171973 + inference_time: 12718.0 + throughput: 78.62871520679352 estimated_peak_memory_range: - min: 63692800 - max: 64958336 + min: 63713280 + max: 69556520 primary_compute_unit: NPU precision: fp16 layer_info: @@ -546,22 +612,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 2255 - job_id: j0pxv6j8g + job_id: jpxko3e95 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:28:27Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:18:07Z' - torchscript_onnx_tflite: - inference_time: 25128.0 - throughput: 39.79624323463865 + inference_time: 25027.0 + throughput: 39.95684660566588 estimated_peak_memory_range: - min: 16789504 - max: 19879008 + min: 16818176 + max: 19555432 primary_compute_unit: NPU precision: fp16 layer_info: @@ -569,14 +635,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 2573 - job_id: jw566zjn5 + job_id: jp2ky47mp job_status: Passed torchscript_onnx_qnn: - inference_time: 12549.0 - throughput: 79.6876245119133 + inference_time: 12546.0 + throughput: 79.70667941973538 estimated_peak_memory_range: - min: 63705088 - max: 65038120 + min: 63721472 + max: 65071592 primary_compute_unit: NPU precision: fp16 layer_info: @@ -584,22 +650,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 2255 - job_id: jegn2myjg + job_id: j57yrkvr5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:28:29Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:18:05Z' - torchscript_onnx_tflite: - inference_time: 25007.0 - throughput: 39.98880313512217 + inference_time: 27823.0 + throughput: 35.94148725874277 estimated_peak_memory_range: - min: 16793600 - max: 19730264 + min: 16879616 + max: 1157599856 primary_compute_unit: NPU precision: fp16 layer_info: @@ -607,14 +673,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 2573 - job_id: jwgoyn015 + job_id: jgn6vo9m5 job_status: Passed torchscript_onnx_qnn: - inference_time: 12702.0 - throughput: 78.72775940796726 + inference_time: 14222.0 + throughput: 70.3135986499789 estimated_peak_memory_range: - min: 63696896 - max: 65109016 + min: 59482112 + max: 167489648 primary_compute_unit: NPU precision: fp16 layer_info: @@ -622,22 +688,60 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 2255 - job_id: jep28966p + job_id: jp0z0dee5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:28:31Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:18:13Z' + - torchscript_onnx_tflite: + inference_time: 15389.0 + throughput: 64.98148027812074 + estimated_peak_memory_range: + min: 15761408 + max: 275112272 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 2573 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 2573 + job_id: jp3j0x4zg + job_status: Passed + torchscript_onnx_qnn: + inference_time: 7560.0 + throughput: 132.27513227513228 + estimated_peak_memory_range: + min: 63893504 + max: 204320480 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 2255 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 2255 + job_id: jgkexorog + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:18:25Z' - torchscript_onnx_qnn: - inference_time: 10421.0 - throughput: 95.96008060646771 + inference_time: 10849.0 + throughput: 92.17439395335975 estimated_peak_memory_range: - min: 63696896 - max: 63696896 + min: 63692800 + max: 63692800 primary_compute_unit: NPU precision: fp16 layer_info: @@ -645,14 +749,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 2255 - job_id: jnp10wln5 + job_id: j5we68dj5 job_status: Passed torchscript_onnx: - inference_time: 52241.0 - throughput: 19.142053176623726 + inference_time: 49274.0 + throughput: 20.29467873523562 estimated_peak_memory_range: - min: 242700288 - max: 242700288 + min: 243027968 + max: 243027968 primary_compute_unit: NPU precision: fp16 layer_info: @@ -660,7 +764,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 2302 - job_id: j1p3k12m5 + job_id: jpv6kezm5 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -669,4 +773,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:28:39Z' + timestamp: '2024-10-14T23:18:21Z' diff --git a/qai_hub_models/models/whisper_tiny_en/README.md b/qai_hub_models/models/whisper_tiny_en/README.md index e13d6b04..7de45b00 100644 --- a/qai_hub_models/models/whisper_tiny_en/README.md +++ b/qai_hub_models/models/whisper_tiny_en/README.md @@ -6,7 +6,7 @@ OpenAI’s Whisper ASR (Automatic Speech Recognition) model is a state-of-the-art system designed for transcribing spoken language into written text. It exhibits robust performance in realistic, noisy environments, making it highly reliable for real-world applications. Specifically, it excels in long-form transcription, capable of accurately transcribing audio clips up to 30 seconds long. Time to the first token is the encoder's latency, while time to each additional token is decoder's latency, where we assume a mean decoded length specified below. This is based on the implementation of Whisper-Tiny-En found -[here](https://github.com/openai/whisper/tree/main). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/whisper_tiny_en). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.whisper_tiny_en.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Whisper-Tiny-En can be found +* The license for the original implementation of Whisper-Tiny-En can be found [here](https://github.com/openai/whisper/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Robust Speech Recognition via Large-Scale Weak Supervision](https://cdn.openai.com/papers/whisper.pdf) * [Source Model Implementation](https://github.com/openai/whisper/tree/main) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/whisper_tiny_en/export.py b/qai_hub_models/models/whisper_tiny_en/export.py index fa76ff82..b6accf36 100644 --- a/qai_hub_models/models/whisper_tiny_en/export.py +++ b/qai_hub_models/models/whisper_tiny_en/export.py @@ -10,14 +10,15 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Mapping, Optional, Tuple, cast +from typing import Any, Dict, List, Mapping, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.whisper_tiny_en import Model from qai_hub_models.utils.args import export_parser, get_model_kwargs -from qai_hub_models.utils.base_model import BaseModel, TargetRuntime +from qai_hub_models.utils.base_model import BaseModel from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -45,20 +46,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Mapping[ - str, Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] -] | List[str]: +) -> Mapping[str, ExportResult] | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -83,10 +82,10 @@ def export_model( `model_cls.from_pretrained` Returns: - A Mapping from component_name to a 3-tuple of: + A Mapping from component_name to a struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "whisper_tiny_en" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -118,7 +117,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) components_dict: Dict[str, BaseModel] = {} if "WhisperEncoder" in components: @@ -135,7 +134,7 @@ def export_model( component.to("cpu"), make_torch_inputs(input_spec) ) - # 2. Compile the models to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = component.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -151,7 +150,7 @@ def export_model( hub.client.CompileJob, submitted_compile_job ) - # 3. Profile the model assets on real devices + # 3. Profiles the model performance on a real device profile_jobs: Dict[str, hub.client.ProfileJob] = {} if not skip_profiling: for component_name in components: @@ -169,7 +168,7 @@ def export_model( hub.client.ProfileJob, submitted_profile_job ) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_jobs: Dict[str, hub.client.InferenceJob] = {} if not skip_inferencing: for component_name in components: @@ -193,14 +192,14 @@ def export_model( hub.client.InferenceJob, submitted_inference_job ) - # 5. Download the model assets to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) for component_name, compile_job in compile_jobs.items(): target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / component_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: for component_name in components: profile_job = profile_jobs[component_name] @@ -225,10 +224,10 @@ def export_model( ) return { - component_name: ( - compile_jobs[component_name], - profile_jobs.get(component_name, None), - inference_jobs.get(component_name, None), + component_name: ExportResult( + compile_job=compile_jobs[component_name], + inference_job=inference_jobs.get(component_name, None), + profile_job=profile_jobs.get(component_name, None), ) for component_name in components } @@ -236,7 +235,9 @@ def export_model( def main(): warnings.filterwarnings("ignore") - parser = export_parser(model_cls=Model, components=ALL_COMPONENTS) + parser = export_parser( + model_cls=Model, components=ALL_COMPONENTS, supports_onnx=False + ) args = parser.parse_args() export_model(**vars(args)) diff --git a/qai_hub_models/models/whisper_tiny_en/perf.yaml b/qai_hub_models/models/whisper_tiny_en/perf.yaml index 43570e96..b3f4556c 100644 --- a/qai_hub_models/models/whisper_tiny_en/perf.yaml +++ b/qai_hub_models/models/whisper_tiny_en/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: WhisperEncoder performance_metrics: - torchscript_onnx_tflite: - inference_time: 98932.0 - throughput: 10.107952937371124 + inference_time: 103909.0 + throughput: 9.62380544514912 estimated_peak_memory_range: - min: 14483456 - max: 56238536 + min: 20807680 + max: 91655600 primary_compute_unit: GPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 260 layers_on_cpu: 11 total_layers: 271 - job_id: jep2891qp + job_id: jgjvno27g job_status: Passed torchscript_onnx_qnn: - inference_time: 185163.0 - throughput: 5.400646997510302 + inference_time: 135518.0 + throughput: 7.3790935521480545 estimated_peak_memory_range: - min: 20480 - max: 52464640 + min: 16384 + max: 56853496 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,7 +71,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 313 - job_id: jz5wo3jmp + job_id: j5q6qz37p job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -82,13 +80,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:26:38Z' + timestamp: '2024-10-14T23:16:00Z' - torchscript_onnx_tflite: - inference_time: 83641.0 - throughput: 11.955858968687606 + inference_time: 84725.0 + throughput: 11.802891708468575 estimated_peak_memory_range: - min: 22597632 - max: 49880416 + min: 22761472 + max: 52150160 primary_compute_unit: GPU precision: fp16 layer_info: @@ -96,14 +94,14 @@ models: layers_on_gpu: 260 layers_on_cpu: 11 total_layers: 271 - job_id: j2p0y2wng + job_id: jgz3d8jz5 job_status: Passed torchscript_onnx_qnn: - inference_time: 144419.0 - throughput: 6.924296664566297 + inference_time: 113054.0 + throughput: 8.845330550002654 estimated_peak_memory_range: min: 12288 - max: 114966512 + max: 195753504 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,7 +109,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 313 - job_id: jnp10wr75 + job_id: j56y4rnvp job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -120,13 +118,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:26:40Z' + timestamp: '2024-10-14T23:16:02Z' - torchscript_onnx_tflite: - inference_time: 99583.0 - throughput: 10.04187461715353 + inference_time: 98695.0 + throughput: 10.132225543340594 estimated_peak_memory_range: - min: 20795392 - max: 53352520 + min: 20750336 + max: 104191960 primary_compute_unit: GPU precision: fp16 layer_info: @@ -134,14 +132,14 @@ models: layers_on_gpu: 260 layers_on_cpu: 11 total_layers: 271 - job_id: jogkzq1ng + job_id: jg9lnkyqg job_status: Passed torchscript_onnx_qnn: - inference_time: 152977.0 - throughput: 6.536930388228296 + inference_time: 102692.0 + throughput: 9.737856892455108 estimated_peak_memory_range: - min: 737280 - max: 1978448 + min: 274432 + max: 6031896 primary_compute_unit: NPU precision: fp16 layer_info: @@ -149,7 +147,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 313 - job_id: jnp10wrn5 + job_id: jgjvnoe7g job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -157,14 +155,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:26:43Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:16:06Z' - torchscript_onnx_tflite: - inference_time: 137234.0 - throughput: 7.286823964906656 + inference_time: 100209.0 + throughput: 9.979143589897115 estimated_peak_memory_range: - min: 102400 - max: 35425632 + min: 18350080 + max: 65172280 primary_compute_unit: GPU precision: fp16 layer_info: @@ -172,14 +170,14 @@ models: layers_on_gpu: 260 layers_on_cpu: 11 total_layers: 271 - job_id: j1gln2jmp + job_id: jprv3oevg job_status: Passed torchscript_onnx_qnn: - inference_time: 208922.0 - throughput: 4.7864753352925975 + inference_time: 105009.0 + throughput: 9.522993267243761 estimated_peak_memory_range: - min: 114688 - max: 124039776 + min: 241664 + max: 6122504 primary_compute_unit: NPU precision: fp16 layer_info: @@ -187,22 +185,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 313 - job_id: jep28926p + job_id: jgdx18okp job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:26:51Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:16:13Z' - torchscript_onnx_tflite: - inference_time: 97343.0 - throughput: 10.272952343774078 + inference_time: 105883.0 + throughput: 9.44438672874777 estimated_peak_memory_range: - min: 16031744 - max: 164484336 + min: 6832128 + max: 53498920 primary_compute_unit: GPU precision: fp16 layer_info: @@ -210,14 +208,14 @@ models: layers_on_gpu: 260 layers_on_cpu: 11 total_layers: 271 - job_id: j1p3k1yn5 + job_id: j5mnxo6yp job_status: Passed torchscript_onnx_qnn: - inference_time: 155393.0 - throughput: 6.435296313218742 + inference_time: 103640.0 + throughput: 9.648784253184099 estimated_peak_memory_range: - min: 94208 - max: 4624184 + min: 704512 + max: 2003848 primary_compute_unit: NPU precision: fp16 layer_info: @@ -225,22 +223,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 313 - job_id: jz57zlqnp + job_id: jg9lnkwqg job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:26:45Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:16:11Z' - torchscript_onnx_tflite: - inference_time: 96250.0 - throughput: 10.38961038961039 + inference_time: 101593.0 + throughput: 9.843197858120146 estimated_peak_memory_range: - min: 14434304 - max: 60693616 + min: 20385792 + max: 119246136 primary_compute_unit: GPU precision: fp16 layer_info: @@ -248,14 +246,14 @@ models: layers_on_gpu: 260 layers_on_cpu: 11 total_layers: 271 - job_id: j1pv3rjr5 + job_id: jp4lrmdq5 job_status: Passed torchscript_onnx_qnn: - inference_time: 157869.0 - throughput: 6.334365834964433 + inference_time: 104738.0 + throughput: 9.547633141744162 estimated_peak_memory_range: - min: 163840 - max: 5613952 + min: 176128 + max: 10930688 primary_compute_unit: NPU precision: fp16 layer_info: @@ -263,22 +261,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 313 - job_id: j0pxv6w8g + job_id: jgz3d8rz5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:26:47Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:16:09Z' - torchscript_onnx_tflite: - inference_time: 112458.0 - throughput: 8.892208646783688 + inference_time: 150469.0 + throughput: 6.6458871927107905 estimated_peak_memory_range: - min: 6311936 - max: 59157856 + min: 20463616 + max: 55366064 primary_compute_unit: GPU precision: fp16 layer_info: @@ -286,14 +284,14 @@ models: layers_on_gpu: 260 layers_on_cpu: 11 total_layers: 271 - job_id: jlpe9wjvg + job_id: jgdx18qkp job_status: Passed torchscript_onnx_qnn: - inference_time: 154185.0 - throughput: 6.48571521224503 + inference_time: 180709.0 + throughput: 5.533758694918349 estimated_peak_memory_range: - min: 229376 - max: 6149440 + min: 106496 + max: 204613008 primary_compute_unit: NPU precision: fp16 layer_info: @@ -301,22 +299,60 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 313 - job_id: jegn2mjjg + job_id: j5mnxo3yp job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:26:49Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:16:17Z' + - torchscript_onnx_tflite: + inference_time: 77729.0 + throughput: 12.86521118244156 + estimated_peak_memory_range: + min: 21049344 + max: 41626208 + primary_compute_unit: GPU + precision: fp16 + layer_info: + layers_on_npu: 0 + layers_on_gpu: 260 + layers_on_cpu: 11 + total_layers: 271 + job_id: jp8qy6zzp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 100989.0 + throughput: 9.902068542118448 + estimated_peak_memory_range: + min: 0 + max: 204384016 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 313 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 313 + job_id: jprv3oyvg + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:16:18Z' - torchscript_onnx_qnn: - inference_time: 148682.0 - throughput: 6.725763710469324 + inference_time: 95580.0 + throughput: 10.462439840970914 estimated_peak_memory_range: - min: 491520 - max: 491520 + min: 520192 + max: 520192 primary_compute_unit: NPU precision: fp16 layer_info: @@ -324,7 +360,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 313 - job_id: jz5wo3j4p + job_id: jgo26o34p job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -333,15 +369,15 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:26:41Z' + timestamp: '2024-10-14T23:16:05Z' - name: WhisperDecoder performance_metrics: - torchscript_onnx_tflite: - inference_time: 3793.0 - throughput: 263.6435539151068 + inference_time: 3760.0 + throughput: 265.9574468085106 estimated_peak_memory_range: - min: 7077888 - max: 9559432 + min: 2981888 + max: 5642752 primary_compute_unit: NPU precision: fp16 layer_info: @@ -349,14 +385,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 557 - job_id: jqpyejllg + job_id: jpedm8w75 job_status: Passed torchscript_onnx_qnn: - inference_time: 2387.0 - throughput: 418.93590280687056 + inference_time: 2356.0 + throughput: 424.44821731748726 estimated_peak_memory_range: - min: 16384 - max: 148387760 + min: 2781184 + max: 139470032 primary_compute_unit: NPU precision: fp16 layer_info: @@ -364,22 +400,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 447 - job_id: jmg9vy685 - job_status: Passed - torchscript_onnx: - inference_time: 5347.0 - throughput: 187.02075930428276 - estimated_peak_memory_range: - min: 36864 - max: 79008392 - primary_compute_unit: NPU - precision: fp16 - layer_info: - layers_on_npu: 462 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 462 - job_id: j1p8omoqg + job_id: jglvmo3e5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -388,13 +409,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:26:53Z' + timestamp: '2024-10-14T23:16:01Z' - torchscript_onnx_tflite: inference_time: 2891.0 throughput: 345.9010722933241 estimated_peak_memory_range: - min: 184320 - max: 227930880 + min: 2994176 + max: 231538704 primary_compute_unit: NPU precision: fp16 layer_info: @@ -402,14 +423,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 557 - job_id: j1p8omnog + job_id: j5we683z5 job_status: Passed torchscript_onnx_qnn: - inference_time: 1717.0 - throughput: 582.4111822947001 + inference_time: 1618.0 + throughput: 618.0469715698393 estimated_peak_memory_range: - min: 4624384 - max: 26259952 + min: 4628480 + max: 28286880 primary_compute_unit: NPU precision: fp16 layer_info: @@ -417,22 +438,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 447 - job_id: jvgdwqjz5 - job_status: Passed - torchscript_onnx: - inference_time: 4145.0 - throughput: 241.25452352231605 - estimated_peak_memory_range: - min: 995328 - max: 401520432 - primary_compute_unit: NPU - precision: fp16 - layer_info: - layers_on_npu: 462 - layers_on_gpu: 0 - layers_on_cpu: 0 - total_layers: 462 - job_id: jn5q8r8e5 + job_id: jp3j0xexg job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -441,13 +447,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:26:54Z' + timestamp: '2024-10-14T23:16:03Z' - torchscript_onnx_tflite: - inference_time: 4198.0 - throughput: 238.20867079561697 + inference_time: 3718.0 + throughput: 268.9618074233459 estimated_peak_memory_range: min: 2985984 - max: 5534288 + max: 5145512 primary_compute_unit: NPU precision: fp16 layer_info: @@ -455,14 +461,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 557 - job_id: jn5q8rno5 + job_id: jp14z7wkp job_status: Passed torchscript_onnx_qnn: - inference_time: 2228.0 - throughput: 448.8330341113106 + inference_time: 2284.0 + throughput: 437.82837127845886 estimated_peak_memory_range: - min: 10661888 - max: 12492272 + min: 10674176 + max: 11965008 primary_compute_unit: NPU precision: fp16 layer_info: @@ -470,7 +476,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 447 - job_id: jvgdwqj65 + job_id: jpedm8k75 job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -478,14 +484,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:26:44Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:16:07Z' - torchscript_onnx_tflite: - inference_time: 4213.0 - throughput: 237.36055067647757 + inference_time: 3651.0 + throughput: 273.8975623116954 estimated_peak_memory_range: - min: 2973696 - max: 226398544 + min: 2977792 + max: 5085032 primary_compute_unit: NPU precision: fp16 layer_info: @@ -493,14 +499,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 557 - job_id: jw566zky5 + job_id: jp2ky4lxp job_status: Passed torchscript_onnx_qnn: - inference_time: 2639.0 - throughput: 378.931413414172 + inference_time: 2233.0 + throughput: 447.82803403493057 estimated_peak_memory_range: - min: 7266304 - max: 29957504 + min: 10657792 + max: 12672312 primary_compute_unit: NPU precision: fp16 layer_info: @@ -508,22 +514,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 447 - job_id: jqpyej90g + job_id: j57yrkxq5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:26:51Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:16:13Z' - torchscript_onnx_tflite: - inference_time: 3777.0 - throughput: 264.76039184537996 + inference_time: 3786.0 + throughput: 264.1310089804543 estimated_peak_memory_range: - min: 2977792 - max: 4900008 + min: 2981888 + max: 4973504 primary_compute_unit: NPU precision: fp16 layer_info: @@ -531,14 +537,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 557 - job_id: jwgoynjk5 + job_id: jgn6vo3v5 job_status: Passed torchscript_onnx_qnn: - inference_time: 2162.0 - throughput: 462.53469010175763 + inference_time: 2226.0 + throughput: 449.23629829290206 estimated_peak_memory_range: - min: 2265088 - max: 3560360 + min: 10694656 + max: 11899816 primary_compute_unit: NPU precision: fp16 layer_info: @@ -546,22 +552,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 447 - job_id: jqp4qdz2g + job_id: jp14z7ekp job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:26:45Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:16:11Z' - torchscript_onnx_tflite: - inference_time: 3795.0 - throughput: 263.5046113306983 + inference_time: 3644.0 + throughput: 274.423710208562 estimated_peak_memory_range: - min: 2998272 - max: 8673408 + min: 2981888 + max: 5035200 primary_compute_unit: NPU precision: fp16 layer_info: @@ -569,14 +575,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 557 - job_id: j7gjx2jep + job_id: jpxko36j5 job_status: Passed torchscript_onnx_qnn: - inference_time: 2170.0 - throughput: 460.8294930875576 + inference_time: 2297.0 + throughput: 435.35045711798 estimated_peak_memory_range: - min: 10661888 - max: 12549080 + min: 10674176 + max: 12067440 primary_compute_unit: NPU precision: fp16 layer_info: @@ -584,22 +590,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 447 - job_id: jo5mr6j7g + job_id: j5we68qz5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:26:47Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:16:09Z' - torchscript_onnx_tflite: - inference_time: 3796.0 - throughput: 263.43519494204423 + inference_time: 4266.0 + throughput: 234.4116268166901 estimated_peak_memory_range: - min: 2994176 - max: 5048120 + min: 2973696 + max: 228556224 primary_compute_unit: NPU precision: fp16 layer_info: @@ -607,14 +613,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 557 - job_id: jygzej1xg + job_id: j57yrklq5 job_status: Passed torchscript_onnx_qnn: - inference_time: 2193.0 - throughput: 455.99635202918375 + inference_time: 2741.0 + throughput: 364.8303538854433 estimated_peak_memory_range: - min: 4648960 - max: 5961392 + min: 10637312 + max: 37286256 primary_compute_unit: NPU precision: fp16 layer_info: @@ -622,22 +628,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 447 - job_id: joprk2zk5 + job_id: jgn6voev5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:26:49Z' - - torchscript_onnx_qnn: - inference_time: 2061.0 - throughput: 485.201358563804 + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:16:17Z' + - torchscript_onnx_tflite: + inference_time: 2429.0 + throughput: 411.6920543433512 estimated_peak_memory_range: - min: 10629120 - max: 10629120 + min: 1028096 + max: 32313024 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 557 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 557 + job_id: jgkexo3yg + job_status: Passed + torchscript_onnx_qnn: + inference_time: 1394.0 + throughput: 717.3601147776184 + estimated_peak_memory_range: + min: 10620928 + max: 35570128 primary_compute_unit: NPU precision: fp16 layer_info: @@ -645,22 +666,30 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 447 - job_id: jmg9vy6m5 + job_id: jp2ky4mxp job_status: Passed - torchscript_onnx: - inference_time: 4503.0 - throughput: 222.0741727737064 + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:16:19Z' + - torchscript_onnx_qnn: + inference_time: 2056.0 + throughput: 486.38132295719845 estimated_peak_memory_range: - min: 77918208 - max: 77918208 + min: 10629120 + max: 10629120 primary_compute_unit: NPU precision: fp16 layer_info: - layers_on_npu: 462 + layers_on_npu: 447 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 462 - job_id: jw566z6n5 + total_layers: 447 + job_id: jpv6kev75 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -669,4 +698,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:26:56Z' + timestamp: '2024-10-14T23:16:05Z' diff --git a/qai_hub_models/models/wideresnet50/README.md b/qai_hub_models/models/wideresnet50/README.md index 4e152e5b..30dcddb5 100644 --- a/qai_hub_models/models/wideresnet50/README.md +++ b/qai_hub_models/models/wideresnet50/README.md @@ -6,7 +6,7 @@ WideResNet50 is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of WideResNet50 found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/wideresnet50). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.wideresnet50.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of WideResNet50 can be found +* The license for the original implementation of WideResNet50 can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Wide Residual Networks](https://arxiv.org/abs/1605.07146) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/wideresnet50/export.py b/qai_hub_models/models/wideresnet50/export.py index 16cb1e89..368a28e6 100644 --- a/qai_hub_models/models/wideresnet50/export.py +++ b/qai_hub_models/models/wideresnet50/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.wideresnet50 import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "wideresnet50" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/wideresnet50/perf.yaml b/qai_hub_models/models/wideresnet50/perf.yaml index 706e9b87..46060da9 100644 --- a/qai_hub_models/models/wideresnet50/perf.yaml +++ b/qai_hub_models/models/wideresnet50/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: WideResNet50 performance_metrics: - torchscript_onnx_tflite: - inference_time: 4868.0 - throughput: 205.42317173377157 + inference_time: 4887.0 + throughput: 204.62451401677922 estimated_peak_memory_range: - min: 28672 - max: 2161232 + min: 24576 + max: 2241528 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 79 - job_id: j2p0y28ng + job_id: j56y4revp job_status: Passed torchscript_onnx_qnn: - inference_time: 5710.0 - throughput: 175.13134851138355 + inference_time: 5677.0 + throughput: 176.14937466971992 estimated_peak_memory_range: - min: 618496 - max: 216999120 + min: 622592 + max: 362802736 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: jwgoynxk5 + job_id: jp14z7ykp job_status: Passed torchscript_onnx: - inference_time: 5515.0 - throughput: 181.32366273798732 + inference_time: 5217.0 + throughput: 191.68104274487254 estimated_peak_memory_range: - min: 16384 - max: 169289768 + min: 634880 + max: 2575704 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 128 - job_id: jvgdwqkz5 + job_id: jp0z0d225 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:25:55Z' + timestamp: '2024-10-14T23:15:13Z' - torchscript_onnx_tflite: - inference_time: 4001.0 - throughput: 249.93751562109472 + inference_time: 3989.0 + throughput: 250.68939583855604 estimated_peak_memory_range: - min: 16384 - max: 104623104 + min: 12288 + max: 105887760 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 79 - job_id: j1p8omdog + job_id: jp3j0xvxg job_status: Passed torchscript_onnx_qnn: - inference_time: 4546.0 - throughput: 219.9736031676199 + inference_time: 4603.0 + throughput: 217.24961981316534 estimated_peak_memory_range: - min: 647168 - max: 27068912 + min: 618496 + max: 28515632 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: j1pv3r8r5 + job_id: jgdx18ekp job_status: Passed torchscript_onnx: - inference_time: 4538.0 - throughput: 220.36139268400177 + inference_time: 4293.0 + throughput: 232.93733985557884 estimated_peak_memory_range: - min: 638976 - max: 106582320 + min: 0 + max: 111389648 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 128 - job_id: jz57zlm9p + job_id: jp8qy6mzp job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:25:56Z' + timestamp: '2024-10-14T23:15:14Z' - torchscript_onnx_tflite: - inference_time: 4855.0 - throughput: 205.97322348094747 + inference_time: 4847.0 + throughput: 206.31318341242005 estimated_peak_memory_range: - min: 24576 - max: 19454792 + min: 16384 + max: 2006712 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 79 - job_id: jogkzqwng + job_id: jgo26ok4p job_status: Passed torchscript_onnx_qnn: - inference_time: 4904.0 - throughput: 203.9151712887439 + inference_time: 5026.0 + throughput: 198.96538002387584 estimated_peak_memory_range: min: 634880 - max: 2287808 + max: 1895920 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: jlpe9wqvg + job_id: jp4lrmkq5 job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:25:50Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:15:06Z' - torchscript_onnx_tflite: - inference_time: 7110.0 - throughput: 140.64697609001408 + inference_time: 4872.0 + throughput: 205.2545155993432 estimated_peak_memory_range: - min: 32768 - max: 94466032 + min: 28672 + max: 2231992 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 79 - job_id: jn5q8rxo5 + job_id: jgz3d8oz5 job_status: Passed torchscript_onnx_qnn: - inference_time: 7278.0 - throughput: 137.40038472107722 + inference_time: 5033.0 + throughput: 198.68865487780647 estimated_peak_memory_range: - min: 618496 - max: 22598112 + min: 659456 + max: 1918008 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: jnp10w975 + job_id: jgn6vomv5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:25:55Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:15:09Z' - torchscript_onnx_tflite: - inference_time: 4870.0 - throughput: 205.3388090349076 + inference_time: 4860.0 + throughput: 205.76131687242798 estimated_peak_memory_range: - min: 36864 - max: 2265576 + min: 16384 + max: 2013600 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 79 - job_id: j1gln2dmp + job_id: jpedm8e75 job_status: Passed torchscript_onnx_qnn: - inference_time: 4903.0 - throughput: 203.95676116663267 + inference_time: 5031.0 + throughput: 198.76764062810574 estimated_peak_memory_range: - min: 626688 - max: 1802184 + min: 655360 + max: 2032392 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: jygzej6xg + job_id: j5mnxoqyp job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:25:51Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:15:08Z' - torchscript_onnx_tflite: - inference_time: 4870.0 - throughput: 205.3388090349076 + inference_time: 4878.0 + throughput: 205.0020500205002 estimated_peak_memory_range: - min: 16384 - max: 24783952 + min: 28672 + max: 1515456 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 79 - job_id: jw566zxy5 + job_id: jgjvnoz7g job_status: Passed torchscript_onnx_qnn: - inference_time: 4915.0 - throughput: 203.4587995930824 + inference_time: 5029.0 + throughput: 198.8466892026248 estimated_peak_memory_range: - min: 327680 - max: 1578048 + min: 643072 + max: 2388064 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: jz5wo3kmp + job_id: jpxko3nj5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:25:52Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:15:07Z' - torchscript_onnx_tflite: - inference_time: 4855.0 - throughput: 205.97322348094747 + inference_time: 7138.0 + throughput: 140.09526478005043 estimated_peak_memory_range: - min: 16384 - max: 2315176 + min: 24576 + max: 94402656 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 79 - job_id: j1p3k1dn5 + job_id: jpv6ke075 job_status: Passed torchscript_onnx_qnn: - inference_time: 4847.0 - throughput: 206.31318341242005 + inference_time: 7222.0 + throughput: 138.46579894765992 estimated_peak_memory_range: - min: 634880 - max: 1843304 + min: 638976 + max: 25920800 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: jmg9vyr85 + job_id: jp2ky49xp job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:15:11Z' + - torchscript_onnx_tflite: + inference_time: 3063.0 + throughput: 326.47730982696703 + estimated_peak_memory_range: + min: 12288 + max: 33038080 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 79 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 79 + job_id: jg9lnkjqg + job_status: Passed + torchscript_onnx_qnn: + inference_time: 4069.0 + throughput: 245.7606291472106 + estimated_peak_memory_range: + min: 0 + max: 26973264 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 126 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 126 + job_id: jpy13qjrp + job_status: Passed + torchscript_onnx: + inference_time: 3520.0 + throughput: 284.09090909090907 + estimated_peak_memory_range: + min: 0 + max: 38708432 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 128 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 128 + job_id: jglvmo2e5 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:25:53Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:15:17Z' - torchscript_onnx_qnn: - inference_time: 4696.0 - throughput: 212.94718909710392 + inference_time: 4938.0 + throughput: 202.5111381125962 estimated_peak_memory_range: min: 602112 max: 602112 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 126 - job_id: j7gjx29ep + job_id: j57yrk0q5 job_status: Passed torchscript_onnx: - inference_time: 5118.0 - throughput: 195.38882375928097 + inference_time: 4711.0 + throughput: 212.26915729144557 estimated_peak_memory_range: - min: 139382784 - max: 139382784 + min: 139501568 + max: 139501568 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 128 - job_id: jqp4qd71g + job_id: jgkexoqyg job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:25:57Z' + timestamp: '2024-10-14T23:15:15Z' diff --git a/qai_hub_models/models/wideresnet50_quantized/README.md b/qai_hub_models/models/wideresnet50_quantized/README.md index 1f6f16ad..a64ee173 100644 --- a/qai_hub_models/models/wideresnet50_quantized/README.md +++ b/qai_hub_models/models/wideresnet50_quantized/README.md @@ -6,7 +6,7 @@ WideResNet50 is a machine learning model that can classify images from the Imagenet dataset. It can also be used as a backbone in building more complex models for specific use cases. This is based on the implementation of WideResNet50-Quantized found -[here](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/wideresnet50_quantized). @@ -17,11 +17,6 @@ accross various devices, can be found [here](https://aihub.qualcomm.com/models/w ## Example & Usage -Install the package via pip: -```bash -pip install "qai_hub_models[wideresnet50_quantized]" -``` - Once installed, run the following simple CLI demo: @@ -44,15 +39,19 @@ python -m qai_hub_models.models.wideresnet50_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of WideResNet50-Quantized can be found +* The license for the original implementation of WideResNet50-Quantized can be found [here](https://github.com/pytorch/vision/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Wide Residual Networks](https://arxiv.org/abs/1605.07146) * [Source Model Implementation](https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/wideresnet50_quantized/evaluate.py b/qai_hub_models/models/wideresnet50_quantized/evaluate.py index 232037a3..865f4679 100644 --- a/qai_hub_models/models/wideresnet50_quantized/evaluate.py +++ b/qai_hub_models/models/wideresnet50_quantized/evaluate.py @@ -13,10 +13,8 @@ from qai_hub_models.models.wideresnet50_quantized import MODEL_ID, Model from qai_hub_models.utils.args import evaluate_parser, get_hub_device, get_model_kwargs -from qai_hub_models.utils.base_model import BaseModel from qai_hub_models.utils.evaluate import evaluate_on_dataset from qai_hub_models.utils.inference import compile_model_from_args -from qai_hub_models.utils.quantization_aimet import AIMETQuantizableMixin SUPPORTED_DATASETS = ["imagenette", "imagenet"] @@ -27,6 +25,7 @@ def main(): model_cls=Model, default_split_size=2500, supported_datasets=SUPPORTED_DATASETS, + is_hub_quantized=True, ) args = parser.parse_args() args.device = None @@ -38,13 +37,7 @@ def main(): MODEL_ID, args, get_model_kwargs(Model, vars(args)) ) hub_device = get_hub_device(None, args.chipset) - - # Use Fp16 model for torch inference - for cls in Model.__mro__: - if issubclass(cls, BaseModel) and not issubclass(cls, AIMETQuantizableMixin): - torch_cls = cls - break - torch_model = torch_cls.from_pretrained(**get_model_kwargs(torch_cls, vars(args))) + torch_model = Model.from_pretrained(**get_model_kwargs(Model, vars(args))) evaluate_on_dataset( hub_model, torch_model, diff --git a/qai_hub_models/models/wideresnet50_quantized/export.py b/qai_hub_models/models/wideresnet50_quantized/export.py index 7588c26d..47865485 100644 --- a/qai_hub_models/models/wideresnet50_quantized/export.py +++ b/qai_hub_models/models/wideresnet50_quantized/export.py @@ -10,18 +10,20 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.wideresnet50_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference +from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( print_inference_metrics, print_on_target_demo_cmd, @@ -31,11 +33,14 @@ can_access_qualcomm_ai_hub, export_without_hub_access, ) +from qai_hub_models.utils.quantization import get_calibration_data def export_model( device: str = "Samsung Galaxy S23 (Family)", chipset: Optional[str] = None, + num_calibration_samples: int = 100, + skip_compiling: bool = False, skip_profiling: bool = False, skip_inferencing: bool = False, skip_downloading: bool = False, @@ -45,20 +50,19 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + 3. Compiles the model to an asset that can be run on device + 4. Profiles the model performance on a real device + 5. Inferences the model on sample inputs + 6. Downloads the model asset to the local directory + 7. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 5 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -66,6 +70,9 @@ def export_model( Defaults to DEFAULT_DEVICE if not specified. chipset: If set, will choose a random device with this chipset. Overrides the `device` argument. + num_calibration_samples: The number of calibration data samples + to use for quantization. + skip_compiling: If set, skips compiling model to format that can run on device. skip_profiling: If set, skips profiling of compiled model on real devices. skip_inferencing: If set, skips computing on-device outputs from sample data. skip_downloading: If set, skips downloading of compiled model. @@ -80,10 +87,11 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: - * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). + A struct of: + * A CompileJob object containing metadata about the compile job submitted to hub (None if compiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). + * A QuantizeJob object containing metadata about the quantize job submitted to hub """ model_name = "wideresnet50_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,33 +117,52 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) ) # Trace the model - source_model = model.convert_to_hub_source_model( - target_runtime, output_path, input_spec + source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) + + print(f"Quantizing model {model_name} with {num_calibration_samples} samples.") + # 2. Converts the PyTorch model to ONNX and quantizes the ONNX model. + onnx_compile_job = hub.submit_compile_job( + model=source_model, + input_specs=input_spec, + device=hub_device, + name=model_name, + options="--target_runtime onnx", + ) + quantize_job = hub.submit_quantize_job( + model=onnx_compile_job.get_target_model(), + calibration_data=get_calibration_data( + input_spec, "imagenette", num_calibration_samples + ), + weights_dtype=model.get_weights_dtype(), + activations_dtype=model.get_activations_dtype(), + name=model_name, + options=model.get_quantize_options(), ) + if skip_compiling: + return ExportResult(quantize_job=quantize_job) - # 2. Compile the model to an on-device asset + # 3. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) print(f"Optimizing model {model_name} to run on-device") submitted_compile_job = hub.submit_compile_job( - model=source_model, + model=quantize_job.get_target_model(), input_specs=input_spec, device=hub_device, name=model_name, - calibration_data=model.get_calibration_data(target_runtime), options=model_compile_options, ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 4. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +177,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 5. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +198,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 6. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 7. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -198,12 +225,17 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + quantize_job=quantize_job, + ) def main(): warnings.filterwarnings("ignore") - parser = export_parser(model_cls=Model) + parser = export_parser(model_cls=Model, is_hub_quantized=True) args = parser.parse_args() export_model(**vars(args)) diff --git a/qai_hub_models/models/wideresnet50_quantized/model.py b/qai_hub_models/models/wideresnet50_quantized/model.py index a0fa95da..9e5e0f2b 100644 --- a/qai_hub_models/models/wideresnet50_quantized/model.py +++ b/qai_hub_models/models/wideresnet50_quantized/model.py @@ -4,83 +4,11 @@ # --------------------------------------------------------------------- from __future__ import annotations -# isort: off -# This verifies aimet is installed, and this must be included first. -from qai_hub_models.utils.quantization_aimet import ( - AIMETQuantizableMixin, - constrain_quantized_inputs_to_image_range, -) - -# isort: on - -import torch -from aimet_torch.cross_layer_equalization import ( - equalize_bn_folded_model, - fold_all_batch_norms, -) -from aimet_torch.model_preparer import prepare_model -from aimet_torch.quantsim import QuantizationSimModel, load_encodings_to_sim - from qai_hub_models.models.wideresnet50.model import WideResNet50 -from qai_hub_models.utils.aimet.config_loader import get_default_aimet_config -from qai_hub_models.utils.asset_loaders import CachedWebModelAsset +from qai_hub_models.utils.quantization import HubQuantizableMixin MODEL_ID = __name__.split(".")[-2] -MODEL_ASSET_VERSION = 3 -DEFAULT_ENCODINGS = "wideresnet50_quantized_encodings.json" - - -class WideResNet50Quantizable(AIMETQuantizableMixin, WideResNet50): - """WideResNet50 with post train quantization support. - - Supports only 8 bit weights and activations, and only loads pre-quantized checkpoints. - Support for quantizing using your own weights & data will come at a later date.""" - - def __init__( - self, - sim_model: QuantizationSimModel, - ) -> None: - # Input is already normalized by sim_model. Disable it in the wrapper model. - WideResNet50.__init__(self, sim_model.model, normalize_input=False) - AIMETQuantizableMixin.__init__( - self, - sim_model, - ) - - @classmethod - def from_pretrained( - cls, - aimet_encodings: str | None = "DEFAULT", - ) -> "WideResNet50Quantizable": - """ - Parameters: - aimet_encodings: - if "DEFAULT": Loads the model with aimet encodings calibrated on imagenette. - elif None: Doesn't load any encodings. Used when computing encodings. - else: Interprets as a filepath and loads the encodings stored there. - """ - model = WideResNet50.from_pretrained() - input_shape = cls.get_input_spec()["image_tensor"][0] - model = prepare_model(model) - dummy_input = torch.rand(input_shape) - - pairs = fold_all_batch_norms(model, input_shape, dummy_input) - equalize_bn_folded_model(model, input_shape, pairs, dummy_input) - sim = QuantizationSimModel( - model, - quant_scheme="tf_enhanced", - default_param_bw=8, - default_output_bw=8, - config_file=get_default_aimet_config(), - dummy_input=dummy_input, - ) - constrain_quantized_inputs_to_image_range(sim) - if aimet_encodings: - if aimet_encodings == "DEFAULT": - aimet_encodings = CachedWebModelAsset.from_asset_store( - MODEL_ID, MODEL_ASSET_VERSION, DEFAULT_ENCODINGS - ).fetch() - load_encodings_to_sim(sim, aimet_encodings) - return cls(sim) +class WideResNet50Quantizable(HubQuantizableMixin, WideResNet50): + pass diff --git a/qai_hub_models/models/wideresnet50_quantized/perf.yaml b/qai_hub_models/models/wideresnet50_quantized/perf.yaml index b5dc4abc..93a078e8 100644 --- a/qai_hub_models/models/wideresnet50_quantized/perf.yaml +++ b/qai_hub_models/models/wideresnet50_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,44 +20,39 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS6490 (Proxy) - RB3 Gen 2 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Sa8775p Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Qcs6490 Proxy + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8775P Proxy models: - name: WideResNet50-Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 1778.0 - throughput: 562.429696287964 + inference_time: 1779.0 + throughput: 562.1135469364812 estimated_peak_memory_range: - min: 12288 - max: 2215384 + min: 36864 + max: 611655808 primary_compute_unit: NPU precision: int8 layer_info: @@ -64,29 +60,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 82 - job_id: jqpyejylg + job_id: j57y2dvl5 job_status: Passed torchscript_onnx_qnn: - inference_time: 2038.0 - throughput: 490.6771344455348 + inference_time: 2025.0 + throughput: 493.82716049382714 estimated_peak_memory_range: - min: 16384 - max: 671168640 + min: 12288 + max: 145082312 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: j1pv3ryr5 + total_layers: 127 + job_id: j5q60294p job_status: Passed torchscript_onnx: - inference_time: 2795.0 - throughput: 357.78175313059035 + inference_time: 2468.0 + throughput: 405.1863857374392 estimated_peak_memory_range: min: 12288 - max: 85652800 + max: 87343304 primary_compute_unit: NPU precision: int8 layer_info: @@ -94,7 +90,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: jqp4qd61g + job_id: jp142832p job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -103,13 +99,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:25:12Z' + timestamp: '2024-10-17T17:14:16Z' - torchscript_onnx_tflite: - inference_time: 1345.0 - throughput: 743.4944237918215 + inference_time: 1403.0 + throughput: 712.7583749109052 estimated_peak_memory_range: - min: 16384 - max: 58881312 + min: 12288 + max: 61732864 primary_compute_unit: NPU precision: int8 layer_info: @@ -117,29 +113,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 82 - job_id: j2p0y2xng + job_id: jp4lnwjv5 job_status: Passed torchscript_onnx_qnn: - inference_time: 1674.0 - throughput: 597.3715651135007 + inference_time: 1682.0 + throughput: 594.5303210463734 estimated_peak_memory_range: min: 167936 - max: 19094240 + max: 21864528 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: j7gjx26ep + total_layers: 127 + job_id: jglv4ke85 job_status: Passed torchscript_onnx: - inference_time: 2047.0 - throughput: 488.5197850512946 + inference_time: 1830.0 + throughput: 546.448087431694 estimated_peak_memory_range: - min: 24576 - max: 89421392 + min: 32768 + max: 92240000 primary_compute_unit: NPU precision: int8 layer_info: @@ -147,7 +143,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: j0pxv68lg + job_id: jgdxnv0ep job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -156,13 +152,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:25:13Z' + timestamp: '2024-10-17T17:14:17Z' - torchscript_onnx_tflite: - inference_time: 1772.0 - throughput: 564.3340857787811 + inference_time: 7792.0 + throughput: 128.33675564681724 estimated_peak_memory_range: - min: 28672 - max: 622465936 + min: 12288 + max: 30620560 primary_compute_unit: NPU precision: int8 layer_info: @@ -170,37 +166,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 82 - job_id: j1p8omkog + job_id: jpxk91e15 job_status: Passed torchscript_onnx_qnn: - inference_time: 1882.0 - throughput: 531.3496280552604 + inference_time: 9311.0 + throughput: 107.3998496402105 estimated_peak_memory_range: - min: 180224 - max: 1470720 + min: 200704 + max: 8266832 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: jygzejqxg + total_layers: 127 + job_id: j56y21q0p job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:25:06Z' + chipset: QCS6490 Proxy + timestamp: '2024-10-17T17:14:02Z' - torchscript_onnx_tflite: - inference_time: 2175.0 - throughput: 459.7701149425287 + inference_time: 23600.0 + throughput: 42.3728813559322 estimated_peak_memory_range: - min: 20480 - max: 61780080 + min: 196608 + max: 2337856 primary_compute_unit: NPU precision: int8 layer_info: @@ -208,37 +204,60 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 82 - job_id: jogkzqkng + job_id: j5mnezvwp + job_status: Passed + reference_device_info: + name: RB5 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8250 Proxy + timestamp: '2024-10-17T17:13:47Z' + - torchscript_onnx_tflite: + inference_time: 1768.0 + throughput: 565.6108597285067 + estimated_peak_memory_range: + min: 12288 + max: 2217072 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 82 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 82 + job_id: jgn60err5 job_status: Passed torchscript_onnx_qnn: - inference_time: 2463.0 - throughput: 406.00893219650834 + inference_time: 1920.0 + throughput: 520.8333333333334 estimated_peak_memory_range: - min: 167936 - max: 22571312 + min: 204800 + max: 1626464 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: jvgdwqyz5 + total_layers: 127 + job_id: jp3jnmqlg job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:25:11Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-17T17:14:03Z' - torchscript_onnx_tflite: - inference_time: 1772.0 - throughput: 564.3340857787811 + inference_time: 1779.0 + throughput: 562.1135469364812 estimated_peak_memory_range: - min: 24576 - max: 1592920 + min: 12288 + max: 28815656 primary_compute_unit: NPU precision: int8 layer_info: @@ -246,37 +265,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 82 - job_id: jn5q8rdo5 + job_id: jprv6y19g job_status: Passed torchscript_onnx_qnn: - inference_time: 1887.0 - throughput: 529.9417064122946 + inference_time: 1916.0 + throughput: 521.9206680584551 estimated_peak_memory_range: - min: 180224 - max: 1435328 + min: 176128 + max: 1507952 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: jz5wo30mp + total_layers: 127 + job_id: jpv6qwzj5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:25:07Z' + chipset: SA8255P Proxy + timestamp: '2024-10-17T17:14:06Z' - torchscript_onnx_tflite: - inference_time: 1773.0 - throughput: 564.0157924421884 + inference_time: 1775.0 + throughput: 563.3802816901408 estimated_peak_memory_range: - min: 16384 - max: 16234344 + min: 32768 + max: 1598432 primary_compute_unit: NPU precision: int8 layer_info: @@ -284,22 +303,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 82 - job_id: j1gln29mp + job_id: jp2kxm34p job_status: Passed torchscript_onnx_qnn: - inference_time: 1890.0 - throughput: 529.1005291005291 + inference_time: 1924.0 + throughput: 519.7505197505197 estimated_peak_memory_range: - min: 192512 - max: 1639712 + min: 184320 + max: 1685552 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: jmg9vy785 + total_layers: 127 + job_id: jgjvdlkxg job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -307,14 +326,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:25:08Z' + chipset: SA8775P Proxy + timestamp: '2024-10-17T17:14:08Z' - torchscript_onnx_tflite: - inference_time: 1774.0 - throughput: 563.6978579481398 + inference_time: 2167.0 + throughput: 461.4674665436087 estimated_peak_memory_range: min: 12288 - max: 87019352 + max: 64212992 primary_compute_unit: NPU precision: int8 layer_info: @@ -322,37 +341,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 82 - job_id: jw566z9y5 + job_id: jpy1zdv7p job_status: Passed torchscript_onnx_qnn: - inference_time: 1928.0 - throughput: 518.6721991701245 + inference_time: 2463.0 + throughput: 406.00893219650834 estimated_peak_memory_range: - min: 16384 - max: 1728576 + min: 167936 + max: 23012416 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: jnp10wk75 + total_layers: 127 + job_id: jpedov415 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:25:10Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-17T17:14:10Z' - torchscript_onnx_tflite: - inference_time: 7772.0 - throughput: 128.66700977869274 + inference_time: 1248.0 + throughput: 801.2820512820513 estimated_peak_memory_range: - min: 12288 - max: 31462320 + min: 8192 + max: 24951424 primary_compute_unit: NPU precision: int8 layer_info: @@ -360,75 +379,67 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 82 - job_id: j1p3k1ln5 + job_id: jp0z4re65 job_status: Passed torchscript_onnx_qnn: - inference_time: 10252.0 - throughput: 97.54194303550527 + inference_time: 1443.0 + throughput: 693.000693000693 estimated_peak_memory_range: - min: 163840 - max: 7742096 + min: 159744 + max: 19173072 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: jz57zl19p + total_layers: 127 + job_id: jgz327vk5 job_status: Passed - reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot - os_name: Android - manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T11:25:12Z' - - torchscript_onnx_tflite: - inference_time: 24100.0 - throughput: 41.49377593360996 + torchscript_onnx: + inference_time: 1674.0 + throughput: 597.3715651135007 estimated_peak_memory_range: - min: 176128 - max: 2286552 + min: 0 + max: 42505296 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 82 + layers_on_npu: 147 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 82 - job_id: jwgoyn7k5 + total_layers: 147 + job_id: jg9l048wg job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T11:25:02Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-17T17:14:20Z' - torchscript_onnx_qnn: - inference_time: 1822.0 - throughput: 548.847420417124 + inference_time: 1840.0 + throughput: 543.4782608695652 estimated_peak_memory_range: - min: 253952 - max: 253952 + min: 323584 + max: 323584 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 78 + layers_on_npu: 127 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 78 - job_id: jlpe9w0vg + total_layers: 127 + job_id: jgo2zvexp job_status: Passed torchscript_onnx: - inference_time: 2640.0 - throughput: 378.7878787878788 + inference_time: 2614.0 + throughput: 382.55547054322875 estimated_peak_memory_range: - min: 72720384 - max: 72720384 + min: 73981952 + max: 73981952 primary_compute_unit: NPU precision: int8 layer_info: @@ -436,7 +447,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 147 - job_id: jo5mr619g + job_id: j5wew9x35 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -445,4 +456,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:25:14Z' + timestamp: '2024-10-17T17:14:19Z' diff --git a/qai_hub_models/models/wideresnet50_quantized/requirements.txt b/qai_hub_models/models/wideresnet50_quantized/requirements.txt deleted file mode 100644 index de5b80e8..00000000 --- a/qai_hub_models/models/wideresnet50_quantized/requirements.txt +++ /dev/null @@ -1 +0,0 @@ -aimet-torch==1.32.1.post1; sys_platform == "linux" diff --git a/qai_hub_models/models/wideresnet50_quantized/test.py b/qai_hub_models/models/wideresnet50_quantized/test.py deleted file mode 100644 index fbe14f34..00000000 --- a/qai_hub_models/models/wideresnet50_quantized/test.py +++ /dev/null @@ -1,30 +0,0 @@ -# --------------------------------------------------------------------- -# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. -# SPDX-License-Identifier: BSD-3-Clause -# --------------------------------------------------------------------- -from qai_hub_models.models._shared.imagenet_classifier.test_utils import ( - run_imagenet_classifier_test, -) -from qai_hub_models.models.wideresnet50_quantized.demo import main as demo_main -from qai_hub_models.models.wideresnet50_quantized.model import ( - MODEL_ASSET_VERSION, - MODEL_ID, - WideResNet50Quantizable, -) - - -def test_task(): - run_imagenet_classifier_test( - WideResNet50Quantizable.from_pretrained(), - MODEL_ID, - probability_threshold=0.4, - asset_version=MODEL_ASSET_VERSION, - diff_tol=0.005, - rtol=0.02, - atol=0.2, - ) - - -def test_demo(): - # Verify demo does not crash - demo_main(is_test=True) diff --git a/qai_hub_models/models/xlsr/README.md b/qai_hub_models/models/xlsr/README.md index 1b42556e..53aa6944 100644 --- a/qai_hub_models/models/xlsr/README.md +++ b/qai_hub_models/models/xlsr/README.md @@ -6,7 +6,7 @@ XLSR is designed for lightweight real-time upscaling of images. This is based on the implementation of XLSR found -[here](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/xlsr). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/xlsr). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.xlsr.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of XLSR can be found +* The license for the original implementation of XLSR can be found [here](https://github.com/quic/aimet-model-zoo/blob/develop/LICENSE.pdf). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Extremely Lightweight Quantization Robust Real-Time Single-Image Super Resolution for Mobile Devices](https://arxiv.org/abs/2105.10288) * [Source Model Implementation](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/xlsr) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/xlsr/export.py b/qai_hub_models/models/xlsr/export.py index 63330580..5b37b6bc 100644 --- a/qai_hub_models/models/xlsr/export.py +++ b/qai_hub_models/models/xlsr/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.xlsr import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "xlsr" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -197,7 +195,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/xlsr/perf.yaml b/qai_hub_models/models/xlsr/perf.yaml index e58340ef..93e2f4cc 100644 --- a/qai_hub_models/models/xlsr/perf.yaml +++ b/qai_hub_models/models/xlsr/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: XLSR performance_metrics: - torchscript_onnx_tflite: - inference_time: 2483.0 - throughput: 402.7386226339106 + inference_time: 2580.0 + throughput: 387.5968992248062 estimated_peak_memory_range: - min: 24576 - max: 72773184 + min: 225280 + max: 9489008 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 16 - job_id: joprk2d75 + job_id: jp4lrmr25 job_status: Passed torchscript_onnx_qnn: - inference_time: 1448.0 - throughput: 690.6077348066299 + inference_time: 1375.0 + throughput: 727.2727272727273 estimated_peak_memory_range: - min: 217088 - max: 3042344 + min: 28672 + max: 3234336 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 21 - job_id: j1gln2qmp + job_id: jgkexoevg job_status: Passed torchscript_onnx: - inference_time: 1547.0 - throughput: 646.4124111182934 + inference_time: 1509.0 + throughput: 662.6905235255136 estimated_peak_memory_range: - min: 12288 - max: 30840088 + min: 229376 + max: 1899672 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 23 - job_id: jz5wo3rmp + job_id: j5we68e45 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:22:32Z' + timestamp: '2024-10-14T23:11:36Z' - torchscript_onnx_tflite: - inference_time: 1754.0 - throughput: 570.1254275940707 + inference_time: 1793.0 + throughput: 557.7244841048522 estimated_peak_memory_range: - min: 16384 - max: 22748944 + min: 20480 + max: 25105472 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 16 - job_id: jep289dqp + job_id: jpxko3o85 job_status: Passed torchscript_onnx_qnn: - inference_time: 1088.0 - throughput: 919.1176470588235 + inference_time: 1080.0 + throughput: 925.925925925926 estimated_peak_memory_range: - min: 212992 - max: 11680624 + min: 208896 + max: 14304032 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 21 - job_id: jw566z0y5 + job_id: j5q6qz6ep job_status: Passed torchscript_onnx: - inference_time: 1048.0 - throughput: 954.1984732824427 + inference_time: 1084.0 + throughput: 922.509225092251 estimated_peak_memory_range: min: 0 - max: 23772800 + max: 24023888 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 23 - job_id: jmg9vyq85 + job_id: jg9lnklmg job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:22:33Z' + timestamp: '2024-10-14T23:11:37Z' - torchscript_onnx_tflite: - inference_time: 2474.0 - throughput: 404.2037186742118 + inference_time: 2467.0 + throughput: 405.35062829347385 estimated_peak_memory_range: min: 28672 - max: 1382728 + max: 1363664 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 16 - job_id: jqpyej2lg + job_id: j5mnxox7p job_status: Passed torchscript_onnx_qnn: - inference_time: 1330.0 - throughput: 751.8796992481203 + inference_time: 1359.0 + throughput: 735.8351729212657 estimated_peak_memory_range: min: 229376 - max: 1476848 + max: 1472576 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 21 - job_id: jwgoyn9k5 + job_id: j56y4rynp job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:22:28Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:11:29Z' - torchscript_onnx_tflite: - inference_time: 4543.0 - throughput: 220.1188641866608 + inference_time: 2551.0 + throughput: 392.0031360250882 estimated_peak_memory_range: - min: 6336512 - max: 30139312 + min: 16384 + max: 92452376 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 16 - job_id: j2p0y29ng + job_id: jpy13q30p job_status: Passed torchscript_onnx_qnn: - inference_time: 1564.0 - throughput: 639.386189258312 + inference_time: 1344.0 + throughput: 744.047619047619 estimated_peak_memory_range: - min: 208896 - max: 15461264 + min: 221184 + max: 1455936 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 21 - job_id: jygzej0xg + job_id: jpv6ke6z5 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:22:31Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:11:32Z' - torchscript_onnx_tflite: - inference_time: 2578.0 - throughput: 387.8975950349108 + inference_time: 2600.0 + throughput: 384.61538461538464 estimated_peak_memory_range: - min: 28672 - max: 1410784 + min: 1933312 + max: 3268800 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 16 - job_id: j1p8omrog + job_id: jp2ky4y6p job_status: Passed torchscript_onnx_qnn: - inference_time: 1349.0 - throughput: 741.2898443291327 + inference_time: 1364.0 + throughput: 733.1378299120234 estimated_peak_memory_range: - min: 225280 - max: 4978872 + min: 229376 + max: 1519016 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 21 - job_id: j1pv3rnr5 + job_id: jgo26o21p job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:22:29Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:11:31Z' - torchscript_onnx_tflite: - inference_time: 2424.0 - throughput: 412.54125412541254 + inference_time: 2451.0 + throughput: 407.9967360261118 estimated_peak_memory_range: - min: 16384 - max: 1506344 + min: 24576 + max: 32218040 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 16 - job_id: jogkzq0ng + job_id: jprv3o3kg job_status: Passed torchscript_onnx_qnn: - inference_time: 1352.0 - throughput: 739.6449704142012 + inference_time: 1341.0 + throughput: 745.7121551081283 estimated_peak_memory_range: - min: 20480 - max: 4007744 + min: 233472 + max: 1715304 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 21 - job_id: j7gjx28ep + job_id: jp3j0xjmg job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:22:30Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:11:30Z' - torchscript_onnx_tflite: - inference_time: 2574.0 - throughput: 388.5003885003885 + inference_time: 3255.0 + throughput: 307.21966205837174 estimated_peak_memory_range: - min: 6316032 - max: 23092696 + min: 6328320 + max: 31083824 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 16 - job_id: jn5q8r1o5 + job_id: jgn6vovj5 job_status: Passed torchscript_onnx_qnn: - inference_time: 1462.0 - throughput: 683.9945280437756 + inference_time: 1541.0 + throughput: 648.9292667099286 estimated_peak_memory_range: - min: 221184 - max: 1485168 + min: 204800 + max: 14941584 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,22 +329,75 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 21 - job_id: jlpe9wnvg + job_id: jpedm8d85 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:11:34Z' + - torchscript_onnx_tflite: + inference_time: 1913.0 + throughput: 522.7391531625718 + estimated_peak_memory_range: + min: 20480 + max: 17227600 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 13 + layers_on_gpu: 0 + layers_on_cpu: 3 + total_layers: 16 + job_id: jp8qy6qqp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 686.0 + throughput: 1457.725947521866 + estimated_peak_memory_range: + min: 0 + max: 9899072 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 21 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 21 + job_id: jgz3d8345 + job_status: Passed + torchscript_onnx: + inference_time: 1057.0 + throughput: 946.073793755913 + estimated_peak_memory_range: + min: 0 + max: 15333728 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 23 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 23 + job_id: j57yrkyn5 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:22:30Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:11:40Z' - torchscript_onnx_qnn: - inference_time: 1459.0 - throughput: 685.4009595613434 + inference_time: 1500.0 + throughput: 666.6666666666666 estimated_peak_memory_range: - min: 212992 - max: 212992 + min: 237568 + max: 237568 primary_compute_unit: NPU precision: fp16 layer_info: @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 21 - job_id: j1p3k1rn5 + job_id: jglvmov25 job_status: Passed torchscript_onnx: - inference_time: 1501.0 - throughput: 666.2225183211193 + inference_time: 1516.0 + throughput: 659.6306068601583 estimated_peak_memory_range: - min: 8970240 - max: 8970240 + min: 8962048 + max: 8962048 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 23 - job_id: jnp10wm75 + job_id: jp14z74np job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:22:34Z' + timestamp: '2024-10-14T23:11:38Z' diff --git a/qai_hub_models/models/xlsr_quantized/README.md b/qai_hub_models/models/xlsr_quantized/README.md index d1f27eab..dbc4c468 100644 --- a/qai_hub_models/models/xlsr_quantized/README.md +++ b/qai_hub_models/models/xlsr_quantized/README.md @@ -6,7 +6,7 @@ XLSR is designed for lightweight real-time upscaling of images. This is based on the implementation of XLSR-Quantized found -[here](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/xlsr). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/xlsr_quantized). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.xlsr_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of XLSR-Quantized can be found +* The license for the original implementation of XLSR-Quantized can be found [here](https://github.com/quic/aimet-model-zoo/blob/develop/LICENSE.pdf). -- The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) +* The license for the compiled assets for on-device deployment can be found [here](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/Qualcomm+AI+Hub+Proprietary+License.pdf) + ## References * [Extremely Lightweight Quantization Robust Real-Time Single-Image Super Resolution for Mobile Devices](https://arxiv.org/abs/2105.10288) * [Source Model Implementation](https://github.com/quic/aimet-model-zoo/tree/develop/aimet_zoo_torch/xlsr) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/xlsr_quantized/export.py b/qai_hub_models/models/xlsr_quantized/export.py index 5411d38f..7c924ecd 100644 --- a/qai_hub_models/models/xlsr_quantized/export.py +++ b/qai_hub_models/models/xlsr_quantized/export.py @@ -10,17 +10,17 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.xlsr_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.printing import ( print_inference_metrics, @@ -45,20 +45,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -80,10 +78,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "xlsr_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,7 +107,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( target_runtime, output_path, input_spec ) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -135,7 +133,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +148,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +169,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -198,7 +196,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/xlsr_quantized/perf.yaml b/qai_hub_models/models/xlsr_quantized/perf.yaml index cc192a0d..7f345b62 100644 --- a/qai_hub_models/models/xlsr_quantized/perf.yaml +++ b/qai_hub_models/models/xlsr_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,44 +20,41 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS6490 (Proxy) - RB3 Gen 2 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Sa8775p Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Qcs6490 Proxy + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: XLSR-Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 1060.0 - throughput: 943.3962264150944 + inference_time: 1076.0 + throughput: 929.368029739777 estimated_peak_memory_range: min: 12288 - max: 3525304 + max: 1327904 primary_compute_unit: NPU precision: int8 layer_info: @@ -64,29 +62,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 19 - job_id: jqpyej88g + job_id: jpedm2385 job_status: Passed torchscript_onnx_qnn: - inference_time: 654.0 - throughput: 1529.051987767584 + inference_time: 652.0 + throughput: 1533.7423312883436 estimated_peak_memory_range: - min: 20480 - max: 3091504 + min: 0 + max: 3226576 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 16 + layers_on_npu: 21 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 16 - job_id: j7gjx2yvp + total_layers: 21 + job_id: jprv39jkg job_status: Passed torchscript_onnx: - inference_time: 765.0 - throughput: 1307.18954248366 + inference_time: 678.0 + throughput: 1474.9262536873157 estimated_peak_memory_range: - min: 69632 - max: 1491648 + min: 65536 + max: 1331032 primary_compute_unit: NPU precision: int8 layer_info: @@ -94,7 +92,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 19 - job_id: jnp10w175 + job_id: jpv6kekz5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -103,13 +101,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:21:57Z' + timestamp: '2024-10-14T23:10:54Z' - torchscript_onnx_tflite: - inference_time: 916.0 - throughput: 1091.703056768559 + inference_time: 878.0 + throughput: 1138.9521640091116 estimated_peak_memory_range: min: 20480 - max: 23306096 + max: 23583056 primary_compute_unit: NPU precision: int8 layer_info: @@ -117,29 +115,29 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 19 - job_id: j2p0y2o9g + job_id: jgz3dwk45 job_status: Passed torchscript_onnx_qnn: - inference_time: 447.0 - throughput: 2237.136465324385 + inference_time: 454.0 + throughput: 2202.643171806167 estimated_peak_memory_range: min: 12288 - max: 14237008 + max: 16088080 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 16 + layers_on_npu: 21 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 16 - job_id: jlpe9wxog + total_layers: 21 + job_id: jp2kyjn6p job_status: Passed torchscript_onnx: - inference_time: 721.0 - throughput: 1386.9625520110958 + inference_time: 499.0 + throughput: 2004.0080160320642 estimated_peak_memory_range: min: 0 - max: 25230096 + max: 25219312 primary_compute_unit: NPU precision: int8 layer_info: @@ -147,7 +145,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 19 - job_id: jvgdwq4z5 + job_id: jgjvnon1g job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -156,13 +154,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:21:58Z' + timestamp: '2024-10-14T23:10:55Z' - torchscript_onnx_tflite: - inference_time: 1980.0 - throughput: 505.050505050505 + inference_time: 2437.0 + throughput: 410.3405826836274 estimated_peak_memory_range: - min: 28672 - max: 1428296 + min: 12288 + max: 17008928 primary_compute_unit: NPU precision: int8 layer_info: @@ -170,37 +168,60 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 19 - job_id: j1p8omjkg + job_id: jpxkom285 job_status: Passed torchscript_onnx_qnn: - inference_time: 432.0 - throughput: 2314.814814814815 + inference_time: 1076.0 + throughput: 929.368029739777 estimated_peak_memory_range: - min: 77824 - max: 1286280 + min: 12288 + max: 7595984 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 16 + layers_on_npu: 21 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 16 - job_id: jz5wo3z3p + total_layers: 21 + job_id: jp3j0x0mg job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS6490 Proxy + timestamp: '2024-10-14T23:10:52Z' + - torchscript_onnx_tflite: + inference_time: 16048.0 + throughput: 62.31306081754736 + estimated_peak_memory_range: + min: 4354048 + max: 29172840 + primary_compute_unit: GPU + precision: int8 + layer_info: + layers_on_npu: 5 + layers_on_gpu: 9 + layers_on_cpu: 5 + total_layers: 19 + job_id: j5mnx4y7p + job_status: Passed + reference_device_info: + name: RB5 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:21:52Z' + chipset: QCS8250 Proxy + timestamp: '2024-10-14T23:10:40Z' - torchscript_onnx_tflite: - inference_time: 1492.0 - throughput: 670.2412868632708 + inference_time: 1060.0 + throughput: 943.3962264150944 estimated_peak_memory_range: - min: 806912 - max: 24163360 + min: 24576 + max: 12948776 primary_compute_unit: NPU precision: int8 layer_info: @@ -208,37 +229,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 19 - job_id: jogkzq6wg + job_id: j5we6xn45 job_status: Passed torchscript_onnx_qnn: - inference_time: 710.0 - throughput: 1408.4507042253522 + inference_time: 426.0 + throughput: 2347.417840375587 estimated_peak_memory_range: - min: 61440 - max: 15521888 + min: 81920 + max: 1717840 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 16 + layers_on_npu: 21 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 16 - job_id: jz5wo3zmp + total_layers: 21 + job_id: jp0z0d005 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:21:55Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:10:46Z' - torchscript_onnx_tflite: - inference_time: 1067.0 - throughput: 937.207122774133 + inference_time: 1054.0 + throughput: 948.7666034155598 estimated_peak_memory_range: - min: 28672 - max: 3048136 + min: 24576 + max: 3075568 primary_compute_unit: NPU precision: int8 layer_info: @@ -246,37 +267,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 19 - job_id: jn5q8r4n5 + job_id: j57yr63n5 job_status: Passed torchscript_onnx_qnn: - inference_time: 432.0 - throughput: 2314.814814814815 + inference_time: 429.0 + throughput: 2331.002331002331 estimated_peak_memory_range: - min: 86016 - max: 1312984 + min: 32768 + max: 1894288 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 16 + layers_on_npu: 21 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 16 - job_id: jmg9vy2w5 + total_layers: 21 + job_id: j5q6qzqep job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:21:53Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:10:49Z' - torchscript_onnx_tflite: - inference_time: 1071.0 - throughput: 933.7068160597572 + inference_time: 1065.0 + throughput: 938.9671361502348 estimated_peak_memory_range: - min: 24576 - max: 5816184 + min: 49152 + max: 1291680 primary_compute_unit: NPU precision: int8 layer_info: @@ -284,22 +305,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 19 - job_id: j1gln2wjp + job_id: jgdx10l6p job_status: Passed torchscript_onnx_qnn: - inference_time: 432.0 - throughput: 2314.814814814815 + inference_time: 433.0 + throughput: 2309.4688221709007 estimated_peak_memory_range: - min: 73728 - max: 1408024 + min: 81920 + max: 1307672 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 16 + layers_on_npu: 21 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 16 - job_id: jnp10w185 + total_layers: 21 + job_id: jgkexoxvg job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -307,14 +328,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:21:54Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:10:48Z' - torchscript_onnx_tflite: - inference_time: 1068.0 - throughput: 936.3295880149813 + inference_time: 1077.0 + throughput: 928.5051067780872 estimated_peak_memory_range: - min: 24576 - max: 1419600 + min: 1605632 + max: 15182384 primary_compute_unit: NPU precision: int8 layer_info: @@ -322,37 +343,37 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 19 - job_id: jw566zo65 + job_id: jp14z3xnp job_status: Passed torchscript_onnx_qnn: - inference_time: 430.0 - throughput: 2325.5813953488373 + inference_time: 424.0 + throughput: 2358.490566037736 estimated_peak_memory_range: min: 69632 - max: 1456560 + max: 1396200 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 16 + layers_on_npu: 21 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 16 - job_id: jvgdwq4r5 + total_layers: 21 + job_id: jp8qy6yqp job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:21:54Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:10:47Z' - torchscript_onnx_tflite: - inference_time: 2344.0 - throughput: 426.6211604095563 + inference_time: 1399.0 + throughput: 714.7962830593281 estimated_peak_memory_range: - min: 1642496 - max: 18693376 + min: 16384 + max: 24102912 primary_compute_unit: NPU precision: int8 layer_info: @@ -360,75 +381,105 @@ models: layers_on_gpu: 0 layers_on_cpu: 3 total_layers: 19 - job_id: j1p3k1o35 + job_id: jg9ln8emg job_status: Passed torchscript_onnx_qnn: - inference_time: 1118.0 - throughput: 894.4543828264758 + inference_time: 716.0 + throughput: 1396.6480446927374 estimated_peak_memory_range: - min: 61440 - max: 8045296 + min: 12288 + max: 13691328 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 16 + layers_on_npu: 21 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 16 - job_id: jmg9vy285 + total_layers: 21 + job_id: j56y4r4np job_status: Passed reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T11:21:57Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:10:51Z' - torchscript_onnx_tflite: - inference_time: 14013.0 - throughput: 71.36230642974381 + inference_time: 854.0 + throughput: 1170.96018735363 estimated_peak_memory_range: - min: 4333568 - max: 10909736 - primary_compute_unit: GPU + min: 12288 + max: 16457856 + primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 5 - layers_on_gpu: 9 - layers_on_cpu: 5 + layers_on_npu: 16 + layers_on_gpu: 0 + layers_on_cpu: 3 total_layers: 19 - job_id: jwgoyndq5 + job_id: jgn6vx8j5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 404.0 + throughput: 2475.2475247524753 + estimated_peak_memory_range: + min: 57344 + max: 10604272 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 21 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 21 + job_id: jgo26o61p + job_status: Passed + torchscript_onnx: + inference_time: 381.0 + throughput: 2624.6719160104985 + estimated_peak_memory_range: + min: 20480 + max: 16581872 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 19 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 19 + job_id: j5we68645 job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T11:21:47Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:10:58Z' - torchscript_onnx_qnn: - inference_time: 556.0 - throughput: 1798.5611510791366 + inference_time: 536.0 + throughput: 1865.6716417910447 estimated_peak_memory_range: - min: 49152 - max: 49152 + min: 131072 + max: 131072 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 16 + layers_on_npu: 21 layers_on_gpu: 0 layers_on_cpu: 0 - total_layers: 16 - job_id: jygzejyog + total_layers: 21 + job_id: jpy13n00p job_status: Passed torchscript_onnx: - inference_time: 782.0 - throughput: 1278.772378516624 + inference_time: 794.0 + throughput: 1259.4458438287154 estimated_peak_memory_range: - min: 3330048 - max: 3330048 + min: 3387392 + max: 3387392 primary_compute_unit: NPU precision: int8 layer_info: @@ -436,7 +487,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 19 - job_id: jz57zln9p + job_id: jpedm8m85 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -445,4 +496,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:21:59Z' + timestamp: '2024-10-14T23:10:56Z' diff --git a/qai_hub_models/models/yolonas/README.md b/qai_hub_models/models/yolonas/README.md index dd81f9b2..f1467c01 100644 --- a/qai_hub_models/models/yolonas/README.md +++ b/qai_hub_models/models/yolonas/README.md @@ -6,7 +6,7 @@ YoloNAS is a machine learning model that predicts bounding boxes and classes of objects in an image. This is based on the implementation of Yolo-NAS found -[here](https://github.com/Deci-AI/super-gradients). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/yolonas). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.yolonas.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Yolo-NAS can be found +* The license for the original implementation of Yolo-NAS can be found [here](https://github.com/Deci-AI/super-gradients/blob/master/YOLONAS.md#license). -- The license for the compiled assets for on-device deployment can be found [here](https://github.com/Deci-AI/super-gradients/blob/master/LICENSE.YOLONAS.md) +* The license for the compiled assets for on-device deployment can be found [here](https://github.com/Deci-AI/super-gradients/blob/master/LICENSE.YOLONAS.md) + ## References * [A Next-Generation, Object Detection Foundational Model generated by Deci’s Neural Architecture Search Technology](https://github.com/Deci-AI/super-gradients/blob/master/YOLONAS.md) * [Source Model Implementation](https://github.com/Deci-AI/super-gradients) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/yolonas/export.py b/qai_hub_models/models/yolonas/export.py index 61906a98..eabeed46 100644 --- a/qai_hub_models/models/yolonas/export.py +++ b/qai_hub_models/models/yolonas/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.yolonas import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "yolonas" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -201,7 +199,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/yolonas/perf.yaml b/qai_hub_models/models/yolonas/perf.yaml index 7f549a6b..7f57acd5 100644 --- a/qai_hub_models/models/yolonas/perf.yaml +++ b/qai_hub_models/models/yolonas/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: Yolo-NAS performance_metrics: - torchscript_onnx_tflite: - inference_time: 10909.0 - throughput: 91.66743056192135 + inference_time: 10860.0 + throughput: 92.08103130755065 estimated_peak_memory_range: - min: 217088 - max: 4691760 + min: 32768 + max: 7163776 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 201 - job_id: j1p8omekg + job_id: jpedm21v5 job_status: Passed torchscript_onnx_qnn: - inference_time: 15011.0 - throughput: 66.61781360335753 + inference_time: 15235.0 + throughput: 65.63833278634722 estimated_peak_memory_range: - min: 6328320 - max: 24035528 + min: 4931584 + max: 24696952 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 289 - job_id: j1pv3r2k5 + job_id: jgdx1096p job_status: Passed torchscript_onnx: - inference_time: 9947.0 - throughput: 100.53282396702524 + inference_time: 7751.0 + throughput: 129.01561088891756 estimated_peak_memory_range: - min: 16384 - max: 26342544 + min: 28672 + max: 26587000 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 290 - job_id: jz57zlovp + job_id: jp8qy8vqp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:21:17Z' + timestamp: '2024-10-14T23:10:06Z' - torchscript_onnx_tflite: - inference_time: 9064.0 - throughput: 110.32656663724624 + inference_time: 8986.0 + throughput: 111.28421989761851 estimated_peak_memory_range: - min: 237568 - max: 103485776 + min: 163840 + max: 112014672 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 201 - job_id: jogkzq2wg + job_id: jgz3dw9x5 job_status: Passed torchscript_onnx_qnn: - inference_time: 10848.0 - throughput: 92.18289085545723 + inference_time: 10889.0 + throughput: 91.8357975939021 estimated_peak_memory_range: min: 4952064 - max: 33809264 + max: 38833264 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 289 - job_id: j7gjx23vp + job_id: j57yr6wn5 job_status: Passed torchscript_onnx: - inference_time: 7273.0 - throughput: 137.49484394335212 + inference_time: 6026.0 + throughput: 165.94756057085962 estimated_peak_memory_range: - min: 1060864 - max: 106214624 + min: 4116480 + max: 119044512 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 290 - job_id: jqp4qde8g + job_id: jgkexdmvg job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:21:18Z' + timestamp: '2024-10-14T23:10:07Z' - torchscript_onnx_tflite: - inference_time: 10800.0 - throughput: 92.5925925925926 + inference_time: 10765.0 + throughput: 92.89363678588016 estimated_peak_memory_range: - min: 0 - max: 25968968 + min: 241664 + max: 311233456 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 201 - job_id: jn5q8rln5 + job_id: j5we6xvm5 job_status: Passed torchscript_onnx_qnn: - inference_time: 10116.0 - throughput: 98.85330170027679 + inference_time: 9611.0 + throughput: 104.04744563520966 estimated_peak_memory_range: - min: 5001216 - max: 6202472 + min: 4988928 + max: 6329160 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 289 - job_id: jygzejzog + job_id: jpxkomj85 job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:21:12Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:09:58Z' - torchscript_onnx_tflite: - inference_time: 13791.0 - throughput: 72.51105793633529 + inference_time: 10662.0 + throughput: 93.79103357719002 estimated_peak_memory_range: - min: 217088 - max: 100659936 + min: 245760 + max: 4395224 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 201 - job_id: j1gln2yjp + job_id: j5we6xv45 job_status: Passed torchscript_onnx_qnn: - inference_time: 18123.0 - throughput: 55.17850245544336 + inference_time: 9606.0 + throughput: 104.10160316468874 estimated_peak_memory_range: - min: 4952064 - max: 32193984 + min: 4976640 + max: 6826496 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 289 - job_id: jvgdwq6r5 + job_id: jprv39qkg job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:21:16Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:10:01Z' - torchscript_onnx_tflite: - inference_time: 10934.0 - throughput: 91.45783793671117 + inference_time: 10844.0 + throughput: 92.21689413500553 estimated_peak_memory_range: - min: 237568 - max: 6140328 + min: 12288 + max: 80955216 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 201 - job_id: jw566z865 + job_id: jgdx109zp job_status: Passed torchscript_onnx_qnn: - inference_time: 10106.0 - throughput: 98.95111814763507 + inference_time: 9491.0 + throughput: 105.36297545042672 estimated_peak_memory_range: - min: 4960256 - max: 6588608 + min: 4993024 + max: 6543456 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 289 - job_id: jz5wo3y3p + job_id: jgn6vxyj5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:21:13Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:10:00Z' - torchscript_onnx_tflite: - inference_time: 10964.0 - throughput: 91.20758847136082 + inference_time: 10664.0 + throughput: 93.7734433608402 estimated_peak_memory_range: - min: 40960 - max: 334395264 + min: 16384 + max: 5471048 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 201 - job_id: j1p3k1z35 + job_id: jp14z3l7p job_status: Passed torchscript_onnx_qnn: - inference_time: 10226.0 - throughput: 97.78994719342852 + inference_time: 9508.0 + throughput: 105.17458981909971 estimated_peak_memory_range: - min: 4980736 - max: 6388816 + min: 4960256 + max: 6244192 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 289 - job_id: jmg9vyow5 + job_id: j5mnx427p job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:21:14Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:09:59Z' - torchscript_onnx_tflite: - inference_time: 10838.0 - throughput: 92.26794611551946 + inference_time: 13889.0 + throughput: 71.99942400460796 estimated_peak_memory_range: - min: 253952 - max: 7244464 + min: 233472 + max: 107888992 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 201 - job_id: jwgoynlq5 + job_id: jg9ln818g job_status: Passed torchscript_onnx_qnn: - inference_time: 10013.0 - throughput: 99.87016878058525 + inference_time: 18405.0 + throughput: 54.333061668024996 estimated_peak_memory_range: - min: 4993024 - max: 6298952 + min: 4956160 + max: 39240272 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 289 - job_id: jnp10wo85 + job_id: jpy13nw0p job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:10:03Z' + - torchscript_onnx_tflite: + inference_time: 7633.0 + throughput: 131.0100877767588 + estimated_peak_memory_range: + min: 212992 + max: 56645456 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 201 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 201 + job_id: jp14z3lnp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 10025.0 + throughput: 99.75062344139651 + estimated_peak_memory_range: + min: 4931584 + max: 33649952 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 289 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 289 + job_id: jp0z0k705 + job_status: Passed + torchscript_onnx: + inference_time: 4362.0 + throughput: 229.25263640531867 + estimated_peak_memory_range: + min: 5369856 + max: 61783152 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 290 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 290 + job_id: j56y4vlnp + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:21:15Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:10:09Z' - torchscript_onnx_qnn: - inference_time: 10718.0 - throughput: 93.3009889904833 + inference_time: 10223.0 + throughput: 97.81864423359092 estimated_peak_memory_range: min: 4923392 max: 4923392 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 289 - job_id: jlpe9w6og + job_id: jp4lr8o25 job_status: Passed torchscript_onnx: - inference_time: 10102.0 - throughput: 98.99029895070284 + inference_time: 8286.0 + throughput: 120.68549360366885 estimated_peak_memory_range: - min: 22188032 - max: 22188032 + min: 22249472 + max: 22249472 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 290 - job_id: j0pxv603g + job_id: j5q6qwoep job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:21:19Z' + timestamp: '2024-10-14T23:10:08Z' diff --git a/qai_hub_models/models/yolonas_quantized/README.md b/qai_hub_models/models/yolonas_quantized/README.md index 10c5859a..0894542c 100644 --- a/qai_hub_models/models/yolonas_quantized/README.md +++ b/qai_hub_models/models/yolonas_quantized/README.md @@ -6,7 +6,7 @@ YoloNAS is a machine learning model that predicts bounding boxes and classes of objects in an image. This model is post-training quantized to int8 using samples from the COCO dataset. This is based on the implementation of Yolo-NAS-Quantized found -[here](https://github.com/Deci-AI/super-gradients). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/yolonas_quantized). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.yolonas_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Yolo-NAS-Quantized can be found +* The license for the original implementation of Yolo-NAS-Quantized can be found [here](https://github.com/Deci-AI/super-gradients/blob/master/YOLONAS.md#license). -- The license for the compiled assets for on-device deployment can be found [here](https://github.com/Deci-AI/super-gradients/blob/master/LICENSE.YOLONAS.md) +* The license for the compiled assets for on-device deployment can be found [here](https://github.com/Deci-AI/super-gradients/blob/master/LICENSE.YOLONAS.md) + ## References * [YOLO-NAS by Deci Achieves SOTA Performance on Object Detection Using Neural Architecture Search](https://deci.ai/blog/yolo-nas-object-detection-foundation-model/) * [Source Model Implementation](https://github.com/Deci-AI/super-gradients) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/yolonas_quantized/export.py b/qai_hub_models/models/yolonas_quantized/export.py index fb81e239..ba9b2ab8 100644 --- a/qai_hub_models/models/yolonas_quantized/export.py +++ b/qai_hub_models/models/yolonas_quantized/export.py @@ -10,17 +10,17 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.yolonas_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.printing import ( print_inference_metrics, @@ -45,20 +45,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -80,10 +78,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "yolonas_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,7 +107,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( target_runtime, output_path, input_spec ) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -135,7 +133,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +148,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +169,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -202,7 +200,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/yolonas_quantized/perf.yaml b/qai_hub_models/models/yolonas_quantized/perf.yaml index 6c49be97..5ef63664 100644 --- a/qai_hub_models/models/yolonas_quantized/perf.yaml +++ b/qai_hub_models/models/yolonas_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,41 +20,38 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS6490 (Proxy) - RB3 Gen 2 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Sa8775p Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs6490 Proxy + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: Yolo-NAS-Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 4789.0 - throughput: 208.81186051367717 + inference_time: 4715.0 + throughput: 212.08907741251326 estimated_peak_memory_range: - min: 81920 - max: 14038736 + min: 32768 + max: 2619144 primary_compute_unit: NPU precision: int8 layer_info: @@ -61,7 +59,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 204 - job_id: j1p8omxkg + job_id: jglvm7nm5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -70,13 +68,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:20:18Z' + timestamp: '2024-10-14T23:08:53Z' - torchscript_onnx_tflite: - inference_time: 3765.0 - throughput: 265.6042496679947 + inference_time: 3058.0 + throughput: 327.01111837802483 estimated_peak_memory_range: - min: 86016 - max: 80907840 + min: 12288 + max: 83450752 primary_compute_unit: NPU precision: int8 layer_info: @@ -84,7 +82,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 204 - job_id: jogkzq4wg + job_id: j56y4v6yp job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -93,13 +91,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:20:19Z' + timestamp: '2024-10-14T23:08:54Z' - torchscript_onnx_tflite: - inference_time: 4701.0 - throughput: 212.72069772388852 + inference_time: 13608.0 + throughput: 73.4861845972957 estimated_peak_memory_range: - min: 81920 - max: 4319104 + min: 69632 + max: 69619264 primary_compute_unit: NPU precision: int8 layer_info: @@ -107,22 +105,30 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 204 - job_id: jn5q8ryn5 + job_id: j5we6xom5 job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:20:20Z' + chipset: QCS6490 Proxy + timestamp: '2024-10-14T23:09:01Z' + - reference_device_info: + name: RB5 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8250 Proxy + timestamp: '2024-10-14T23:09:02Z' - torchscript_onnx_tflite: - inference_time: 5255.0 - throughput: 190.29495718363464 + inference_time: 4692.0 + throughput: 213.12872975277068 estimated_peak_memory_range: - min: 135168 - max: 83142816 + min: 86016 + max: 1489008 primary_compute_unit: NPU precision: int8 layer_info: @@ -130,22 +136,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 204 - job_id: j1gln2xjp + job_id: jp3j08kng job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:20:21Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:08:55Z' - torchscript_onnx_tflite: - inference_time: 4728.0 - throughput: 211.50592216582064 + inference_time: 4704.0 + throughput: 212.58503401360545 estimated_peak_memory_range: - min: 61440 - max: 12883904 + min: 98304 + max: 4233816 primary_compute_unit: NPU precision: int8 layer_info: @@ -153,22 +159,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 204 - job_id: jw566z765 + job_id: jpedm29v5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:20:22Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:08:59Z' - torchscript_onnx_tflite: - inference_time: 4728.0 - throughput: 211.50592216582064 + inference_time: 4696.0 + throughput: 212.94718909710392 estimated_peak_memory_range: - min: 110592 - max: 6947216 + min: 98304 + max: 7726400 primary_compute_unit: NPU precision: int8 layer_info: @@ -176,7 +182,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 204 - job_id: j1p3k1935 + job_id: jgjvn1xeg job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -184,14 +190,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:20:23Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:08:58Z' - torchscript_onnx_tflite: - inference_time: 4772.0 - throughput: 209.55574182732607 + inference_time: 4690.0 + throughput: 213.21961620469082 estimated_peak_memory_range: - min: 32768 - max: 191541224 + min: 65536 + max: 18267320 primary_compute_unit: NPU precision: int8 layer_info: @@ -199,22 +205,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 204 - job_id: jwgoynrq5 + job_id: jpv6k43r5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:20:24Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:08:57Z' - torchscript_onnx_tflite: - inference_time: 13953.0 - throughput: 71.66917508779474 + inference_time: 5202.0 + throughput: 192.23375624759709 estimated_peak_memory_range: - min: 69632 - max: 69253424 + min: 110592 + max: 85722432 primary_compute_unit: NPU precision: int8 layer_info: @@ -222,13 +228,36 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 204 - job_id: j1pv3rlk5 + job_id: jgo26mykp job_status: Passed reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:08:56Z' + - torchscript_onnx_tflite: + inference_time: 3157.0 + throughput: 316.75641431738995 + estimated_peak_memory_range: + min: 61440 + max: 57573888 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 204 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 204 + job_id: jp14z307p + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T11:20:24Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:09:03Z' diff --git a/qai_hub_models/models/yolov11_det/README.md b/qai_hub_models/models/yolov11_det/README.md index e0bf22f0..99e7da65 100644 --- a/qai_hub_models/models/yolov11_det/README.md +++ b/qai_hub_models/models/yolov11_det/README.md @@ -1,14 +1,14 @@ [![Qualcomm® AI Hub Models](https://qaihub-public-assets.s3.us-west-2.amazonaws.com/qai-hub-models/quic-logo.jpg)](../../README.md) -# [YOLOv11-Detection: Real-time object detection optimized for mobile and edge by Ultralytics](#) +# [YOLOv11-Detection: Real-time object detection optimized for mobile and edge by Ultralytics](https://aihub.qualcomm.com/models/yolov11_det) Ultralytics YOLOv11 is a machine learning model that predicts bounding boxes and classes of objects in an image. This is based on the implementation of YOLOv11-Detection found -[here](https://github.com/ultralytics/ultralytics/tree/main/ultralytics/models/yolo/detect). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance -accross various devices, can be found [here](#). +accross various devices, can be found [here](https://aihub.qualcomm.com/models/yolov11_det). [Sign up](https://myaccount.qualcomm.com/signup) to start using Qualcomm AI Hub and run these models on a hosted Qualcomm® device. @@ -44,15 +44,19 @@ python -m qai_hub_models.models.yolov11_det.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of YOLOv11-Detection can be found +* The license for the original implementation of YOLOv11-Detection can be found [here](https://github.com/ultralytics/ultralytics/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://github.com/ultralytics/ultralytics/blob/main/LICENSE) +* The license for the compiled assets for on-device deployment can be found [here](https://github.com/ultralytics/ultralytics/blob/main/LICENSE) + ## References * [Ultralytics YOLOv11 Docs: Object Detection](https://docs.ultralytics.com/tasks/detect/) * [Source Model Implementation](https://github.com/ultralytics/ultralytics/tree/main/ultralytics/models/yolo/detect) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/yolov11_det/export.py b/qai_hub_models/models/yolov11_det/export.py index c878152c..7a559ed8 100644 --- a/qai_hub_models/models/yolov11_det/export.py +++ b/qai_hub_models/models/yolov11_det/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.yolov11_det import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "yolov11_det" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -122,7 +120,7 @@ def export_model( model.to("cpu"), make_torch_inputs(input_spec), check_trace=False ) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -136,7 +134,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -151,7 +149,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -172,13 +170,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -203,7 +201,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/yolov11_det/perf.yaml b/qai_hub_models/models/yolov11_det/perf.yaml new file mode 100644 index 00000000..c37c9416 --- /dev/null +++ b/qai_hub_models/models/yolov11_det/perf.yaml @@ -0,0 +1,432 @@ +aggregated: + supported_oses: + - Android + supported_devices: + - Snapdragon 8 Elite QRD + - Samsung Galaxy S24 + - Samsung Galaxy S24 Ultra + - Samsung Galaxy S24+ + - Snapdragon 8 Gen 3 QRD + - Samsung Galaxy S23 + - Samsung Galaxy S23 Ultra + - Samsung Galaxy S23+ + - Samsung Galaxy S22 5G + - Samsung Galaxy S22 Ultra 5G + - Samsung Galaxy S22+ 5G + - Samsung Galaxy Tab S8 + - Xiaomi 12 + - Xiaomi 12 Pro + - Samsung Galaxy S21 + - Samsung Galaxy S21 Ultra + - Samsung Galaxy S21+ + - Snapdragon X Elite CRD + - Snapdragon X Plus 8-Core CRD + - QCS8450 (Proxy) + - XR2 Gen 2 (Proxy) + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) + supported_chipsets: + - Snapdragon® 8 Elite + - Snapdragon® 8 Gen 3 + - Snapdragon® 8 Gen 2 + - Snapdragon® 8 Gen 1 + - Snapdragon® 888 + - Snapdragon® X Elite + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy +models: +- name: YOLOv11-Detection + performance_metrics: + - torchscript_onnx_tflite: + inference_time: 5441.0 + throughput: 183.7897445322551 + estimated_peak_memory_range: + min: 32768 + max: 95551352 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 382 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 382 + job_id: jp2kyj1qp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 5576.0 + throughput: 179.3400286944046 + estimated_peak_memory_range: + min: 6307840 + max: 17406200 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 374 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 374 + job_id: jgo26mjkp + job_status: Passed + torchscript_onnx: + inference_time: 6016.0 + throughput: 166.22340425531914 + estimated_peak_memory_range: + min: 651264 + max: 5849392 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 376 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 376 + job_id: jp4lr8z15 + job_status: Passed + reference_device_info: + name: Samsung Galaxy S23 + os: '13' + form_factor: Phone + os_name: Android + manufacturer: Samsung + chipset: Snapdragon® 8 Gen 2 + timestamp: '2024-10-14T23:07:45Z' + - torchscript_onnx_tflite: + inference_time: 3961.0 + throughput: 252.46149962130775 + estimated_peak_memory_range: + min: 12288 + max: 101388816 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 382 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 382 + job_id: jpy13nllp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 3960.0 + throughput: 252.5252525252525 + estimated_peak_memory_range: + min: 4931584 + max: 52566064 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 374 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 374 + job_id: jpv6k4jr5 + job_status: Passed + torchscript_onnx: + inference_time: 4285.0 + throughput: 233.37222870478413 + estimated_peak_memory_range: + min: 5361664 + max: 126557536 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 376 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 376 + job_id: jpxkomwl5 + job_status: Passed + reference_device_info: + name: Samsung Galaxy S24 + os: '14' + form_factor: Phone + os_name: Android + manufacturer: Samsung + chipset: Snapdragon® 8 Gen 3 + timestamp: '2024-10-14T23:07:46Z' + - torchscript_onnx_tflite: + inference_time: 5440.0 + throughput: 183.8235294117647 + estimated_peak_memory_range: + min: 65536 + max: 3419960 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 382 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 382 + job_id: jp0z0kwn5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 5374.0 + throughput: 186.08113137327874 + estimated_peak_memory_range: + min: 4960256 + max: 6210912 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 374 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 374 + job_id: jpedm2jv5 + job_status: Passed + reference_device_info: + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:07:38Z' + - torchscript_onnx_tflite: + inference_time: 5435.0 + throughput: 183.99264029438822 + estimated_peak_memory_range: + min: 253952 + max: 10299528 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 382 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 382 + job_id: jglvm7jm5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 5407.0 + throughput: 184.945441094877 + estimated_peak_memory_range: + min: 4964352 + max: 6375432 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 374 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 374 + job_id: jg9ln868g + job_status: Passed + reference_device_info: + name: SA8255 (Proxy) + os: '13' + form_factor: Auto + os_name: Android + manufacturer: Qualcomm + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:07:41Z' + - torchscript_onnx_tflite: + inference_time: 5518.0 + throughput: 181.2250815512867 + estimated_peak_memory_range: + min: 217088 + max: 5084256 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 382 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 382 + job_id: j5q6qwnop + job_status: Passed + torchscript_onnx_qnn: + inference_time: 5402.0 + throughput: 185.11662347278786 + estimated_peak_memory_range: + min: 4956160 + max: 6286088 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 374 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 374 + job_id: j5we6xjm5 + job_status: Passed + reference_device_info: + name: SA8775 (Proxy) + os: '13' + form_factor: Auto + os_name: Android + manufacturer: Qualcomm + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:07:40Z' + - torchscript_onnx_tflite: + inference_time: 5531.0 + throughput: 180.7991321641656 + estimated_peak_memory_range: + min: 229376 + max: 2242640 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 382 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 382 + job_id: jgkexd1ng + job_status: Passed + torchscript_onnx_qnn: + inference_time: 5350.0 + throughput: 186.9158878504673 + estimated_peak_memory_range: + min: 4972544 + max: 6175480 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 374 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 374 + job_id: jgz3dw1x5 + job_status: Passed + reference_device_info: + name: SA8650 (Proxy) + os: '13' + form_factor: Auto + os_name: Android + manufacturer: Qualcomm + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:07:39Z' + - torchscript_onnx_tflite: + inference_time: 9143.0 + throughput: 109.37329104232747 + estimated_peak_memory_range: + min: 262144 + max: 95935840 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 382 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 382 + job_id: jp8qy8nop + job_status: Passed + torchscript_onnx_qnn: + inference_time: 8619.0 + throughput: 116.0227404571296 + estimated_peak_memory_range: + min: 4931584 + max: 39126160 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 374 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 374 + job_id: jgdx10jzp + job_status: Passed + reference_device_info: + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:07:43Z' + - torchscript_onnx_tflite: + inference_time: 3848.0 + throughput: 259.87525987525987 + estimated_peak_memory_range: + min: 8192 + max: 66587568 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 382 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 382 + job_id: jp3j08yng + job_status: Passed + torchscript_onnx_qnn: + inference_time: 4086.0 + throughput: 244.73813020068528 + estimated_peak_memory_range: + min: 4927488 + max: 51834944 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 374 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 374 + job_id: j57yr6q95 + job_status: Passed + torchscript_onnx: + inference_time: 3299.0 + throughput: 303.12215822976657 + estimated_peak_memory_range: + min: 0 + max: 77494720 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 376 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 376 + job_id: jprv39z7g + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone + os_name: Android + manufacturer: Qualcomm + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:07:49Z' + - torchscript_onnx_qnn: + inference_time: 5700.0 + throughput: 175.43859649122808 + estimated_peak_memory_range: + min: 4923392 + max: 4923392 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 374 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 374 + job_id: jgjvn1jeg + job_status: Passed + torchscript_onnx: + inference_time: 6775.0 + throughput: 147.60147601476015 + estimated_peak_memory_range: + min: 4931584 + max: 4931584 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 376 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 376 + job_id: j5mnx4j9p + job_status: Passed + reference_device_info: + name: Snapdragon X Elite CRD + os: '11' + form_factor: Compute + os_name: Windows + manufacturer: Qualcomm + chipset: Snapdragon® X Elite + timestamp: '2024-10-14T23:07:47Z' diff --git a/qai_hub_models/models/yolov6/README.md b/qai_hub_models/models/yolov6/README.md index 82404fcf..a167d0bc 100644 --- a/qai_hub_models/models/yolov6/README.md +++ b/qai_hub_models/models/yolov6/README.md @@ -6,7 +6,7 @@ YoloV6 is a machine learning model that predicts bounding boxes and classes of objects in an image. This is based on the implementation of Yolo-v6 found -[here](https://github.com/meituan/YOLOv6/). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/yolov6). @@ -39,15 +39,19 @@ python -m qai_hub_models.models.yolov6.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Yolo-v6 can be found +* The license for the original implementation of Yolo-v6 can be found [here](https://github.com/meituan/YOLOv6/blob/47625514e7480706a46ff3c0cd0252907ac12f22/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://github.com/meituan/YOLOv6/blob/47625514e7480706a46ff3c0cd0252907ac12f22/LICENSE) +* The license for the compiled assets for on-device deployment can be found [here](https://github.com/meituan/YOLOv6/blob/47625514e7480706a46ff3c0cd0252907ac12f22/LICENSE) + ## References * [YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications](https://arxiv.org/abs/2209.02976) * [Source Model Implementation](https://github.com/meituan/YOLOv6/) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/yolov6/export.py b/qai_hub_models/models/yolov6/export.py index e7cd9712..ebb4d8a5 100644 --- a/qai_hub_models/models/yolov6/export.py +++ b/qai_hub_models/models/yolov6/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.yolov6 import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "yolov6" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -201,7 +199,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/yolov6/perf.yaml b/qai_hub_models/models/yolov6/perf.yaml index 8e835feb..5a00fece 100644 --- a/qai_hub_models/models/yolov6/perf.yaml +++ b/qai_hub_models/models/yolov6/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: Yolo-v6 performance_metrics: - torchscript_onnx_tflite: - inference_time: 6324.0 - throughput: 158.12776723592663 + inference_time: 6222.0 + throughput: 160.72002571520412 estimated_peak_memory_range: - min: 245760 - max: 4293104 + min: 32768 + max: 2385000 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 182 - job_id: jn5q8rvn5 + job_id: jpxkom8l5 job_status: Passed torchscript_onnx_qnn: - inference_time: 5225.0 - throughput: 191.38755980861245 + inference_time: 5257.0 + throughput: 190.22256039566292 estimated_peak_memory_range: - min: 4239360 - max: 14478112 + min: 6316032 + max: 19646608 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 228 - job_id: jlpe9wzog + job_id: j5q6qwxop job_status: Passed torchscript_onnx: - inference_time: 6501.0 - throughput: 153.82248884786955 + inference_time: 6076.0 + throughput: 164.58196181698486 estimated_peak_memory_range: - min: 45056 - max: 9677944 + min: 12288 + max: 10443168 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 228 - job_id: j0pxv643g + job_id: jg9ln8r8g job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:19:12Z' + timestamp: '2024-10-14T23:06:58Z' - torchscript_onnx_tflite: - inference_time: 4643.0 - throughput: 215.37798836958862 + inference_time: 5104.0 + throughput: 195.92476489028212 estimated_peak_memory_range: - min: 221184 - max: 86093616 + min: 12288 + max: 95823856 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 182 - job_id: j1gln2ljp + job_id: j5mnx419p job_status: Passed torchscript_onnx_qnn: - inference_time: 4085.0 - throughput: 244.79804161566707 + inference_time: 4097.0 + throughput: 244.081034903588 estimated_peak_memory_range: min: 4931584 - max: 48104320 + max: 53770832 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 228 - job_id: jygzejmog + job_id: jglvm7dm5 job_status: Passed torchscript_onnx: - inference_time: 4857.0 - throughput: 205.88840848260244 + inference_time: 4477.0 + throughput: 223.36385972749608 estimated_peak_memory_range: min: 0 - max: 101204496 + max: 110342112 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 228 - job_id: jo5mr6mdg + job_id: jp14z397p job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:19:12Z' + timestamp: '2024-10-14T23:06:59Z' - torchscript_onnx_tflite: - inference_time: 6190.0 - throughput: 161.55088852988692 + inference_time: 6174.0 + throughput: 161.96954972465176 estimated_peak_memory_range: - min: 245760 - max: 4091080 + min: 217088 + max: 3534712 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 182 - job_id: jw566zw65 + job_id: jgn6vxdq5 job_status: Passed torchscript_onnx_qnn: - inference_time: 4911.0 - throughput: 203.62451639177357 + inference_time: 5340.0 + throughput: 187.26591760299627 estimated_peak_memory_range: min: 5001216 - max: 6447752 + max: 6300376 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 228 - job_id: jmg9vymw5 + job_id: jp3j08dng job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:19:06Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:06:51Z' - torchscript_onnx_tflite: - inference_time: 7039.0 - throughput: 142.06563432305725 + inference_time: 6358.0 + throughput: 157.28216420257942 estimated_peak_memory_range: - min: 12288 - max: 71987488 + min: 229376 + max: 4300968 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 182 - job_id: j1p3k1635 + job_id: jp0z0k8n5 job_status: Passed torchscript_onnx_qnn: - inference_time: 6957.0 - throughput: 143.74011786689664 + inference_time: 5372.0 + throughput: 186.15040953090096 estimated_peak_memory_range: - min: 4931584 - max: 42068976 + min: 5066752 + max: 6361888 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 228 - job_id: jqp4qd18g + job_id: jgjvn19eg job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:19:10Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:06:54Z' - torchscript_onnx_tflite: - inference_time: 6362.0 - throughput: 157.18327569946558 + inference_time: 6324.0 + throughput: 158.12776723592663 estimated_peak_memory_range: - min: 274432 - max: 4106232 + min: 233472 + max: 4026880 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 182 - job_id: jwgoyn8q5 + job_id: jpy13nklp job_status: Passed torchscript_onnx_qnn: - inference_time: 4900.0 - throughput: 204.08163265306123 + inference_time: 5321.0 + throughput: 187.93459875963165 estimated_peak_memory_range: - min: 4997120 - max: 6228768 + min: 5001216 + max: 6267464 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 228 - job_id: jnp10wj85 + job_id: jpv6k48r5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:19:07Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:06:53Z' - torchscript_onnx_tflite: - inference_time: 6281.0 - throughput: 159.2103168285305 + inference_time: 6286.0 + throughput: 159.0836780146357 estimated_peak_memory_range: - min: 253952 - max: 4016480 + min: 0 + max: 2409872 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 182 - job_id: j1pv3rdk5 + job_id: jp2kyjqqp job_status: Passed torchscript_onnx_qnn: - inference_time: 4883.0 - throughput: 204.7921359819783 + inference_time: 5354.0 + throughput: 186.77624206200971 estimated_peak_memory_range: - min: 4964352 - max: 6584096 + min: 5009408 + max: 6207208 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 228 - job_id: jvgdwq3r5 + job_id: jgo26mxkp job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:19:08Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:06:52Z' - torchscript_onnx_tflite: - inference_time: 6361.0 - throughput: 157.2079861656972 + inference_time: 7785.0 + throughput: 128.45215157353886 estimated_peak_memory_range: - min: 233472 - max: 218946696 + min: 217088 + max: 78221968 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 182 - job_id: j7gjx27vp + job_id: jprv39m7g job_status: Passed torchscript_onnx_qnn: - inference_time: 4924.0 - throughput: 203.08692120227457 + inference_time: 6945.0 + throughput: 143.98848092152627 estimated_peak_memory_range: - min: 5005312 - max: 6275136 + min: 4931584 + max: 49288960 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 228 - job_id: jz57zl4vp + job_id: jgz3dw6x5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:06:56Z' + - torchscript_onnx_tflite: + inference_time: 4370.0 + throughput: 228.83295194508008 + estimated_peak_memory_range: + min: 212992 + max: 62186016 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 182 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 182 + job_id: jgkexdwng + job_status: Passed + torchscript_onnx_qnn: + inference_time: 3412.0 + throughput: 293.08323563892145 + estimated_peak_memory_range: + min: 4927488 + max: 50714384 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 228 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 228 + job_id: j5we6xkm5 + job_status: Passed + torchscript_onnx: + inference_time: 4075.0 + throughput: 245.39877300613497 + estimated_peak_memory_range: + min: 5337088 + max: 74489872 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 228 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 228 + job_id: jp4lr8715 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:19:09Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:07:02Z' - torchscript_onnx_qnn: - inference_time: 5218.0 - throughput: 191.64430816404752 + inference_time: 5728.0 + throughput: 174.58100558659217 estimated_peak_memory_range: min: 4923392 max: 4923392 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 228 - job_id: jz5wo373p + job_id: j56y4vxyp job_status: Passed torchscript_onnx: - inference_time: 6544.0 - throughput: 152.8117359413203 + inference_time: 6407.0 + throughput: 156.07928827844546 estimated_peak_memory_range: - min: 6971392 - max: 6971392 + min: 8302592 + max: 8302592 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 228 - job_id: jegn2mnkg + job_id: jgdx10kzp job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:19:13Z' + timestamp: '2024-10-14T23:07:00Z' diff --git a/qai_hub_models/models/yolov7/README.md b/qai_hub_models/models/yolov7/README.md index fe6030f0..861ce55b 100644 --- a/qai_hub_models/models/yolov7/README.md +++ b/qai_hub_models/models/yolov7/README.md @@ -6,7 +6,7 @@ YoloV7 is a machine learning model that predicts bounding boxes and classes of objects in an image. This is based on the implementation of Yolo-v7 found -[here](https://github.com/WongKinYiu/yolov7/). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/yolov7). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.yolov7.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Yolo-v7 can be found +* The license for the original implementation of Yolo-v7 can be found [here](https://github.com/WongKinYiu/yolov7/blob/main/LICENSE.md). -- The license for the compiled assets for on-device deployment can be found [here](https://github.com/WongKinYiu/yolov7/blob/main/LICENSE.md) +* The license for the compiled assets for on-device deployment can be found [here](https://github.com/WongKinYiu/yolov7/blob/main/LICENSE.md) + ## References * [YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors](https://arxiv.org/abs/2207.02696) * [Source Model Implementation](https://github.com/WongKinYiu/yolov7/) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/yolov7/export.py b/qai_hub_models/models/yolov7/export.py index 4a10d175..5b4b87a8 100644 --- a/qai_hub_models/models/yolov7/export.py +++ b/qai_hub_models/models/yolov7/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.yolov7 import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "yolov7" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( # Trace the model source_model = torch.jit.trace(model.to("cpu"), make_torch_inputs(input_spec)) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -134,7 +132,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -149,7 +147,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -170,13 +168,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -201,7 +199,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/yolov7/perf.yaml b/qai_hub_models/models/yolov7/perf.yaml index e24a7b92..46404a95 100644 --- a/qai_hub_models/models/yolov7/perf.yaml +++ b/qai_hub_models/models/yolov7/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: Yolo-v7 performance_metrics: - torchscript_onnx_tflite: - inference_time: 17219.0 - throughput: 58.07538184563563 + inference_time: 17188.0 + throughput: 58.180125669071444 estimated_peak_memory_range: - min: 45056 - max: 2664960 + min: 663552 + max: 3387440 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 215 - job_id: jlpe9wl1g + job_id: jp4lr8285 job_status: Passed torchscript_onnx_qnn: - inference_time: 10503.0 - throughput: 95.21089212605922 + inference_time: 10527.0 + throughput: 94.99382540134891 estimated_peak_memory_range: - min: 5009408 - max: 19070104 + min: 4984832 + max: 22130440 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 221 - job_id: jmg9vydw5 + job_id: j5q6qwdnp job_status: Passed torchscript_onnx: - inference_time: 13692.0 - throughput: 73.03534910896875 + inference_time: 12235.0 + throughput: 81.73273395995096 estimated_peak_memory_range: - min: 57344 - max: 11066416 + min: 53248 + max: 13030160 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 222 - job_id: joprk2w05 + job_id: jg9ln87wg job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:18:28Z' + timestamp: '2024-10-14T23:06:11Z' - torchscript_onnx_tflite: - inference_time: 11637.0 - throughput: 85.93280054996993 + inference_time: 11658.0 + throughput: 85.7780065191285 estimated_peak_memory_range: - min: 307200 - max: 91891552 + min: 638976 + max: 105670416 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 215 - job_id: jygzej4kg + job_id: jpxkomz35 job_status: Passed torchscript_onnx_qnn: - inference_time: 8517.0 - throughput: 117.41223435481977 + inference_time: 7221.0 + throughput: 138.48497438027974 estimated_peak_memory_range: - min: 4931584 - max: 71172528 + min: 4956160 + max: 79005712 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 221 - job_id: jnp10w685 + job_id: jglvm7qj5 job_status: Passed torchscript_onnx: - inference_time: 9230.0 - throughput: 108.34236186348862 + inference_time: 8179.0 + throughput: 122.26433549333659 estimated_peak_memory_range: - min: 5140480 - max: 112479088 + min: 479232 + max: 124426000 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 222 - job_id: jep289erp + job_id: jp14z3k8p job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:18:30Z' + timestamp: '2024-10-14T23:06:12Z' - torchscript_onnx_tflite: - inference_time: 17137.0 - throughput: 58.35327070082278 + inference_time: 17172.0 + throughput: 58.23433496389471 estimated_peak_memory_range: - min: 655360 - max: 2642992 + min: 618496 + max: 9138064 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 215 - job_id: jz5wo346p + job_id: jgn6vxwk5 job_status: Passed torchscript_onnx_qnn: - inference_time: 10323.0 - throughput: 96.8710646130001 + inference_time: 10322.0 + throughput: 96.8804495252858 estimated_peak_memory_range: - min: 4964352 - max: 6128368 + min: 5005312 + max: 6319184 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 221 - job_id: jz57zl9vp + job_id: jp3j08r3g job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:18:23Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:06:03Z' - torchscript_onnx_tflite: - inference_time: 19520.0 - throughput: 51.22950819672131 + inference_time: 17145.0 + throughput: 58.326042578011084 estimated_peak_memory_range: - min: 647168 - max: 97617328 + min: 77824 + max: 2396616 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 215 - job_id: jmg9vydl5 + job_id: jp0z0kx95 job_status: Passed torchscript_onnx_qnn: - inference_time: 12670.0 - throughput: 78.92659826361484 + inference_time: 10334.0 + throughput: 96.76795045480937 estimated_peak_memory_range: - min: 4952064 - max: 56412288 + min: 4993024 + max: 6236288 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 221 - job_id: jegn2mkkg + job_id: jgjvn16vg job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:18:27Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:06:07Z' - torchscript_onnx_tflite: - inference_time: 17221.0 - throughput: 58.06863712908658 + inference_time: 17142.0 + throughput: 58.33625014584062 estimated_peak_memory_range: - min: 626688 - max: 3193392 + min: 12288 + max: 1619720 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 215 - job_id: jnp10w625 + job_id: jpy13ny8p job_status: Passed torchscript_onnx_qnn: - inference_time: 10449.0 - throughput: 95.70293808019906 + inference_time: 10477.0 + throughput: 95.44716999140975 estimated_peak_memory_range: - min: 4993024 - max: 6624200 + min: 5009408 + max: 6255424 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 221 - job_id: jqp4qd38g + job_id: jpv6k4yk5 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:18:24Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:06:05Z' - torchscript_onnx_tflite: - inference_time: 17170.0 - throughput: 58.241118229470004 + inference_time: 17156.0 + throughput: 58.28864537188156 estimated_peak_memory_range: - min: 16384 - max: 249429656 + min: 49152 + max: 211980424 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 215 - job_id: jvgdwq2e5 + job_id: jp2kyjzrp job_status: Passed torchscript_onnx_qnn: - inference_time: 10449.0 - throughput: 95.70293808019906 + inference_time: 10469.0 + throughput: 95.52010698251982 estimated_peak_memory_range: - min: 4997120 - max: 6179960 + min: 4968448 + max: 6197136 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 221 - job_id: j0pxv6x3g + job_id: jgo26m9qp job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:18:25Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:06:04Z' - torchscript_onnx_tflite: - inference_time: 17204.0 - throughput: 58.126017205301096 + inference_time: 19533.0 + throughput: 51.19541289100496 estimated_peak_memory_range: - min: 827392 - max: 2628400 + min: 634880 + max: 107445984 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 215 - job_id: jz5wo343p + job_id: jprv3970g job_status: Passed torchscript_onnx_qnn: - inference_time: 10542.0 - throughput: 94.8586605957124 + inference_time: 12605.0 + throughput: 79.33359777865927 estimated_peak_memory_range: - min: 5001216 - max: 6655832 + min: 4931584 + max: 63855088 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 221 - job_id: jo5mr68dg + job_id: jgz3dwqo5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:06:09Z' + - torchscript_onnx_tflite: + inference_time: 12241.0 + throughput: 81.6926721673066 + estimated_peak_memory_range: + min: 614400 + max: 72754016 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 215 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 215 + job_id: jgkexdkwg + job_status: Passed + torchscript_onnx_qnn: + inference_time: 5825.0 + throughput: 171.67381974248926 + estimated_peak_memory_range: + min: 4927488 + max: 73396624 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 221 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 221 + job_id: j5we6x035 + job_status: Passed + torchscript_onnx: + inference_time: 8115.0 + throughput: 123.22858903265558 + estimated_peak_memory_range: + min: 6352896 + max: 90064304 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 222 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 222 + job_id: jg9ln878g + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:18:26Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:06:15Z' - torchscript_onnx_qnn: - inference_time: 10922.0 - throughput: 91.55832265152902 + inference_time: 10949.0 + throughput: 91.33254178463787 estimated_peak_memory_range: min: 4923392 max: 4923392 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 221 - job_id: jvgdwq2r5 + job_id: j56y4v06p job_status: Passed torchscript_onnx: - inference_time: 14009.0 - throughput: 71.38268256121064 + inference_time: 14157.0 + throughput: 70.63643427279791 estimated_peak_memory_range: - min: 9863168 - max: 9863168 + min: 9900032 + max: 9900032 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 222 - job_id: jqpyejm8g + job_id: jgdx10yrp job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:18:31Z' + timestamp: '2024-10-14T23:06:13Z' diff --git a/qai_hub_models/models/yolov7_quantized/README.md b/qai_hub_models/models/yolov7_quantized/README.md index bb4f1089..a271f87c 100644 --- a/qai_hub_models/models/yolov7_quantized/README.md +++ b/qai_hub_models/models/yolov7_quantized/README.md @@ -6,7 +6,7 @@ YoloV7 is a machine learning model that predicts bounding boxes and classes of objects in an image. This model is post-training quantized to int8 using samples from the COCO dataset. This is based on the implementation of Yolo-v7-Quantized found -[here](https://github.com/WongKinYiu/yolov7/). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/yolov7_quantized). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.yolov7_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of Yolo-v7-Quantized can be found +* The license for the original implementation of Yolo-v7-Quantized can be found [here](https://github.com/WongKinYiu/yolov7/blob/main/LICENSE.md). -- The license for the compiled assets for on-device deployment can be found [here](https://github.com/WongKinYiu/yolov7/blob/main/LICENSE.md) +* The license for the compiled assets for on-device deployment can be found [here](https://github.com/WongKinYiu/yolov7/blob/main/LICENSE.md) + ## References * [YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors](https://arxiv.org/abs/2207.02696) * [Source Model Implementation](https://github.com/WongKinYiu/yolov7/) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/yolov7_quantized/export.py b/qai_hub_models/models/yolov7_quantized/export.py index 9784d91a..25748e89 100644 --- a/qai_hub_models/models/yolov7_quantized/export.py +++ b/qai_hub_models/models/yolov7_quantized/export.py @@ -10,17 +10,17 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.yolov7_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.printing import ( print_inference_metrics, @@ -45,20 +45,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -80,10 +78,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "yolov7_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,7 +107,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( target_runtime, output_path, input_spec ) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -135,7 +133,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +148,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +169,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -202,7 +200,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/yolov7_quantized/perf.yaml b/qai_hub_models/models/yolov7_quantized/perf.yaml index 73911737..eabd9a87 100644 --- a/qai_hub_models/models/yolov7_quantized/perf.yaml +++ b/qai_hub_models/models/yolov7_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,44 +20,41 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS6490 (Proxy) - RB3 Gen 2 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Sa8775p Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Qcs6490 Proxy + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: Yolo-v7-Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 4430.0 - throughput: 225.73363431151242 + inference_time: 4395.0 + throughput: 227.53128555176337 estimated_peak_memory_range: - min: 200704 - max: 150294072 + min: 368640 + max: 2121736 primary_compute_unit: NPU precision: int8 layer_info: @@ -64,14 +62,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 229 - job_id: jygzej8kg + job_id: jgo26mdqp job_status: Passed torchscript_onnx_qnn: - inference_time: 4818.0 - throughput: 207.55500207555002 + inference_time: 4817.0 + throughput: 207.59809009757112 estimated_peak_memory_range: - min: 16384 - max: 10251656 + min: 20480 + max: 10430648 primary_compute_unit: NPU precision: int8 layer_info: @@ -79,14 +77,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 221 - job_id: jegn2m7rg + job_id: jgn6vxqk5 job_status: Passed torchscript_onnx: - inference_time: 8991.0 - throughput: 111.22233344455567 + inference_time: 7451.0 + throughput: 134.21017313112333 estimated_peak_memory_range: - min: 12288 - max: 9416680 + min: 49152 + max: 12716312 primary_compute_unit: NPU precision: int8 layer_info: @@ -94,7 +92,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 253 - job_id: jw566zd05 + job_id: jpv6k4nk5 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -103,13 +101,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:17:52Z' + timestamp: '2024-10-14T23:05:23Z' - torchscript_onnx_tflite: - inference_time: 2831.0 - throughput: 353.2320734722713 + inference_time: 2825.0 + throughput: 353.98230088495575 estimated_peak_memory_range: min: 12288 - max: 68718304 + max: 76290288 primary_compute_unit: NPU precision: int8 layer_info: @@ -117,14 +115,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 229 - job_id: jz5wo316p + job_id: jpv6k4mk5 job_status: Passed torchscript_onnx_qnn: - inference_time: 3205.0 - throughput: 312.01248049922 + inference_time: 3164.0 + throughput: 316.05562579013906 estimated_peak_memory_range: min: 1245184 - max: 49689920 + max: 59792256 primary_compute_unit: NPU precision: int8 layer_info: @@ -132,14 +130,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 221 - job_id: joprk2n95 + job_id: jprv39d0g job_status: Passed torchscript_onnx: - inference_time: 6372.0 - throughput: 156.9365976145637 + inference_time: 5362.0 + throughput: 186.4975755315181 estimated_peak_memory_range: - min: 90112 - max: 111436464 + min: 311296 + max: 128002480 primary_compute_unit: NPU precision: int8 layer_info: @@ -147,7 +145,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 253 - job_id: j1p3k1wl5 + job_id: jgjvn18vg job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -156,13 +154,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:17:53Z' + timestamp: '2024-10-14T23:05:24Z' - torchscript_onnx_tflite: - inference_time: 4381.0 - throughput: 228.2583884957772 + inference_time: 9943.0 + throughput: 100.57326762546515 estimated_peak_memory_range: - min: 180224 - max: 150514552 + min: 159744 + max: 73956800 primary_compute_unit: NPU precision: int8 layer_info: @@ -170,14 +168,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 229 - job_id: jmg9vyxl5 + job_id: jp4lr8485 job_status: Passed torchscript_onnx_qnn: - inference_time: 3511.0 - throughput: 284.8191398461977 + inference_time: 13531.0 + throughput: 73.90436774813392 estimated_peak_memory_range: - min: 1269760 - max: 2524184 + min: 1732608 + max: 9872768 primary_compute_unit: NPU precision: int8 layer_info: @@ -185,22 +183,45 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 221 - job_id: jqpyej77g + job_id: jp3j0873g job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS6490 Proxy + timestamp: '2024-10-14T23:05:21Z' + - torchscript_onnx_tflite: + inference_time: 96943.0 + throughput: 10.315339942027789 + estimated_peak_memory_range: + min: 3944448 + max: 35907808 + primary_compute_unit: GPU + precision: int8 + layer_info: + layers_on_npu: 33 + layers_on_gpu: 127 + layers_on_cpu: 69 + total_layers: 229 + job_id: jpxkomr35 + job_status: Passed + reference_device_info: + name: RB5 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:17:46Z' + chipset: QCS8250 Proxy + timestamp: '2024-10-14T23:05:10Z' - torchscript_onnx_tflite: - inference_time: 5021.0 - throughput: 199.16351324437363 + inference_time: 4372.0 + throughput: 228.72827081427263 estimated_peak_memory_range: - min: 180224 - max: 72680320 + min: 176128 + max: 4489448 primary_compute_unit: NPU precision: int8 layer_info: @@ -208,14 +229,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 229 - job_id: jnp10wv25 + job_id: jgjvn1yvg job_status: Passed torchscript_onnx_qnn: - inference_time: 4649.0 - throughput: 215.10002151000216 + inference_time: 3741.0 + throughput: 267.30820636193533 estimated_peak_memory_range: - min: 1273856 - max: 52987264 + min: 1265664 + max: 3092536 primary_compute_unit: NPU precision: int8 layer_info: @@ -223,22 +244,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 221 - job_id: jn5q8rm45 + job_id: jpy13n28p job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:17:50Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:05:15Z' - torchscript_onnx_tflite: - inference_time: 4392.0 - throughput: 227.68670309653916 + inference_time: 4375.0 + throughput: 228.57142857142858 estimated_peak_memory_range: - min: 180224 - max: 2400480 + min: 0 + max: 1559680 primary_compute_unit: NPU precision: int8 layer_info: @@ -246,14 +267,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 229 - job_id: jvgdwqze5 + job_id: jp14z318p job_status: Passed torchscript_onnx_qnn: - inference_time: 3526.0 - throughput: 283.60748723766307 + inference_time: 3786.0 + throughput: 264.1310089804543 estimated_peak_memory_range: - min: 1294336 - max: 2490600 + min: 1282048 + max: 2529512 primary_compute_unit: NPU precision: int8 layer_info: @@ -261,22 +282,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 221 - job_id: j2p0y2v6g + job_id: j5q6qw1np job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:17:47Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:05:18Z' - torchscript_onnx_tflite: - inference_time: 4372.0 - throughput: 228.72827081427263 + inference_time: 4360.0 + throughput: 229.3577981651376 estimated_peak_memory_range: - min: 192512 - max: 1697224 + min: 176128 + max: 1892640 primary_compute_unit: NPU precision: int8 layer_info: @@ -284,14 +305,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 229 - job_id: jz57zl7lp + job_id: jg9ln82wg job_status: Passed torchscript_onnx_qnn: - inference_time: 3550.0 - throughput: 281.6901408450704 + inference_time: 3741.0 + throughput: 267.30820636193533 estimated_peak_memory_range: - min: 1269760 - max: 2601032 + min: 1286144 + max: 2711568 primary_compute_unit: NPU precision: int8 layer_info: @@ -299,7 +320,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 221 - job_id: j1p8om4xg + job_id: jp8qy8rkp job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -307,14 +328,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:17:48Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:05:17Z' - torchscript_onnx_tflite: - inference_time: 4416.0 - throughput: 226.44927536231884 + inference_time: 4402.0 + throughput: 227.1694684234439 estimated_peak_memory_range: - min: 167936 - max: 11549496 + min: 823296 + max: 2827872 primary_compute_unit: NPU precision: int8 layer_info: @@ -322,14 +343,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 229 - job_id: jqp4qd9vg + job_id: j5we6xz35 job_status: Passed torchscript_onnx_qnn: - inference_time: 3529.0 - throughput: 283.36639274582035 + inference_time: 3749.0 + throughput: 266.7377967457989 estimated_peak_memory_range: - min: 1273856 - max: 2607304 + min: 1306624 + max: 2952528 primary_compute_unit: NPU precision: int8 layer_info: @@ -337,22 +358,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 221 - job_id: jogkzq92g + job_id: jp0z0k995 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:17:49Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:05:16Z' - torchscript_onnx_tflite: - inference_time: 9988.0 - throughput: 100.1201441730076 + inference_time: 5008.0 + throughput: 199.68051118210863 estimated_peak_memory_range: - min: 159744 - max: 74328688 + min: 163840 + max: 81691760 primary_compute_unit: NPU precision: int8 layer_info: @@ -360,14 +381,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 229 - job_id: j0pxv6d1g + job_id: jpedm2xo5 job_status: Passed torchscript_onnx_qnn: - inference_time: 13418.0 - throughput: 74.52675510508273 + inference_time: 4645.0 + throughput: 215.28525296017222 estimated_peak_memory_range: - min: 1282048 - max: 9008416 + min: 1245184 + max: 62789088 primary_compute_unit: NPU precision: int8 layer_info: @@ -375,42 +396,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 221 - job_id: j1gln218p + job_id: j56y4vm6p job_status: Passed reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T11:17:51Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:05:20Z' - torchscript_onnx_tflite: - inference_time: 95801.0 - throughput: 10.438304401832966 + inference_time: 2893.0 + throughput: 345.66194262011754 estimated_peak_memory_range: - min: 3633152 - max: 54027720 - primary_compute_unit: GPU + min: 8192 + max: 54127424 + primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 33 - layers_on_gpu: 127 - layers_on_cpu: 69 + layers_on_npu: 229 + layers_on_gpu: 0 + layers_on_cpu: 0 total_layers: 229 - job_id: jo5mr6dwg + job_id: j5mnx4kdp + job_status: Passed + torchscript_onnx_qnn: + inference_time: 3348.0 + throughput: 298.6857825567503 + estimated_peak_memory_range: + min: 1241088 + max: 51929040 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 221 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 221 + job_id: jgo26mwqp + job_status: Passed + torchscript_onnx: + inference_time: 5191.0 + throughput: 192.64110961279138 + estimated_peak_memory_range: + min: 1585152 + max: 94095936 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 253 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 253 + job_id: j5we6xr35 job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T11:17:42Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:05:27Z' - torchscript_onnx_qnn: - inference_time: 3883.0 - throughput: 257.53283543651816 + inference_time: 4189.0 + throughput: 238.72045834328003 estimated_peak_memory_range: min: 1232896 max: 1232896 @@ -421,14 +472,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 221 - job_id: jep289v4p + job_id: jp2kyjdrp job_status: Passed torchscript_onnx: - inference_time: 9351.0 - throughput: 106.94043417816276 + inference_time: 9178.0 + throughput: 108.95619960775768 estimated_peak_memory_range: - min: 6844416 - max: 6844416 + min: 8142848 + max: 8142848 primary_compute_unit: NPU precision: int8 layer_info: @@ -436,7 +487,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 253 - job_id: jwgoyn4x5 + job_id: jpedm2no5 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -445,4 +496,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:17:54Z' + timestamp: '2024-10-14T23:05:25Z' diff --git a/qai_hub_models/models/yolov8_det/README.md b/qai_hub_models/models/yolov8_det/README.md index fc6ab59b..141d16c8 100644 --- a/qai_hub_models/models/yolov8_det/README.md +++ b/qai_hub_models/models/yolov8_det/README.md @@ -6,7 +6,7 @@ Ultralytics YOLOv8 is a machine learning model that predicts bounding boxes and classes of objects in an image. This is based on the implementation of YOLOv8-Detection found -[here](https://github.com/ultralytics/ultralytics/tree/main/ultralytics/models/yolo/detect). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/yolov8_det). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.yolov8_det.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of YOLOv8-Detection can be found +* The license for the original implementation of YOLOv8-Detection can be found [here](https://github.com/ultralytics/ultralytics/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://github.com/ultralytics/ultralytics/blob/main/LICENSE) +* The license for the compiled assets for on-device deployment can be found [here](https://github.com/ultralytics/ultralytics/blob/main/LICENSE) + ## References * [Ultralytics YOLOv8 Docs: Object Detection](https://docs.ultralytics.com/tasks/detect/) * [Source Model Implementation](https://github.com/ultralytics/ultralytics/tree/main/ultralytics/models/yolo/detect) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/yolov8_det/export.py b/qai_hub_models/models/yolov8_det/export.py index 985b6c19..9a73f192 100644 --- a/qai_hub_models/models/yolov8_det/export.py +++ b/qai_hub_models/models/yolov8_det/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.yolov8_det import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "yolov8_det" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -122,7 +120,7 @@ def export_model( model.to("cpu"), make_torch_inputs(input_spec), check_trace=False ) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -136,7 +134,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -151,7 +149,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -172,13 +170,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -203,7 +201,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/yolov8_det/perf.yaml b/qai_hub_models/models/yolov8_det/perf.yaml index 2261ac50..cb83042b 100644 --- a/qai_hub_models/models/yolov8_det/perf.yaml +++ b/qai_hub_models/models/yolov8_det/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: YOLOv8-Detection performance_metrics: - torchscript_onnx_tflite: - inference_time: 5248.0 - throughput: 190.5487804878049 + inference_time: 5198.0 + throughput: 192.3816852635629 estimated_peak_memory_range: - min: 40960 - max: 5407336 + min: 16384 + max: 171349048 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 290 - job_id: jmg9vykl5 + job_id: j5q6qwlnp job_status: Passed torchscript_onnx_qnn: - inference_time: 5281.0 - throughput: 189.3580761219466 + inference_time: 5304.0 + throughput: 188.5369532428356 estimated_peak_memory_range: - min: 4227072 - max: 16782792 + min: 4968448 + max: 20730640 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 285 - job_id: jegn2morg + job_id: j5we6xy35 job_status: Passed torchscript_onnx: - inference_time: 6367.0 - throughput: 157.0598397989634 + inference_time: 6063.0 + throughput: 164.93485073396008 estimated_peak_memory_range: - min: 5300224 - max: 10908312 + min: 4960256 + max: 10882952 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 286 - job_id: j1gln2o8p + job_id: jp2kyjorp job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:17:00Z' + timestamp: '2024-10-14T23:04:23Z' - torchscript_onnx_tflite: - inference_time: 3835.0 - throughput: 260.7561929595828 + inference_time: 3840.0 + throughput: 260.4166666666667 estimated_peak_memory_range: min: 12288 - max: 87381344 + max: 96494496 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 290 - job_id: jnp10w725 + job_id: jglvm7yj5 job_status: Passed torchscript_onnx_qnn: - inference_time: 3815.0 - throughput: 262.12319790301444 + inference_time: 3826.0 + throughput: 261.3695765812859 estimated_peak_memory_range: - min: 0 - max: 43800208 + min: 4931584 + max: 55705856 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 285 - job_id: joprk2o95 + job_id: jg9ln8owg job_status: Passed torchscript_onnx: - inference_time: 4524.0 - throughput: 221.04332449160034 + inference_time: 5039.0 + throughput: 198.45207382417146 estimated_peak_memory_range: - min: 4128768 - max: 109267504 + min: 5382144 + max: 119552640 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 286 - job_id: jw566zr05 + job_id: jpy13n88p job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:17:01Z' + timestamp: '2024-10-14T23:04:24Z' - torchscript_onnx_tflite: - inference_time: 5172.0 - throughput: 193.34880123743233 + inference_time: 5145.0 + throughput: 194.3634596695821 estimated_peak_memory_range: - min: 225280 - max: 167035944 + min: 229376 + max: 2187816 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 290 - job_id: jvgdwq8e5 + job_id: j56y4v86p job_status: Passed torchscript_onnx_qnn: - inference_time: 5039.0 - throughput: 198.45207382417146 + inference_time: 4996.0 + throughput: 200.160128102482 estimated_peak_memory_range: min: 4993024 - max: 6751880 + max: 6358000 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 285 - job_id: jqpyejq7g + job_id: jgdx106rp job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:16:55Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:04:16Z' - torchscript_onnx_tflite: - inference_time: 8713.0 - throughput: 114.77103179157581 + inference_time: 5198.0 + throughput: 192.3816852635629 estimated_peak_memory_range: - min: 217088 - max: 81988800 + min: 36864 + max: 4522160 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 290 - job_id: jz57zlklp + job_id: jgjvn13vg job_status: Passed torchscript_onnx_qnn: - inference_time: 7883.0 - throughput: 126.85525815045034 + inference_time: 5109.0 + throughput: 195.73302016050107 estimated_peak_memory_range: - min: 4145152 - max: 33283632 + min: 4956160 + max: 6524256 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 285 - job_id: jn5q8rz45 + job_id: jpxkom035 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:16:59Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:04:19Z' - torchscript_onnx_tflite: - inference_time: 5251.0 - throughput: 190.43991620643686 + inference_time: 5234.0 + throughput: 191.05846388995033 estimated_peak_memory_range: - min: 45056 - max: 167626888 + min: 229376 + max: 1799032 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 290 - job_id: jqp4qdmvg + job_id: jpv6k42k5 job_status: Passed torchscript_onnx_qnn: - inference_time: 5024.0 - throughput: 199.04458598726114 + inference_time: 5085.0 + throughput: 196.65683382497542 estimated_peak_memory_range: - min: 5013504 - max: 6329568 + min: 4988928 + max: 6640168 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 285 - job_id: j2p0y2d6g + job_id: jp4lr8e85 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:16:56Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:04:18Z' - torchscript_onnx_tflite: - inference_time: 5191.0 - throughput: 192.64110961279138 + inference_time: 5156.0 + throughput: 193.9487975174554 estimated_peak_memory_range: - min: 225280 - max: 2186664 + min: 258048 + max: 16794984 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 290 - job_id: j0pxv631g + job_id: jgo26mlqp job_status: Passed torchscript_onnx_qnn: - inference_time: 5095.0 - throughput: 196.27085377821393 + inference_time: 5069.0 + throughput: 197.27756954034325 estimated_peak_memory_range: - min: 4964352 - max: 6242344 + min: 4993024 + max: 6330696 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 285 - job_id: j1p8om6xg + job_id: j57yr6ov5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:16:57Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:04:17Z' - torchscript_onnx_tflite: - inference_time: 5180.0 - throughput: 193.05019305019306 + inference_time: 8670.0 + throughput: 115.34025374855824 estimated_peak_memory_range: - min: 225280 - max: 16609440 + min: 245760 + max: 86445504 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 290 - job_id: jo5mr6owg + job_id: jp3j08z3g job_status: Passed torchscript_onnx_qnn: - inference_time: 5125.0 - throughput: 195.1219512195122 + inference_time: 7878.0 + throughput: 126.93577050012694 estimated_peak_memory_range: - min: 4972544 - max: 6195568 + min: 4931584 + max: 41834944 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 285 - job_id: jogkzqo2g + job_id: jgn6vx1k5 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:04:21Z' + - torchscript_onnx_tflite: + inference_time: 3025.0 + throughput: 330.57851239669424 + estimated_peak_memory_range: + min: 8192 + max: 60656016 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 290 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 290 + job_id: jgz3dwzo5 + job_status: Passed + torchscript_onnx_qnn: + inference_time: 3595.0 + throughput: 278.1641168289291 + estimated_peak_memory_range: + min: 4927488 + max: 51255872 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 285 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 285 + job_id: jprv39x0g + job_status: Passed + torchscript_onnx: + inference_time: 4011.0 + throughput: 249.3143854400399 + estimated_peak_memory_range: + min: 5365760 + max: 76348800 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 286 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 286 + job_id: jgkexd6wg + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:16:58Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:04:27Z' - torchscript_onnx_qnn: - inference_time: 5442.0 - throughput: 183.75597206909225 + inference_time: 5524.0 + throughput: 181.02824040550325 estimated_peak_memory_range: min: 4923392 max: 4923392 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 285 - job_id: jep28944p + job_id: jp14z3o8p job_status: Passed torchscript_onnx: - inference_time: 6486.0 - throughput: 154.17823003391922 + inference_time: 6702.0 + throughput: 149.20919128618323 estimated_peak_memory_range: - min: 4931584 - max: 4931584 + min: 5443584 + max: 5443584 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 286 - job_id: j1p3k1xl5 + job_id: jp0z0ko95 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:17:02Z' + timestamp: '2024-10-14T23:04:25Z' diff --git a/qai_hub_models/models/yolov8_det_quantized/README.md b/qai_hub_models/models/yolov8_det_quantized/README.md index 025830a7..81e68ba0 100644 --- a/qai_hub_models/models/yolov8_det_quantized/README.md +++ b/qai_hub_models/models/yolov8_det_quantized/README.md @@ -6,7 +6,7 @@ Ultralytics YOLOv8 is a machine learning model that predicts bounding boxes and classes of objects in an image. This model is post-training quantized to int8 using samples from the COCO dataset. This is based on the implementation of YOLOv8-Detection-Quantized found -[here](https://github.com/ultralytics/ultralytics/tree/main/ultralytics/models/yolo/detect). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/yolov8_det_quantized). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.yolov8_det_quantized.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of YOLOv8-Detection-Quantized can be found +* The license for the original implementation of YOLOv8-Detection-Quantized can be found [here](https://github.com/ultralytics/ultralytics/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://github.com/ultralytics/ultralytics/blob/main/LICENSE) +* The license for the compiled assets for on-device deployment can be found [here](https://github.com/ultralytics/ultralytics/blob/main/LICENSE) + ## References * [Ultralytics YOLOv8 Docs: Object Detection](https://docs.ultralytics.com/tasks/detect/) * [Source Model Implementation](https://github.com/ultralytics/ultralytics/tree/main/ultralytics/models/yolo/detect) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/yolov8_det_quantized/export.py b/qai_hub_models/models/yolov8_det_quantized/export.py index 69f20f6a..3682d38a 100644 --- a/qai_hub_models/models/yolov8_det_quantized/export.py +++ b/qai_hub_models/models/yolov8_det_quantized/export.py @@ -10,17 +10,17 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.yolov8_det_quantized import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.printing import ( print_inference_metrics, @@ -45,20 +45,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -80,10 +78,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "yolov8_det_quantized" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -109,7 +107,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -120,7 +118,7 @@ def export_model( target_runtime, output_path, input_spec ) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -135,7 +133,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -150,7 +148,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -171,13 +169,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -202,7 +200,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/yolov8_det_quantized/perf.yaml b/qai_hub_models/models/yolov8_det_quantized/perf.yaml index 82f67d77..6a15ee80 100644 --- a/qai_hub_models/models/yolov8_det_quantized/perf.yaml +++ b/qai_hub_models/models/yolov8_det_quantized/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,44 +20,41 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS6490 (Proxy) - RB3 Gen 2 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) - QCS8250 (Proxy) - RB5 (Proxy) - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Sa8775p Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Qcs8250 Proxy - - Qcs6490 Proxy + - Snapdragon® X Plus 8-Core + - QCS6490 Proxy + - QCS8250 Proxy + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: YOLOv8-Detection-Quantized performance_metrics: - torchscript_onnx_tflite: - inference_time: 1913.0 - throughput: 522.7391531625718 + inference_time: 1915.0 + throughput: 522.1932114882507 estimated_peak_memory_range: - min: 12288 - max: 108295224 + min: 16384 + max: 1454752 primary_compute_unit: NPU precision: int8 layer_info: @@ -64,14 +62,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 278 - job_id: jnp10w325 + job_id: jpy13nx7p job_status: Passed torchscript_onnx_qnn: - inference_time: 2242.0 - throughput: 446.03033006244425 + inference_time: 2265.0 + throughput: 441.5011037527594 estimated_peak_memory_range: - min: 2113536 - max: 12544800 + min: 12288 + max: 9036560 primary_compute_unit: NPU precision: int8 layer_info: @@ -79,14 +77,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 273 - job_id: jqpyejn7g + job_id: jpedm2z15 job_status: Passed torchscript_onnx: - inference_time: 6310.0 - throughput: 158.47860538827257 + inference_time: 5638.0 + throughput: 177.367860943597 estimated_peak_memory_range: - min: 6213632 - max: 12150408 + min: 7311360 + max: 11149928 primary_compute_unit: NPU precision: int8 layer_info: @@ -94,7 +92,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 8 total_layers: 331 - job_id: j1pv3r4j5 + job_id: jp4lr8y85 job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -103,13 +101,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:16:21Z' + timestamp: '2024-10-14T23:03:33Z' - torchscript_onnx_tflite: - inference_time: 1646.0 - throughput: 607.5334143377886 + inference_time: 1273.0 + throughput: 785.5459544383347 estimated_peak_memory_range: min: 12288 - max: 55371904 + max: 59792512 primary_compute_unit: NPU precision: int8 layer_info: @@ -117,14 +115,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 278 - job_id: jvgdwq0e5 + job_id: jp0z0kj65 job_status: Passed torchscript_onnx_qnn: - inference_time: 1494.0 - throughput: 669.3440428380187 + inference_time: 1489.0 + throughput: 671.591672263264 estimated_peak_memory_range: min: 1245184 - max: 28324336 + max: 32420816 primary_compute_unit: NPU precision: int8 layer_info: @@ -132,14 +130,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 273 - job_id: j2p0y2k6g + job_id: jgz3dwmk5 job_status: Passed torchscript_onnx: - inference_time: 5579.0 - throughput: 179.24359204158452 + inference_time: 3976.0 + throughput: 251.50905432595573 estimated_peak_memory_range: - min: 0 - max: 123878256 + min: 3457024 + max: 158278112 primary_compute_unit: NPU precision: int8 layer_info: @@ -147,7 +145,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 8 total_layers: 331 - job_id: j7gjx21xp + job_id: jpxkoml35 job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -156,13 +154,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:16:22Z' + timestamp: '2024-10-14T23:03:34Z' - torchscript_onnx_tflite: - inference_time: 1922.0 - throughput: 520.2913631633714 + inference_time: 4734.0 + throughput: 211.23785382340515 estimated_peak_memory_range: - min: 16384 - max: 1718064 + min: 12288 + max: 44329568 primary_compute_unit: NPU precision: int8 layer_info: @@ -170,14 +168,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 278 - job_id: jz57zl6lp + job_id: jgo26mrxp job_status: Passed torchscript_onnx_qnn: - inference_time: 1950.0 - throughput: 512.8205128205128 + inference_time: 5781.0 + throughput: 172.9804532087874 estimated_peak_memory_range: - min: 1282048 - max: 3053760 + min: 1245184 + max: 8681616 primary_compute_unit: NPU precision: int8 layer_info: @@ -185,22 +183,45 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 273 - job_id: jogkzqd2g + job_id: jgdx10drp job_status: Passed reference_device_info: - name: QCS8550 (Proxy) + name: RB3 Gen 2 (Proxy) os: '12' form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:16:15Z' + chipset: QCS6490 Proxy + timestamp: '2024-10-14T23:03:32Z' - torchscript_onnx_tflite: - inference_time: 2100.0 - throughput: 476.1904761904762 + inference_time: 46023.0 + throughput: 21.72826630163179 + estimated_peak_memory_range: + min: 2912256 + max: 16478472 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 277 + layers_on_gpu: 1 + layers_on_cpu: 0 + total_layers: 278 + job_id: jpv6k4dj5 + job_status: Passed + reference_device_info: + name: RB5 (Proxy) + os: '12' + form_factor: Iot + os_name: Android + manufacturer: Qualcomm + chipset: QCS8250 Proxy + timestamp: '2024-10-14T23:03:20Z' + - torchscript_onnx_tflite: + inference_time: 1891.0 + throughput: 528.8207297726071 estimated_peak_memory_range: min: 12288 - max: 55966704 + max: 6466008 primary_compute_unit: NPU precision: int8 layer_info: @@ -208,14 +229,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 278 - job_id: jqp4qd8vg + job_id: jp8qy8xxp job_status: Passed torchscript_onnx_qnn: - inference_time: 2486.0 - throughput: 402.2526146419952 + inference_time: 1941.0 + throughput: 515.1983513652756 estimated_peak_memory_range: - min: 1245184 - max: 29299664 + min: 1261568 + max: 2423696 primary_compute_unit: NPU precision: int8 layer_info: @@ -223,22 +244,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 273 - job_id: j1p3k18l5 + job_id: jg9ln8zlg job_status: Passed reference_device_info: - name: QCS8450 (Proxy) - os: '13' - form_factor: Xr + name: QCS8550 (Proxy) + os: '12' + form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:16:19Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:03:25Z' - torchscript_onnx_tflite: - inference_time: 1915.0 - throughput: 522.1932114882507 + inference_time: 1897.0 + throughput: 527.1481286241434 estimated_peak_memory_range: - min: 12288 - max: 4475880 + min: 16384 + max: 9354536 primary_compute_unit: NPU precision: int8 layer_info: @@ -246,14 +267,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 278 - job_id: j0pxv6m1g + job_id: j56y4v70p job_status: Passed torchscript_onnx_qnn: - inference_time: 1956.0 - throughput: 511.2474437627812 + inference_time: 1970.0 + throughput: 507.61421319796955 estimated_peak_memory_range: - min: 1273856 - max: 2472528 + min: 1261568 + max: 2413648 primary_compute_unit: NPU precision: int8 layer_info: @@ -261,22 +282,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 273 - job_id: jn5q8rw45 + job_id: j5we6xl35 job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8255 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:16:16Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:03:28Z' - torchscript_onnx_tflite: - inference_time: 1909.0 - throughput: 523.8344683080147 + inference_time: 1919.0 + throughput: 521.1047420531527 estimated_peak_memory_range: - min: 16384 - max: 2940456 + min: 12288 + max: 2250360 primary_compute_unit: NPU precision: int8 layer_info: @@ -284,14 +305,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 278 - job_id: jo5mr64wg + job_id: jglvm7x85 job_status: Passed torchscript_onnx_qnn: - inference_time: 1961.0 - throughput: 509.94390617032127 + inference_time: 1953.0 + throughput: 512.0327700972862 estimated_peak_memory_range: - min: 1273856 - max: 2520880 + min: 4788224 + max: 5988272 primary_compute_unit: NPU precision: int8 layer_info: @@ -299,7 +320,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 273 - job_id: j1gln278p + job_id: jgdx10dep job_status: Passed reference_device_info: name: SA8775 (Proxy) @@ -307,14 +328,14 @@ models: form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:16:17Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:03:27Z' - torchscript_onnx_tflite: - inference_time: 1915.0 - throughput: 522.1932114882507 + inference_time: 1919.0 + throughput: 521.1047420531527 estimated_peak_memory_range: min: 12288 - max: 1786760 + max: 3498696 primary_compute_unit: NPU precision: int8 layer_info: @@ -322,14 +343,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 278 - job_id: jegn2mxrg + job_id: j5q6qwy4p job_status: Passed torchscript_onnx_qnn: - inference_time: 1951.0 - throughput: 512.557662737058 + inference_time: 1979.0 + throughput: 505.3057099545225 estimated_peak_memory_range: - min: 1245184 - max: 3175872 + min: 1265664 + max: 2441912 primary_compute_unit: NPU precision: int8 layer_info: @@ -337,22 +358,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 273 - job_id: jw566zv05 + job_id: jp14z3n2p job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:16:18Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:03:26Z' - torchscript_onnx_tflite: - inference_time: 4477.0 - throughput: 223.36385972749608 + inference_time: 2091.0 + throughput: 478.24007651841225 estimated_peak_memory_range: - min: 94208 - max: 44556032 + min: 12288 + max: 60267504 primary_compute_unit: NPU precision: int8 layer_info: @@ -360,14 +381,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 278 - job_id: joprk2995 + job_id: jgkexd42g job_status: Passed torchscript_onnx_qnn: - inference_time: 6094.0 - throughput: 164.09583196586806 + inference_time: 2512.0 + throughput: 398.0891719745223 estimated_peak_memory_range: - min: 1540096 - max: 9478256 + min: 1245184 + max: 34234320 primary_compute_unit: NPU precision: int8 layer_info: @@ -375,42 +396,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 273 - job_id: jwgoynmx5 + job_id: jp14z3n8p job_status: Passed reference_device_info: - name: RB3 Gen 2 (Proxy) - os: '12' - form_factor: Iot + name: QCS8450 (Proxy) + os: '13' + form_factor: Xr os_name: Android manufacturer: Qualcomm - chipset: Qcs6490 Proxy - timestamp: '2024-09-25T11:16:20Z' + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:03:30Z' - torchscript_onnx_tflite: - inference_time: 46612.0 - throughput: 21.453702909122114 + inference_time: 1206.0 + throughput: 829.1873963515754 estimated_peak_memory_range: - min: 2920448 - max: 25415328 + min: 8192 + max: 39328128 primary_compute_unit: NPU precision: int8 layer_info: - layers_on_npu: 277 - layers_on_gpu: 1 + layers_on_npu: 278 + layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 278 - job_id: jep289j4p + job_id: jgjvn17xg + job_status: Passed + torchscript_onnx_qnn: + inference_time: 1487.0 + throughput: 672.4949562878278 + estimated_peak_memory_range: + min: 1241088 + max: 28062400 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 273 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 273 + job_id: j57yr6ev5 + job_status: Passed + torchscript_onnx: + inference_time: 3993.0 + throughput: 250.4382669671926 + estimated_peak_memory_range: + min: 6139904 + max: 127875488 + primary_compute_unit: NPU + precision: int8 + layer_info: + layers_on_npu: 323 + layers_on_gpu: 0 + layers_on_cpu: 8 + total_layers: 331 + job_id: jprv39l0g job_status: Passed reference_device_info: - name: RB5 (Proxy) - os: '12' - form_factor: Iot + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Qcs8250 Proxy - timestamp: '2024-09-25T11:16:11Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:03:37Z' - torchscript_onnx_qnn: - inference_time: 2223.0 - throughput: 449.842555105713 + inference_time: 2263.0 + throughput: 441.8912947414936 estimated_peak_memory_range: min: 1232896 max: 1232896 @@ -421,14 +472,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 273 - job_id: j1p8om8xg + job_id: j5we6xl65 job_status: Passed torchscript_onnx: - inference_time: 6233.0 - throughput: 160.43638697256537 + inference_time: 6306.0 + throughput: 158.5791309863622 estimated_peak_memory_range: - min: 7827456 - max: 7827456 + min: 7704576 + max: 7704576 primary_compute_unit: NPU precision: int8 layer_info: @@ -436,7 +487,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 8 total_layers: 331 - job_id: jlpe9w21g + job_id: j5mnx40dp job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -445,4 +496,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:16:23Z' + timestamp: '2024-10-14T23:03:35Z' diff --git a/qai_hub_models/models/yolov8_seg/README.md b/qai_hub_models/models/yolov8_seg/README.md index 7f827999..678d1662 100644 --- a/qai_hub_models/models/yolov8_seg/README.md +++ b/qai_hub_models/models/yolov8_seg/README.md @@ -6,7 +6,7 @@ Ultralytics YOLOv8 is a machine learning model that predicts bounding boxes, segmentation masks and classes of objects in an image. This is based on the implementation of YOLOv8-Segmentation found -[here](https://github.com/ultralytics/ultralytics/tree/main/ultralytics/models/yolo/segment). This repository contains scripts for optimized on-device +[here]({source_repo}). This repository contains scripts for optimized on-device export suitable to run on Qualcomm® devices. More details on model performance accross various devices, can be found [here](https://aihub.qualcomm.com/models/yolov8_seg). @@ -44,15 +44,19 @@ python -m qai_hub_models.models.yolov8_seg.export Additional options are documented with the `--help` option. Note that the above script requires access to Deployment instructions for Qualcomm® AI Hub. + ## License -- The license for the original implementation of YOLOv8-Segmentation can be found +* The license for the original implementation of YOLOv8-Segmentation can be found [here](https://github.com/ultralytics/ultralytics/blob/main/LICENSE). -- The license for the compiled assets for on-device deployment can be found [here](https://github.com/ultralytics/ultralytics/blob/main/LICENSE) +* The license for the compiled assets for on-device deployment can be found [here](https://github.com/ultralytics/ultralytics/blob/main/LICENSE) + ## References * [Ultralytics YOLOv8 Docs: Instance Segmentation](https://docs.ultralytics.com/tasks/segment/) * [Source Model Implementation](https://github.com/ultralytics/ultralytics/tree/main/ultralytics/models/yolo/segment) + + ## Community * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI. * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com). diff --git a/qai_hub_models/models/yolov8_seg/export.py b/qai_hub_models/models/yolov8_seg/export.py index dd920257..464045bf 100644 --- a/qai_hub_models/models/yolov8_seg/export.py +++ b/qai_hub_models/models/yolov8_seg/export.py @@ -10,18 +10,18 @@ import os import warnings from pathlib import Path -from typing import Any, Dict, List, Optional, Tuple, cast +from typing import Any, Dict, List, Optional, cast import qai_hub as hub import torch +from qai_hub_models.models.common import ExportResult, TargetRuntime from qai_hub_models.models.yolov8_seg import Model from qai_hub_models.utils.args import ( export_parser, get_input_spec_kwargs, get_model_kwargs, ) -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import torch_inference from qai_hub_models.utils.input_spec import make_torch_inputs from qai_hub_models.utils.printing import ( @@ -47,20 +47,18 @@ def export_model( compile_options: str = "", profile_options: str = "", **additional_model_kwargs, -) -> Tuple[hub.CompileJob, Optional[hub.ProfileJob], Optional[hub.InferenceJob]] | List[ - str -]: +) -> ExportResult | List[str]: """ - This function accomplishes 6 main tasks: + This function executes the following recipe: - 1. Instantiates a PyTorch model and converts it to a traced TorchScript format. - 2. Compiles the model to an asset that can be run on device. - 3. Profiles the model performance on real devices. - 4. Inferences the model on sample inputs. - 5. Downloads the model asset to the local directory. - 6. Summarizes the results from profiling and inference. + 1. Instantiates a PyTorch model and converts it to a traced TorchScript format + 2. Compiles the model to an asset that can be run on device + 3. Profiles the model performance on a real device + 4. Inferences the model on sample inputs + 5. Downloads the model asset to the local directory + 6. Summarizes the results from profiling and inference - Each of the last four steps can be optionally skipped using the input options. + Each of the last 4 steps can be optionally skipped using the input options. Parameters: device: Device for which to export the model. @@ -82,10 +80,10 @@ def export_model( `model_cls.from_pretrained` and `model.get_input_spec` Returns: - A 3-tuple of: + A struct of: * A CompileJob object containing metadata about the compile job submitted to hub. - * A ProfileJob containing metadata about the profile job (None if profiling skipped). * An InferenceJob containing metadata about the inference job (None if inferencing skipped). + * A ProfileJob containing metadata about the profile job (None if profiling skipped). """ model_name = "yolov8_seg" output_path = Path(output_dir or Path.cwd() / "build" / model_name) @@ -111,7 +109,7 @@ def export_model( # On-device perf improves with I/O in channel_last format except when using ONNX. use_channel_last_format = target_runtime != TargetRuntime.ONNX - # 1. Initialize PyTorch model + # 1. Instantiates a PyTorch model and converts it to a traced TorchScript format model = Model.from_pretrained(**get_model_kwargs(Model, additional_model_kwargs)) input_spec = model.get_input_spec( **get_input_spec_kwargs(model, additional_model_kwargs) @@ -122,7 +120,7 @@ def export_model( model.to("cpu"), make_torch_inputs(input_spec), check_trace=False ) - # 2. Compile the model to an on-device asset + # 2. Compiles the model to an asset that can be run on device model_compile_options = model.get_hub_compile_options( target_runtime, compile_options, hub_device ) @@ -136,7 +134,7 @@ def export_model( ) compile_job = cast(hub.client.CompileJob, submitted_compile_job) - # 3. Profile the model asset on real devices + # 3. Profiles the model performance on a real device profile_job: Optional[hub.client.ProfileJob] = None if not skip_profiling: profile_options_all = model.get_hub_profile_options( @@ -151,7 +149,7 @@ def export_model( ) profile_job = cast(hub.client.ProfileJob, submitted_profile_job) - # 4. Run inference on-device with sample inputs + # 4. Inferences the model on sample inputs inference_job: Optional[hub.client.InferenceJob] = None if not skip_inferencing: profile_options_all = model.get_hub_profile_options( @@ -172,13 +170,13 @@ def export_model( ) inference_job = cast(hub.client.InferenceJob, submitted_inference_job) - # 5. Download the model asset to a local file + # 5. Downloads the model asset to the local directory if not skip_downloading: os.makedirs(output_path, exist_ok=True) target_model: hub.Model = compile_job.get_target_model() # type: ignore target_model.download(str(output_path / model_name)) - # 6. Summarize the results from profiling and inference + # 6. Summarizes the results from profiling and inference if not skip_summary and not skip_profiling: assert profile_job is not None and profile_job.wait().success profile_data: Dict[str, Any] = profile_job.download_profile() # type: ignore @@ -203,7 +201,11 @@ def export_model( if not skip_summary: print_on_target_demo_cmd(compile_job, Path(__file__).parent, hub_device) - return (compile_job, profile_job, inference_job) + return ExportResult( + compile_job=compile_job, + inference_job=inference_job, + profile_job=profile_job, + ) def main(): diff --git a/qai_hub_models/models/yolov8_seg/perf.yaml b/qai_hub_models/models/yolov8_seg/perf.yaml index 810ec6ac..aad3a905 100644 --- a/qai_hub_models/models/yolov8_seg/perf.yaml +++ b/qai_hub_models/models/yolov8_seg/perf.yaml @@ -2,6 +2,7 @@ aggregated: supported_oses: - Android supported_devices: + - Snapdragon 8 Elite QRD - Samsung Galaxy S24 - Samsung Galaxy S24 Ultra - Samsung Galaxy S24+ @@ -19,38 +20,35 @@ aggregated: - Samsung Galaxy S21 Ultra - Samsung Galaxy S21+ - Snapdragon X Elite CRD - - QCS8550 (Proxy) - - SA8775 (Proxy) - - SA8650 (Proxy) - - SA8255 (Proxy) + - Snapdragon X Plus 8-Core CRD - QCS8450 (Proxy) - XR2 Gen 2 (Proxy) - - Google Pixel 5a 5G - - Google Pixel 4 - - Google Pixel 4a - - Google Pixel 3 - - Google Pixel 3a - - Google Pixel 3a XL + - QCS8550 (Proxy) + - SA8255 (Proxy) + - SA8650 (Proxy) + - SA8775 (Proxy) supported_chipsets: + - Snapdragon® 8 Elite - Snapdragon® 8 Gen 3 - Snapdragon® 8 Gen 2 - Snapdragon® 8 Gen 1 - Snapdragon® 888 - Snapdragon® X Elite - - Qcs8550 Proxy - - Qcs8450 Proxy - - Sa8650p Proxy - - Sa8255p Proxy - - Sa8775p Proxy + - Snapdragon® X Plus 8-Core + - QCS8450 Proxy + - QCS8550 Proxy + - SA8255P Proxy + - SA8650P Proxy + - SA8775P Proxy models: - name: YOLOv8-Segmentation performance_metrics: - torchscript_onnx_tflite: - inference_time: 6418.0 - throughput: 155.8117793705204 + inference_time: 6541.0 + throughput: 152.88182235132243 estimated_peak_memory_range: - min: 4235264 - max: 6944008 + min: 4571136 + max: 6416656 primary_compute_unit: NPU precision: fp16 layer_info: @@ -58,14 +56,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 338 - job_id: jz57zlvlp + job_id: j5mnx48wp job_status: Passed torchscript_onnx_qnn: - inference_time: 6398.0 - throughput: 156.29884338855894 + inference_time: 6409.0 + throughput: 156.03058199407084 estimated_peak_memory_range: - min: 7303168 - max: 17949456 + min: 4210688 + max: 15170608 primary_compute_unit: NPU precision: fp16 layer_info: @@ -73,14 +71,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 333 - job_id: jqpyejv7g + job_id: jglvm7l85 job_status: Passed torchscript_onnx: - inference_time: 7635.0 - throughput: 130.97576948264572 + inference_time: 7616.0 + throughput: 131.30252100840337 estimated_peak_memory_range: - min: 14888960 - max: 22845744 + min: 13791232 + max: 22564368 primary_compute_unit: NPU precision: fp16 layer_info: @@ -88,7 +86,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 336 - job_id: jwgoynex5 + job_id: jp14z3j2p job_status: Passed reference_device_info: name: Samsung Galaxy S23 @@ -97,13 +95,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 2 - timestamp: '2024-09-25T11:15:20Z' + timestamp: '2024-10-14T23:02:24Z' - torchscript_onnx_tflite: - inference_time: 4846.0 - throughput: 206.35575732562938 + inference_time: 4861.0 + throughput: 205.71898786257972 estimated_peak_memory_range: - min: 2994176 - max: 107446800 + min: 3215360 + max: 117297872 primary_compute_unit: NPU precision: fp16 layer_info: @@ -111,14 +109,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 338 - job_id: jqp4qdjvg + job_id: jgn6vxkr5 job_status: Passed torchscript_onnx_qnn: - inference_time: 4782.0 - throughput: 209.11752404851526 + inference_time: 4775.0 + throughput: 209.4240837696335 estimated_peak_memory_range: min: 4931584 - max: 55423776 + max: 65463024 primary_compute_unit: NPU precision: fp16 layer_info: @@ -126,14 +124,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 333 - job_id: j2p0y2e6g + job_id: j56y4vw0p job_status: Passed torchscript_onnx: - inference_time: 5593.0 - throughput: 178.79492222420882 + inference_time: 5228.0 + throughput: 191.27773527161438 estimated_peak_memory_range: - min: 434176 - max: 113566400 + min: 18432000 + max: 139158352 primary_compute_unit: NPU precision: fp16 layer_info: @@ -141,7 +139,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 336 - job_id: j1pv3rzj5 + job_id: jgdx103ep job_status: Passed reference_device_info: name: Samsung Galaxy S24 @@ -150,13 +148,13 @@ models: os_name: Android manufacturer: Samsung chipset: Snapdragon® 8 Gen 3 - timestamp: '2024-09-25T11:15:21Z' + timestamp: '2024-10-14T23:02:25Z' - torchscript_onnx_tflite: - inference_time: 6501.0 - throughput: 153.82248884786955 + inference_time: 6414.0 + throughput: 155.90894917368257 estimated_peak_memory_range: - min: 4575232 - max: 6463320 + min: 12288 + max: 20095544 primary_compute_unit: NPU precision: fp16 layer_info: @@ -164,14 +162,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 338 - job_id: j0pxv6e1g + job_id: jprv39w9g job_status: Passed torchscript_onnx_qnn: - inference_time: 6096.0 - throughput: 164.04199475065616 + inference_time: 6283.0 + throughput: 159.15963711602737 estimated_peak_memory_range: - min: 4980736 - max: 11781136 + min: 4956160 + max: 6289704 primary_compute_unit: NPU precision: fp16 layer_info: @@ -179,7 +177,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 333 - job_id: jogkzqr2g + job_id: jgo26m8xp job_status: Passed reference_device_info: name: QCS8550 (Proxy) @@ -187,14 +185,14 @@ models: form_factor: Iot os_name: Android manufacturer: Qualcomm - chipset: Qcs8550 Proxy - timestamp: '2024-09-25T11:15:15Z' + chipset: QCS8550 Proxy + timestamp: '2024-10-14T23:02:17Z' - torchscript_onnx_tflite: - inference_time: 9680.0 - throughput: 103.30578512396694 + inference_time: 6446.0 + throughput: 155.13496742165685 estimated_peak_memory_range: - min: 4567040 - max: 101403952 + min: 0 + max: 209652176 primary_compute_unit: NPU precision: fp16 layer_info: @@ -202,14 +200,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 338 - job_id: jo5mr6vwg + job_id: jp8qy81xp job_status: Passed torchscript_onnx_qnn: - inference_time: 9137.0 - throughput: 109.44511327569224 + inference_time: 6278.0 + throughput: 159.28639694170118 estimated_peak_memory_range: - min: 4939776 - max: 42898608 + min: 4947968 + max: 10993640 primary_compute_unit: NPU precision: fp16 layer_info: @@ -217,22 +215,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 333 - job_id: j1p3k1ql5 + job_id: jpedm2y15 job_status: Passed reference_device_info: - name: QCS8450 (Proxy) + name: SA8255 (Proxy) os: '13' - form_factor: Xr + form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Qcs8450 Proxy - timestamp: '2024-09-25T11:15:19Z' + chipset: SA8255P Proxy + timestamp: '2024-10-14T23:02:20Z' - torchscript_onnx_tflite: - inference_time: 6566.0 - throughput: 152.29972586049345 + inference_time: 6490.0 + throughput: 154.08320493066256 estimated_peak_memory_range: - min: 4567040 - max: 7074160 + min: 4579328 + max: 7531776 primary_compute_unit: NPU precision: fp16 layer_info: @@ -240,14 +238,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 338 - job_id: jegn2mrrg + job_id: jp0z0k665 job_status: Passed torchscript_onnx_qnn: - inference_time: 6089.0 - throughput: 164.23057973394646 + inference_time: 6277.0 + throughput: 159.31177314003506 estimated_peak_memory_range: - min: 5005312 - max: 6444544 + min: 4943872 + max: 12461256 primary_compute_unit: NPU precision: fp16 layer_info: @@ -255,22 +253,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 333 - job_id: jn5q8r945 + job_id: jgjvn1qxg job_status: Passed reference_device_info: - name: SA8650 (Proxy) + name: SA8775 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8650p Proxy - timestamp: '2024-09-25T11:15:16Z' + chipset: SA8775P Proxy + timestamp: '2024-10-14T23:02:19Z' - torchscript_onnx_tflite: - inference_time: 6616.0 - throughput: 151.14873035066506 + inference_time: 6533.0 + throughput: 153.0690341343946 estimated_peak_memory_range: - min: 4587520 - max: 7057528 + min: 4595712 + max: 14287264 primary_compute_unit: NPU precision: fp16 layer_info: @@ -278,14 +276,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 338 - job_id: joprk2195 + job_id: jpy13nm7p job_status: Passed torchscript_onnx_qnn: - inference_time: 6022.0 - throughput: 166.05778811026238 + inference_time: 6380.0 + throughput: 156.73981191222572 estimated_peak_memory_range: - min: 5009408 - max: 6336528 + min: 4960256 + max: 6184600 primary_compute_unit: NPU precision: fp16 layer_info: @@ -293,22 +291,22 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 333 - job_id: j1gln2e8p + job_id: jpv6k47j5 job_status: Passed reference_device_info: - name: SA8775 (Proxy) + name: SA8650 (Proxy) os: '13' form_factor: Auto os_name: Android manufacturer: Qualcomm - chipset: Sa8775p Proxy - timestamp: '2024-09-25T11:15:17Z' + chipset: SA8650P Proxy + timestamp: '2024-10-14T23:02:18Z' - torchscript_onnx_tflite: - inference_time: 6544.0 - throughput: 152.8117359413203 + inference_time: 9610.0 + throughput: 104.0582726326743 estimated_peak_memory_range: - min: 4227072 - max: 13454240 + min: 4763648 + max: 107743616 primary_compute_unit: NPU precision: fp16 layer_info: @@ -316,14 +314,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 338 - job_id: jep28934p + job_id: jp2kyje4p job_status: Passed torchscript_onnx_qnn: - inference_time: 6234.0 - throughput: 160.41065126724413 + inference_time: 9155.0 + throughput: 109.22992900054615 estimated_peak_memory_range: - min: 4993024 - max: 6883504 + min: 4931584 + max: 46516128 primary_compute_unit: NPU precision: fp16 layer_info: @@ -331,19 +329,72 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 333 - job_id: jw566zq05 + job_id: j5we6x765 job_status: Passed reference_device_info: - name: SA8255 (Proxy) + name: QCS8450 (Proxy) os: '13' - form_factor: Auto + form_factor: Xr + os_name: Android + manufacturer: Qualcomm + chipset: QCS8450 Proxy + timestamp: '2024-10-14T23:02:22Z' + - torchscript_onnx_tflite: + inference_time: 4508.0 + throughput: 221.82786157941436 + estimated_peak_memory_range: + min: 4075520 + max: 78297040 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 338 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 338 + job_id: j5q6qwv4p + job_status: Passed + torchscript_onnx_qnn: + inference_time: 3685.0 + throughput: 271.37042062415196 + estimated_peak_memory_range: + min: 4927488 + max: 59329488 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 333 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 333 + job_id: jg9ln8mlg + job_status: Passed + torchscript_onnx: + inference_time: 4821.0 + throughput: 207.42584526031945 + estimated_peak_memory_range: + min: 0 + max: 74454688 + primary_compute_unit: NPU + precision: fp16 + layer_info: + layers_on_npu: 336 + layers_on_gpu: 0 + layers_on_cpu: 0 + total_layers: 336 + job_id: jpxkom415 + job_status: Passed + reference_device_info: + name: Snapdragon 8 Elite QRD + os: '15' + form_factor: Phone os_name: Android manufacturer: Qualcomm - chipset: Sa8255p Proxy - timestamp: '2024-09-25T11:15:18Z' + chipset: Snapdragon® 8 Elite + timestamp: '2024-10-14T23:02:28Z' - torchscript_onnx_qnn: - inference_time: 6424.0 - throughput: 155.6662515566625 + inference_time: 7215.0 + throughput: 138.6001386001386 estimated_peak_memory_range: min: 4923392 max: 4923392 @@ -354,14 +405,14 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 333 - job_id: j1p8omwxg + job_id: jp3j086lg job_status: Passed torchscript_onnx: - inference_time: 7730.0 - throughput: 129.36610608020698 + inference_time: 7647.0 + throughput: 130.7702366941284 estimated_peak_memory_range: - min: 17395712 - max: 17395712 + min: 17469440 + max: 17469440 primary_compute_unit: NPU precision: fp16 layer_info: @@ -369,7 +420,7 @@ models: layers_on_gpu: 0 layers_on_cpu: 0 total_layers: 336 - job_id: j7gjx2kxp + job_id: j57yr64l5 job_status: Passed reference_device_info: name: Snapdragon X Elite CRD @@ -378,4 +429,4 @@ models: os_name: Windows manufacturer: Qualcomm chipset: Snapdragon® X Elite - timestamp: '2024-09-25T11:15:22Z' + timestamp: '2024-10-14T23:02:26Z' diff --git a/qai_hub_models/requirements.txt b/qai_hub_models/requirements.txt index 2eee49ed..4da7dd45 100644 --- a/qai_hub_models/requirements.txt +++ b/qai_hub_models/requirements.txt @@ -3,11 +3,11 @@ deprecation==2.1.0 fsspec==2023.6.0 gdown==4.7.1 gitpython==3.1.42 -huggingface_hub==0.23.1 +huggingface_hub>=0.23.1,<0.24 ipython==8.12.3 matplotlib==3.7.5 numpy==1.23.1 -onnx==1.14.1 +onnx # We don't use a specific version of ONNX so we can defer to AIMET. AIMET-torch and AIMET-ONNX use different ONNX versions. opencv-python==4.8.1.78 packaging==23.2 pandas==1.5.3 @@ -24,4 +24,4 @@ torchvision==0.16.2 typing-extensions>=4.12.2 tqdm==4.66.2 urllib3==1.26.18 -qai_hub>=0.15.0 +qai_hub>=0.18.1 diff --git a/qai_hub_models/test/test_utils/test_perf_summary.py b/qai_hub_models/test/test_utils/test_perf_summary.py index 7d28c515..209db75f 100644 --- a/qai_hub_models/test/test_utils/test_perf_summary.py +++ b/qai_hub_models/test/test_utils/test_perf_summary.py @@ -99,7 +99,7 @@ def test_model_inference_run_toggle(): perf_summary.update_summary(MODEL_ID, prev_perf_metrics, new_perf_metrics) assert perf_summary.progressions["inf"] == [ - (MODEL_ID, "torchscript_onnx_tflite", "inf", 10.0, "null", CHIPSET, OS) + (MODEL_ID, "torchscript_onnx_tflite", "inf", 10.0, "null", "null", CHIPSET, OS) ] @@ -118,7 +118,7 @@ def test_perf_progression_basic(): perf_summary.update_summary(MODEL_ID, prev_perf_metrics, new_perf_metrics) expected_inf_bucket = [ - (MODEL_ID, "torchscript_onnx_tflite", 20.0, 0.5, 10.0, CHIPSET, OS), + (MODEL_ID, "torchscript_onnx_tflite", 20.0, 0.5, 10.0, "null", CHIPSET, OS), ] assert perf_summary.progressions[10] == expected_inf_bucket @@ -140,7 +140,7 @@ def test_perf_regression_basic(): perf_summary.update_summary(MODEL_ID, prev_perf_metrics, new_perf_metrics) expected_inf_bucket = [ - (MODEL_ID, "torchscript_onnx_tflite", 2, 20.0, 10.0, CHIPSET, OS), + (MODEL_ID, "torchscript_onnx_tflite", 2, 20.0, 10.0, "null", CHIPSET, OS), ] assert perf_summary.regressions[2] == expected_inf_bucket diff --git a/qai_hub_models/utils/aimet/aimet_dummy_model.py b/qai_hub_models/utils/aimet/aimet_dummy_model.py index 59d90738..3a530603 100644 --- a/qai_hub_models/utils/aimet/aimet_dummy_model.py +++ b/qai_hub_models/utils/aimet/aimet_dummy_model.py @@ -6,6 +6,7 @@ import os import shutil +from contextlib import ExitStack from pathlib import Path from typing import List, Optional from zipfile import ZIP_DEFLATED, ZipFile @@ -13,6 +14,7 @@ import torch from onnx import load_model as load_onnx_model from onnx import save_model as save_onnx_model +from packaging.version import Version from qai_hub_models.evaluators.base_evaluators import _DataLoader from qai_hub_models.models.protocols import ( @@ -68,10 +70,10 @@ class AimetEncodingLoaderMixin(PretrainedHubModelProtocol, QuantizableModelProto - Export Torch model to ONNX and load pre-computed encodings """ - def __init__(self, model, aimet_encoding_path: str): + def __init__(self, model, aimet_encodings: str): super().__init__() self.model = model - self.encodings_path = aimet_encoding_path + self.aimet_encodings = aimet_encodings def quantize( self, @@ -91,6 +93,7 @@ def convert_to_onnx_and_aimet_encodings( input_spec: InputSpec | None = None, model_name: str | None = None, external_weights: bool = False, + bundle_external_weights: bool = False, output_names: Optional[List[str]] = None, ) -> str: """ @@ -103,29 +106,50 @@ def convert_to_onnx_and_aimet_encodings( input_spec = self.get_input_spec() os.makedirs(output_dir, exist_ok=True) - zip_path = os.path.join(output_dir, f"{model_name}.aimet.zip") zip_base_dir = Path(f"{model_name}.aimet") + zip = self._use_zip_file() + + with ExitStack() as stack: + if zip: + # Use temporary directory for preparation + tmpdir = stack.enter_context(qaihm_temp_dir()) + else: + tmpdir = output_dir - with qaihm_temp_dir() as tmpdir: base_path = Path(tmpdir) / zip_base_dir - if base_path.exists(): - shutil.rmtree(base_path) - os.makedirs(base_path) + os.makedirs(base_path, exist_ok=True) onnx_file_path = str(base_path / f"{model_name}.onnx") encoding_file_path = str(base_path / f"{model_name}.encodings") + torch_inputs = tuple(make_torch_inputs(input_spec)) + + if Version(torch.__version__) < Version("2.4.0"): + print() + print( + f"WARNING: You are using PyTorch {torch.__version__}, which pre-dates significant ONNX export optimizations" + ) + print( + " introduced in 2.4.0. We recommend upgrading PyTorch version to speed up this step:" + ) + print() + print(" pip install torch==2.4.0") + print() + torch.onnx.export( - self.model, - tuple(make_torch_inputs(input_spec)), + self, + torch_inputs, onnx_file_path, input_names=[name for name in input_spec], output_names=output_names, + opset_version=17, ) - shutil.copyfile(self.encodings_path, encoding_file_path) - external_weights_file_path = "" + self._adapt_aimet_encodings( + self.aimet_encodings, encoding_file_path, onnx_file_path + ) - if external_weights: + external_weights_file_path = "" + if external_weights and zip: external_weights_file_name = f"{model_name}.data" external_weights_file_path = str(base_path / external_weights_file_name) # Torch exports to onnx with external weights scattered in a directory. @@ -139,11 +163,30 @@ def convert_to_onnx_and_aimet_encodings( location=external_weights_file_name, ) - zip_aimet_model( - zip_path, - zip_base_dir, - onnx_file_path, - encoding_file_path, - external_weights_file_path, - ) - return zip_path + if zip: + zip_path = os.path.join(output_dir, f"{model_name}.aimet.zip") + zip_aimet_model( + zip_path, + zip_base_dir, + onnx_file_path, + encoding_file_path, + external_weights_file_path, + ) + return zip_path + else: + # This path is persistent + return base_path.as_posix() + + return "" # mypy requires this for some reason + + def _use_zip_file(self) -> bool: + """ + Should the return of convert_to_hub_source_model be zipped. + """ + return True + + def _adapt_aimet_encodings(self, src_encodings, dst_encodings, onnx_model_path): + """ + Overridable file that adapts the AIMET encodings. + """ + shutil.copyfile(src=src_encodings, dst=dst_encodings) diff --git a/qai_hub_models/utils/aimet/encodings.py b/qai_hub_models/utils/aimet/encodings.py new file mode 100644 index 00000000..a28708a9 --- /dev/null +++ b/qai_hub_models/utils/aimet/encodings.py @@ -0,0 +1,142 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +import re +from copy import deepcopy + + +def find_name_mapping(pattern_pairs, src_names, dst_names, dst_input_names=None): + patterns = [re.compile(x) for x, y in pattern_pairs] + mapping = {} + rev_mapping = {} + known_unused = set() + for src_name in src_names: + for i in range(len(pattern_pairs)): + m = patterns[i].match(src_name) + if m: + dst_patterns = pattern_pairs[i][1] + if not isinstance(dst_patterns, list): + dst_patterns = [dst_patterns] + + used = False + for dst_pattern in dst_patterns: + if isinstance(dst_pattern, tuple): + assert dst_input_names is not None + # This contains a (node, index) pair, where the index + # refers to the input index of that node + dst_pattern, index = dst_pattern + dst_name = dst_pattern.format(*m.groups()) + if dst_name in dst_input_names: + real_dst_name = dst_input_names[dst_name][index] + mapping[src_name] = real_dst_name + rev_mapping[real_dst_name] = src_name + used = True + + elif not dst_pattern: + known_unused.add(src_name) + used = True + else: + # This dst_name refers to the edge name + dst_name = dst_pattern.format(*m.groups()) + if dst_name in dst_names: + mapping[src_name] = dst_name + rev_mapping[dst_name] = src_name + used = True + if used: + break + + return mapping, rev_mapping, known_unused + + +def map_encodings( + pattern_pairs, + src_names, + dst_names, + dst_input_names=None, + src_encodings=[], + dst_encodings=[], +): + patterns = [re.compile(x) for x, y in pattern_pairs] + mapping = {} + rev_mapping = {} + known_unused = set() + + def default_callback( + src_encodings, + dst_encodings, + src_name, + dst_name, + pattern_index, + num_patterns, + groups, + ): + if src_name in src_encodings: + src_entry = src_encodings[src_name] + dst_entry = deepcopy(src_entry) + if isinstance(dst_entry, dict): + dst_entry["name"] = dst_name + dst_encodings[dst_name] = dst_entry + + for src_name in src_names: + for i in range(len(pattern_pairs)): + m = patterns[i].match(src_name) + if m: + dst_patterns = pattern_pairs[i][1] + callback = default_callback + + if isinstance(dst_patterns, tuple) and callable(dst_patterns[1]): + dst_patterns, callback = dst_patterns + + if not isinstance(dst_patterns, list): + dst_patterns = [dst_patterns] + + used = False + + for dst_pattern_index, dst_pattern in enumerate(dst_patterns): + if isinstance(dst_pattern, tuple): + assert dst_input_names is not None + # This contains a (node, index) pair, where the index + # refers to the input index of that node + dst_pattern, index = dst_pattern + dst_name = dst_pattern.format(*m.groups()) + if dst_name in dst_input_names: + real_dst_name = dst_input_names[dst_name][index] + mapping[src_name] = real_dst_name + rev_mapping[real_dst_name] = src_name + used = True + + callback( + src_encodings, + dst_encodings, + src_name, + real_dst_name, + dst_pattern_index, + len(dst_patterns), + m.groups(), + ) + + elif not dst_pattern: + known_unused.add(src_name) + used = True + else: + # This dst_name refers to the edge name + dst_name = dst_pattern.format(*m.groups()) + if dst_name in dst_names: + mapping[src_name] = dst_name + rev_mapping[dst_name] = src_name + used = True + + callback( + src_encodings, + dst_encodings, + src_name, + dst_name, + dst_pattern_index, + len(dst_patterns), + m.groups(), + ) + if used: + break + + return mapping, rev_mapping, known_unused diff --git a/qai_hub_models/utils/args.py b/qai_hub_models/utils/args.py index c2259889..e82796c1 100644 --- a/qai_hub_models/utils/args.py +++ b/qai_hub_models/utils/args.py @@ -439,12 +439,13 @@ def get_qcom_chipsets() -> Set[str]: def _evaluate_export_common_parser( model_cls: Type[FromPretrainedTypeVar] | Type[FromPrecompiledTypeVar], - supports_tflite=True, - supports_qnn=True, - supports_onnx=True, - supports_precompiled_qnn_onnx=True, - default_runtime=TargetRuntime.TFLITE, - exporting_compiled_model=False, + supports_tflite: bool = True, + supports_qnn: bool = True, + supports_onnx: bool = True, + supports_precompiled_qnn_onnx: bool = True, + default_runtime: TargetRuntime = TargetRuntime.TFLITE, + exporting_compiled_model: bool = False, + is_hub_quantized: bool = False, ) -> argparse.ArgumentParser: """ Common arguments between export and evaluate scripts. @@ -452,7 +453,13 @@ def _evaluate_export_common_parser( # Set handler to resolve, to allow from_pretrained and get_input_spec # to have the same argument names. parser = get_parser(allow_dupe_args=True) - + if is_hub_quantized: + parser.add_argument( + "--num-calibration-samples", + type=int, + default=100, + help="The number of calibration data samples to use for quantization.", + ) if not exporting_compiled_model: # Default runtime for compiled model is fixed for given model available_runtimes = [] @@ -508,6 +515,7 @@ def export_parser( default_runtime: TargetRuntime = TargetRuntime.TFLITE, exporting_compiled_model: bool = False, default_export_device: str = DEFAULT_EXPORT_DEVICE, + is_hub_quantized: bool = False, ) -> argparse.ArgumentParser: """ Arg parser to be used in export scripts. @@ -532,11 +540,11 @@ def export_parser( True when exporting compiled model. If set, removing skip_profiling flag from export arguments. Default = False. - default_export_device: - Default device to set for export. + default_export_device: Default device to set for export. + is_hub_quantized: Whether the model is quantized via the hub quantize job. Returns: - Arg parser object. + argparse ArgumentParser object. """ parser = _evaluate_export_common_parser( model_cls=model_cls, @@ -546,6 +554,7 @@ def export_parser( supports_precompiled_qnn_onnx=supports_precompiled_qnn_onnx, default_runtime=default_runtime, exporting_compiled_model=exporting_compiled_model, + is_hub_quantized=is_hub_quantized, ) parser.add_argument( "--device", @@ -561,6 +570,12 @@ def export_parser( help="If set, will choose a random device with this chipset. " "Overrides whatever is set in --device.", ) + if is_hub_quantized: + parser.add_argument( + "--skip-compiling", + action="store_true", + help="If set, skips compiling to asset that can run on device.", + ) parser.add_argument( "--skip-profiling", action="store_true", @@ -609,6 +624,7 @@ def evaluate_parser( supports_qnn=True, supports_onnx=True, default_runtime=TargetRuntime.TFLITE, + is_hub_quantized: bool = False, ) -> argparse.ArgumentParser: """ Arg parser to be used in evaluate scripts. @@ -630,6 +646,7 @@ def evaluate_parser( If set, removing skip_profiling flag from export arguments. Default = False. default_runtime: Which runtime to use as default if not specified in cli args. + is_hub_quantized: Whether the model is quantized via the hub quantize job. Returns: Arg parser object. @@ -640,6 +657,7 @@ def evaluate_parser( supports_qnn=supports_qnn, supports_onnx=supports_onnx, default_runtime=default_runtime, + is_hub_quantized=is_hub_quantized, ) parser.add_argument( "--chipset", diff --git a/qai_hub_models/utils/asset_loaders.py b/qai_hub_models/utils/asset_loaders.py index 9cf6e429..cb79f853 100644 --- a/qai_hub_models/utils/asset_loaders.py +++ b/qai_hub_models/utils/asset_loaders.py @@ -15,12 +15,13 @@ import tempfile import threading import time +import zipfile from contextlib import contextmanager from enum import Enum from functools import partial from pathlib import Path from types import ModuleType -from typing import Any, Callable, Dict, List, Optional, Union +from typing import Any, Callable, Dict, Iterable, List, Optional, Tuple, Union from zipfile import ZipFile import gdown @@ -542,6 +543,7 @@ def from_cfg( "models_website_url": str, "models_website_relative_path": str, "email_template": str, + "genie_url": str, } ) ) @@ -743,7 +745,7 @@ def extract(self, force=True) -> Path: _, ext = os.path.splitext(self.local_cache_path) if ext == ".zip": - # Update local cache path to pont to the extracted zip folder. + # Update local cache path to point to the extracted zip folder. extract_zip_file(str(self.path())) os.remove(self.path()) # Deletes zip file self.is_extracted = True # Updates path() to return extracted path @@ -1045,6 +1047,37 @@ def extract_zip_file(filepath_str: str) -> Path: return out_path +# TODO (#12708): Remove this and rely on client +def zip_model(output_dir_path: str, model_path: str) -> str: + model_path = os.path.realpath(model_path) + package_name = os.path.basename(model_path) + compresslevel = 1 + + output_path = os.path.join(output_dir_path, package_name + ".zip") + os.makedirs(os.path.dirname(output_path), exist_ok=True) + with zipfile.ZipFile( + output_path, "w", compression=zipfile.ZIP_DEFLATED, compresslevel=compresslevel + ) as f: + walk: Iterable[Tuple[str, List[str], List[str]]] + if os.path.isfile(model_path): + root_path = os.path.dirname(model_path) + walk = [(root_path, [], [model_path])] + else: + root_path = os.path.join(model_path, "..") + walk = os.walk(model_path) + for root, _, files in walk: + # Create directory entry (can use f.mkdir from Python 3.11) + rel_root = os.path.relpath(root, root_path) + if rel_root != ".": + f.writestr(rel_root + "/", "") + for file in files: + f.write( + os.path.join(root, file), + os.path.relpath(os.path.join(root, file), root_path), + ) + return output_path + + def callback_with_retry( num_retries: int, callback: Callable, diff --git a/qai_hub_models/utils/config_loaders.py b/qai_hub_models/utils/config_loaders.py index da646f6b..42f382b4 100644 --- a/qai_hub_models/utils/config_loaders.py +++ b/qai_hub_models/utils/config_loaders.py @@ -30,14 +30,16 @@ from schema import Schema, SchemaError from qai_hub_models.utils.asset_loaders import ASSET_CONFIG, QAIHM_WEB_ASSET, load_yaml -from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.path_helpers import ( MODELS_PACKAGE_NAME, QAIHM_PACKAGE_NAME, get_qaihm_models_root, get_qaihm_package_root, ) -from qai_hub_models.utils.scorecard.common import get_supported_devices +from qai_hub_models.utils.scorecard.common import ( + ScorecardProfilePath, + get_supported_devices, +) QAIHM_PACKAGE_ROOT = get_qaihm_package_root() QAIHM_MODELS_ROOT = get_qaihm_models_root() @@ -122,10 +124,88 @@ def get_all_supported_devices(): return get_supported_devices( - ["qualcomm-snapdragon-x-elite", "qualcomm-snapdragon-8gen3"] + [ + "qualcomm-snapdragon-8-elite", + "qualcomm-snapdragon-x-elite", + "qualcomm-snapdragon-8gen3", + ] ) +def _get_origin(input_type: Type) -> Type: + """ + For nested types like List[str] or Union[str, int], this function will + return the "parent" type like List or Union. + + If the input type is not a nested type, the function returns the input_type. + """ + return getattr(input_type, "__origin__", input_type) + + +def _extract_optional_type(input_type: Type) -> Type: + """ + Given an optional type as input, returns the inner type that is wrapped. + + For example, if input type is Optional[int], the function returns int. + """ + assert ( + _get_origin(input_type) == Union + ), "Input type must be an instance of `Optional`." + union_args = get_args(input_type) + assert len(union_args) == 2 and issubclass( + union_args[1], type(None) + ), "Input type must be an instance of `Optional`." + return union_args[0] + + +def _constructor_from_type(input_type: Type) -> Union[Type, Callable]: + """ + Given a type, return the appropriate constructor for that type. + + For primitive types like str and int, the type and constructor are the same object. + + For types like List, the constructor is list. + """ + input_type = _get_origin(input_type) + if input_type == List: + return list + if input_type == Dict: + return dict + return input_type + + +@dataclass +class BaseDataClass: + @classmethod + def get_schema(cls) -> Schema: + """Derive the Schema from the fields set on the dataclass.""" + schema_dict = {} + field_datatypes = get_type_hints(cls) + for field in fields(cls): + field_type = field_datatypes[field.name] + if _get_origin(field_type) == Union: + field_type = _extract_optional_type(field_type) + assert ( + field.default != dataclasses.MISSING + ), "Optional fields must have a default set." + if field.default != dataclasses.MISSING: + field_key = OptionalSchema(field.name, default=field.default) + else: + field_key = field.name + schema_dict[field_key] = _constructor_from_type(field_type) + return Schema(And(schema_dict)) + + @classmethod + def from_dict( + cls: Type[BaseDataClassTypeVar], val_dict: Dict[str, Any] + ) -> BaseDataClassTypeVar: + kwargs = {field.name: val_dict[field.name] for field in fields(cls)} + return cls(**kwargs) + + +BaseDataClassTypeVar = TypeVar("BaseDataClassTypeVar", bound="BaseDataClass") + + @unique class FORM_FACTOR(Enum): PHONE = 0 @@ -257,323 +337,178 @@ def bytes_to_mb(num_bytes: int) -> int: return round(num_bytes / (1 << 20)) -@dataclass -class ModelRuntimePerformanceDetails: - model_name: str - device_name: str - device_os: str - runtime: TargetRuntime - inference_time_ms: int - peak_memory_bytes: Tuple[int, int] # min, max - compute_unit_counts: Dict[str, int] - - class QAIHMModelPerf: """Class to read the perf.yaml and parse it for displaying it on HuggingFace.""" - def __init__(self, perf_yaml_path, model_name): - self.model_name = model_name - self.perf_yaml_path = perf_yaml_path - self.skip_overall = False - self.skip_tflite = False - self.skip_qnn = False - self.tflite_row = ( - "| Samsung Galaxy S23 Ultra (Android 13) | Snapdragon® 8 Gen 2 |" - ) - self.qnn_row = "| Samsung Galaxy S23 Ultra (Android 13) | Snapdragon® 8 Gen 2 |" - - if os.path.exists(self.perf_yaml_path): - self.perf_details = load_yaml(self.perf_yaml_path) - num_models = len(self.perf_details["models"]) - - # Get TFLite summary from perf.yaml - try: - self.tflite_summary = [] - for model in self.perf_details["models"]: - self.tflite_summary.append( - model["performance_metrics"][0][TFLITE_PATH] - ) - except Exception: - self.skip_tflite = True - - if not self.skip_overall and not self.skip_tflite: - for num in range(num_models): - if isinstance(self.tflite_summary[num]["inference_time"], str): - self.skip_tflite = True - - # Get QNN summary from perf.yaml - try: - self.qnn_summary = [] - for model in self.perf_details["models"]: - self.qnn_summary.append(model["performance_metrics"][0][QNN_PATH]) - except Exception: - self.skip_qnn = True - if not self.skip_overall and not self.skip_qnn: - for num in range(num_models): - if isinstance(self.qnn_summary[num]["inference_time"], str): - self.skip_qnn = True - else: - self.skip_overall = True - - def _get_runtime_type(self, model_type): - if model_type == "tflite": - return "TFLite" - if model_type == "so": - return "QNN Model Library" - if model_type == "bin": - return "QNN Binary" - raise RuntimeError(f"Unsupported model_type specified {model_type}.") - - def get_row(self, skip, summary_list, initial_row, model_type, has_assets=True): - # Creating a row for performance table. - row = "" - if not skip: - names = self.get_submodel_names() - for summary, name in zip(summary_list, names): - inf_time = summary["inference_time"] - inference_time = f"{inf_time / 1000} ms" - mem_min = bytes_to_mb(summary["estimated_peak_memory_range"]["min"]) - mem_max = bytes_to_mb(summary["estimated_peak_memory_range"]["max"]) - peak_memory_range = f"{mem_min} - {mem_max} MB" - if model_type == "tflite": - self.tflite_inference_time = inference_time - self.tflite_peak_memory_range = peak_memory_range - elif model_type == "so" or model_type == "bin": - self.qnn_inference_time = inference_time - self.qnn_peak_memory_range = peak_memory_range - primary_compute_unit = summary["primary_compute_unit"] - precision = summary["precision"].upper() - base_url = ASSET_CONFIG.get_hugging_face_url(self.model_name) - # For model cards with no assets, only show model name with no link; - # as there is not target model to download - if has_assets: - target_model = f" [{name}.{model_type}]({base_url}/blob/main/{name}.{model_type})" - else: - target_model = name - - runtime_type = self._get_runtime_type(model_type) - row += ( - initial_row - + f" {runtime_type} | {inference_time} | {peak_memory_range} | {precision} | {primary_compute_unit} | {target_model} \n" - ) - return row - return "" - - def get_tflite_row(self): - # Get TFLite row for a submodel on a device. - return self.get_row( - self.skip_tflite, self.tflite_summary, self.tflite_row, "tflite" - ) + ### + # Helper Struct Classes + ### + + @dataclass + class PerformanceDetails: + job_id: str + inference_time_microsecs: float + peak_memory_bytes: Tuple[int, int] # min, max + compute_unit_counts: Dict[str, int] + primary_compute_unit: str + precision: str + + @staticmethod + def from_dict(device_perf_details: Dict) -> QAIHMModelPerf.PerformanceDetails: + peak_memory = device_perf_details["estimated_peak_memory_range"] + layer_info = device_perf_details["layer_info"] + compute_unit_counts = {} + for layer_name, count in layer_info.items(): + if "layers_on" in layer_name: + if count > 0: + compute_unit_counts[layer_name[-3:].upper()] = count + + return QAIHMModelPerf.PerformanceDetails( + job_id=device_perf_details["job_id"], + inference_time_microsecs=float(device_perf_details["inference_time"]), + peak_memory_bytes=(peak_memory["min"], peak_memory["max"]), + compute_unit_counts=compute_unit_counts, + primary_compute_unit=device_perf_details["primary_compute_unit"], + precision=device_perf_details["precision"], + ) - def get_qnn_row(self, is_precompiled: bool = False, has_assets=True): - # Get QNN row for a submodel on a device. - return self.get_row( - self.skip_qnn, - self.qnn_summary, - self.qnn_row, - "bin" if is_precompiled else "so", - has_assets, - ) + @dataclass + class LLMPerformanceDetails: + time_to_first_token_range_secs: Tuple[str, str] # min, max + tokens_per_second: float + + @staticmethod + def from_dict( + device_perf_details: Dict, + ) -> QAIHMModelPerf.LLMPerformanceDetails: + ttftr = device_perf_details["time_to_first_token_range"] + return QAIHMModelPerf.LLMPerformanceDetails( + time_to_first_token_range_secs=( + # Original data is in microseconds + str(float(ttftr["min"]) * 1e-6), + str(float(ttftr["max"]) * 1e-6), + ), + tokens_per_second=device_perf_details["tokens_per_second"], + ) - def body_perf(self, is_precompiled: bool = False, has_assets: bool = True): - # Combine all the rows to make the body of performance table. - if self.skip_tflite: - return self.get_qnn_row(is_precompiled, has_assets) - elif self.skip_qnn: - return self.get_tflite_row() - else: - return self.get_tflite_row() + self.get_qnn_row(is_precompiled, has_assets) - - def compute_unit_summary(self, runtime_path=TFLITE_PATH): - # Get compute unit summary for export script's output. - npu, gpu, cpu = 0, 0, 0 - cu_summary = "" - for model in self.perf_details["models"]: - layer_info = model["performance_metrics"][0][runtime_path]["layer_info"] - npu += layer_info["layers_on_npu"] - gpu += layer_info["layers_on_gpu"] - cpu += layer_info["layers_on_cpu"] - if npu > 0: - cu_summary += f"NPU ({npu})" - if gpu > 0: - cu_summary += f"GPU ({gpu})" - if cpu > 0: - cu_summary += f"CPU ({cpu})" - return cu_summary - - def get_submodel_names_and_ids(self): - # Get the names, TFLite job ids and QNN job ids. - names = self.get_submodel_names() - tflite_job_ids, qnn_job_ids = [], [] - for model in self.perf_details["models"]: - if TFLITE_PATH in model["performance_metrics"][0]: - tflite_job_ids.append( - model["performance_metrics"][0][TFLITE_PATH]["job_id"] + @dataclass + class EvaluationDetails(BaseDataClass): + name: str + value: float + unit: str + + @dataclass + class DeviceDetails(BaseDataClass): + name: str + os: str + form_factor: str + os_name: str + manufacturer: str + chipset: str + + @dataclass + class ProfilePerfDetails: + path: ScorecardProfilePath + perf_details: QAIHMModelPerf.PerformanceDetails | QAIHMModelPerf.LLMPerformanceDetails + eval_details: Optional[QAIHMModelPerf.EvaluationDetails] = None + + @staticmethod + def from_dict( + path: ScorecardProfilePath, perf_details_dict: Dict + ) -> QAIHMModelPerf.ProfilePerfDetails: + perf_details: QAIHMModelPerf.LLMPerformanceDetails | QAIHMModelPerf.PerformanceDetails + if llm_metrics := perf_details_dict.get("llm_metrics", None): + perf_details = QAIHMModelPerf.LLMPerformanceDetails.from_dict( + llm_metrics ) - if QNN_PATH in model["performance_metrics"][0]: - qnn_job_ids.append(model["performance_metrics"][0][QNN_PATH]["job_id"]) - return names, tflite_job_ids, qnn_job_ids - - def get_submodel_names(self): - # Get names of all the submodels. - names = [] - for model in self.perf_details["models"]: - names.append(model["name"]) - return names - - def get_perf_details( - self, - runtime: TargetRuntime, - device: str | None = None, - device_os: str | None = None, - ) -> Dict[str, ModelRuntimePerformanceDetails | None]: - """ - Get model performance details for the selected device and runtime. - - If device is None, picks the first device specified in the perf results. - - Returns a dictionary of - { model_component_name : performance details object } - - If there is only one component, model_component_name == model_name. - - The performance details object will be null if the requested - perf details do not exist, or if the perf job failed. - """ - if runtime == TargetRuntime.TFLITE: - rt_name = "torchscript_onnx_tflite" - elif runtime == TargetRuntime.QNN: - rt_name = "torchscript_onnx_qnn" - else: - raise NotImplementedError() - - # Model -> Performance Details - # None == Test did not run. - perf_details: Dict[str, ModelRuntimePerformanceDetails | None] = {} - - for model in self.perf_details["models"]: - name = model["name"] - metrics = model["performance_metrics"] - for device_metrics in metrics: - device_name = device_metrics["reference_device_info"]["name"] - metric_device_os = device_metrics["reference_device_info"]["os"] - - # Verify Device Matches Requested Device - if device and device_name != device: - continue - if device_os and metric_device_os != device_os: - continue - - perf_rt = device_metrics.get(rt_name, None) - - # Inference Time - inf_time = perf_rt["inference_time"] if perf_rt else "null" - if inf_time == "null": - # Compilation or inference failed. - perf_details[name] = None - continue - inf_time /= 1000 - - # Memory - peak_mem = perf_rt["estimated_peak_memory_range"] - peak_mem_bytes: Tuple[int, int] = tuple([peak_mem["min"], peak_mem["max"]]) # type: ignore - - # Layer Info - layer_info = perf_rt["layer_info"] - compute_unit_counts = {} - for layer_name, count in layer_info.items(): - if "layers_on" in layer_name: - if count > 0: - compute_unit_counts[layer_name[-3:].upper()] = count - - perf_details[name] = ModelRuntimePerformanceDetails( - model_name=model, - device_name=device_name, - device_os=metric_device_os, - runtime=runtime, - inference_time_ms=inf_time, - peak_memory_bytes=peak_mem_bytes, - compute_unit_counts=compute_unit_counts, + else: + perf_details = QAIHMModelPerf.PerformanceDetails.from_dict( + perf_details_dict ) - if name not in perf_details.keys(): - perf_details[name] = None - - return perf_details - - -def _get_origin(input_type: Type) -> Type: - """ - For nested types like List[str] or Union[str, int], this function will - return the "parent" type like List or Union. - - If the input type is not a nested type, the function returns the input_type. - """ - return getattr(input_type, "__origin__", input_type) - - -def _extract_optional_type(input_type: Type) -> Type: - """ - Given an optional type as input, returns the inner type that is wrapped. - - For example, if input type is Optional[int], the function returns int. - """ - assert ( - _get_origin(input_type) == Union - ), "Input type must be an instance of `Optional`." - union_args = get_args(input_type) - assert len(union_args) == 2 and issubclass( - union_args[1], type(None) - ), "Input type must be an instance of `Optional`." - return union_args[0] - - -def _constructor_from_type(input_type: Type) -> Union[Type, Callable]: - """ - Given a type, return the appropriate constructor for that type. - - For primitive types like str and int, the type and constructor are the same object. + if eval_metrics := perf_details_dict.get("evaluation_metrics", None): + eval_details_data = ( + QAIHMModelPerf.EvaluationDetails.get_schema().validate(eval_metrics) + ) + eval_details = QAIHMModelPerf.EvaluationDetails.from_dict( + eval_details_data + ) + else: + eval_details = None - For types like List, the constructor is list. - """ - input_type = _get_origin(input_type) - if input_type == List: - return list - if input_type == Dict: - return dict - return input_type + return QAIHMModelPerf.ProfilePerfDetails( + path=path, perf_details=perf_details, eval_details=eval_details + ) + @dataclass + class DevicePerfDetails: + device: QAIHMModelPerf.DeviceDetails + details_per_path: Dict[ScorecardProfilePath, QAIHMModelPerf.ProfilePerfDetails] + + @staticmethod + def from_dict( + device: QAIHMModelPerf.DeviceDetails, device_runtime_details: Dict + ) -> QAIHMModelPerf.DevicePerfDetails: + details_per_path = {} + for profile_path in ScorecardProfilePath: + if profile_path.long_name in device_runtime_details: + perf_details_dict = device_runtime_details[profile_path.long_name] + details_per_path[ + profile_path + ] = QAIHMModelPerf.ProfilePerfDetails.from_dict( + profile_path, perf_details_dict + ) + return QAIHMModelPerf.DevicePerfDetails( + device=device, details_per_path=details_per_path + ) -@dataclass -class BaseDataClass: - @classmethod - def get_schema(cls) -> Schema: - """Derive the Schema from the fields set on the dataclass.""" - schema_dict = {} - field_datatypes = get_type_hints(cls) - for field in fields(cls): - field_type = field_datatypes[field.name] - if _get_origin(field_type) == Union: - field_type = _extract_optional_type(field_type) - assert ( - field.default != dataclasses.MISSING - ), "Optional fields must have a default set." - if field.default != dataclasses.MISSING: - field_key = OptionalSchema(field.name, default=field.default) - else: - field_key = field.name - schema_dict[field_key] = _constructor_from_type(field_type) - return Schema(And(schema_dict)) + @dataclass + class ModelPerfDetails: + model: str + details_per_device: Dict[str, QAIHMModelPerf.DevicePerfDetails] + + @staticmethod + def from_dict( + model: str, model_performance_metrics: List[Dict] + ) -> QAIHMModelPerf.ModelPerfDetails: + details_per_device = {} + for device_perf_details in model_performance_metrics: + device_details_data = ( + QAIHMModelPerf.DeviceDetails.get_schema().validate( + device_perf_details["reference_device_info"] + ) + ) + device_details = QAIHMModelPerf.DeviceDetails.from_dict( + device_details_data + ) + details_per_device[ + device_details.name + ] = QAIHMModelPerf.DevicePerfDetails.from_dict( + device_details, device_perf_details + ) - @classmethod - def from_dict( - cls: Type[BaseDataClassTypeVar], val_dict: Dict[str, Any] - ) -> BaseDataClassTypeVar: - kwargs = {field.name: val_dict[field.name] for field in fields(cls)} - return cls(**kwargs) + return QAIHMModelPerf.ModelPerfDetails( + model=model, details_per_device=details_per_device + ) + def __init__(self, perf_yaml_path, model_name): + self.model_name = model_name + self.perf_yaml_path = perf_yaml_path + self.per_model_details: Dict[str, QAIHMModelPerf.ModelPerfDetails] = {} -BaseDataClassTypeVar = TypeVar("BaseDataClassTypeVar", bound="BaseDataClass") + if os.path.exists(self.perf_yaml_path): + self.perf_details = load_yaml(self.perf_yaml_path) + all_models_and_perf = self.perf_details["models"] + if not isinstance(all_models_and_perf, list): + all_models_and_perf = [all_models_and_perf] + + for model_perf in all_models_and_perf: + model_name = model_perf["name"] + self.per_model_details[ + model_name + ] = QAIHMModelPerf.ModelPerfDetails.from_dict( + model_name, model_perf["performance_metrics"] + ) @dataclass @@ -659,6 +594,11 @@ class QAIHMModelCodeGen(BaseDataClass): # on a full dataset. Datasets specified here must be chosen from `qai_hub_models/datasets`. eval_datasets: Optional[List[str]] = None + # If set, quantizes the model using AI Hub quantize job. This also requires setting + # the `eval_datasets` field. Calibration data will be pulled from the first item + # in `eval_datasets`. + use_hub_quantization: bool = False + # By default inference tests are done using 8gen1 chipset to avoid overloading # newer devices. Some models don't work on 8gen1, so use 8gen3 for those. inference_on_8gen3: bool = False @@ -688,6 +628,12 @@ def load_code_gen_yaml(path: str | Path | None = None) -> Dict[str, Any]: data = QAIHMModelCodeGen.get_schema().validate(data) except SchemaError as e: assert 0, f"{e.code} in {path}" + if data["is_aimet"] and data["use_hub_quantization"]: + raise ValueError( + "Flags is_aimet and use_hub_quantization cannot both be set." + ) + if data["use_hub_quantization"] and len(data["eval_datasets"]) == 0: + raise ValueError("Must set eval_datasets if use_hub_quantization is set.") return data @@ -720,22 +666,6 @@ class QAIHMModelInfo(BaseDataClass): # A list of applicable tags to add to the model tags: List[MODEL_TAG] - # Link to the research paper where the model was first published. Usually an arxiv link. - research_paper: str - - # The title of the research paper. - research_paper_title: str - - # A link to the model's license. Most commonly found in the github repo it was cloned from. - license: str - - # A link to the AIHub license, unless the license is more restrictive like GPL. - # In that case, this should point to the same as the model license. - deploy_license: str - - # A link to the original github repo with the model's code. - source_repo: str - # A list of real-world applicaitons for which this model could be used. # This is free-from and almost anything reasonable here is fine. applicable_scenarios: List[str] @@ -758,13 +688,6 @@ class QAIHMModelInfo(BaseDataClass): # CodeGen options from code-gen.yaml in the model's folder. code_gen_config: QAIHMModelCodeGen - # The license type of the original model repo. - license_type: str - - # Should be set to `AI Model Hub License`, unless the license is more restrictive like GPL. - # In that case, this should be the same as the model license. - deploy_license_type: str - # A list of datasets for which the model has pre-trained checkpoints # available as options in `model.py`. Typically only has one entry. dataset: List[str] @@ -778,6 +701,32 @@ class QAIHMModelInfo(BaseDataClass): # Number of output classes: The number of classes the model can classify or annotate. technical_details: Dict[str, str] + # The license type of the original model repo. + license_type: str + + # Some models are made by company + model_maker_id: Optional[str] = None + + # Link to the research paper where the model was first published. Usually an arxiv link. + research_paper: Optional[str] = None + + # The title of the research paper. + research_paper_title: Optional[str] = None + + # A link to the original github repo with the model's code. + source_repo: Optional[str] = None + + # A link to the model's license. Most commonly found in the github repo it was cloned from. + license: Optional[str] = None + + # A link to the AIHub license, unless the license is more restrictive like GPL. + # In that case, this should point to the same as the model license. + deploy_license: Optional[str] = None + + # Should be set to `AI Model Hub License`, unless the license is more restrictive like GPL. + # In that case, this should be the same as the model license. + deploy_license_type: Optional[str] = None + # If set, model assets shouldn't distributed. restrict_model_sharing: bool = False @@ -829,7 +778,9 @@ def validate(self) -> Tuple[bool, Optional[str]]: return False, f"Model {r_model} cannot be related to itself." # If paper is arxiv, it should be an abs link - if self.research_paper.startswith("https://arxiv.org/"): + if self.research_paper is not None and self.research_paper.startswith( + "https://arxiv.org/" + ): if "/abs/" not in self.research_paper: return ( False, @@ -840,9 +791,14 @@ def validate(self) -> Tuple[bool, Optional[str]]: if self.license_type not in HF_AVAILABLE_LICENSES: return False, f"license can be one of these: {HF_AVAILABLE_LICENSES}" - if not self.deploy_license: + if self.model_type_llm and self.llm_details is not None: + purchase_required = ( + self.llm_details.get("call_to_action", "") == "contact_for_purchase" + ) + + if not self.deploy_license and not purchase_required: return False, "deploy_license cannot be empty" - if not self.deploy_license_type: + if not self.deploy_license_type and not purchase_required: return False, "deploy_license_type cannot be empty" # Status Reason @@ -896,31 +852,64 @@ def validate(self) -> Tuple[bool, Optional[str]]: if expected_example_use != ASSET_CONFIG.get_example_use(self.id): return False, "Example-usage field not pointing to expected relative path" + # Check that model_type_llm and llm_details fields if self.model_type_llm: - assert self.llm_details is not None + assert ( + self.llm_details is not None + ), "All LLMs must have 'llm_details' section." + assert ( + "call_to_action" in self.llm_details + ), "All LLMs must have 'call_to_action' in 'llm_details'." + assert self.llm_details["call_to_action"] in { + "contact_for_purchase", + "download", + "view_readme", + "contact_for_download", + }, "All LLMs 'call_to_action' field only allows these values: download, view_readme, contact_for_purchase or contact_for_download." for dev in self.llm_details: - # Check the device is one of the supported devices. - assert dev in get_all_supported_devices() - - if "purchase_required" in self.llm_details[dev]: - assert self.llm_details[dev]["purchase_required"] - if "model_download_url" in self.llm_details[dev]: - assert self.llm_details[dev]["model_download_url"] is not None - model_download_url = ASSET_CONFIG.get_web_asset_url( - self.id, self.llm_details[dev]["model_download_url"] + if dev not in {"call_to_action", "genie_compatible"}: + assert ( + list(self.llm_details[dev].keys())[0] == "torchscript_onnx_qnn" ) - # Check if the url exists + # Check the device is one of the supported devices. + assert dev in get_all_supported_devices() + if ( - session.head(model_download_url).status_code - != requests.codes.ok + "model_download_url" + in self.llm_details[dev]["torchscript_onnx_qnn"] ): - return False, f"Download URL is missing at {model_download_url}" - if "genie_url" in self.llm_details[dev]: - assert self.llm_details[dev]["genie_url"] is not None - genie_url = self.llm_details[dev]["genie_url"] - # Check if the url exists - if session.head(genie_url).status_code != requests.codes.ok: - return False, f"Genie App URL is missing at {genie_url}" + assert ( + self.llm_details[dev]["torchscript_onnx_qnn"][ + "model_download_url" + ] + is not None + ) + version, relative_path = int( + self.llm_details[dev]["torchscript_onnx_qnn"][ + "model_download_url" + ].split("/")[0][1:] + ), "/".join( + self.llm_details[dev]["torchscript_onnx_qnn"][ + "model_download_url" + ].split("/")[1:] + ) + model_download_url = ASSET_CONFIG.get_model_asset_url( + self.id, version, relative_path + ) + # Check if the url exists + if ( + session.head(model_download_url).status_code + != requests.codes.ok + ): + return ( + False, + f"Download URL is missing at {model_download_url}", + ) + + if self.llm_details["call_to_action"] == "contact_for_purchase": + assert not self.llm_details.get("genie_compatible", False) + else: + assert self.llm_details is None return True, None diff --git a/qai_hub_models/utils/evaluate.py b/qai_hub_models/utils/evaluate.py index dfad39b1..42e9cac6 100644 --- a/qai_hub_models/utils/evaluate.py +++ b/qai_hub_models/utils/evaluate.py @@ -120,11 +120,13 @@ def _populate_data_cache_impl( else: output_names = ["output_0"] input_entries = make_hub_dataset_entries( + (model_inputs.split(1, dim=0),), input_names, channel_last_input, - model_inputs.split(1, dim=0), ) - gt_entries = make_hub_dataset_entries(output_names, None, ground_truth_values) + gt_entries = make_hub_dataset_entries( + (ground_truth_values,), output_names, None + ) # print(input_entries) input_dataset = hub.upload_dataset(input_entries) gt_dataset = hub.upload_dataset(gt_entries) @@ -209,7 +211,7 @@ def _populate_data_cache( shutil.move(str(tmp_cache_path), str(cache_path)) -def sample_dataset(dataset: Dataset, num_samples: int, seed: int) -> Dataset: +def sample_dataset(dataset: Dataset, num_samples: int, seed: int = 42) -> Dataset: """ Create a dataset that is a subsample of `dataset` with `num_samples`. diff --git a/qai_hub_models/utils/huggingface.py b/qai_hub_models/utils/huggingface.py index 4ddd9bef..34361c7e 100644 --- a/qai_hub_models/utils/huggingface.py +++ b/qai_hub_models/utils/huggingface.py @@ -29,12 +29,14 @@ def fetch_huggingface_target_model( file_types = ["tflite"] elif runtime_path == TargetRuntime.QNN: file_types = ["so", "bin"] + elif runtime_path == TargetRuntime.ONNX: + file_types = ["onnx"] else: raise NotImplementedError() files = [] for file_type in file_types: - files += fs.glob(os.path.join(hf_path, f"**/*.{file_type}")) + files += fs.glob(os.path.join(hf_path, f"*.{file_type}")) if not files: raise FileNotFoundError( f"No compiled assets are available on Huggingface for {model_name} with runtime {runtime_path.name}." @@ -49,10 +51,13 @@ def fetch_huggingface_target_model( return paths -def has_model_access(repo_name: str, repo_url: str): +def has_model_access(repo_name: str, repo_url: str | None = None): # Huggingface returns GatedRepoError if model is not accessible to current User. # ref: https://github.com/huggingface/huggingface_hub/blob/5ff2d150d121d04799b78bc08f2343c21b8f07a9/src/huggingface_hub/utils/_errors.py#L135 + if not repo_url: + repo_url = "https://huggingface.co/" + repo_name + try: hf_api = HfApi() hf_api.model_info(repo_name) diff --git a/qai_hub_models/utils/inference.py b/qai_hub_models/utils/inference.py index 8a33a407..c2181f0c 100644 --- a/qai_hub_models/utils/inference.py +++ b/qai_hub_models/utils/inference.py @@ -253,7 +253,6 @@ def compile_model_from_args( """ export_file = f"qai_hub_models.models.{model_id}.export" export_module = import_module(export_file) - compile_job: hub.CompileJob if cli_args.chipset: device_cli = f"--chipset {cli_args.chipset}" else: @@ -279,10 +278,7 @@ def compile_model_from_args( **component_kwargs, ) - if component is None: - no_hub_access = len(export_output) == 0 or isinstance(export_output[0], str) - else: - no_hub_access = export_output[component][0] is None + no_hub_access = isinstance(export_output, list) if no_hub_access: # The export returned local file paths, which mean Hub credentials were not found. @@ -291,29 +287,36 @@ def compile_model_from_args( ) export_output = export_output if component is None else export_output[component] - compile_job, _, _ = export_output - target_model = compile_job.get_target_model() + target_model = export_output.compile_job.get_target_model() assert target_model is not None return target_model def make_hub_dataset_entries( + tensors_tuple: Tuple[ + torch.Tensor + | np.ndarray + | List[torch.Tensor | np.ndarray] + | Tuple[torch.Tensor | np.ndarray], + ..., + ], input_names: List[str], - channel_last_input: Optional[List[str]], - *args: torch.Tensor | np.ndarray | List[torch.Tensor | np.ndarray], + channel_last_input: Optional[List[str]] = None, ) -> DatasetEntries: """ Given input tensor(s) in either numpy or torch format, convert to hub DatasetEntries format. Parameters: + tensors: Tensor data in numpy or torch.Tensor format. input_names: List of input names. channel_last_input: Comma-separated list of input names to transpose channel. - target_runtime: Runtime of model being used to inference this dataset. - args: Tensor data in numpy or torch.Tensor format. """ dataset = {} - for name, inputs in zip(input_names, args): + assert len(tensors_tuple) == len( + input_names + ), "Number of elements in tensors_tuple must match number of inputs" + for name, inputs in zip(input_names, tensors_tuple): if not isinstance(inputs, (list, tuple)): inputs = [inputs] # type: ignore @@ -442,7 +445,8 @@ def __call__( assert len(args) == 1, "Only 1 dataset can be provided for inference." dataset = args[0] else: - dataset_entries = make_hub_dataset_entries(self.input_names, self.channel_last_input, *args) # type: ignore + tensors = tuple(args) + dataset_entries = make_hub_dataset_entries(tensors, self.input_names, self.channel_last_input) # type: ignore dataset = hub.upload_dataset(dataset_entries) inference_job = hub.submit_inference_job( diff --git a/qai_hub_models/utils/path_helpers.py b/qai_hub_models/utils/path_helpers.py index 2dc4a50f..dfb6a82d 100644 --- a/qai_hub_models/utils/path_helpers.py +++ b/qai_hub_models/utils/path_helpers.py @@ -18,7 +18,7 @@ def get_all_models(public_only: bool = False): if not subdir.is_dir(): continue # Heuristic to see if this is a model we should generate export.py for. - if (subdir / "model.py").exists() and (subdir / "test.py").exists(): + if (subdir / "model.py").exists(): if public_only: if not (subdir / "info.yaml").exists(): continue diff --git a/qai_hub_models/utils/printing.py b/qai_hub_models/utils/printing.py index 0884f1d0..5ea8ae7a 100644 --- a/qai_hub_models/utils/printing.py +++ b/qai_hub_models/utils/printing.py @@ -2,6 +2,8 @@ # Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. # SPDX-License-Identifier: BSD-3-Clause # --------------------------------------------------------------------- +from __future__ import annotations + from collections import Counter from pathlib import Path from typing import Any, Dict, List, Optional, Union @@ -14,10 +16,7 @@ from qai_hub_models.utils.base_model import TargetRuntime from qai_hub_models.utils.compare import METRICS_FUNCTIONS, generate_comparison_metrics -from qai_hub_models.utils.config_loaders import ( - ModelRuntimePerformanceDetails, - bytes_to_mb, -) +from qai_hub_models.utils.config_loaders import QAIHMModelPerf, bytes_to_mb from qai_hub_models.utils.qnn_helpers import is_qnn_hub_model _INFO_DASH = "-" * 60 @@ -41,7 +40,7 @@ def print_with_box(data: List[str]) -> None: def print_inference_metrics( - inference_job: hub.InferenceJob, + inference_job: Optional[hub.InferenceJob], inference_result: DatasetEntries, torch_out: List[np.ndarray], output_names: Optional[List[str]] = None, @@ -68,7 +67,8 @@ def custom_float_format(x): formatted_df = df_eval.applymap(custom_float_format) print( - f"\nComparing on-device vs. local-cpu inference for {inference_job.name.title()}." + "\nComparing on-device vs. local-cpu inference" + + (f" for {inference_job.name.title()}." if inference_job is not None else "") ) print(tabulate(formatted_df, headers="keys", tablefmt="grid")) # type: ignore print() @@ -77,9 +77,10 @@ def custom_float_format(x): for m in df_eval.columns.drop("shape"): # type: ignore print(f"- {m}:", METRICS_FUNCTIONS[m][1]) - last_line = f"More details: {inference_job.url}" - print() - print(last_line) + if inference_job is not None: + last_line = f"More details: {inference_job.url}" + print() + print(last_line) def print_profile_metrics_from_job( @@ -90,7 +91,6 @@ def print_profile_metrics_from_job( [op.get("compute_unit", "UNK") for op in profile_data["execution_detail"]] ) execution_summary = profile_data["execution_summary"] - inference_time_ms = execution_summary["estimated_inference_time"] / 1000 peak_memory_bytes = execution_summary["inference_memory_peak_range"] print(f"\n{_INFO_DASH}") print(f"Performance results on-device for {profile_job.name.title()}.") @@ -105,48 +105,87 @@ def print_profile_metrics_from_job( else: raise NotImplementedError() - print_profile_metrics( - ModelRuntimePerformanceDetails( - profile_job.model.name, - profile_job.device.name, - profile_job.device.os, - runtime, - inference_time_ms, - peak_memory_bytes, - compute_unit_counts, - ) + perf_details = QAIHMModelPerf.PerformanceDetails( + job_id=profile_job.job_id, + inference_time_microsecs=execution_summary["estimated_inference_time"], + peak_memory_bytes=peak_memory_bytes, + compute_unit_counts=compute_unit_counts, + # Unused + primary_compute_unit="", + precision="", + ) + + device_details = QAIHMModelPerf.DeviceDetails( + name=profile_job.device.name, + os=profile_job.device.os, + # unused + form_factor="", + os_name="", + manufacturer="", + chipset="", ) + + print_profile_metrics(device_details, runtime, perf_details) print(_INFO_DASH) last_line = f"More details: {profile_job.url}\n" print(last_line) -def print_profile_metrics( - details: ModelRuntimePerformanceDetails, -): - inf_time = details.inference_time_ms - peak_memory_mb = f"[{bytes_to_mb(details.peak_memory_bytes[0])}, {bytes_to_mb(details.peak_memory_bytes[1])}]" - num_ops = sum(details.compute_unit_counts.values()) - compute_units = [ - f"{unit} ({num_ops} ops)" - for unit, num_ops in details.compute_unit_counts.items() - ] - +def get_profile_metrics( + device: QAIHMModelPerf.DeviceDetails, + runtime: TargetRuntime, + perf_details: QAIHMModelPerf.PerformanceDetails + | QAIHMModelPerf.LLMPerformanceDetails, +) -> str: rows = [ - ["Device", f"{details.device_name} ({details.device_os})"], - ["Runtime", f"{details.runtime.name}"], - [ - "Estimated inference time (ms)", - "<0.1" if inf_time < 0.1 else f"{inf_time:.1f}", - ], - ["Estimated peak memory usage (MB)", f"{peak_memory_mb}"], - ["Total # Ops", f"{num_ops}"], - ["Compute Unit(s)", " ".join(compute_units)], + ["Device", f"{device.name} ({device.os})"], + ["Runtime", runtime.name], ] + + if isinstance(perf_details, QAIHMModelPerf.LLMPerformanceDetails): + rows.extend( + [ + ["Response Rate (Tokens/Second)", str(perf_details.tokens_per_second)], + [ + "Time to First Token (Seconds)", + str(perf_details.time_to_first_token_range_secs), + ], + ] + ) + else: + inf_time_ms = perf_details.inference_time_microsecs / 1000 + mem_min = bytes_to_mb(perf_details.peak_memory_bytes[0]) + mem_max = bytes_to_mb(perf_details.peak_memory_bytes[1]) + compute_units = [ + f"{unit} ({num_ops} ops)" + for unit, num_ops in perf_details.compute_unit_counts.items() + ] + + rows.extend( + [ + [ + "Estimated inference time (ms)", + "<0.1" if inf_time_ms < 0.1 else f"{inf_time_ms:.1f}", + ], + ["Estimated peak memory usage (MB)", f"[{mem_min}, {mem_max}]"], + ["Total # Ops", str(sum(perf_details.compute_unit_counts.values()))], + ["Compute Unit(s)", " ".join(compute_units)], + ] + ) + table = PrettyTable(align="l", header=False, border=False, padding_width=0) for row in rows: table.add_row([row[0], f": {row[1]}"]) - print(table.get_string()) + return table.get_string() + + +def print_profile_metrics( + device: QAIHMModelPerf.DeviceDetails, + runtime: TargetRuntime, + perf_details: QAIHMModelPerf.PerformanceDetails + | QAIHMModelPerf.LLMPerformanceDetails, +): + print(get_profile_metrics(device, runtime, perf_details)) def print_on_target_demo_cmd( diff --git a/qai_hub_models/utils/qai_hub_helpers.py b/qai_hub_models/utils/qai_hub_helpers.py index 335e65d3..74dcbe4b 100644 --- a/qai_hub_models/utils/qai_hub_helpers.py +++ b/qai_hub_models/utils/qai_hub_helpers.py @@ -70,25 +70,44 @@ def export_without_hub_access( print("") missing_perf = True - # Components in perf.yaml don't yet have the same name as their code generated names. - if not components: - perf_yaml_path = os.path.join( - os.path.dirname(os.path.dirname(__file__)), - "models", - model_id, - "perf.yaml", - ) - if os.path.exists(perf_yaml_path): - parsed_perf = QAIHMModelPerf(perf_yaml_path, model_id).get_perf_details( - target_runtime, device_name + perf_yaml_path = os.path.join( + os.path.dirname(os.path.dirname(__file__)), + "models", + model_id, + "perf.yaml", + ) + if os.path.exists(perf_yaml_path): + parsed_perf = QAIHMModelPerf(perf_yaml_path, model_id) + + if not components: + components = [model_display_name] + + print(f"Profiling Results\n{_INFO_DASH}") + for component in components: + print(f"{component}") + model_perf = parsed_perf.per_model_details[component] + + # Device families aren't stored in perf yamls. Replace with the original device name. + device_search_name = device_name.replace(" (Family)", "") + device_perf = model_perf.details_per_device.get( + device_search_name, None ) - missing_perf = None in parsed_perf.values() + if not device_perf: + break + + runtime_perf = None + for path, path_runtime_perf in device_perf.details_per_path.items(): + if path.get_runtime() == target_runtime: + runtime_perf = path_runtime_perf + break - if not missing_perf: - print(f"Profiling Results for {model_display_name}\n{_INFO_DASH}") - for model_name, perf in parsed_perf.items(): - assert perf is not None # for mypy - print_profile_metrics(perf) + if not runtime_perf: + break + + missing_perf = False + print_profile_metrics( + device_perf.device, target_runtime, runtime_perf.perf_details + ) if missing_perf: print( diff --git a/qai_hub_models/utils/quantization.py b/qai_hub_models/utils/quantization.py index 0220849d..ce339f9a 100644 --- a/qai_hub_models/utils/quantization.py +++ b/qai_hub_models/utils/quantization.py @@ -7,9 +7,16 @@ from typing import Optional import torch +from qai_hub.client import DatasetEntries, Device, QuantizeDtype from torch.utils.data import DataLoader +from qai_hub_models.datasets import get_dataset_from_name +from qai_hub_models.models.common import TargetRuntime +from qai_hub_models.models.protocols import HubModelProtocol from qai_hub_models.utils.asset_loaders import CachedWebDatasetAsset, load_torch +from qai_hub_models.utils.evaluate import sample_dataset +from qai_hub_models.utils.inference import make_hub_dataset_entries +from qai_hub_models.utils.input_spec import InputSpec DATA_ID = "image_quantziation_samples" DATA_VERSION = 1 @@ -66,3 +73,65 @@ def get_image_quantization_samples( """ path = IMAGE_QUANTIZATION_SAMPLES.fetch(extract=False) return load_torch(quantization_samples_path or path) + + +def get_calibration_data( + input_spec: InputSpec, dataset_name: str, num_samples: int +) -> DatasetEntries: + """ + Produces a numpy dataset to be used for calibration data of a quantize job. + + Parameters: + input_spec: The input spec of the model. Used to ensure the returned dataset's names + match the input names of the model. + dataset_name: Name of the dataset to sample from. + num_samples: Number of data samples to use. + + Returns: + Dataset compatible with the format expected by AI Hub. + """ + torch_dataset = sample_dataset(get_dataset_from_name(dataset_name), num_samples) + torch_samples = tuple( + [torch_dataset[i][j].unsqueeze(0).numpy() for i in range(len(torch_dataset))] + for j in range(len(input_spec)) + ) + return make_hub_dataset_entries(torch_samples, list(input_spec.keys())) + + +class HubQuantizableMixin(HubModelProtocol): + """ + Mixin to attach to model classes that will be quantized using AI Hub quantize job. + """ + + def get_hub_compile_options( + self, + target_runtime: TargetRuntime, + other_compile_options: str = "", + device: Optional[Device] = None, + ) -> str: + quantization_flags = " --quantize_io" + if target_runtime == TargetRuntime.TFLITE: + # uint8 is the easiest I/O type for integration purposes, + # especially for image applications. Images are always + # uint8 RGB when coming from disk or a camera. + # + # Uint8 has not been thoroughly tested with other paths, + # so it is enabled only for TF Lite today. + quantization_flags += " --quantize_io_type uint8" + return ( + super().get_hub_compile_options( # type: ignore + target_runtime, other_compile_options, device + ) + + quantization_flags + ) + + def get_quantize_options(self) -> str: + return "" + + @staticmethod + def get_weights_dtype() -> QuantizeDtype: + return QuantizeDtype.INT8 + + @staticmethod + def get_activations_dtype() -> QuantizeDtype: + return QuantizeDtype.INT8 diff --git a/qai_hub_models/utils/scorecard/common.py b/qai_hub_models/utils/scorecard/common.py index 13028351..06eb02de 100644 --- a/qai_hub_models/utils/scorecard/common.py +++ b/qai_hub_models/utils/scorecard/common.py @@ -3,6 +3,7 @@ # SPDX-License-Identifier: BSD-3-Clause # --------------------------------------------------------------------- import os +import re from enum import Enum from functools import cached_property from typing import Dict, List, Optional, Tuple @@ -25,9 +26,7 @@ def _get_cached_device(device_name: str) -> hub.Device: def scorecard_unit_test_idfn(val): """Name of unit test parameters used in tests created in test_generated.py""" - if val == ScorecardDevices.any: - return "device_agnostic" - elif isinstance(val, ScorecardDevice): + if isinstance(val, ScorecardDevice): return val.name @@ -35,6 +34,12 @@ class ScorecardDevice: # -- DEVICE REGISTRY -- _registry: Dict[str, "ScorecardDevice"] = {} + @classmethod + def all_devices(cls, only_enabled: bool = False) -> List["ScorecardDevice"]: + if only_enabled: + return cls.all_enabled() + return list(cls._registry.values()) + @classmethod def all_enabled(cls) -> List["ScorecardDevice"]: return [x for x in cls._registry.values() if x.enabled] @@ -46,12 +51,21 @@ def register( reference_device_name: Optional[str], execution_device_name: Optional[str] = None, disabled_models: List[str] = [], + duplicate_of: Optional["ScorecardDevice"] = None, + compile_paths: Optional[List["ScorecardCompilePath"]] = None, + profile_paths: Optional[List["ScorecardProfilePath"]] = None, ) -> "ScorecardDevice": if name in cls._registry: raise ValueError("Device " + name + "already registered.") device = ScorecardDevice( - name, reference_device_name, execution_device_name, disabled_models + name, + reference_device_name, + execution_device_name, + disabled_models, + duplicate_of, + compile_paths, + profile_paths, ) cls._registry[name] = device return device @@ -67,6 +81,9 @@ def __init__( reference_device_name: Optional[str], execution_device_name: Optional[str] = None, disabled_models: List[str] = [], + duplicate_of: Optional["ScorecardDevice"] = None, + compile_paths: Optional[List["ScorecardCompilePath"]] = None, + profile_paths: Optional[List["ScorecardProfilePath"]] = None, ): """ Parameters @@ -80,11 +97,27 @@ def __init__( disabled_models: AI Hub Model IDs that are not supported by this device. These models will be ignored by the scorecard in combination with this device. + + duplicate_of: If set, this device will act as a duplicate of the given scorecard device. In effect this means: + * Jobs will not be submitted targeting this chipset. + * Jobs for the "given" scorecard device will be used to create performance metrics for this device. + + NOTE: Just because this chip is marked as having duplicate AI/ML performance compared to another chip, + does not mean this chip is indistinguishable from that other chip. The chips will + differ by other important features, but these are not relevant for this AI/ML scorecard. + + compile_paths: The set of compile paths valid for this device. If unset, will use the default set of paths for this device type. + + profile_paths: The set of profile paths valid for this device. If unset, will use the default set of paths for this device type. + """ self.name = name self.disabled_models = disabled_models self.reference_device_name = reference_device_name self.execution_device_name = execution_device_name + self.duplicate_of = duplicate_of + self._compile_paths = compile_paths + self._profile_paths = profile_paths def __str__(self): return self.name.lower() @@ -117,7 +150,7 @@ def enabled(self) -> bool: valid_test_devices = os.environ.get("WHITELISTED_PROFILE_TEST_DEVICES", "ALL") return ( valid_test_devices == "ALL" - or self.name == "all" + or self.name == "any" or self.name in valid_test_devices.split(",") ) @@ -168,48 +201,94 @@ def os(self) -> str: for attr in self.reference_device.attributes: if attr.startswith("os:"): return attr[3:] - raise ValueError(f"OS Not found for device: {self.name}") + raise ValueError(f"OS not found for device: {self.name}") + @cached_property + def form_factor(self) -> str: + """ + The chipset form_factor (eg. Auto, IoT, Mobile, ...) + """ + for attr in self.reference_device.attributes: + if attr.startswith("format:"): + return attr[7:] + raise ValueError(f"Format not found for device: {self.name}") -class ScorecardDevices: - any = ScorecardDevice.register( - "any", "Samsung Galaxy S23" - ) # no specific device (usable only during compilation) - cs_8_gen_2 = ScorecardDevice.register("cs_8_gen_2", "Samsung Galaxy S23") - cs_8_gen_3 = ScorecardDevice.register( - "cs_8_gen_3", "Samsung Galaxy S24", "Samsung Galaxy S24 (Family)" - ) - cs_6490 = ScorecardDevice.register( - "cs_6490", - "RB3 Gen 2 (Proxy)", - None, - [ - "ConvNext-Tiny-w8a8-Quantized", - "ConvNext-Tiny-w8a16-Quantized", - "ResNet50Quantized", - "RegNetQuantized", - "HRNetPoseQuantized", - "SESR-M5-Quantized", - "Midas-V2-Quantized", - "Posenet-Mobilenet-Quantized", - ], - ) - cs_8250 = ScorecardDevice.register("cs_8250", "RB5 (Proxy)") - cs_8550 = ScorecardDevice.register("cs_8550", "QCS8550 (Proxy)") - cs_x_elite = ScorecardDevice.register("cs_x_elite", "Snapdragon X Elite CRD") - cs_auto_lemans_8255 = ScorecardDevice.register( - "cs_auto_lemans_8255", "SA8255 (Proxy)" - ) - cs_auto_lemans_8775 = ScorecardDevice.register( - "cs_auto_lemans_8775", "SA8775 (Proxy)" - ) - cs_auto_lemans_8650 = ScorecardDevice.register( - "cs_auto_lemans_8650", "SA8650 (Proxy)" - ) - cs_xr_8450 = ScorecardDevice.register("cs_xr_8450", "QCS8450 (Proxy)") - cs_auto_makena_8295 = ScorecardDevice.register( - "cs_auto_makena_8295", "Snapdragon Cockpit Gen 4 QAM" - ) + @cached_property + def hexagon_version(self) -> int: + """ + The chipset hexagon version number + """ + for attr in self.reference_device.attributes: + if attr.startswith("hexagon:v"): + return int(attr[9:]) + raise ValueError(f"Hexagon version not found for device: {self.name}") + + @property + def supports_fp16_inference(self) -> bool: + return self.hexagon_version >= 69 + + @cached_property + def supported_runtimes(self) -> List[TargetRuntime]: + runtimes = [] + for attr in self.reference_device.attributes: + if attr.startswith("framework:"): + rt_name = attr[10:].upper() + try: + runtimes.append(TargetRuntime[rt_name.upper()]) + except KeyError: + print( + f"WARNING: Unable to determine supported runtime associated with framework {rt_name}" + ) + return runtimes + + @cached_property + def profile_paths(self) -> List["ScorecardProfilePath"]: + if self._profile_paths: + return self._profile_paths + if self.duplicate_of: + return self.duplicate_of.profile_paths + + if self.form_factor == "phone": + paths = [ + ScorecardProfilePath.ONNX, + ScorecardProfilePath.QNN, + ScorecardProfilePath.TFLITE, + ] + elif self.form_factor == "auto": + paths = [ + ScorecardProfilePath.QNN, + ScorecardProfilePath.TFLITE, + ] + elif self.form_factor == "xr": + paths = [ScorecardProfilePath.QNN, ScorecardProfilePath.TFLITE] + elif self.form_factor == "compute": + paths = [ + ScorecardProfilePath.ONNX, + ScorecardProfilePath.ONNX_DML_GPU, + ScorecardProfilePath.QNN, + ] + elif self.form_factor == "iot": + paths = [ScorecardProfilePath.TFLITE, ScorecardProfilePath.QNN] + else: + raise NotImplementedError( + f"Unsupported device form_factor: {self.form_factor}" + ) + + return [path for path in paths if path.get_runtime() in self.supported_runtimes] + + @cached_property + def compile_paths(self) -> List["ScorecardCompilePath"]: + if self._compile_paths: + return self._compile_paths + if self.duplicate_of: + return self.duplicate_of.compile_paths + + if ScorecardProfilePath.QNN in self.profile_paths: + paths = [ScorecardCompilePath.QNN] + else: + paths = [] + + return [path for path in paths if path.get_runtime() in self.supported_runtimes] def get_job_cache_name( @@ -253,13 +332,15 @@ def all_enabled() -> List["ScorecardCompilePath"]: @staticmethod def get_parameterized_test_config( - aimet_model=False, + is_quantized=False, only_enabled_paths=True, only_enabled_devices=True, ) -> List[Tuple["ScorecardCompilePath", ScorecardDevice]]: path_list: List[ScorecardCompilePath] = ScorecardCompilePath.all_enabled() if only_enabled_paths else ScorecardCompilePath # type: ignore path_devices_dict = { - sc_path: sc_path.get_test_devices(aimet_model, only_enabled_devices) + sc_path: sc_path.get_test_devices( + is_quantized, only_enabled_devices, include_any=True + ) for sc_path in path_list } return [ @@ -276,51 +357,40 @@ def get_runtime(self) -> TargetRuntime: raise NotImplementedError() def get_test_devices( - self, aimet_model: bool = False, only_enabled: bool = True + self, + is_quantized: bool = False, + only_enabled: bool = True, + include_duplicate_devices: bool = False, + include_any: bool = False, ) -> List[ScorecardDevice]: - if self == ScorecardCompilePath.QNN: - devices = [ - ScorecardDevices.any, - ScorecardDevices.cs_x_elite, - ScorecardDevices.cs_8550, - ScorecardDevices.cs_auto_lemans_8255, - ScorecardDevices.cs_auto_lemans_8775, - ScorecardDevices.cs_auto_makena_8295, - ] - if aimet_model: - devices.append(ScorecardDevices.cs_6490) - else: - devices = [ScorecardDevices.any] - - try: - from qai_hub_models.utils.scorecard._common_private import ( - get_private_compile_path_test_devices, + return [ + device + for device in ScorecardDevice.all_devices(only_enabled) + if ( + (is_quantized or device.supports_fp16_inference) + and (include_duplicate_devices or not device.duplicate_of) + and (include_any or device != ScorecardDevices.any) + and self in device.compile_paths ) + ] - devices.extend(get_private_compile_path_test_devices(self, aimet_model)) # type: ignore - except ImportError: - pass - - return [x for x in devices if x.enabled] if only_enabled else devices - - def get_compile_options(self, aimet_model=False) -> str: - if self == ScorecardCompilePath.ONNX_FP16 and not aimet_model: + def get_compile_options(self, is_quantized=False) -> str: + if self == ScorecardCompilePath.ONNX_FP16 and not is_quantized: return "--quantize_full_type float16 --quantize_io" return "" def get_job_cache_name( self, model: str, - device: ScorecardDevice = ScorecardDevices.any, - aimet_model: bool = False, + device: Optional[ScorecardDevice] = None, + is_quantized: bool = False, component: Optional[str] = None, ): - # These two auto chips are the same, re-use the same compiled asset. - if device == ScorecardDevices.cs_auto_lemans_8650: - device = ScorecardDevices.cs_auto_lemans_8775 - if device not in self.get_test_devices(aimet_model=aimet_model): + if not device or self not in device.compile_paths: device = ScorecardDevices.any # default to the "generic" compilation path - return get_job_cache_name(self.name, model, device, component) + return get_job_cache_name( + self.name, model, device.duplicate_of or device, component + ) class ScorecardProfilePath(Enum): @@ -358,13 +428,15 @@ def include_in_perf_yaml(self) -> bool: @staticmethod def get_parameterized_test_config( - aimet_model=False, + is_quantized=False, only_enabled_paths=True, only_enabled_devices=True, ) -> List[Tuple["ScorecardProfilePath", ScorecardDevice]]: path_list: List[ScorecardProfilePath] = ScorecardProfilePath.all_enabled() if only_enabled_paths else ScorecardProfilePath # type: ignore path_devices_dict = { - sc_path: sc_path.get_test_devices(aimet_model, only_enabled_devices) + sc_path: sc_path.get_test_devices( + is_quantized, only_enabled_devices, include_any=True + ) for sc_path in path_list } return [ @@ -397,56 +469,22 @@ def get_profile_options(self) -> str: return "" def get_test_devices( - self, aimet_model: bool = False, only_enabled: bool = True + self, + is_quantized: bool = False, + only_enabled: bool = True, + include_duplicate_devices: bool = False, + include_any: bool = False, ) -> List[ScorecardDevice]: - if self == ScorecardProfilePath.TFLITE: - devices = [ - ScorecardDevices.cs_8_gen_2, - ScorecardDevices.cs_8_gen_3, - ScorecardDevices.cs_8550, - ScorecardDevices.cs_xr_8450, - ScorecardDevices.cs_auto_lemans_8650, - ScorecardDevices.cs_auto_lemans_8775, - ScorecardDevices.cs_auto_lemans_8255, - ScorecardDevices.cs_auto_makena_8295, - ] + ( - [ScorecardDevices.cs_6490, ScorecardDevices.cs_8250] - if aimet_model - else [] - ) - elif self == ScorecardProfilePath.ONNX: - devices = [ - ScorecardDevices.cs_8_gen_2, - ScorecardDevices.cs_8_gen_3, - ScorecardDevices.cs_x_elite, - ] - elif self == ScorecardProfilePath.QNN: - devices = [ - ScorecardDevices.cs_8_gen_2, - ScorecardDevices.cs_8_gen_3, - ScorecardDevices.cs_x_elite, - ScorecardDevices.cs_8550, - ScorecardDevices.cs_auto_lemans_8650, - ScorecardDevices.cs_auto_lemans_8775, - ScorecardDevices.cs_auto_lemans_8255, - ScorecardDevices.cs_auto_makena_8295, - ScorecardDevices.cs_xr_8450, - ] + ([ScorecardDevices.cs_6490] if aimet_model else []) - elif self == ScorecardProfilePath.ONNX_DML_GPU: - devices = [ScorecardDevices.cs_x_elite] - else: - raise NotImplementedError() - - try: - from qai_hub_models.utils.scorecard._common_private import ( - get_private_profile_path_test_devices, + return [ + device + for device in ScorecardDevice.all_devices(only_enabled) + if ( + (is_quantized or device.supports_fp16_inference) + and (include_duplicate_devices or not device.duplicate_of) + and self in device.profile_paths + and (include_any or device != ScorecardDevices.any) ) - - devices.extend(get_private_profile_path_test_devices(self, aimet_model)) # type: ignore - except ImportError: - pass - - return [x for x in devices if x.enabled] if only_enabled else devices + ] def get_job_cache_name( self, @@ -454,7 +492,9 @@ def get_job_cache_name( device: ScorecardDevice, component: Optional[str] = None, ): - return get_job_cache_name(self.name, model, device, component) + return get_job_cache_name( + self.name, model, device.duplicate_of or device, component + ) def supported_chipsets(chips: List[str]) -> List[str]: @@ -466,7 +506,18 @@ def supported_chipsets(chips: List[str]) -> List[str]: """ chipset_set = set(chips) chipset_list = [] - if "qualcomm-snapdragon-8gen3" in chipset_set: + + if "qualcomm-snapdragon-8-elite" in chipset_set: + chipset_list.extend( + [ + "qualcomm-snapdragon-8-elite", + "qualcomm-snapdragon-8gen3", + "qualcomm-snapdragon-8gen2", + "qualcomm-snapdragon-8gen1", + "qualcomm-snapdragon-888", + ] + ) + elif "qualcomm-snapdragon-8gen3" in chipset_set: chipset_list.extend( [ "qualcomm-snapdragon-8gen3", @@ -485,10 +536,10 @@ def supported_chipsets(chips: List[str]) -> List[str]: ) if "qualcomm-snapdragon-x-elite" in chipset_set: + chipset_list.extend(["qualcomm-snapdragon-x-elite"]) chipset_list.extend(["qualcomm-snapdragon-x-plus-8-core"]) chipset_order = [ - "qualcomm-snapdragon-x-elite", "qualcomm-qcs6490", "qualcomm-qcs8250", "qualcomm-qcs8550", @@ -502,32 +553,34 @@ def supported_chipsets(chips: List[str]) -> List[str]: chipset_list.append(chipset) # Add any remaining chipsets not covered - for chipset in chipset_set: + for chipset in sorted(chipset_set): if chipset not in chipset_list: chipset_list.append(chipset) return chipset_list def chipset_marketing_name(chipset) -> str: - """Sanitize chip name to match marketting.""" - chip = [word.capitalize() for word in chipset.split("-")] - details_to_remove = [] - for i in range(len(chip)): - if chip[i] == "8gen3": - chip[i] = "8 Gen 3" - if chip[i] == "8gen2": - chip[i] = "8 Gen 2" - elif chip[i] == "8gen1": - chip[i] = "8 Gen 1" - elif chip[i] == "Snapdragon": - # Marketing name for Qualcomm Snapdragon is Snapdragon® - chip[i] = "Snapdragon®" - elif chip[i] == "Qualcomm": - details_to_remove.append(chip[i]) - - for detail in details_to_remove: - chip.remove(detail) - return " ".join(chip) + """Sanitize chip name to match marketing.""" + chip = " ".join([word.capitalize() for word in chipset.split("-")]) + chip = chip.replace("Qualcomm ", "") + chip = chip.replace( + "Snapdragon", "Snapdragon®" + ) # Marketing name for Qualcomm Snapdragon is Snapdragon® + + # 8cxgen2 -> 8cx Gen 2 + # 8gen2 -> 8 Gen 2 + chip = re.sub(r"(\w+)gen(\d+)", r"\g<1> Gen \g<2>", chip) + + # 8 Core -> 8-Core + chip = re.sub(r"(\d+) Core", r"\g<1>-Core", chip) + + # qcs6490 -> QCS6490 + # sa8775p -> SA8775P + chip = re.sub( + r"(Qcs|Sa)(\w+)", lambda m: f"{m.group(1).upper()}{m.group(2).upper()}", chip + ) + + return chip def supported_chipsets_santized(chips) -> List[str]: @@ -556,16 +609,6 @@ def get_supported_devices(chips) -> List[str]: supported_devices_for_chip = sorted(set(supported_devices_for_chip)) __CHIP_SUPPORTED_DEVICES_CACHE[chip] = supported_devices_for_chip supported_devices.extend(supported_devices_for_chip) - supported_devices.extend( - [ - "Google Pixel 5a 5G", - "Google Pixel 4", - "Google Pixel 4a", - "Google Pixel 3", - "Google Pixel 3a", - "Google Pixel 3a XL", - ] - ) return supported_devices @@ -574,6 +617,82 @@ def supported_oses() -> List[str]: return ["Android"] +class ScorecardDevices: + any = ScorecardDevice.register( + name="any", + reference_device_name="Samsung Galaxy S23", + compile_paths=[path for path in ScorecardCompilePath], + profile_paths=[], + ) # no specific device (usable only during compilation) + + ### + # cs == chipset + ### + cs_8_gen_2 = ScorecardDevice.register( + name="cs_8_gen_2", + reference_device_name="Samsung Galaxy S23", + compile_paths=[], # Uses "any" in all cases + ) + + cs_8_gen_3 = ScorecardDevice.register( + name="cs_8_gen_3", + reference_device_name="Samsung Galaxy S24", + execution_device_name="Samsung Galaxy S24 (Family)", + compile_paths=[], # Uses "any" in all cases + ) + + cs_6490 = ScorecardDevice.register( + name="cs_6490", + reference_device_name="RB3 Gen 2 (Proxy)", + disabled_models=[ + "ConvNext-Tiny-w8a8-Quantized", + "ConvNext-Tiny-w8a16-Quantized", + "ResNet50Quantized", + "RegNetQuantized", + "HRNetPoseQuantized", + "SESR-M5-Quantized", + "Midas-V2-Quantized", + "Posenet-Mobilenet-Quantized", + ], + ) + + cs_8250 = ScorecardDevice.register( + name="cs_8250", + reference_device_name="RB5 (Proxy)", + ) + + cs_8550 = ScorecardDevice.register( + name="cs_8550", reference_device_name="QCS8550 (Proxy)" + ) + + cs_x_elite = ScorecardDevice.register( + name="cs_x_elite", reference_device_name="Snapdragon X Elite CRD" + ) + + cs_auto_lemans_8255 = ScorecardDevice.register( + name="cs_auto_lemans_8255", + reference_device_name="SA8255 (Proxy)", + ) + + cs_auto_lemans_8775 = ScorecardDevice.register( + name="cs_auto_lemans_8775", + reference_device_name="SA8775 (Proxy)", + ) + + cs_auto_lemans_8650 = ScorecardDevice.register( + name="cs_auto_lemans_8650", + reference_device_name="SA8650 (Proxy)", + ) + + cs_xr_8450 = ScorecardDevice.register( + name="cs_xr_8450", reference_device_name="QCS8450 (Proxy)" + ) + + cs_8_elite = ScorecardDevice.register( + name="cs_8_elite", reference_device_name="Snapdragon 8 Elite QRD" + ) + + try: # Register private devices # This must live at the end of this file to avoid circular import problems. diff --git a/qai_hub_models/utils/scorecard/job_summary.py b/qai_hub_models/utils/scorecard/job_summary.py index b4a2f305..5e275c91 100644 --- a/qai_hub_models/utils/scorecard/job_summary.py +++ b/qai_hub_models/utils/scorecard/job_summary.py @@ -149,12 +149,12 @@ def from_model_id( path: ScorecardCompilePath for path in ScorecardCompilePath.all_enabled(): for component in components: - path_devices_enabled = [ - x - for x in path.get_test_devices(model_code_gen.is_aimet) - if x.enabled - ] - for device in path_devices_enabled: + for device in path.get_test_devices( + model_code_gen.is_aimet or model_code_gen.use_hub_quantization, + only_enabled=True, + include_duplicate_devices=True, + include_any=True, + ): model_runs.append( cls( model_id=component or model_info.name, @@ -222,12 +222,11 @@ def from_model_id( path: ScorecardProfilePath for path in ScorecardProfilePath.all_enabled(): for component in components: - path_devices_enabled = [ - x - for x in path.get_test_devices(model_code_gen.is_aimet) - if x.enabled - ] - for device in path_devices_enabled: + for device in path.get_test_devices( + model_code_gen.is_aimet or model_code_gen.use_hub_quantization, + only_enabled=True, + include_duplicate_devices=True, + ): model_runs.append( cls( model_id=component or model_info.name, @@ -251,7 +250,7 @@ def __post_init__(self): super().__post_init__() if not self.skipped: assert isinstance(self.job, hub.ProfileJob) - if self._job_status.success: + if self._job_status.success: # type: ignore assert self.profile_results @cached_property @@ -386,7 +385,7 @@ def evaluation_metrics(self) -> Union[Dict[str, Any], str]: @cached_property def performance_metrics(self) -> Dict[str, Any]: - return dict( + metrics = dict( inference_time=self.inference_time, throughput=self.throughput, estimated_peak_memory_range=self.peak_memory_range, @@ -398,8 +397,11 @@ def performance_metrics(self) -> Dict[str, Any]: layers_on_cpu=self.cpu, total_layers=self.total, ), - llm_metrics=self.llm_metrics, - evaluation_metrics=self.evaluation_metrics, job_id=self.job_id, job_status=self.job_status, ) + if self.llm_metrics != "null": + metrics["llm_metrics"] = self.llm_metrics + if self.evaluation_metrics != "null": + metrics["evaluation_metrics"] = self.evaluation_metrics + return metrics diff --git a/qai_hub_models/utils/scorecard/model_card.py b/qai_hub_models/utils/scorecard/model_card.py index b89b7cfc..00209a75 100644 --- a/qai_hub_models/utils/scorecard/model_card.py +++ b/qai_hub_models/utils/scorecard/model_card.py @@ -220,7 +220,7 @@ def from_runs(model_runs: List[ProfileJobSummary]): } ) - def get_chipsets(self) -> Set[str]: + def get_chipsets(self, include_internal_devices: bool = False) -> Set[str]: chips: Set[str] = set() for model_id, model_summary in self.runs_per_model.items(): for device, device_summary in model_summary.runs_per_device.items(): @@ -237,6 +237,10 @@ def get_chipsets(self) -> Set[str]: if model_id in device.disabled_models: continue + # Don't include private devices + if not include_internal_devices and not device.public: + continue + chips.add(device.chipset) return chips @@ -248,7 +252,7 @@ def get_perf_card( ) -> Dict[str, str | List[Any] | Dict[str, Any]]: perf_card: Dict[str, str | List[Any] | Dict[str, Any]] = {} - chips = self.get_chipsets() + chips = self.get_chipsets(include_internal_devices) perf_card["aggregated"] = dict( supported_oses=supported_oses(), supported_devices=get_supported_devices(chips), diff --git a/qai_hub_models/utils/scorecard/perf_summary.py b/qai_hub_models/utils/scorecard/perf_summary.py index abcf4348..975536f7 100644 --- a/qai_hub_models/utils/scorecard/perf_summary.py +++ b/qai_hub_models/utils/scorecard/perf_summary.py @@ -116,10 +116,9 @@ def update_summary(self, model_id: str, previous_report, new_report): prev_inference_time = prev_inference_time.get( "inference_time", "null" ) - new_inference_time = new_perf_metrics[device].get(runtime_type, {}) - new_inference_time = new_inference_time.get( - "inference_time", "null" - ) + run_stats = new_perf_metrics[device].get(runtime_type, {}) + job_id = run_stats.get("job_id", "null") + new_inference_time = run_stats.get("inference_time", "null") if new_inference_time == prev_inference_time: continue @@ -131,6 +130,7 @@ def update_summary(self, model_id: str, previous_report, new_report): "inf", self._format_speedup(new_inference_time), self._format_speedup(prev_inference_time), + job_id, device_info["chipset"], device_info["os"], ) @@ -161,6 +161,7 @@ def update_summary(self, model_id: str, previous_report, new_report): self._format_speedup(speedup), self._format_speedup(new_inference_time), self._format_speedup(prev_inference_time), + job_id, device_info["chipset"], device_info["os"], ) @@ -183,6 +184,7 @@ def _get_summary_table(self, bucket_id, get_progressions=True): "Kx faster" if get_progressions else "Kx slower", "New Inference time", "Prev Inference time", + "Job ID", "Chipset", "OS", ] diff --git a/qai_hub_models/utils/testing.py b/qai_hub_models/utils/testing.py index 998d9594..cd72a77c 100644 --- a/qai_hub_models/utils/testing.py +++ b/qai_hub_models/utils/testing.py @@ -4,6 +4,9 @@ # --------------------------------------------------------------------- from __future__ import annotations +from typing import Callable +from unittest import mock + import numpy as np import pytest @@ -95,3 +98,20 @@ def assert_most_close( assert ( np.mean(not_close_values) <= diff_tol ), f"More than {diff_tol * 100}% of values were not close." + + +def mock_first_n(fn: Callable, n: int): + """ + Return a function that returns a Mock object for the first N calls + and calls the given fn on all subsequent calls. + """ + call_count = 0 + + def mock_fn(*args, **kwargs): + nonlocal call_count + call_count += 1 + if call_count <= n: + return mock.Mock() + return fn(*args, **kwargs) + + return mock_fn diff --git a/scripts/build_and_test.py b/scripts/build_and_test.py index 9fc4beb5..2d8eaece 100755 --- a/scripts/build_and_test.py +++ b/scripts/build_and_test.py @@ -14,12 +14,7 @@ from tasks.changes import ( REPRESENTATIVE_EXPORT_MODELS, get_all_models, - get_changed_models, - get_code_gen_changed_models, - get_models_to_run_general_tests, - get_models_to_test_export, - get_models_with_changed_definitions, - get_models_with_export_file_changes, + get_models_to_test, ) from tasks.constants import VENV_PATH from tasks.plan import ( @@ -258,37 +253,55 @@ def test_scripts(self, plan: Plan, step_id: str = "test_scripts") -> str: PyTestScriptsTask(self.venv_path), ) - @public_task( - "Run most tests for only added/modified models in Model Zoo. Includes most tests, uses shared global cache, and uses the same environment for each model." - ) + def _get_quantize_models_task(self, models) -> PyTestModelsTask: + return PyTestModelsTask( + self.python_executable, + models, + models, + self.venv_path, + venv_for_each_model=False, + use_shared_cache=True, + test_trace=False, + run_export_quantize=True, + run_export_compile=False, + skip_standard_unit_test=True, + ) + + @public_task("Quantize changed models in preparation for testing all of them.") @depends(["install_deps"]) - def test_changed_models( - self, plan: Plan, step_id: str = "test_changed_models" + def quantize_changed_models( + self, plan: Plan, step_id: str = "quantize_changed_models" ) -> str: - # model.py changed - model_changed_models = get_models_with_changed_definitions() - - # export.py or test_generated.py changed - export_changed_models = get_models_with_export_file_changes() - - # code-gen.yaml changed - code_gen_changed_models = get_code_gen_changed_models() - - # If model or code-gen changed, then test export. - models_to_test_export = model_changed_models | code_gen_changed_models - - # For all other models where export.py or test_generated.py changed, - # only test if they're part of REPRESENTATIVE_EXPORT_MODELS - models_to_test_export.update( - export_changed_models & set(REPRESENTATIVE_EXPORT_MODELS) + _, models_to_test_export = get_models_to_test() + return plan.add_step( + step_id, self._get_quantize_models_task(models_to_test_export) ) - # Set of models where model.py, demo.py, or test.py changed. - models_to_run_tests = get_models_to_run_general_tests() + @public_task("Quantize changed models in preparation for testing all of them.") + @depends(["install_deps"]) + def quantize_representative_models( + self, plan: Plan, step_id: str = "quantize_representative_models" + ) -> str: + return plan.add_step( + step_id, self._get_quantize_models_task(REPRESENTATIVE_EXPORT_MODELS) + ) - # export tests can only run alongside general model tests - models_to_run_tests = models_to_run_tests | models_to_test_export + @public_task("Quantize changed models in preparation for testing all of them.") + @depends(["install_deps"]) + def quantize_all_models( + self, plan: Plan, step_id: str = "quantize_all_models" + ) -> str: + all_models = get_all_models() + return plan.add_step(step_id, self._get_quantize_models_task(all_models)) + @public_task( + "Run most tests for only added/modified models in Model Zoo. Includes most tests, uses shared global cache, and uses the same environment for each model." + ) + @depends(["install_deps", "quantize_changed_models"]) + def test_changed_models( + self, plan: Plan, step_id: str = "test_changed_models" + ) -> str: + models_to_run_tests, models_to_test_export = get_models_to_test() return plan.add_step( step_id, PyTestModelsTask( @@ -305,17 +318,17 @@ def test_changed_models( @public_task( "Run all tests for only added/modified models in Model Zoo. Includes all tests, and creates a fresh environment for each model." ) - @depends(["install_deps"]) + @depends(["install_deps", "quantize_changed_models"]) def test_changed_models_long( self, plan: Plan, step_id: str = "test_changed_models_long" ) -> str: - default_test_models = REPRESENTATIVE_EXPORT_MODELS + models_to_run_tests, models_to_test_export = get_models_to_test() return plan.add_step( step_id, PyTestModelsTask( self.python_executable, - get_changed_models() or default_test_models, - get_models_to_test_export() or default_test_models, + models_to_run_tests, + models_to_test_export, self.venv_path, venv_for_each_model=True, use_shared_cache=False, @@ -323,7 +336,7 @@ def test_changed_models_long( ) @public_task("Run tests for all models in Model Zoo.") - @depends(["install_deps"]) + @depends(["install_deps", "quantize_representative_models"]) def test_all_models(self, plan: Plan, step_id: str = "test_all_models") -> str: # Excludes export tests, and uses the same environment for each model. all_models = get_all_models() @@ -353,53 +366,46 @@ def create_perfs(self, plan: Plan, step_id: str = "generate_perfs") -> str: ), ) + def _make_hub_scorecard_task( + self, compile: bool = False, profile: bool = False, quantize: bool = False + ) -> PyTestModelsTask: + all_models = get_all_models() + return PyTestModelsTask( + self.python_executable, + all_models, + all_models, + self.venv_path, + venv_for_each_model=False, + use_shared_cache=True, + run_export_compile=compile, + run_export_profile=profile, + run_export_quantize=quantize, + # If one model fails, we should still try the others. + exit_after_single_model_failure=False, + skip_standard_unit_test=True, + test_trace=False, + ) + @public_task("Run Compile jobs for all models in Model Zoo.") @depends(["install_deps"]) def test_compile_all_models( self, plan: Plan, step_id: str = "test_compile_all_models" ) -> str: - all_models = get_all_models() - return plan.add_step( - step_id, - PyTestModelsTask( - self.python_executable, - all_models, - all_models, - self.venv_path, - venv_for_each_model=False, - use_shared_cache=True, - run_export_compile=True, - run_export_profile=False, - # If one model fails to export, we should still try the others. - exit_after_single_model_failure=False, - skip_standard_unit_test=True, - test_trace=False, - ), - ) + return plan.add_step(step_id, self._make_hub_scorecard_task(compile=True)) @public_task("Run profile jobs for all models in Model Zoo.") @depends(["install_deps"]) def test_profile_all_models( self, plan: Plan, step_id: str = "test_profile_all_models" ) -> str: - all_models = get_all_models() - return plan.add_step( - step_id, - PyTestModelsTask( - self.python_executable, - all_models, - all_models, - self.venv_path, - venv_for_each_model=False, - use_shared_cache=True, - run_export_compile=False, - run_export_profile=True, - skip_standard_unit_test=True, - # "Profile" tests fail only if there is something fundamentally wrong with the code, not if a single profile job fails. - exit_after_single_model_failure=False, - test_trace=False, - ), - ) + return plan.add_step(step_id, self._make_hub_scorecard_task(profile=True)) + + @public_task("Run quantize jobs for all models in Model Zoo.") + @depends(["install_deps"]) + def test_quantize_all_models( + self, plan: Plan, step_id: str = "test_quantize_all_models" + ) -> str: + return plan.add_step(step_id, self._make_hub_scorecard_task(quantize=True)) @public_task("Verify all export scripts work e2e.") @depends(["install_deps"]) @@ -427,7 +433,7 @@ def test_all_export_scripts( ) @public_task("Run tests for all models in Model Zoo.") - @depends(["install_deps"]) + @depends(["install_deps", "quantize_all_models"]) def test_all_models_long( self, plan: Plan, step_id: str = "test_all_models_long" ) -> str: diff --git a/scripts/tasks/changes.py b/scripts/tasks/changes.py index a27a856f..4b438cf7 100644 --- a/scripts/tasks/changes.py +++ b/scripts/tasks/changes.py @@ -2,9 +2,10 @@ # Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. # SPDX-License-Identifier: BSD-3-Clause # --------------------------------------------------------------------- +import functools import os from pathlib import Path -from typing import Iterable, Optional, Set +from typing import Iterable, Optional, Set, Tuple from .constants import ( PY_PACKAGE_MODELS_ROOT, @@ -89,6 +90,32 @@ def _get_file_edges(filename) -> Set[str]: return dependent_files +@functools.lru_cache(maxsize=1) +def get_affected_files(changed_files: Iterable[str]) -> Set[str]: + """ + Given a list of changed python files, performs a Depth-First Search (DFS) + over the qai_hub_models directory to figure out which files were affected. + + Cached so that the graph traversal is done once, and `resolve_affected_models` + can be run with different args using the same base set of files. + """ + changed_files = list(changed_files) + seen = set(changed_files) + while len(changed_files) > 0: + # Pop off stack + curr_file = changed_files.pop() + if curr_file in MANUAL_EDGES: + dependent_files = set(MANUAL_EDGES[curr_file]) + else: + dependent_files = _get_file_edges(curr_file) + # Add new nodes to stack + for dependent_file in dependent_files: + if dependent_file not in seen: + seen.add(dependent_file) + changed_files.append(dependent_file) + return seen + + def resolve_affected_models( changed_files: Iterable[str], include_model: bool = True, @@ -111,22 +138,10 @@ def resolve_affected_models( changed_files: List of filepaths to files that changed. Paths are relative to the root of this repository. """ - changed_files = list(changed_files) - seen = set(changed_files) - while len(changed_files) > 0: - # Pop off stack - curr_file = changed_files.pop() - if curr_file in MANUAL_EDGES: - dependent_files = set(MANUAL_EDGES[curr_file]) - else: - dependent_files = _get_file_edges(curr_file) - # Add new nodes to stack - for dependent_file in dependent_files: - if dependent_file not in seen: - seen.add(dependent_file) - changed_files.append(dependent_file) + # Convert to tuple so it can be used as a cache key + affected_files = get_affected_files(tuple(changed_files)) changed_models = set() - for f in seen: + for f in affected_files: file_path = Path(f) # Only consider directories directly in the top-level `models/` folder # (i.e. ignore `models/_shared`, `models/_internal`) @@ -167,6 +182,7 @@ def get_code_gen_changed_models() -> Set[str]: return set(changed_models) +@functools.lru_cache(maxsize=2) # Size 2 for `.py` and `code-gen.yaml` def get_changed_files_in_package(suffix: Optional[str] = None) -> Iterable[str]: """ Returns the list of changed files in zoo based on git tracking. @@ -277,7 +293,7 @@ def get_all_models() -> Set[str]: """ Resolve model IDs (folder names) of all models in QAIHM. """ - model_names = set() + model_names: Set[str] = set() for model_name in os.listdir(PY_PACKAGE_MODELS_ROOT): if os.path.exists(os.path.join(PY_PACKAGE_MODELS_ROOT, model_name, "model.py")): model_names.add(model_name) @@ -289,6 +305,40 @@ def get_all_models() -> Set[str]: for model in allowed_models: if model not in model_names: raise ValueError(f"Unknown model selected: {model}") - model_names = allowed_models + model_names = set(allowed_models) return model_names + + +def get_models_to_test() -> Tuple[Set[str], Set[str]]: + """ + This is the master function that is called directly in CI to determine + which models to test. + + Returns: + Tuple[list of models to run unit tests, list of models to run compile tests] + """ + # model.py changed + model_changed_models = get_models_with_changed_definitions() + + # export.py or test_generated.py changed + export_changed_models = get_models_with_export_file_changes() + + # code-gen.yaml changed + code_gen_changed_models = get_code_gen_changed_models() + + # If model or code-gen changed, then test export. + models_to_test_export = model_changed_models | code_gen_changed_models + + # For all other models where export.py or test_generated.py changed, + # only test if they're part of REPRESENTATIVE_EXPORT_MODELS + models_to_test_export.update( + export_changed_models & set(REPRESENTATIVE_EXPORT_MODELS) + ) + + # Set of models where model.py, demo.py, or test.py changed. + models_to_run_tests = get_models_to_run_general_tests() + + # export tests can only run alongside general model tests + models_to_run_tests = models_to_run_tests | models_to_test_export + return models_to_run_tests, models_to_test_export diff --git a/scripts/tasks/constants.py b/scripts/tasks/constants.py index 62ac36aa..5e91c725 100644 --- a/scripts/tasks/constants.py +++ b/scripts/tasks/constants.py @@ -3,8 +3,29 @@ # SPDX-License-Identifier: BSD-3-Clause # --------------------------------------------------------------------- import os +import subprocess + + +def process_output(command): + return command.stdout.decode("utf-8").strip() + + +BASH_EXECUTABLE = process_output( + subprocess.run("which bash", stdout=subprocess.PIPE, shell=True, check=True) +) + + +def run_and_get_output(command, check=True): + return process_output( + subprocess.run( + command, + stdout=subprocess.PIPE, + shell=True, + check=check, + executable=BASH_EXECUTABLE, + ) + ) -from .util import run_and_get_output # Env Variable STORE_ROOT_ENV_VAR = "QAIHM_STORE_ROOT" diff --git a/scripts/tasks/test.py b/scripts/tasks/test.py index 477fb97c..1b4d6287 100644 --- a/scripts/tasks/test.py +++ b/scripts/tasks/test.py @@ -16,7 +16,12 @@ STORE_ROOT_ENV_VAR, ) from .task import CompositeTask, PyTestTask, RunCommandsTask -from .util import can_support_aimet, model_needs_aimet +from .util import ( + can_support_aimet, + check_code_gen_field, + get_is_hub_quantized, + model_needs_aimet, +) from .venv import ( CreateVenvTask, RunCommandsWithVenvTask, @@ -69,6 +74,7 @@ def __init__( run_general: bool = True, run_compile: bool = True, run_profile: bool = False, + run_quantize: bool = False, run_export: bool = False, run_trace: bool = True, install_deps: bool = True, @@ -114,12 +120,15 @@ def __init__( if run_profile: test_flags.append("profile") test_flags.append("inference") + if run_quantize: + test_flags.append("quantize") if run_trace: test_flags.append("trace") if run_export: test_flags.append("export") - if test_flags: - extras_args += ["-m", f'"{" or ".join(test_flags)}"'] + if not test_flags: + raise ValueError("Must specify which types of tests to run") + extras_args += ["-m", f'"{" or ".join(test_flags)}"'] # Create temporary directory for storing cloned & downloaded test artifacts. with TemporaryDirectory() as tmpdir: @@ -179,6 +188,7 @@ def __init__( skip_standard_unit_test: bool = False, test_trace: bool = True, run_export_compile: bool = True, + run_export_quantize: bool = False, run_export_profile: bool = False, run_full_export: bool = False, exit_after_single_model_failure=False, @@ -213,14 +223,13 @@ def __init__( ) print(f"Tests to be run for models: {models_for_testing}") - global_models = [] + global_models = set([]) if not venv_for_each_model: for model_name in models_for_testing: - yaml_path = Path(PY_PACKAGE_MODELS_ROOT) / model_name / "code-gen.yaml" - if yaml_path.exists(): - with open(yaml_path, "r") as f: - if "global_requirements_incompatible" not in f.read(): - global_models.append(model_name) + if not check_code_gen_field( + model_name, "global_requirements_incompatible" + ): + global_models.add(model_name) if len(global_models) > 0: globals_path = Path(PY_PACKAGE_SRC_ROOT) / "global_requirements.txt" @@ -238,7 +247,21 @@ def __init__( # Sort models for ease of tracking how far along the tests are. # Do reverse order because whisper is slow to compile, so trigger earlier. - for model_name in sorted(models_for_testing, reverse=True): + export_models = models_to_test_export + hub_quantized_models = [] + nonhub_quantized_models = [] + for model in sorted(models_for_testing, reverse=True): + if get_is_hub_quantized(model) and model in export_models: + hub_quantized_models.append(model) + else: + nonhub_quantized_models.append(model) + + if run_export_quantize: + models_to_run = hub_quantized_models + else: + # Run hub quantized models last to give quantize job time to complete + models_to_run = nonhub_quantized_models + hub_quantized_models + for model_name in models_to_run: # Run standard test suite for this model. is_global_model = model_name in global_models tasks.append( @@ -250,12 +273,12 @@ def __init__( install_deps=not is_global_model, run_trace=test_trace, run_general=not skip_standard_unit_test, - run_compile=run_export_compile - and model_name in models_to_test_export, - run_profile=run_export_profile - and model_name in models_to_test_export, - run_export=run_full_export and model_name in models_to_test_export, - raise_on_failure=False, # Do not raise on failure; let PyTestModelsTask::run_tasks handle this + run_compile=run_export_compile and model_name in export_models, + run_profile=run_export_profile and model_name in export_models, + run_quantize=run_export_quantize and model_name in export_models, + run_export=run_full_export and model_name in export_models, + # Do not raise on failure; let PyTestModelsTask::run_tasks handle this + raise_on_failure=False, ) ) diff --git a/scripts/tasks/util.py b/scripts/tasks/util.py index 9625ff3b..1f5f1f77 100644 --- a/scripts/tasks/util.py +++ b/scripts/tasks/util.py @@ -3,10 +3,19 @@ # SPDX-License-Identifier: BSD-3-Clause # --------------------------------------------------------------------- import contextlib +import functools import os import platform import subprocess import sys +from pathlib import Path + +from .constants import ( + BASH_EXECUTABLE, + PY_PACKAGE_MODELS_ROOT, + process_output, + run_and_get_output, +) class Colors: @@ -35,12 +44,30 @@ def new_cd(x): os.chdir(d) +@functools.lru_cache(maxsize=None) +def check_code_gen_field(model_name: str, field_name: str) -> bool: + """ + This process does not have the yaml package, so use this primitive way + to check if a code gen field is true and apply branching logic within CI/scorecard. + """ + yaml_path = Path(PY_PACKAGE_MODELS_ROOT) / model_name / "code-gen.yaml" + if yaml_path.exists(): + with open(yaml_path, "r") as f: + if f"{field_name}: true" in f.read(): + return True + return False + + def can_support_aimet(platform: str = sys.platform) -> bool: return platform == "linux" or platform == "linux2" +def get_is_hub_quantized(model_name) -> bool: + return check_code_gen_field(model_name.lower(), "use_hub_quantization") + + def model_needs_aimet(model_name: str) -> bool: - return "quantized" in model_name.lower() + return "quantized" in model_name.lower() and not get_is_hub_quantized(model_name) def default_parallelism() -> int: @@ -76,26 +103,10 @@ def on_mac(): return platform.uname().system == "Darwin" -def process_output(command): - return command.stdout.decode("utf-8").strip() - - def run(command): return subprocess.run(command, shell=True, check=True, executable=BASH_EXECUTABLE) -def run_and_get_output(command, check=True): - return process_output( - subprocess.run( - command, - stdout=subprocess.PIPE, - shell=True, - check=check, - executable=BASH_EXECUTABLE, - ) - ) - - def run_with_venv(venv, command, env=None): if venv is not None: subprocess.run( @@ -122,8 +133,3 @@ def run_with_venv_and_get_output(venv, command): ) else: return run_and_get_output(command) - - -BASH_EXECUTABLE = process_output( - subprocess.run("which bash", stdout=subprocess.PIPE, shell=True, check=True) -) diff --git a/scripts/util/extract_info_from_context_binary.py b/scripts/util/extract_info_from_context_binary.py new file mode 100644 index 00000000..0bf0aba3 --- /dev/null +++ b/scripts/util/extract_info_from_context_binary.py @@ -0,0 +1,79 @@ +# --------------------------------------------------------------------- +# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved. +# SPDX-License-Identifier: BSD-3-Clause +# --------------------------------------------------------------------- +import argparse +import json +import os +import subprocess + +QNN_TYPE_TO_STR = { + "QNN_DATATYPE_UFIXED_POINT_16": "uint16", + "QNN_DATATYPE_UFIXED_POINT_8": "uint8", + "QNN_DATATYPE_INT_32": "int32", +} + + +def run_utility(qnn_sdk, model_path): + json_path = f"{os.path.splitext(os.path.basename(model_path))[0]}.json" + subprocess.run( + [ + f"{qnn_sdk}/qnn_sdk/default/bin/x86_64-linux-clang/qnn-context-binary-utility", + "--context_binary", + model_path, + "--json_file", + json_path, + ] + ) + return json_path + + +def print_details_from_json(json_path): + data = json.load(open(json_path, "r")) + + for graph in data["info"]["graphs"]: + print(f"Graph Name: {graph['info']['graphName']}") + input_spec = dict() + for input in graph["info"]["graphInputs"]: + input_spec[input["info"]["name"]] = ( + tuple(input["info"]["dimensions"]), + QNN_TYPE_TO_STR[input["info"]["dataType"]], + ) + print(f"Graph Input: {input_spec}") + out = [] + for output in graph["info"]["graphOutputs"]: + out.append(output["info"]["name"]) + print(f"Graph Output Names: {out}") + print() + + +def main(): + parser = argparse.ArgumentParser() + parser.add_argument( + "--model", + "-m", + type=str, + default=None, + help="Folder of context binaries whose graph names and input/output details are needed to create model.py", + ) + parser.add_argument( + "--qnn", + type=str, + default=None, + help="QNN SDK path", + ) + args = parser.parse_args() + assert args.qnn and args.model, "Must specify --model and --qnn" + + for model_path in os.listdir(args.model): + if os.path.splitext(model_path)[-1] == ".bin": + print(f"Model {model_path}") + print("===================") + json_path = run_utility(args.qnn, os.path.join(args.model, model_path)) + print_details_from_json(json_path) + print() + print() + + +if __name__ == "__main__": + main()