v0.20.0

See https://github.com/quic/ai-hub-models/releases/v0.20.0 for changelog. Signed-off-by: QAIHM Team <[email protected]>
quic · Dec 12, 2024 · 9857d3f · 9857d3f
1 parent 826c3ea
commit 9857d3f
Show file tree

Hide file tree

Showing 328 changed files with 30,803 additions and 19,280 deletions.
diff --git a/README.md b/README.md
@@ -235,6 +235,7 @@ and many more.
 | [SINet](https://aihub.qualcomm.com/models/sinet) | [qai_hub_models.models.sinet](qai_hub_models/models/sinet/README.md) |
 | [Segment-Anything-Model](https://aihub.qualcomm.com/models/sam) | [qai_hub_models.models.sam](qai_hub_models/models/sam/README.md) |
 | [Unet-Segmentation](https://aihub.qualcomm.com/models/unet_segmentation) | [qai_hub_models.models.unet_segmentation](qai_hub_models/models/unet_segmentation/README.md) |
+| [YOLOv11-Segmentation](https://aihub.qualcomm.com/models/yolov11_seg) | [qai_hub_models.models.yolov11_seg](qai_hub_models/models/yolov11_seg/README.md) |
 | [YOLOv8-Segmentation](https://aihub.qualcomm.com/models/yolov8_seg) | [qai_hub_models.models.yolov8_seg](qai_hub_models/models/yolov8_seg/README.md) |
 | | |
 | **Object Detection**
@@ -243,8 +244,10 @@ and many more.
 | [DETR-ResNet101-DC5](https://aihub.qualcomm.com/models/detr_resnet101_dc5) | [qai_hub_models.models.detr_resnet101_dc5](qai_hub_models/models/detr_resnet101_dc5/README.md) |
 | [DETR-ResNet50](https://aihub.qualcomm.com/models/detr_resnet50) | [qai_hub_models.models.detr_resnet50](qai_hub_models/models/detr_resnet50/README.md) |
 | [DETR-ResNet50-DC5](https://aihub.qualcomm.com/models/detr_resnet50_dc5) | [qai_hub_models.models.detr_resnet50_dc5](qai_hub_models/models/detr_resnet50_dc5/README.md) |
-| [FaceAttribNet](https://aihub.qualcomm.com/models/face_attrib_net) | [qai_hub_models.models.face_attrib_net](qai_hub_models/models/face_attrib_net/README.md) |
+| [Facial-Attribute-Detection](https://aihub.qualcomm.com/models/face_attrib_net) | [qai_hub_models.models.face_attrib_net](qai_hub_models/models/face_attrib_net/README.md) |
+| [Facial-Attribute-Detection-Quantized](https://aihub.qualcomm.com/models/face_attrib_net_quantized) | [qai_hub_models.models.face_attrib_net_quantized](qai_hub_models/models/face_attrib_net_quantized/README.md) |
 | [Lightweight-Face-Detection](https://aihub.qualcomm.com/models/face_det_lite) | [qai_hub_models.models.face_det_lite](qai_hub_models/models/face_det_lite/README.md) |
+| [Lightweight-Face-Detection-Quantized](https://aihub.qualcomm.com/models/face_det_lite_quantized) | [qai_hub_models.models.face_det_lite_quantized](qai_hub_models/models/face_det_lite_quantized/README.md) |
 | [MediaPipe-Face-Detection](https://aihub.qualcomm.com/models/mediapipe_face) | [qai_hub_models.models.mediapipe_face](qai_hub_models/models/mediapipe_face/README.md) |
 | [MediaPipe-Face-Detection-Quantized](https://aihub.qualcomm.com/models/mediapipe_face_quantized) | [qai_hub_models.models.mediapipe_face_quantized](qai_hub_models/models/mediapipe_face_quantized/README.md) |
 | [MediaPipe-Hand-Detection](https://aihub.qualcomm.com/models/mediapipe_hand) | [qai_hub_models.models.mediapipe_hand](qai_hub_models/models/mediapipe_hand/README.md) |
@@ -257,6 +260,7 @@ and many more.
 | [YOLOv8-Detection-Quantized](https://aihub.qualcomm.com/models/yolov8_det_quantized) | [qai_hub_models.models.yolov8_det_quantized](qai_hub_models/models/yolov8_det_quantized/README.md) |
 | [Yolo-NAS](https://aihub.qualcomm.com/models/yolonas) | [qai_hub_models.models.yolonas](qai_hub_models/models/yolonas/README.md) |
 | [Yolo-NAS-Quantized](https://aihub.qualcomm.com/models/yolonas_quantized) | [qai_hub_models.models.yolonas_quantized](qai_hub_models/models/yolonas_quantized/README.md) |
+| [Yolo-v3](https://aihub.qualcomm.com/models/yolov3) | [qai_hub_models.models.yolov3](qai_hub_models/models/yolov3/README.md) |
 | [Yolo-v6](https://aihub.qualcomm.com/models/yolov6) | [qai_hub_models.models.yolov6](qai_hub_models/models/yolov6/README.md) |
 | [Yolo-v7](https://aihub.qualcomm.com/models/yolov7) | [qai_hub_models.models.yolov7](qai_hub_models/models/yolov7/README.md) |
 | [Yolo-v7-Quantized](https://aihub.qualcomm.com/models/yolov7_quantized) | [qai_hub_models.models.yolov7_quantized](qai_hub_models/models/yolov7_quantized/README.md) |
@@ -273,6 +277,8 @@ and many more.
 | [Posenet-Mobilenet-Quantized](https://aihub.qualcomm.com/models/posenet_mobilenet_quantized) | [qai_hub_models.models.posenet_mobilenet_quantized](qai_hub_models/models/posenet_mobilenet_quantized/README.md) |
 | | |
 | **Depth Estimation**
+| [Depth-Anything](https://aihub.qualcomm.com/models/depth_anything) | [qai_hub_models.models.depth_anything](qai_hub_models/models/depth_anything/README.md) |
+| [Depth-Anything-V2](https://aihub.qualcomm.com/models/depth_anything_v2) | [qai_hub_models.models.depth_anything_v2](qai_hub_models/models/depth_anything_v2/README.md) |
 | [Midas-V2](https://aihub.qualcomm.com/models/midas) | [qai_hub_models.models.midas](qai_hub_models/models/midas/README.md) |
 | [Midas-V2-Quantized](https://aihub.qualcomm.com/models/midas_quantized) | [qai_hub_models.models.midas_quantized](qai_hub_models/models/midas_quantized/README.md) |
 
@@ -284,16 +290,15 @@ and many more.
 | **Speech Recognition**
 | [HuggingFace-WavLM-Base-Plus](https://aihub.qualcomm.com/models/huggingface_wavlm_base_plus) | [qai_hub_models.models.huggingface_wavlm_base_plus](qai_hub_models/models/huggingface_wavlm_base_plus/README.md) |
 | [Whisper-Base-En](https://aihub.qualcomm.com/models/whisper_base_en) | [qai_hub_models.models.whisper_base_en](qai_hub_models/models/whisper_base_en/README.md) |
-| [Whisper-Small-En](https://aihub.qualcomm.com/models/whisper_small_en) | [qai_hub_models.models.whisper_small_en](qai_hub_models/models/whisper_small_en/README.md) |
 | [Whisper-Tiny-En](https://aihub.qualcomm.com/models/whisper_tiny_en) | [qai_hub_models.models.whisper_tiny_en](qai_hub_models/models/whisper_tiny_en/README.md) |
 
 ### Multimodal
 
 | Model | README |
 | -- | -- |
 | | |
-| [TrOCR](https://aihub.qualcomm.com/models/trocr) | [qai_hub_models.models.trocr](qai_hub_models/models/trocr/README.md) |
 | [OpenAI-Clip](https://aihub.qualcomm.com/models/openai_clip) | [qai_hub_models.models.openai_clip](qai_hub_models/models/openai_clip/README.md) |
+| [TrOCR](https://aihub.qualcomm.com/models/trocr) | [qai_hub_models.models.trocr](qai_hub_models/models/trocr/README.md) |
 
 ### Generative Ai
 

diff --git a/qai_hub_models/_version.py b/qai_hub_models/_version.py
@@ -2,4 +2,4 @@
 # Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
 # SPDX-License-Identifier: BSD-3-Clause
 # ---------------------------------------------------------------------
-__version__ = "0.19.1"
+__version__ = "0.20.0"
diff --git a/qai_hub_models/labels/ppe_labels.txt b/qai_hub_models/labels/ppe_labels.txt
@@ -0,0 +1,2 @@
+helmet
+vest
diff --git a/qai_hub_models/models/midas/app.py → ...ls/models/_shared/depth_estimation/app.py b/qai_hub_models/models/midas/app.py → ...ls/models/_shared/depth_estimation/app.py
@@ -15,7 +15,20 @@
 from qai_hub_models.utils.image_processing import pil_resize_pad, undo_resize_pad
 
 
-class MidasApp:
+class DepthEstimationApp:
+    """
+    This class is required to perform end to end inference for Depth Estimation
+
+    The app uses 2 models:
+        * Midas
+        * DepthAnything
+
+    For a given image input, the app will:
+        * pre-process the image (convert to range[0, 1])
+        * Run DepthAnything inference
+        * Convert the depth into visual representation(heatmap) and return as image
+    """
+
     def __init__(
         self,
         model: Callable[[torch.Tensor], torch.Tensor],

diff --git a/qai_hub_models/models/_shared/depth_estimation/demo.py b/qai_hub_models/models/_shared/depth_estimation/demo.py
@@ -0,0 +1,49 @@
+# ---------------------------------------------------------------------
+# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+# ---------------------------------------------------------------------
+
+from qai_hub_models.models._shared.depth_estimation.app import DepthEstimationApp
+from qai_hub_models.utils.args import (
+    demo_model_from_cli_args,
+    get_model_cli_parser,
+    get_on_device_demo_parser,
+    validate_on_device_demo_args,
+)
+from qai_hub_models.utils.asset_loaders import CachedWebModelAsset, load_image
+from qai_hub_models.utils.base_model import BaseModel
+from qai_hub_models.utils.display import display_or_save_image
+
+
+# The demo will display a heatmap of the estimated depth at each point in the image.
+def depth_estimation_demo(
+    model_cls: type[BaseModel],
+    model_id,
+    default_image: CachedWebModelAsset,
+    is_test: bool = False,
+):
+    parser = get_model_cli_parser(model_cls)
+    parser = get_on_device_demo_parser(parser, add_output_dir=True)
+    parser.add_argument(
+        "--image",
+        type=str,
+        default=default_image,
+        help="image file path or URL",
+    )
+    args = parser.parse_args([] if is_test else None)
+    model = demo_model_from_cli_args(model_cls, model_id, args)
+    validate_on_device_demo_args(args, model_id)
+
+    # Load image
+    (_, _, height, width) = model_cls.get_input_spec()["image"][0]
+    image = load_image(args.image)
+    print("Model Loaded")
+
+    app = DepthEstimationApp(model, height, width)
+    heatmap_image = app.estimate_depth(image)
+
+    if not is_test:
+        # Resize / unpad annotated image
+        display_or_save_image(
+            heatmap_image, args.output_dir, "out_heatmap.png", "heatmap"
+        )
diff --git a/qai_hub_models/models/face_attrib_net/app.py → ...els/models/_shared/face_attrib_net/app.py b/qai_hub_models/models/face_attrib_net/app.py → ...els/models/_shared/face_attrib_net/app.py
diff --git a/qai_hub_models/models/_shared/face_attrib_net/demo.py b/qai_hub_models/models/_shared/face_attrib_net/demo.py
@@ -0,0 +1,57 @@
+# ---------------------------------------------------------------------
+# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+# ---------------------------------------------------------------------
+import json
+from pathlib import Path
+
+from qai_hub_models.models._shared.face_attrib_net.app import FaceAttribNetApp
+from qai_hub_models.models.face_attrib_net.model import (
+    MODEL_ASSET_VERSION,
+    MODEL_ID,
+    OUT_NAMES,
+    FaceAttribNet,
+)
+from qai_hub_models.utils.args import (
+    demo_model_from_cli_args,
+    get_model_cli_parser,
+    get_on_device_demo_parser,
+    validate_on_device_demo_args,
+)
+from qai_hub_models.utils.asset_loaders import CachedWebModelAsset, load_image
+
+INPUT_IMAGE_ADDRESS = CachedWebModelAsset.from_asset_store(
+    MODEL_ID, MODEL_ASSET_VERSION, "img_sample.bmp"
+)
+
+
+# Run FaceAttribNet end-to-end on a sample image.
+def face_attrib_net_demo(model_cls: type[FaceAttribNet], is_test: bool = False):
+    # Demo parameters
+    parser = get_model_cli_parser(model_cls)
+    parser = get_on_device_demo_parser(parser, add_output_dir=True)
+    parser.add_argument(
+        "--image",
+        type=str,
+        default=INPUT_IMAGE_ADDRESS,
+        help="image file path or URL",
+    )
+    args = parser.parse_args([])
+    model = demo_model_from_cli_args(model_cls, MODEL_ID, args)
+    validate_on_device_demo_args(args, MODEL_ID)
+
+    # Load image
+    _, _, height, width = model_cls.get_input_spec()["image"][0]
+    orig_image = load_image(args.image)
+    print("Model loaded")
+
+    app = FaceAttribNetApp(model)
+    output = app.run_inference_on_image(orig_image)
+    out_dict = {}
+    for i in range(len(output)):
+        out_dict[OUT_NAMES[i]] = list(output[i].astype(float))
+
+    output_path = (args.output_dir or str(Path() / "build")) + "/output.json"
+    with open(output_path, "w", encoding="utf-8") as wf:
+        json.dump(out_dict, wf, ensure_ascii=False, indent=4)
+    print(f"Model outputs are saved at: {output_path}")