Skip to content

Commit

Permalink
v0.19.0
Browse files Browse the repository at this point in the history
See https://github.com/quic/ai-hub-models/releases/v0.19.0 for changelog.

Signed-off-by: QAIHM Team <[email protected]>
  • Loading branch information
qaihm-bot committed Nov 27, 2024
1 parent 2fc5329 commit 9507106
Show file tree
Hide file tree
Showing 381 changed files with 28,918 additions and 21,043 deletions.
9 changes: 8 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,13 +147,16 @@ and many more.
| -- | -- |
| | |
| **Image Classification**
| [Beit](https://aihub.qualcomm.com/models/beit) | [qai_hub_models.models.beit](qai_hub_models/models/beit/README.md) |
| [ConvNext-Base](https://aihub.qualcomm.com/models/convnext_base) | [qai_hub_models.models.convnext_base](qai_hub_models/models/convnext_base/README.md) |
| [ConvNext-Tiny](https://aihub.qualcomm.com/models/convnext_tiny) | [qai_hub_models.models.convnext_tiny](qai_hub_models/models/convnext_tiny/README.md) |
| [ConvNext-Tiny-w8a16-Quantized](https://aihub.qualcomm.com/models/convnext_tiny_w8a16_quantized) | [qai_hub_models.models.convnext_tiny_w8a16_quantized](qai_hub_models/models/convnext_tiny_w8a16_quantized/README.md) |
| [ConvNext-Tiny-w8a8-Quantized](https://aihub.qualcomm.com/models/convnext_tiny_w8a8_quantized) | [qai_hub_models.models.convnext_tiny_w8a8_quantized](qai_hub_models/models/convnext_tiny_w8a8_quantized/README.md) |
| [DenseNet-121](https://aihub.qualcomm.com/models/densenet121) | [qai_hub_models.models.densenet121](qai_hub_models/models/densenet121/README.md) |
| [DenseNet-121-Quantized](https://aihub.qualcomm.com/models/densenet121_quantized) | [qai_hub_models.models.densenet121_quantized](qai_hub_models/models/densenet121_quantized/README.md) |
| [EfficientNet-B0](https://aihub.qualcomm.com/models/efficientnet_b0) | [qai_hub_models.models.efficientnet_b0](qai_hub_models/models/efficientnet_b0/README.md) |
| [EfficientNet-B4](https://aihub.qualcomm.com/models/efficientnet_b4) | [qai_hub_models.models.efficientnet_b4](qai_hub_models/models/efficientnet_b4/README.md) |
| [EfficientNet-V2-s](https://aihub.qualcomm.com/models/efficientnet_v2_s) | [qai_hub_models.models.efficientnet_v2_s](qai_hub_models/models/efficientnet_v2_s/README.md) |
| [EfficientViT-b2-cls](https://aihub.qualcomm.com/models/efficientvit_b2_cls) | [qai_hub_models.models.efficientvit_b2_cls](qai_hub_models/models/efficientvit_b2_cls/README.md) |
| [EfficientViT-l2-cls](https://aihub.qualcomm.com/models/efficientvit_l2_cls) | [qai_hub_models.models.efficientvit_l2_cls](qai_hub_models/models/efficientvit_l2_cls/README.md) |
| [GoogLeNet](https://aihub.qualcomm.com/models/googlenet) | [qai_hub_models.models.googlenet](qai_hub_models/models/googlenet/README.md) |
Expand All @@ -166,6 +169,7 @@ and many more.
| [MobileNet-v3-Large](https://aihub.qualcomm.com/models/mobilenet_v3_large) | [qai_hub_models.models.mobilenet_v3_large](qai_hub_models/models/mobilenet_v3_large/README.md) |
| [MobileNet-v3-Large-Quantized](https://aihub.qualcomm.com/models/mobilenet_v3_large_quantized) | [qai_hub_models.models.mobilenet_v3_large_quantized](qai_hub_models/models/mobilenet_v3_large_quantized/README.md) |
| [MobileNet-v3-Small](https://aihub.qualcomm.com/models/mobilenet_v3_small) | [qai_hub_models.models.mobilenet_v3_small](qai_hub_models/models/mobilenet_v3_small/README.md) |
| [Mobile_Vit](https://aihub.qualcomm.com/models/mobile_vit) | [qai_hub_models.models.mobile_vit](qai_hub_models/models/mobile_vit/README.md) |
| [RegNet](https://aihub.qualcomm.com/models/regnet) | [qai_hub_models.models.regnet](qai_hub_models/models/regnet/README.md) |
| [RegNetQuantized](https://aihub.qualcomm.com/models/regnet_quantized) | [qai_hub_models.models.regnet_quantized](qai_hub_models/models/regnet_quantized/README.md) |
| [ResNeXt101](https://aihub.qualcomm.com/models/resnext101) | [qai_hub_models.models.resnext101](qai_hub_models/models/resnext101/README.md) |
Expand Down Expand Up @@ -214,6 +218,7 @@ and many more.
| [DeepLabV3-Plus-MobileNet](https://aihub.qualcomm.com/models/deeplabv3_plus_mobilenet) | [qai_hub_models.models.deeplabv3_plus_mobilenet](qai_hub_models/models/deeplabv3_plus_mobilenet/README.md) |
| [DeepLabV3-Plus-MobileNet-Quantized](https://aihub.qualcomm.com/models/deeplabv3_plus_mobilenet_quantized) | [qai_hub_models.models.deeplabv3_plus_mobilenet_quantized](qai_hub_models/models/deeplabv3_plus_mobilenet_quantized/README.md) |
| [DeepLabV3-ResNet50](https://aihub.qualcomm.com/models/deeplabv3_resnet50) | [qai_hub_models.models.deeplabv3_resnet50](qai_hub_models/models/deeplabv3_resnet50/README.md) |
| [EfficientViT-l2-seg](https://aihub.qualcomm.com/models/efficientvit_l2_seg) | [qai_hub_models.models.efficientvit_l2_seg](qai_hub_models/models/efficientvit_l2_seg/README.md) |
| [FCN-ResNet50](https://aihub.qualcomm.com/models/fcn_resnet50) | [qai_hub_models.models.fcn_resnet50](qai_hub_models/models/fcn_resnet50/README.md) |
| [FCN-ResNet50-Quantized](https://aihub.qualcomm.com/models/fcn_resnet50_quantized) | [qai_hub_models.models.fcn_resnet50_quantized](qai_hub_models/models/fcn_resnet50_quantized/README.md) |
| [FFNet-122NS-LowRes](https://aihub.qualcomm.com/models/ffnet_122ns_lowres) | [qai_hub_models.models.ffnet_122ns_lowres](qai_hub_models/models/ffnet_122ns_lowres/README.md) |
Expand All @@ -233,6 +238,7 @@ and many more.
| [YOLOv8-Segmentation](https://aihub.qualcomm.com/models/yolov8_seg) | [qai_hub_models.models.yolov8_seg](qai_hub_models/models/yolov8_seg/README.md) |
| | |
| **Object Detection**
| [Conditional-DETR-ResNet50](https://aihub.qualcomm.com/models/conditional_detr_resnet50) | [qai_hub_models.models.conditional_detr_resnet50](qai_hub_models/models/conditional_detr_resnet50/README.md) |
| [DETR-ResNet101](https://aihub.qualcomm.com/models/detr_resnet101) | [qai_hub_models.models.detr_resnet101](qai_hub_models/models/detr_resnet101/README.md) |
| [DETR-ResNet101-DC5](https://aihub.qualcomm.com/models/detr_resnet101_dc5) | [qai_hub_models.models.detr_resnet101_dc5](qai_hub_models/models/detr_resnet101_dc5/README.md) |
| [DETR-ResNet50](https://aihub.qualcomm.com/models/detr_resnet50) | [qai_hub_models.models.detr_resnet50](qai_hub_models/models/detr_resnet50/README.md) |
Expand All @@ -257,6 +263,7 @@ and many more.
| | |
| **Pose Estimation**
| [Facial-Landmark-Detection](https://aihub.qualcomm.com/models/facemap_3dmm) | [qai_hub_models.models.facemap_3dmm](qai_hub_models/models/facemap_3dmm/README.md) |
| [Facial-Landmark-Detection-Quantized](https://aihub.qualcomm.com/models/facemap_3dmm_quantized) | [qai_hub_models.models.facemap_3dmm_quantized](qai_hub_models/models/facemap_3dmm_quantized/README.md) |
| [HRNetPose](https://aihub.qualcomm.com/models/hrnet_pose) | [qai_hub_models.models.hrnet_pose](qai_hub_models/models/hrnet_pose/README.md) |
| [HRNetPoseQuantized](https://aihub.qualcomm.com/models/hrnet_pose_quantized) | [qai_hub_models.models.hrnet_pose_quantized](qai_hub_models/models/hrnet_pose_quantized/README.md) |
| [LiteHRNet](https://aihub.qualcomm.com/models/litehrnet) | [qai_hub_models.models.litehrnet](qai_hub_models/models/litehrnet/README.md) |
Expand Down Expand Up @@ -285,8 +292,8 @@ and many more.
| Model | README |
| -- | -- |
| | |
| [OpenAI-Clip](https://aihub.qualcomm.com/models/openai_clip) | [qai_hub_models.models.openai_clip](qai_hub_models/models/openai_clip/README.md) |
| [TrOCR](https://aihub.qualcomm.com/models/trocr) | [qai_hub_models.models.trocr](qai_hub_models/models/trocr/README.md) |
| [OpenAI-Clip](https://aihub.qualcomm.com/models/openai_clip) | [qai_hub_models.models.openai_clip](qai_hub_models/models/openai_clip/README.md) |

### Generative Ai

Expand Down
2 changes: 1 addition & 1 deletion qai_hub_models/_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
# ---------------------------------------------------------------------
__version__ = "0.18.0"
__version__ = "0.19.0"
4 changes: 4 additions & 0 deletions qai_hub_models/models/_shared/facemap_3dmm/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# ---------------------------------------------------------------------
# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
# ---------------------------------------------------------------------
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,10 @@
import numpy as np
import torch

from qai_hub_models.models.facemap_3dmm.model import MODEL_ASSET_VERSION, MODEL_ID
from qai_hub_models.models._shared.facemap_3dmm.model import (
MODEL_ASSET_VERSION,
MODEL_ID,
)
from qai_hub_models.utils.asset_loaders import CachedWebModelAsset, load_numpy


Expand Down
81 changes: 81 additions & 0 deletions qai_hub_models/models/_shared/facemap_3dmm/demo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# ---------------------------------------------------------------------
# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
# ---------------------------------------------------------------------

import cv2
import numpy as np
from PIL import Image
from skimage import io

from qai_hub_models.models._shared.facemap_3dmm.app import FaceMap_3DMMApp
from qai_hub_models.models._shared.facemap_3dmm.model import (
MODEL_ASSET_VERSION,
MODEL_ID,
FaceMap_3DMM,
)
from qai_hub_models.utils.args import (
demo_model_from_cli_args,
get_model_cli_parser,
get_on_device_demo_parser,
validate_on_device_demo_args,
)
from qai_hub_models.utils.asset_loaders import CachedWebModelAsset
from qai_hub_models.utils.display import display_or_save_image

INPUT_IMAGE_PATH = str(
CachedWebModelAsset.from_asset_store(MODEL_ID, MODEL_ASSET_VERSION, "face_img.jpg")
)


# Run FaceMap_3DMM end-to-end on a sample image.
# The demo will display a image with the predicted landmark displayed.
def main(
model_cls: type[FaceMap_3DMM] = FaceMap_3DMM,
model_id: str = MODEL_ID,
is_test: bool = False,
):
# Demo parameters
parser = get_model_cli_parser(model_cls)
parser = get_on_device_demo_parser(parser, add_output_dir=True)
parser.add_argument(
"--image",
type=str,
default=INPUT_IMAGE_PATH,
help="image file path or URL",
)
args = parser.parse_args([] if is_test else None)
model = demo_model_from_cli_args(model_cls, model_id, args)
validate_on_device_demo_args(args, model_id)

# Load image
(_, _, height, width) = FaceMap_3DMM.get_input_spec()["image"][0]
image = io.imread(args.image)

print("Model Loaded")

app = FaceMap_3DMMApp(model)

# Get face bounding box info (from file or face detector)
fbox = np.loadtxt(INPUT_IMAGE_PATH.replace(".jpg", "_fbox.txt"))
x0, x1, y0, y1 = int(fbox[0]), int(fbox[1]), int(fbox[2]), int(fbox[3])

lmk, output = app.landmark_prediction(image, x0, x1, y0, y1)

if not is_test:
# Annotated lmk
np.savetxt(
"qai_hub_models/models/facemap_3dmm/demo_output_lmk.txt",
lmk.detach().numpy(),
)

# Annotated image
display_or_save_image(
Image.fromarray(cv2.cvtColor(output, cv2.COLOR_BGR2RGB)),
"qai_hub_models/models/facemap_3dmm",
"demo_output_img.png",
)


if __name__ == "__main__":
main()
64 changes: 64 additions & 0 deletions qai_hub_models/models/_shared/facemap_3dmm/model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# ---------------------------------------------------------------------
# Copyright (c) 2024 Qualcomm Innovation Center, Inc. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
# ---------------------------------------------------------------------
from __future__ import annotations

import torch
import torch.nn as nn

from qai_hub_models.models._shared.facemap_3dmm.resnet_score_rgb import resnet18_wd2
from qai_hub_models.utils.asset_loaders import CachedWebModelAsset, load_torch
from qai_hub_models.utils.base_model import BaseModel
from qai_hub_models.utils.input_spec import InputSpec

MODEL_ID = __name__.split(".")[-2]
DEFAULT_WEIGHTS = "resnet_wd2_weak_score_1202_3ch.pth.tar"
MODEL_ASSET_VERSION = 1


class FaceMap_3DMM(BaseModel):
"""Exportable FaceMap_3DMM, end-to-end."""

def __init__(self, model: nn.Module) -> None:
super().__init__()
self.model = model

@classmethod
def from_pretrained(cls):

resnet_model = resnet18_wd2(pretrained=False)

checkpoint_path = CachedWebModelAsset.from_asset_store(
MODEL_ID, MODEL_ASSET_VERSION, DEFAULT_WEIGHTS
)
pretrained_dict = load_torch(checkpoint_path)["state_dict"]
resnet_model.load_state_dict(pretrained_dict)
resnet_model.to(torch.device("cpu")).eval()

return cls(resnet_model)

def forward(self, image):
"""
Run ResNet18_0.5 3Ch on `image`, and produce 265 outputs
Parameters:
image: Pixel values pre-processed for encoder consumption.
Range: float[0, 255]
3-channel Color Space: RGB
Returns:
3DMM model paramaeters for facial landmark reconstruction: Shape [batch, 265]
"""
return self.model(image)

@staticmethod
def get_input_spec() -> InputSpec:
"""
Returns the input specification (name -> (shape, type).
"""
return {"image": ((1, 3, 128, 128), "float32")}

@staticmethod
def get_output_names() -> list[str]:
return ["parameters_3dmm"]
Original file line number Diff line number Diff line change
Expand Up @@ -523,7 +523,6 @@ def forward(self, x):
x = self.features(x)
feature = x.view(x.size(0), -1)
out = self.output(feature)
out[:, 264] = self.sigmoid(out[:, 264])

return out

Expand Down
14 changes: 6 additions & 8 deletions qai_hub_models/models/_shared/llama3/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,18 +88,16 @@ def generate_output_prompt(
output_token = None
hub_tokens = None

model = self.model_cls.from_pretrained(sequence_length=128)
llm_config = model.llm_config
model = self.model_cls.from_pretrained(
sequence_length=prompt_sequence_length,
context_length=context_length,
)
is_prompt = True

# Process input prompt
input_specs = self.model_cls.get_input_spec(
input_seq_length=prompt_sequence_length,
num_hidden_layers=llm_config.num_hidden_layers,
context_length=model.context_length,
hidden_size=llm_config.hidden_size,
num_attention_heads=llm_config.num_attention_heads,
num_key_value_heads=llm_config.num_key_value_heads,
sequence_length=prompt_sequence_length,
context_length=context_length,
)

# Initialization of KV cache
Expand Down
46 changes: 14 additions & 32 deletions qai_hub_models/models/_shared/llama3/demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
# ---------------------------------------------------------------------
from __future__ import annotations

import textwrap
from collections.abc import Callable
from typing import Any

Expand All @@ -14,9 +15,7 @@

# Max output tokens to generate
# You can override this with cli argument.
# Keeping this short as on-device demo takes time to converge.
MAX_OUTPUT_TOKENS = 20
DEFAULT_DEVICE = "Samsung Galaxy S24 (Family)"


def llama_chat_demo(
Expand Down Expand Up @@ -47,6 +46,7 @@ def llama_chat_demo(
default_prompt: Default prompt to set,
is_test: If test, no options required,
available_target_runtimes: Default availble runtime in options,
bundled_kvcache: KV-cache for each head is concatenated.
"""
# Demo parameters
parser = get_model_cli_parser(model_cls)
Expand All @@ -56,12 +56,6 @@ def llama_chat_demo(
default=default_prompt,
help="input prompt.",
)
parser.add_argument(
"--prompt-processor-input-seq-len",
type=int,
default=128,
help="input sequence length for prompt-processor. This must be less than `context_length` set for model.",
)
parser.add_argument(
"--max-output-tokens",
type=int,
Expand All @@ -79,44 +73,32 @@ def llama_chat_demo(
print("Please pass `--max-output-tokens <int>` to generate longer responses.")
print()
print(
"""NOTE: Each token generation takes around 15 mins on-device:
1. Model is divided into multiple parts to fit into device constraints
2. Each model requires separate execution on-device via AI Hub
3. Due to autoregressive nature, we cannot run step 2 in parallel
4. Device procurement is subject to device availability and might take longer to run demo on-device
Alternative:
1. Run demo on host (with PyTorch) to verify e2e result for longer responses
2. Run demo on-device for shorter responses (--max-output-tokens 10 or 20)
3. [Optional] Can run demo on-device to generate long sentence (takes longer)
We are actively working on to improve UX and reduce turn-around time for these models.
"""
textwrap.dedent(
"""
NOTE: This demo runs an unquantized version of Llama, so it may
not be representative of on-device results. The demo is intended as
reference code for how Llama can be executed on device using both a
prompt processor and a token generator. We recommend using Genie
SDK for on-device deployment of LLMs.""".lstrip(
"\n"
)
)
)
print(f"{'-' * 85}\n")

has_model_access(hf_repo_name, hf_repo_url)

"""
llama_ar128 = model_cls.from_pretrained(
sequence_length=args.prompt_processor_input_seq_len
)
llama_ar1 = model_cls.from_pretrained(sequence_length=1)
context_length = llama_ar128.context_length
"""

app = App(
model_cls,
get_input_prompt_with_tags=get_input_prompt_with_tags,
prepare_combined_attention_mask=prepare_combined_attention_mask,
tokenizer=tokenizer,
end_tokens=end_tokens,
)
context_length = 4096
app.generate_output_prompt(
args.prompt,
prompt_sequence_length=args.prompt_processor_input_seq_len,
context_length=context_length,
prompt_sequence_length=args.sequence_length,
context_length=args.context_length,
max_output_tokens=args.max_output_tokens,
bundled_kvcache=bundled_kvcache,
)
Loading

0 comments on commit 9507106

Please sign in to comment.