Skip to content

Commit

Permalink
docs: Benchmarks (#92)
Browse files Browse the repository at this point in the history
## Description
Add models Benchmarks (memory usage, inference time, model size)

### Type of change
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [x] Documentation update (improves or adds clarity to existing
documentation)

### Checklist
- [x] I have performed a self-review of my code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have updated the documentation accordingly
- [x] My changes generate no new warnings
  • Loading branch information
jakmro authored Feb 3, 2025
1 parent 2a98ffa commit c2eee13
Show file tree
Hide file tree
Showing 11 changed files with 274 additions and 21 deletions.
7 changes: 7 additions & 0 deletions docs/docs/benchmarks/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"label": "Benchmarks",
"position": 5,
"link": {
"type": "generated-index"
}
}
42 changes: 42 additions & 0 deletions docs/docs/benchmarks/inference-time.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
---
title: Inference Time
sidebar_position: 3
---

:::warning warning
Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
:::

## Classification

| Model | iPhone 16 Pro (Core ML) [ms] | iPhone 13 Pro (Core ML) [ms] | iPhone SE 3 (Core ML) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
| ----------------- | ---------------------------- | ---------------------------- | -------------------------- | --------------------------------- | ------------------------- |
| EFFICIENTNET_V2_S | 100 | 120 | 130 | 180 | 170 |

## Object Detection

| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 13 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
| ------------------------------ | ---------------------------- | ---------------------------- | -------------------------- | --------------------------------- | ------------------------- |
| SSDLITE_320_MOBILENET_V3_LARGE | 190 | 260 | 280 | 100 | 90 |

## Style Transfer

| Model | iPhone 16 Pro (Core ML) [ms] | iPhone 13 Pro (Core ML) [ms] | iPhone SE 3 (Core ML) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
| ---------------------------- | ---------------------------- | ---------------------------- | -------------------------- | --------------------------------- | ------------------------- |
| STYLE_TRANSFER_CANDY | 450 | 600 | 750 | 1650 | 1800 |
| STYLE_TRANSFER_MOSAIC | 450 | 600 | 750 | 1650 | 1800 |
| STYLE_TRANSFER_UDNIE | 450 | 600 | 750 | 1650 | 1800 |
| STYLE_TRANSFER_RAIN_PRINCESS | 450 | 600 | 750 | 1650 | 1800 |

## LLMs

| Model | iPhone 16 Pro (XNNPACK) [tokens/s] | iPhone 13 Pro (XNNPACK) [tokens/s] | iPhone SE 3 (XNNPACK) [tokens/s] | Samsung Galaxy S24 (XNNPACK) [tokens/s] | OnePlus 12 (XNNPACK) [tokens/s] |
| --------------------- | ---------------------------------- | ---------------------------------- | -------------------------------- | --------------------------------------- | ------------------------------- |
| LLAMA3_2_1B | 16.1 | 11.4 || 15.6 | 19.3 |
| LLAMA3_2_1B_SPINQUANT | 40.6 | 16.7 | 16.5 | 40.3 | 48.2 |
| LLAMA3_2_1B_QLORA | 31.8 | 11.4 | 11.2 | 37.3 | 44.4 |
| LLAMA3_2_3B ||||| 7.1 |
| LLAMA3_2_3B_SPINQUANT | 17.2 | 8.2 || 16.2 | 19.4 |
| LLAMA3_2_3B_QLORA | 14.5 ||| 14.8 | 18.1 |

❌ - Insufficient RAM.
36 changes: 36 additions & 0 deletions docs/docs/benchmarks/memory-usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
---
title: Memory Usage
sidebar_position: 2
---

## Classification

| Model | Android (XNNPACK) [MB] | iOS (Core ML) [MB] |
| ----------------- | ---------------------- | ------------------ |
| EFFICIENTNET_V2_S | 130 | 85 |

## Object Detection

| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
| ------------------------------ | ---------------------- | ------------------ |
| SSDLITE_320_MOBILENET_V3_LARGE | 90 | 90 |

## Style Transfer

| Model | Android (XNNPACK) [MB] | iOS (Core ML) [MB] |
| ---------------------------- | ---------------------- | ------------------ |
| STYLE_TRANSFER_CANDY | 950 | 350 |
| STYLE_TRANSFER_MOSAIC | 950 | 350 |
| STYLE_TRANSFER_UDNIE | 950 | 350 |
| STYLE_TRANSFER_RAIN_PRINCESS | 950 | 350 |

## LLMs

| Model | Android (XNNPACK) [GB] | iOS (XNNPACK) [GB] |
| --------------------- | ---------------------- | ------------------ |
| LLAMA3_2_1B | 3.2 | 3.1 |
| LLAMA3_2_1B_SPINQUANT | 1.9 | 2 |
| LLAMA3_2_1B_QLORA | 2.2 | 2.5 |
| LLAMA3_2_3B | 7.1 | 7.3 |
| LLAMA3_2_3B_SPINQUANT | 3.7 | 3.8 |
| LLAMA3_2_3B_QLORA | 4 | 4.1 |
36 changes: 36 additions & 0 deletions docs/docs/benchmarks/model-size.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
---
title: Model Size
sidebar_position: 1
---

## Classification

| Model | XNNPACK [MB] | Core ML [MB] |
| ----------------- | ------------ | ------------ |
| EFFICIENTNET_V2_S | 85.6 | 43.9 |

## Object Detection

| Model | XNNPACK [MB] |
| ------------------------------ | ------------ |
| SSDLITE_320_MOBILENET_V3_LARGE | 13.9 |

## Style Transfer

| Model | XNNPACK [MB] | Core ML [MB] |
| ---------------------------- | ------------ | ------------ |
| STYLE_TRANSFER_CANDY | 6.78 | 5.22 |
| STYLE_TRANSFER_MOSAIC | 6.78 | 5.22 |
| STYLE_TRANSFER_UDNIE | 6.78 | 5.22 |
| STYLE_TRANSFER_RAIN_PRINCESS | 6.78 | 5.22 |

## LLMs

| Model | XNNPACK [GB] |
| --------------------- | ------------ |
| LLAMA3_2_1B | 2.47 |
| LLAMA3_2_1B_SPINQUANT | 1.14 |
| LLAMA3_2_1B_QLORA | 1.18 |
| LLAMA3_2_3B | 6.43 |
| LLAMA3_2_3B_SPINQUANT | 2.55 |
| LLAMA3_2_3B_QLORA | 2.65 |
Original file line number Diff line number Diff line change
Expand Up @@ -86,3 +86,27 @@ function App() {
| Model | Number of classes | Class list |
| --------------------------------------------------------------------------------------------------------------- | ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [efficientnet_v2_s](https://pytorch.org/vision/0.20/models/generated/torchvision.models.efficientnet_v2_s.html) | 1000 | [ImageNet1k_v1](https://github.com/software-mansion/react-native-executorch/blob/main/android/src/main/java/com/swmansion/rnexecutorch/models/classification/Constants.kt) |

## Benchmarks

### Model size

| Model | XNNPACK [MB] | Core ML [MB] |
| ----------------- | ------------ | ------------ |
| EFFICIENTNET_V2_S | 85.6 | 43.9 |

### Memory usage

| Model | Android (XNNPACK) [MB] | iOS (Core ML) [MB] |
| ----------------- | ---------------------- | ------------------ |
| EFFICIENTNET_V2_S | 130 | 85 |

### Inference time

:::warning warning
Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
:::

| Model | iPhone 16 Pro (Core ML) [ms] | iPhone 13 Pro (Core ML) [ms] | iPhone SE 3 (Core ML) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
| ----------------- | ---------------------------- | ---------------------------- | -------------------------- | --------------------------------- | ------------------------- |
| EFFICIENTNET_V2_S | 100 | 120 | 130 | 180 | 170 |
Original file line number Diff line number Diff line change
Expand Up @@ -124,3 +124,27 @@ function App() {
| Model | Number of classes | Class list |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| [SSDLite320 MobileNetV3 Large](https://pytorch.org/vision/main/models/generated/torchvision.models.detection.ssdlite320_mobilenet_v3_large.html#torchvision.models.detection.SSDLite320_MobileNet_V3_Large_Weights) | 91 | [COCO](https://github.com/software-mansion/react-native-executorch/blob/69802ee1ca161d9df00def1dabe014d36341cfa9/src/types/object_detection.ts#L14) |

## Benchmarks

### Model size

| Model | XNNPACK [MB] |
| ------------------------------ | ------------ |
| SSDLITE_320_MOBILENET_V3_LARGE | 13.9 |

### Memory usage

| Model | Android (XNNPACK) [MB] | iOS (XNNPACK) [MB] |
| ------------------------------ | ---------------------- | ------------------ |
| SSDLITE_320_MOBILENET_V3_LARGE | 90 | 90 |

### Inference time

:::warning warning
Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
:::

| Model | iPhone 16 Pro (XNNPACK) [ms] | iPhone 13 Pro (XNNPACK) [ms] | iPhone SE 3 (XNNPACK) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
| ------------------------------ | ---------------------------- | ---------------------------- | -------------------------- | --------------------------------- | ------------------------- |
| SSDLITE_320_MOBILENET_V3_LARGE | 190 | 260 | 280 | 100 | 90 |
Original file line number Diff line number Diff line change
Expand Up @@ -78,3 +78,36 @@ function App(){
- [Mosaic](https://github.com/pytorch/examples/tree/main/fast_neural_style)
- [Udnie](https://github.com/pytorch/examples/tree/main/fast_neural_style)
- [Rain princess](https://github.com/pytorch/examples/tree/main/fast_neural_style)

## Benchmarks

### Model size

| Model | XNNPACK [MB] | Core ML [MB] |
| ---------------------------- | ------------ | ------------ |
| STYLE_TRANSFER_CANDY | 6.78 | 5.22 |
| STYLE_TRANSFER_MOSAIC | 6.78 | 5.22 |
| STYLE_TRANSFER_UDNIE | 6.78 | 5.22 |
| STYLE_TRANSFER_RAIN_PRINCESS | 6.78 | 5.22 |

### Memory usage

| Model | Android (XNNPACK) [MB] | iOS (Core ML) [MB] |
| ---------------------------- | ---------------------- | ------------------ |
| STYLE_TRANSFER_CANDY | 950 | 350 |
| STYLE_TRANSFER_MOSAIC | 950 | 350 |
| STYLE_TRANSFER_UDNIE | 950 | 350 |
| STYLE_TRANSFER_RAIN_PRINCESS | 950 | 350 |

### Inference time

:::warning warning
Times presented in the tables are measured as consecutive runs of the model. Initial run times may be up to 2x longer due to model loading and initialization.
:::

| Model | iPhone 16 Pro (Core ML) [ms] | iPhone 13 Pro (Core ML) [ms] | iPhone SE 3 (Core ML) [ms] | Samsung Galaxy S24 (XNNPACK) [ms] | OnePlus 12 (XNNPACK) [ms] |
| ---------------------------- | ---------------------------- | ---------------------------- | -------------------------- | --------------------------------- | ------------------------- |
| STYLE_TRANSFER_CANDY | 450 | 600 | 750 | 1650 | 1800 |
| STYLE_TRANSFER_MOSAIC | 450 | 600 | 750 | 1650 | 1800 |
| STYLE_TRANSFER_UDNIE | 450 | 600 | 750 | 1650 | 1800 |
| STYLE_TRANSFER_RAIN_PRINCESS | 450 | 600 | 750 | 1650 | 1800 |
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,15 @@ import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

## What is ExecuTorch?
ExecuTorch is a novel AI framework developed by Meta, designed to streamline deploying PyTorch models on a variety of devices, including mobile phones and microcontrollers. This framework enables exporting models into standalone binaries, allowing them to run locally without requiring API calls. ExecuTorch achieves state-of-the-art performance through optimizations and delegates such as CoreML and XNNPack. It provides a seamless export process with robust debugging options, making it easier to resolve issues if they arise.

ExecuTorch is a novel AI framework developed by Meta, designed to streamline deploying PyTorch models on a variety of devices, including mobile phones and microcontrollers. This framework enables exporting models into standalone binaries, allowing them to run locally without requiring API calls. ExecuTorch achieves state-of-the-art performance through optimizations and delegates such as Core ML and XNNPACK. It provides a seamless export process with robust debugging options, making it easier to resolve issues if they arise.

## React Native ExecuTorch

React Native ExecuTorch is our way of bringing ExecuTorch into the React Native world. Our API is built to be simple, declarative, and efficient. Plus, we’ll provide a set of pre-exported models for common use cases, so you won’t have to worry about handling exports yourself. With just a few lines of JavaScript, you’ll be able to run AI models (even LLMs 👀) right on your device—keeping user data private and saving on cloud costs.

## Installation

Installation is pretty straightforward, just use your favorite package manager.

<Tabs>
Expand Down Expand Up @@ -54,12 +57,15 @@ Because we are using ExecuTorch under the hood, you won't be able to build iOS a
:::

Running the app with the library:

```bash
yarn run expo:<ios | android> -d
```

## Good reads
If you want to dive deeper into ExecuTorch or our previous work with the framework, we highly encourage you to check out the following resources:

If you want to dive deeper into ExecuTorch or our previous work with the framework, we highly encourage you to check out the following resources:

- [ExecuTorch docs](https://pytorch.org/executorch/stable/index.html)
- [Native code for iOS](https://medium.com/swmansion/bringing-native-ai-to-your-mobile-apps-with-executorch-part-i-ios-f1562a4556e8?source=user_profile_page---------0-------------250189c98ccf---------------)
- [Native code for Android](https://medium.com/swmansion/bringing-native-ai-to-your-mobile-apps-with-executorch-part-ii-android-29431b6b9f7f?source=user_profile_page---------2-------------b8e3a5cb1c63---------------)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,32 +3,41 @@ title: Exporting Llama
sidebar_position: 2
---

In order to make the process of export as simple as possible for you, we created a script that runs a Docker container and exports the model.
In order to make the process of export as simple as possible for you, we created a script that runs a Docker container and exports the model.

## Steps to export Llama

### 1. Create an account
Get a [HuggingFace](https://huggingface.co/) account. This will allow you to download needed files. You can also use the [official Llama website](https://www.llama.com/llama-downloads/).

Get a [HuggingFace](https://huggingface.co/) account. This will allow you to download needed files. You can also use the [official Llama website](https://www.llama.com/llama-downloads/).

### 2. Select a model

Pick the model that suits your needs. Before you download it, you'll need to accept a license. For best performance, we recommend using Spin-Quant or QLoRA versions of the model:
- [Llama 3.2 3B](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct/tree/main/original)
- [Llama 3.2 1B](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct/tree/main/original)
- [Llama 3.2 3B Spin-Quant](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8/tree/main)
- [Llama 3.2 1B Spin-Quant](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8/tree/main)
- [Llama 3.2 3B QLoRA](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8/tree/main)
- [Llama 3.2 1B QLoRA](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct-QLORA_INT4_EO8/tree/main)

- [Llama 3.2 3B](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct/tree/main/original)
- [Llama 3.2 1B](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct/tree/main/original)
- [Llama 3.2 3B Spin-Quant](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8/tree/main)
- [Llama 3.2 1B Spin-Quant](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8/tree/main)
- [Llama 3.2 3B QLoRA](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct-QLORA_INT4_EO8/tree/main)
- [Llama 3.2 1B QLoRA](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct-QLORA_INT4_EO8/tree/main)

### 3. Download files

Download the `consolidated.00.pth`, `params.json` and `tokenizer.model` files. If you can't see them, make sure to check the `original` directory.

### 4. Rename the tokenizer file

Rename the `tokenizer.model` file to `tokenizer.bin` as required by the library:

```bash
mv tokenizer.model tokenizer.bin
```

### 5. Run the export script
Navigate to the `llama_export` directory and run the following command:

Navigate to the `llama_export` directory and run the following command:

```bash
./build_llama_binary.sh --model-path /path/to/consolidated.00.pth --params-path /path/to/params.json
```
Expand Down
Loading

0 comments on commit c2eee13

Please sign in to comment.