Skip to content

Commit

Permalink
meta-llama/Llama-3.3-70B-Instruct added as validated model (#196)
Browse files Browse the repository at this point in the history
* Adding more validated models in docs

Signed-off-by: Abukhoyer Shaik <[email protected]>

* Sorted models

Signed-off-by: Abukhoyer Shaik <[email protected]>

* Llama-3.3-70B-instruct added in the validated model

Signed-off-by: Abukhoyer Shaik <[email protected]>

* Readme and docs string changed

Signed-off-by: Abukhoyer Shaik <[email protected]>

* Readme Modified

Signed-off-by: Abukhoyer Shaik <[email protected]>

---------

Signed-off-by: Abukhoyer Shaik <[email protected]>
  • Loading branch information
abukhoy authored Dec 11, 2024
1 parent 1b1b71d commit 56cb807
Show file tree
Hide file tree
Showing 3 changed files with 12 additions and 5 deletions.
2 changes: 1 addition & 1 deletion QEfficient/transformers/models/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,7 @@ def compile(
:onnx_path (str, optional): Path to pre-exported onnx model.
:compile_dir (str, optional): Path for saving the qpc generated.
:num_cores (int): Number of cores used to compile the model.
:num_devices (List[int]): Number of devices for tensor-slicing is invoked, defaults to None, and automatically chooses suitable device.
:num_devices (int): Number of devices the model needs to be compiled for. Defaults to 1.
:batch_size (int, optional): Batch size. ``Defaults to 1``.
:prefill_seq_len (int, optional): The length of the Prefill prompt should be less that ``prefill_seq_len``. ``Defaults to 32``.
:ctx_len (int, optional): Maximum ``ctx`` that the compiled model can remember. ``Defaults to 128``.
Expand Down
14 changes: 10 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,16 @@

*Latest news* :fire: <br>

- [coming soon] Support for more popular [models](https://quic.github.io/efficient-transformers/source/validate.html#models-coming-soon) and inference optimization technique speculative decoding <br>
- [09/2024] [AWQ](https://arxiv.org/abs/2306.00978)/[GPTQ](https://arxiv.org/abs/2210.17323) 4-bit quantized models are supported
- [11/2024] [finite adapters support](https://github.com/quic/efficient-transformers/pull/153) allows mixed adapter usage for peft models.
- [11/2024] [Speculative decoding TLM](https://github.com/quic/efficient-transformers/pull/119) QEFFAutoModelForCausalLM model can be compiled for returning more than 1 logits during decode for TLM.
- [11/2024] Added support for [Meta-Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct), [Meta-Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) and [Meta-Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B)
- [09/2024] [AWQ](https://arxiv.org/abs/2306.00978)/[GPTQ](https://arxiv.org/abs/2210.17323) 4-bit quantized models are supported <br>
- [09/2024] Now we support [PEFT](https://huggingface.co/docs/peft/index) models
- [09/2024] Added support for [Gemma-2-Family](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315)

<details>
<summary>More</summary>

- [09/2024] Added support for [Gemma-2-Family](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315)<br>
- [09/2024] Added support for [CodeGemma-Family](https://huggingface.co/collections/google/codegemma-release-66152ac7b683e2667abdee11)
- [09/2024] Added support for [Gemma-Family](https://huggingface.co/collections/google/gemma-release-65d5efbccdbb8c4202ec078b)
- [09/2024] Added support for [Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B)
Expand All @@ -31,7 +37,7 @@
- [06/2024] Added support for [Vicuna-v1.5](https://huggingface.co/lmsys/vicuna-13b-v1.5)
- [05/2024] Added support for [Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) & [Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1).
- [04/2024] Initial release of [efficient transformers](https://github.com/quic/efficient-transformers) for seamless inference on pre-trained LLMs.

</details>

# Overview

Expand Down
1 change: 1 addition & 0 deletions docs/source/validate.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
| [Llama-3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B) |✔️ |
| [Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) |✔️ |
| [Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) |✔️ |
| [Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) |✔️ |
| [MPT-7b](https://huggingface.co/mosaicml/mpt-7b) |✔️ |
| [Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) |✔️ |
| [Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) |✔️ |
Expand Down

0 comments on commit 56cb807

Please sign in to comment.