meta-llama/Llama-3.3-70B-Instruct added as validated model (#196)

* Adding more validated models in docs Signed-off-by: Abukhoyer Shaik <[email protected]> * Sorted models Signed-off-by: Abukhoyer Shaik <[email protected]> * Llama-3.3-70B-instruct added in the validated model Signed-off-by: Abukhoyer Shaik <[email protected]> * Readme and docs string changed Signed-off-by: Abukhoyer Shaik <[email protected]> * Readme Modified Signed-off-by: Abukhoyer Shaik <[email protected]> --------- Signed-off-by: Abukhoyer Shaik <[email protected]>
quic · Dec 11, 2024 · 56cb807 · 56cb807
1 parent 1b1b71d
commit 56cb807
Show file tree

Hide file tree

Showing 3 changed files with 12 additions and 5 deletions.
diff --git a/QEfficient/transformers/models/modeling_auto.py b/QEfficient/transformers/models/modeling_auto.py
@@ -271,7 +271,7 @@ def compile(
             :onnx_path (str, optional): Path to pre-exported onnx model.
             :compile_dir (str, optional): Path for saving the qpc generated.
             :num_cores (int): Number of cores used to compile the model.
-            :num_devices (List[int]): Number of devices for tensor-slicing is invoked, defaults to None, and automatically chooses suitable device.
+            :num_devices (int): Number of devices the model needs to be compiled for. Defaults to 1.
             :batch_size (int, optional): Batch size. ``Defaults to 1``.
             :prefill_seq_len (int, optional): The length of the Prefill prompt should be less that ``prefill_seq_len``. ``Defaults to 32``.
             :ctx_len (int, optional): Maximum ``ctx`` that the compiled model can remember. ``Defaults to 128``.

diff --git a/README.md b/README.md
@@ -7,10 +7,16 @@
 
 *Latest news* :fire: <br>
 
-- [coming soon] Support for more popular [models](https://quic.github.io/efficient-transformers/source/validate.html#models-coming-soon) and inference optimization technique speculative decoding <br>
-- [09/2024] [AWQ](https://arxiv.org/abs/2306.00978)/[GPTQ](https://arxiv.org/abs/2210.17323) 4-bit quantized models are supported
+- [11/2024] [finite adapters support](https://github.com/quic/efficient-transformers/pull/153) allows mixed adapter usage for peft models.
+- [11/2024] [Speculative decoding TLM](https://github.com/quic/efficient-transformers/pull/119) QEFFAutoModelForCausalLM model can be compiled for returning more than 1 logits during decode for TLM.
+- [11/2024] Added support for [Meta-Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct), [Meta-Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) and [Meta-Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B)
+- [09/2024] [AWQ](https://arxiv.org/abs/2306.00978)/[GPTQ](https://arxiv.org/abs/2210.17323) 4-bit quantized models are supported <br>
 - [09/2024] Now we support [PEFT](https://huggingface.co/docs/peft/index) models
-- [09/2024] Added support for [Gemma-2-Family](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315)
+
+<details>
+<summary>More</summary>
+
+- [09/2024] Added support for [Gemma-2-Family](https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315)<br>
 - [09/2024] Added support for [CodeGemma-Family](https://huggingface.co/collections/google/codegemma-release-66152ac7b683e2667abdee11)
 - [09/2024] Added support for [Gemma-Family](https://huggingface.co/collections/google/gemma-release-65d5efbccdbb8c4202ec078b)
 - [09/2024] Added support for [Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B)
@@ -31,7 +37,7 @@
 - [06/2024] Added support for [Vicuna-v1.5](https://huggingface.co/lmsys/vicuna-13b-v1.5)
 - [05/2024] Added support for [Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) & [Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1).
 - [04/2024] Initial release of [efficient transformers](https://github.com/quic/efficient-transformers) for seamless inference on pre-trained LLMs.
-
+</details>
 
 # Overview
 

diff --git a/docs/source/validate.md b/docs/source/validate.md
@@ -29,6 +29,7 @@
 | [Llama-3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B) |✔️ |
 | [Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) |✔️ |
 | [Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) |✔️ |
+| [Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) |✔️ |
 | [MPT-7b](https://huggingface.co/mosaicml/mpt-7b) |✔️ |
 | [Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) |✔️ |
 | [Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) |✔️ |