pull from main and add new phi3 function calling benchmark

rubra-ai · Jul 9, 2024 · 5ea9b0e · 5ea9b0e
2 parents c187cfe + eb30d06
commit 5ea9b0e
Show file tree

Hide file tree

Showing 4 changed files with 35 additions and 6 deletions.
diff --git a/README.md b/README.md
@@ -29,10 +29,27 @@ Try out the models immediately without downloading anything in Our [Huggingface
 
 ## Run Rubra Models Locally
 
+Check out our [documentation](https://docs.rubra.ai/category/serving--inferencing) to learn how to run Rubra models locally.
 We extend the following inferencing tools to run Rubra models in an OpenAI-compatible tool-calling format for local use:
 
-- [llama.cpp](https://github.com/ggerganov/llama.cpp)
-- [vllm](https://github.com/vllm-project/vllm)
+- [llama.cpp](https://github.com/rubra-ai/tools.cpp)
+- [vLLM](https://github.com/rubra-ai/vllm)
+
+**Note**: Llama3 models, including the 8B and 70B variants, are known to experience increased perplexity and a subsequent degradation in function-calling performance as a result of quantization. We recommend serving them with either vLLM or using the fp16 quantization.
+
+## Benchmark
+
+View full benchmark results for Rubra models and other models here: https://docs.rubra.ai/benchmark
+
+| Model                                                     | Function Calling | MMLU (5-shot) | GPQA (0-shot) | GSM-8K (8-shot, CoT) | MATH (4-shot, CoT) | MT-bench |
+|-----------------------------------------------------------|------------------|---------------|---------------|----------------------|--------------------|----------|
+| [**Rubra Llama-3 70B Instruct**](https://huggingface.co/rubra-ai/Meta-Llama-3-70B-Instruct)       | 97.85%           | 75.90         | 33.93         | 82.26                | 34.24              | 8.36     |
+| [**Rubra Llama-3 8B Instruct**](https://huggingface.co/rubra-ai/Meta-Llama-3-8B-Instruct)        | 89.28%           | 64.39         | 31.70         | 68.99                | 23.76              | 8.03     |
+| [**Rubra Qwen2 7B Instruct**](https://huggingface.co/rubra-ai/Qwen2-7B-Instruct)                 | 85.71%           | 68.88         | 30.36         | 75.82                | 28.72              | 8.08     |
+| [**Rubra Mistral 7B Instruct v0.3**](https://huggingface.co/rubra-ai/Mistral-7B-Instruct-v0.3)   | 73.57%           | 59.12         | 29.91         | 43.29                | 11.14              | 7.69     |
+| [**Rubra Phi-3 Mini 128k Instruct**](https://huggingface.co/rubra-ai/Phi-3-mini-128k-instruct)   | 70.00%           | 66.66         | 29.24         | 74.09                | 26.84              | 7.45     |
+| [**Rubra Mistral 7B Instruct v0.2**](https://huggingface.co/rubra-ai/Mistral-7B-Instruct-v0.2)   | 69.28%           | 58.90         | 29.91         | 34.12                | 8.36               | 7.36     |
+| [**Rubra Gemma-1.1 2B Instruct**](https://huggingface.co/rubra-ai/gemma-1.1-2b-it)               | 45.00%           | 38.85         | 24.55         | 6.14                 | 2.38               | 5.75     |
 
 ## Contributing
 

diff --git a/docs/docs/README.md b/docs/docs/README.md
@@ -38,8 +38,10 @@ Try out the models immediately without downloading anything in [Huggingface Spac
 
 We extend the following inferencing tools to run Rubra models in an OpenAI-compatible tool-calling format for local use:
 
-- [llama.cpp](https://github.com/ggerganov/llama.cpp)
-- [vllm](https://github.com/vllm-project/vllm)
+- [llama.cpp](https://github.com/rubra-ai/tools.cpp)
+- [vLLM](https://github.com/rubra-ai/vllm)
+
+**Note**: Llama3 models, including the 8B and 70B variants, are known to experience increased perplexity and a subsequent degradation in function-calling performance as a result of quantization. We recommend serving them with either vLLM or using the fp16 quantization.
 
 ## Contributing
 

diff --git a/docs/docs/models/llama.md b/docs/docs/models/llama.md
@@ -86,6 +86,7 @@ We recommend this model for complex tool calling scenarios, but users should be
 
 - [rubra-ai/Meta-Llama-3-70B-Instruct](https://huggingface.co/rubra-ai/Meta-Llama-3-70B-Instruct)
 - [rubra-ai/Meta-Llama-3-70B-Instruct-GGUF](https://huggingface.co/rubra-ai/Meta-Llama-3-70B-Instruct-GGUF)
+- [rubra-ai/Meta-Llama-3-70B-Instruct-AWQ](https://huggingface.co/rubra-ai/Meta-Llama-3-70B-Instruct-AWQ)
 
 <table>
   <thead>
@@ -140,4 +141,4 @@ We recommend this model for complex tool calling scenarios, but users should be
       <td>0.40625</td>
     </tr>
   </tbody>
-</table>
+</table>
diff --git a/docs/src/components/BenchmarkTable.js b/docs/src/components/BenchmarkTable.js
@@ -126,7 +126,7 @@ const data = [
     {
         model: 'Rubra Phi-3 Mini 128k Instruct',
         params: 4.27,
-        functionCalling: '65.71%',
+        functionCalling: '70.00%',
         mmlu: '66.66',
         gpqa: '29.24',
         gsm8k: '74.09',
@@ -201,6 +201,15 @@ const data = [
         math: '-',
         mtBench:'-',
     },
+    {
+        model: 'Mistral Large (closed-source)',
+        functionCalling: '48.60%',
+        mmlu: '-',
+        gpqa: '-',
+        gsm8k: '91.21',
+        math: '45.0',
+        mtBench:'-',
+    },
     {
         model: 'meetkai/functionary-medium-v3.0',
         params: 70.6,