Skip to content

Commit

Permalink
Benchmarking january results. (#2189)
Browse files Browse the repository at this point in the history
* Benchmarking january results.

* Update to add MFE job definition files.

* Fix phi-2 paths.

* Update phi-2 model directory.

* Fix boolq phi-2 results path.

---------

Co-authored-by: Alex Kalita <[email protected]>
  • Loading branch information
arun-rajora and Alex Kalita authored Jan 30, 2024
1 parent 3913d15 commit e8440dd
Show file tree
Hide file tree
Showing 363 changed files with 33,027 additions and 141 deletions.
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
type: evaluationresult
name: boolq_gpt_35_turbo_0301_question_answering
version: 1.0.1
version: 1.0.2
display_name: boolq_gpt_35_turbo_0301_question_answering
description: gpt-35-turbo-0301 run for boolq dataset
dataset_family: boolq
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
type: evaluationresult
name: boolq_gpt_35_turbo_0613_question_answering
version: 1.0.1
version: 1.0.2
display_name: boolq_gpt_35_turbo_0613_question_answering
description: gpt-35-turbo-0613 run for boolq dataset
dataset_family: boolq
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
type: evaluationresult
name: boolq_gpt_4_0314_question_answering
version: 1.0.1
version: 1.0.2
display_name: boolq_gpt_4_0314_question_answering
description: gpt-4-0314 run for boolq dataset
dataset_family: boolq
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
type: evaluationresult
name: boolq_gpt_4_0613_question_answering
version: 1.0.1
version: 1.0.2
display_name: boolq_gpt_4_0613_question_answering
description: gpt-4-0613 run for boolq dataset
dataset_family: boolq
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
type: evaluationresult
name: boolq_gpt_4_32k_0314_question_answering
version: 1.0.1
version: 1.0.2
display_name: boolq_gpt_4_32k_0314_question_answering
description: gpt-4-32k-0314 run for boolq dataset
dataset_family: boolq
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
type: evaluationresult
name: boolq_gpt_4_32k_0613_question_answering
version: 1.0.1
version: 1.0.2
display_name: boolq_gpt_4_32k_0613_question_answering
description: gpt-4-32k-0613 run for boolq dataset
dataset_family: boolq
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
type: evaluationresult
name: boolq_llama_2_13b_chat_question_answering
version: 1.0.1
version: 1.0.2
display_name: boolq_llama_2_13b_chat_question_answering
description: llama-2-13b-chat run for boolq dataset
dataset_family: boolq
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
type: evaluationresult
name: boolq_llama_2_13b_question_answering
version: 1.0.1
version: 1.0.2
display_name: boolq_llama_2_13b_question_answering
description: llama-2-13b run for boolq dataset
dataset_family: boolq
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
type: evaluationresult
name: boolq_llama_2_70b_chat_question_answering
version: 1.0.1
version: 1.0.2
display_name: boolq_llama_2_70b_chat_question_answering
description: llama-2-70b-chat run for boolq dataset
dataset_family: boolq
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
type: evaluationresult
name: boolq_llama_2_70b_question_answering
version: 1.0.1
version: 1.0.2
display_name: boolq_llama_2_70b_question_answering
description: llama-2-70b run for boolq dataset
dataset_family: boolq
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
type: evaluationresult
name: boolq_llama_2_7b_chat_question_answering
version: 1.0.1
version: 1.0.2
display_name: boolq_llama_2_7b_chat_question_answering
description: llama-2-7b-chat run for boolq dataset
dataset_family: boolq
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
type: evaluationresult
name: boolq_llama_2_7b_question_answering
version: 1.0.1
version: 1.0.2
display_name: boolq_llama_2_7b_question_answering
description: llama-2-7b run for boolq dataset
dataset_family: boolq
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
type: evaluationresult
spec: spec.yaml
categories: ["EvaluationResult"]
Loading

0 comments on commit e8440dd

Please sign in to comment.