Skip to content

Commit

Permalink
Merge pull request #158 from stanford-crfm/jonathan/0216-weekly-assets
Browse files Browse the repository at this point in the history
update changes
  • Loading branch information
rishibommasani authored Mar 29, 2024
2 parents e978cd2 + d436d3f commit a9aca13
Show file tree
Hide file tree
Showing 58 changed files with 453 additions and 212 deletions.
6 changes: 4 additions & 2 deletions assets/01ai.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@
diversity of responses.
access: open
license:
explanation: Model license can be found at https://huggingface.co/01-ai/Yi-34B/blob/main/LICENSE. Code license is under Apache 2.0
explanation: Model license can be found at https://huggingface.co/01-ai/Yi-34B/blob/main/LICENSE.
Code license is under Apache 2.0
value: custom
intended_uses: ''
prohibited_uses: none
Expand All @@ -46,7 +47,8 @@
quality_control: unknown
access: open
license:
explanation: Model license can be found at https://huggingface.co/01-ai/Yi-VL-34B/blob/main/LICENSE. Code license is under Apache 2.0
explanation: Model license can be found at https://huggingface.co/01-ai/Yi-VL-34B/blob/main/LICENSE.
Code license is under Apache 2.0
value: custom
intended_uses: ''
prohibited_uses: ''
Expand Down
6 changes: 4 additions & 2 deletions assets/ai21.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -298,12 +298,14 @@
- type: model
name: Jamba
organization: AI21 Labs
description: Jamba is a state-of-the-art, hybrid SSM-Transformer LLM. Jamba is the world’s first production-grade Mamba based model.
description: Jamba is a state-of-the-art, hybrid SSM-Transformer LLM. Jamba is
the world’s first production-grade Mamba based model.
created_date: 2024-03-28
url: https://www.ai21.com/blog/announcing-jamba
model_card: https://huggingface.co/ai21labs/Jamba-v0.1
modality: text; text
analysis: Jamba outperforms or matches other state-of-the-art models in its size class on a wide range of benchmarks.
analysis: Jamba outperforms or matches other state-of-the-art models in its size
class on a wide range of benchmarks.
size: 52B parameters (sparse)
dependencies: []
training_emissions: unknown
Expand Down
10 changes: 7 additions & 3 deletions assets/alibaba.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,8 @@
\ repository]](https://huggingface.co/Qwen)\n"
value: open
license:
explanation: Model license can be found at https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20LICENSE%20AGREEMENT. Code license is under Apache 2.0
explanation: Model license can be found at https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20LICENSE%20AGREEMENT.
Code license is under Apache 2.0
value: custom
intended_uses: ''
prohibited_uses: ''
Expand All @@ -87,12 +88,15 @@
- type: model
name: Qwen 1.5
organization: Qwen AI
description: Qwen 1.5 is the next iteration in their Qwen series, consisting of Transformer-based large language models pretrained on a large volume of data, including web texts, books, codes, etc.
description: Qwen 1.5 is the next iteration in their Qwen series, consisting of
Transformer-based large language models pretrained on a large volume of data,
including web texts, books, codes, etc.
created_date: 2024-02-04
url: https://qwenlm.github.io/blog/qwen1.5/
model_card: https://huggingface.co/Qwen/Qwen1.5-72B
modality: text; text
analysis: Evaluated on MMLU, C-Eval, GSM8K, MATH, HumanEval, MBPP, BBH, CMMLU, all standard English and Chinese benchmarks.
analysis: Evaluated on MMLU, C-Eval, GSM8K, MATH, HumanEval, MBPP, BBH, CMMLU,
all standard English and Chinese benchmarks.
size: 72B parameters (dense)
dependencies: []
training_emissions: unknown
Expand Down
18 changes: 12 additions & 6 deletions assets/anthropic.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -573,22 +573,28 @@
- type: model
name: Claude 3
organization: Anthropic
description: The Claude 3 model family is a collection of models which sets new industry benchmarks across a wide range of cognitive tasks.
description: The Claude 3 model family is a collection of models which sets new
industry benchmarks across a wide range of cognitive tasks.
created_date: 2024-03-04
url: https://www.anthropic.com/news/claude-3-family
model_card: https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf
modality: image, text; text
analysis: Evaluated on reasoning, math, coding, reading comprehension, and question answering, outperforming GPT-4 on standard benchmarks.
analysis: Evaluated on reasoning, math, coding, reading comprehension, and question
answering, outperforming GPT-4 on standard benchmarks.
size: unknown
dependencies: []
training_emissions: unknown
training_time: unknown
training_hardware: unknown
quality_control: Pre-trained on diverse dataset and aligned with Constitutional AI technique.
quality_control: Pre-trained on diverse dataset and aligned with Constitutional
AI technique.
access: limited
license: unknown
intended_uses: Claude models excel at open-ended conversation and collaboration on ideas, and also perform exceptionally well in coding tasks and when working with text - whether searching, writing, editing, outlining, or summarizing.
prohibited_uses: Prohibited uses include, but are not limited to, political campaigning or lobbying, surveillance, social scoring, criminal justice decisions, law enforcement, and decisions related to financing, employment, and housing.
intended_uses: Claude models excel at open-ended conversation and collaboration
on ideas, and also perform exceptionally well in coding tasks and when working
with text - whether searching, writing, editing, outlining, or summarizing.
prohibited_uses: Prohibited uses include, but are not limited to, political campaigning
or lobbying, surveillance, social scoring, criminal justice decisions, law enforcement,
and decisions related to financing, employment, and housing.
monitoring: ''
feedback: none

6 changes: 4 additions & 2 deletions assets/apple.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,14 @@
- type: model
name: MM1
organization: Apple
description: MM1 is a family of multimodal models, including both dense variants up to 30B and mixture-of-experts (MoE) variants up to 64B.
description: MM1 is a family of multimodal models, including both dense variants
up to 30B and mixture-of-experts (MoE) variants up to 64B.
created_date: 2024-03-16
url: https://arxiv.org/pdf/2403.09611.pdf
model_card: none
modality: image, text; text
analysis: Evaluated on image captioning and visual question answering across many benchmarks.
analysis: Evaluated on image captioning and visual question answering across many
benchmarks.
size: 30B parameters (dense)
dependencies: []
training_emissions: unknown
Expand Down
15 changes: 10 additions & 5 deletions assets/avignon.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,15 @@
- type: model
name: BioMistral
organization: Avignon University, Nantes University
description: BioMistral is an open-source Large Language Model tailored for the biomedical domain, utilizing Mistral as its foundation model and further pre-trained on PubMed Central.
description: BioMistral is an open-source Large Language Model tailored for the
biomedical domain, utilizing Mistral as its foundation model and further pre-trained
on PubMed Central.
created_date: 2024-02-15
url: https://arxiv.org/pdf/2402.10373.pdf
model_card: https://huggingface.co/BioMistral/BioMistral-7B
modality: text; text
analysis: BioMistral was evaluated on a benchmark comprising 10 established medical question-answering (QA) tasks in English and seven other languages.
analysis: BioMistral was evaluated on a benchmark comprising 10 established medical
question-answering (QA) tasks in English and seven other languages.
size: 7B parameters (dense)
dependencies: [Mistral, PubMed Central]
training_emissions: unknown
Expand All @@ -16,7 +19,9 @@
quality_control: ''
access: open
license: Apache 2.0
intended_uses: Research in the biomedical domain, especially for medical question-answering tasks.
prohibited_uses: Prohibited from deploying in production environments for natural language generation or any professional health and medical purposes.
intended_uses: Research in the biomedical domain, especially for medical question-answering
tasks.
prohibited_uses: Prohibited from deploying in production environments for natural
language generation or any professional health and medical purposes.
monitoring: ''
feedback: https://huggingface.co/BioMistral/BioMistral-7B/discussions
feedback: https://huggingface.co/BioMistral/BioMistral-7B/discussions
6 changes: 4 additions & 2 deletions assets/baai.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -147,12 +147,14 @@
- type: model
name: EVA-CLIP
organization: Beijing Academy of Artificial Intelligence, Tsinghua University
description: As of release, EVA-CLIP is the largest and most powerful open-source CLIP model to date, with 18 billion parameters.
description: As of release, EVA-CLIP is the largest and most powerful open-source
CLIP model to date, with 18 billion parameters.
created_date: 2024-02-06
url: https://arxiv.org/pdf/2402.04252.pdf
model_card: https://huggingface.co/BAAI/EVA-CLIP-8B-448
modality: image, text; text
analysis: Evaluated on zero-shot classification performance across multiple image classification benchmarks.
analysis: Evaluated on zero-shot classification performance across multiple image
classification benchmarks.
size: 18B parameters (dense)
dependencies: [CLIP]
training_emissions: unknown
Expand Down
1 change: 0 additions & 1 deletion assets/beitech.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,4 +43,3 @@
prohibited_uses: ''
monitoring: unknokwn
feedback: https://huggingface.co/GeneZC/MiniMA-3B/discussions

14 changes: 10 additions & 4 deletions assets/bigcode.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,9 @@
- type: model
name: StarCoder2
organization: BigCode
description: A 15 billion parameter language model trained on 600+ programming languages from The Stack v2. The training was carried out using the Fill-in-the-Middle objective on 4+ trillion tokens.
description: A 15 billion parameter language model trained on 600+ programming
languages from The Stack v2. The training was carried out using the Fill-in-the-Middle
objective on 4+ trillion tokens.
created_date: 2024-02-28
url: https://www.servicenow.com/company/media/press-room/huggingface-nvidia-launch-starcoder2.html
model_card: https://huggingface.co/bigcode/starcoder2-15b
Expand All @@ -85,10 +87,14 @@
training_emissions: unknown
training_time: unknown
training_hardware: 1024 x H100 GPUs
quality_control: The model was filtered for permissive licenses and code with no license only. A search index is provided to identify where generated code came from to apply the proper attribution.
quality_control: The model was filtered for permissive licenses and code with
no license only. A search index is provided to identify where generated code
came from to apply the proper attribution.
access: open
license: BigCode OpenRail-M
intended_uses: Intended to generate code snippets from given context, but not for writing actual functional code directly.
prohibited_uses: Should not be used as a way to write fully functioning code without modification or verification.
intended_uses: Intended to generate code snippets from given context, but not
for writing actual functional code directly.
prohibited_uses: Should not be used as a way to write fully functioning code without
modification or verification.
monitoring: unknown
feedback: https://huggingface.co/bigcode/starcoder2-15b/discussions
12 changes: 9 additions & 3 deletions assets/bytedance.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,16 @@
- type: model
name: SDXL-Lightning
organization: ByteDance
description: SDXL-Lightning is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images in a few steps. The models are distilled from stabilityai/stable-diffusion-xl-base-1.0. This repository contains checkpoints for 1-step, 2-step, 4-step, and 8-step distilled models.
description: SDXL-Lightning is a lightning-fast text-to-image generation model.
It can generate high-quality 1024px images in a few steps. The models are distilled
from stabilityai/stable-diffusion-xl-base-1.0. This repository contains checkpoints
for 1-step, 2-step, 4-step, and 8-step distilled models.
created_date: 2024-02-21
url: https://arxiv.org/pdf/2402.13929.pdf
model_card: https://huggingface.co/ByteDance/SDXL-Lightning
modality: text; image
analysis: Evaluated via qualitative comparison relative to other SoTA image generation models.
analysis: Evaluated via qualitative comparison relative to other SoTA image generation
models.
size: unknown
dependencies: [Stable Diffusion XL]
training_emissions: unknown
Expand All @@ -39,7 +43,9 @@
quality_control: unknown
access: open
license: OpenRail++
intended_uses: The model can be used for fast, high-quality text-to-image generation. It supports 1-step, 2-step, 4-step, and 8-step distilled models which provide varying generation quality.
intended_uses: The model can be used for fast, high-quality text-to-image generation.
It supports 1-step, 2-step, 4-step, and 8-step distilled models which provide
varying generation quality.
prohibited_uses: unknown
monitoring: unknown
feedback: https://huggingface.co/ByteDance/SDXL-Lightning/discussions
15 changes: 11 additions & 4 deletions assets/cagliostro.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,10 @@
- type: model
name: Animagine XL 3.1
organization: Cagliostro Research Lab
description: An open-source, anime-themed text-to-image model enhanced to generate higher quality anime-style images with a broader range of characters from well-known anime series, an optimized dataset, and new aesthetic tags for better image creation.
description: An open-source, anime-themed text-to-image model enhanced to generate
higher quality anime-style images with a broader range of characters from well-known
anime series, an optimized dataset, and new aesthetic tags for better image
creation.
created_date: 2024-03-18
url: https://cagliostrolab.net/posts/animagine-xl-v31-release
model_card: https://huggingface.co/cagliostrolab/animagine-xl-3.1
Expand All @@ -13,10 +16,14 @@
training_emissions: unknown
training_time: Approximately 15 days, totaling over 350 GPU hours.
training_hardware: 2x A100 80GB GPUs
quality_control: The model undergoes pretraining, first stage finetuning, and second stage finetuning for refining and improving aspects such as hand and anatomy rendering.
quality_control: The model undergoes pretraining, first stage finetuning, and
second stage finetuning for refining and improving aspects such as hand and
anatomy rendering.
access: open
license: Fair AI Public License 1.0-SD
intended_uses: Generating high-quality anime images from textual prompts. Useful for anime fans, artists, and content creators.
prohibited_uses: Not suitable for creating realistic photos or for users who expect high-quality results from short or simple prompts.
intended_uses: Generating high-quality anime images from textual prompts. Useful
for anime fans, artists, and content creators.
prohibited_uses: Not suitable for creating realistic photos or for users who expect
high-quality results from short or simple prompts.
monitoring: unknown
feedback: https://huggingface.co/cagliostrolab/animagine-xl-3.1/discussions
8 changes: 5 additions & 3 deletions assets/causallm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
- type: model
name: CausalLM
organization: CausalLM
description: CausalLM is an LLM based on the model weights of Qwen and trained on a model architecture identical to LLaMA 2.
description: CausalLM is an LLM based on the model weights of Qwen and trained
on a model architecture identical to LLaMA 2.
created_date: 2023-10-21
url: https://huggingface.co/CausalLM/14B
model_card: https://huggingface.co/CausalLM/14B
Expand All @@ -15,8 +16,9 @@
training_hardware: unknown
quality_control: ''
access: open
license:
explanation: can be found at https://github.com/rpherrera/WTFPL (HuggingFace lists this to be the license)
license:
explanation: can be found at https://github.com/rpherrera/WTFPL (HuggingFace
lists this to be the license)
value: WTFPL
intended_uses: ''
prohibited_uses: ''
Expand Down
9 changes: 6 additions & 3 deletions assets/cerebras.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -117,12 +117,14 @@
- type: model
name: Bittensor Language Model
organization: Cerebras
description: Bittensor Language Model is a 3 billion parameter language model with an 8k context length trained on 627B tokens of SlimPajama.
description: Bittensor Language Model is a 3 billion parameter language model
with an 8k context length trained on 627B tokens of SlimPajama.
created_date: 2023-07-24
url: https://www.cerebras.net/blog/btlm-3b-8k-7b-performance-in-a-3-billion-parameter-model/
model_card: https://huggingface.co/cerebras/btlm-3b-8k-base
modality: text; text
analysis: Evaluated on standard LLM benchmarks in comparison to similar-sized models.
analysis: Evaluated on standard LLM benchmarks in comparison to similar-sized
models.
size: 3B parameters (dense)
dependencies: [SlimPajama]
training_emissions: unknown
Expand All @@ -138,7 +140,8 @@
- type: dataset
name: SlimPajama
organization: Cerebras
description: As of release, SlimPajama is the largest extensively deduplicated, multi-corpora, open-source dataset for training large language models.
description: As of release, SlimPajama is the largest extensively deduplicated,
multi-corpora, open-source dataset for training large language models.
created_date: 2023-06-09
url: https://huggingface.co/datasets/cerebras/SlimPajama-627B
datasheet: https://huggingface.co/datasets/cerebras/SlimPajama-627B
Expand Down
3 changes: 2 additions & 1 deletion assets/cmu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,8 @@
- type: model
name: Moment
organization: Carnegie Mellon University, University of Pennsylvania
description: Moment is a family of open-source foundation models for general-purpose time-series analysis.
description: Moment is a family of open-source foundation models for general-purpose
time-series analysis.
created_date: 2024-02-06
url: https://arxiv.org/pdf/2402.03885.pdf
model_card: none
Expand Down
3 changes: 2 additions & 1 deletion assets/cognition.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@
url: https://www.cognition-labs.com/introducing-devin
model_card: none
modality: text; code
analysis: Evaluated on SWE-Bench, a challenging software engineering benchmark, where Devin outperforms major state of the art models unassisted.
analysis: Evaluated on SWE-Bench, a challenging software engineering benchmark,
where Devin outperforms major state of the art models unassisted.
size: unknown
dependencies: []
training_emissions: unknown
Expand Down
7 changes: 4 additions & 3 deletions assets/cognitive.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,11 @@
- type: model
name: WizardLM Uncensored
organization: Cognitive Computations
description: WizardLM Uncensored is WizardLM trained with a subset of the dataset - responses that contained alignment / moralizing were removed.
description: WizardLM Uncensored is WizardLM trained with a subset of the dataset
- responses that contained alignment / moralizing were removed.
created_date:
explanation: release date is not published; estimated to be sometime in either May or June 2023.
explanation: release date is not published; estimated to be sometime in either
May or June 2023.
value: 2023-06-01
url: https://huggingface.co/cognitivecomputations/WizardLM-30B-Uncensored
model_card: https://huggingface.co/cognitivecomputations/WizardLM-30B-Uncensored
Expand All @@ -45,4 +47,3 @@
prohibited_uses: ''
monitoring: unknown
feedback: https://huggingface.co/cognitivecomputations/WizardLM-30B-Uncensored/discussions

Loading

0 comments on commit a9aca13

Please sign in to comment.