Skip to content

Commit

Permalink
Merge pull request #224 from stanford-crfm/revert-222-revert-217-jona…
Browse files Browse the repository at this point in the history
…than/090524-monthly-assets

Revert "Revert "add notable summer assets""
  • Loading branch information
jxue16 authored Sep 28, 2024
2 parents e62030f + d4d087a commit 177127f
Show file tree
Hide file tree
Showing 24 changed files with 1,022 additions and 32 deletions.
8 changes: 5 additions & 3 deletions assets/360.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,14 @@
- type: model
name: 360 Zhinao
organization: 360 Security
description: 360 Zhinao is a multilingual LLM in Chinese and English with chat capabilities.
description: 360 Zhinao is a multilingual LLM in Chinese and English with chat
capabilities.
created_date: 2024-05-23
url: https://arxiv.org/pdf/2405.13386
model_card: none
modality: text; text
analysis: Achieved competitive performance on relevant benchmarks against other 7B models in Chinese, English, and coding tasks.
analysis: Achieved competitive performance on relevant benchmarks against other
7B models in Chinese, English, and coding tasks.
size: 7B parameters
dependencies: []
training_emissions: unknown
Expand All @@ -19,4 +21,4 @@
intended_uses: ''
prohibited_uses: ''
monitoring: ''
feedback: none
feedback: none
33 changes: 33 additions & 0 deletions assets/ai21.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -318,3 +318,36 @@
prohibited_uses: ''
monitoring: ''
feedback: https://huggingface.co/ai21labs/Jamba-v0.1/discussions
- type: model
name: Jamba 1.5
organization: AI21
description: A family of models that demonstrate superior long context handling,
speed, and quality. Built on a novel SSM-Transformer architecture, they surpass
other models in their size class. These models are useful for enterprise applications,
such as lengthy document summarization and analysis. The Jamba 1.5 family also
includes the longest context window, at 256K, among open models. They are fast,
quality-focused, and handle long contexts efficiently.
created_date: 2024-08-22
url: https://www.ai21.com/blog/announcing-jamba-model-family
model_card: unknown
modality: text; text
analysis: The models were evaluated based on their ability to handle long contexts,
speed, and quality. They outperformed competitors in their size class, scoring
high on the Arena Hard benchmark.
size: 94B parameters
dependencies: []
training_emissions: Unknown
training_time: Unknown
training_hardware: For speed comparisons, Jamba 1.5 Mini used 2xA100 80GB GPUs,
and Jamba 1.5 Large used 8xA100 80GB GPUs.
quality_control: The models were evaluated on the Arena Hard benchmark. For maintaining
long context performance, they were tested on the RULER benchmark.
access: open
license: Jamba Open Model License
intended_uses: The models are built for enterprise scale AI applications. They
are purpose-built for efficiency, speed, and ability to solve critical tasks
that businesses care about, such as lengthy document summarization and analysis.
They can also be used for RAG and agentic workflows.
prohibited_uses: Unknown
monitoring: Unknown
feedback: Unknown
36 changes: 36 additions & 0 deletions assets/aleph_alpha.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -99,3 +99,39 @@
prohibited_uses: ''
monitoring: ''
feedback: ''
- type: model
name: Pharia-1-LLM-7B
organization: Aleph Alpha
description: Pharia-1-LLM-7B is a model that falls within the Pharia-1-LLM model
family. It is designed to deliver short, controlled responses that match the
performance of leading open-source models around 7-8 billion parameters. The
model is culturally and linguistically tuned for German, French, and Spanish
languages. It is trained on carefully curated data in line with relevant EU
and national regulations. The model shows improved token efficiency and is particularly
effective in domain-specific applications, especially in the automotive and
engineering industries. It can also be aligned to user preferences, making it
appropriate for critical applications without the risk of shut-down behaviour.
created_date: 2024-09-08
url: https://aleph-alpha.com/introducing-pharia-1-llm-transparent-and-compliant/#:~:text=Pharia%2D1%2DLLM%2D7B
model_card: unknown
modality: text; text
analysis: Extensive evaluations were done with ablation experiments performed
on pre-training benchmarks such as lambada, triviaqa, hellaswag, winogrande,
webqs, arc, and boolq. Direct comparisons were also performed with applications
like GPT and Llama 2.
size: 7B parameters
dependencies: []
training_emissions: Unknown
training_time: Unknown
training_hardware: Unknown
quality_control: The model comes with additional safety guardrails via alignment
methods to ensure safe usage. Training data is carefully curated to ensure compliance
with EU and national regulations.
access: open
license: Aleph Open
intended_uses: The model is intended for use in domain-specific applications,
particularly in the automotive and engineering industries. It can also be tailored
to user preferences.
prohibited_uses: Unknown
monitoring: Unknown
feedback: Feedback can be sent to [email protected].
39 changes: 39 additions & 0 deletions assets/anthropic.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -598,3 +598,42 @@
and decisions related to financing, employment, and housing.
monitoring: ''
feedback: none
- type: model
name: Claude 3.5 Sonnet
organization: Anthropic
description: Claude 3.5 Sonnet is an AI model with advanced understanding and
generation abilities in text, vision, and code. It sets new industry benchmarks
for graduate-level reasoning (GPQA), undergrad-level knowledge (MMLU), coding
proficiency (HumanEval), and visual reasoning. The model operates at twice the
speed of its predecessor, Claude 3 Opus, and is designed to tackle tasks like
context-sensitive customer support, orchestrating multi-step workflows, interpreting
charts and graphs, and transcribing text from images.
created_date: 2024-06-21
url: https://www.anthropic.com/news/claude-3-5-sonnet
model_card: unknown
modality: text; image, text
analysis: The model has been evaluated on a range of tests including graduate-level
reasoning (GPQA), undergraduate-level knowledge (MMLU), coding proficiency (HumanEval),
and standard vision benchmarks. In an internal agentic coding evaluation, Claude
3.5 Sonnet solved 64% of problems, outperforming the previous version, Claude
3 Opus, which solved 38%.
size: Unknown
dependencies: []
training_emissions: Unknown
training_time: Unknown
training_hardware: Unknown
quality_control: The model underwent a red-teaming assessment, and has been tested
and refined by external experts. It was also provided to the UK's AI Safety
Institute (UK AISI) for a pre-deployment safety evaluation.
access: open
license: unknown
intended_uses: The model is intended for complex tasks such as context-sensitive
customer support, orchestrating multi-step workflows, interpreting charts and
graphs, transcribing text from images, as well as writing, editing, and executing
code.
prohibited_uses: Misuse of the model is discouraged though specific use cases
are not mentioned.
monitoring: Unknown of misuse, and policy feedback from external experts has been
integrated to ensure robustness of evaluations.
feedback: Feedback on Claude 3.5 Sonnet can be submitted directly in-product to
inform the development roadmap and improve user experience.
34 changes: 34 additions & 0 deletions assets/aspia_space,_institu.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
---
- type: model
name: AstroPT
organization: Aspia Space, Instituto de Astrofísica de Canarias (IAC), UniverseTBD,
Astrophysics Research Institute, Liverpool John Moores University, Departamento
Astrofísica, Universidad de la Laguna, Observatoire de Paris, LERMA, PSL University,
and Universit´e Paris-Cit´e.
description: AstroPT is an autoregressive pretrained transformer developed with
astronomical use-cases in mind. The models have been pretrained on 8.6 million
512x512 pixel grz-band galaxy postage stamp observations from the DESI Legacy
Survey DR8. They have created a range of models with varying complexity, ranging
from 1 million to 2.1 billion parameters.
created_date: 2024-09-08
url: https://arxiv.org/pdf/2405.14930v1
model_card: unknown
modality: image; image
analysis: The models’ performance on downstream tasks was evaluated by linear
probing. The models follow a similar saturating log-log scaling law to textual
models, their performance improves with the increase in model size up to the
saturation point of parameters.
size: 2.1B parameters
dependencies: [DESI Legacy Survey DR8]
training_emissions: Unknown
training_time: Unknown
training_hardware: Unknown
quality_control: The models’ performances were evaluated on downstream tasks as
measured by linear probing.
access: open
license: MIT
intended_uses: The models are intended for astronomical use-cases, particularly
in handling and interpreting large observation data from astronomical sources.
prohibited_uses: Unknown
monitoring: Unknown
feedback: Any problem with the model can be reported to Michael J. Smith at [email protected].
14 changes: 11 additions & 3 deletions assets/cartesia.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,18 @@
- type: model
name: Sonic
organization: Cartesia
description: Sonic is a low-latency voice model that generates lifelike speech. Developed by Cartesia, it was designed to be an efficient real-time AI capable of processing any-sized contexts and running on any device.
description: Sonic is a low-latency voice model that generates lifelike speech.
Developed by Cartesia, it was designed to be an efficient real-time AI capable
of processing any-sized contexts and running on any device.
created_date: 2024-05-29
url: https://cartesia.ai/blog/sonic
model_card: none
modality: text; audio
analysis: Extensive testing on Multilingual Librispeech dataset resulted in 20% lower validation perplexity. In downstream evaluations, this leads to a 2x lower word error rate and a 1 point higher quality score. Sonic also displays impressive performance metrics at inference, achieving lower latency (1.5x lower time-to-first-audio), faster inference speed (2x lower real-time factor), and higher throughput (4x).
analysis: Extensive testing on Multilingual Librispeech dataset resulted in 20%
lower validation perplexity. In downstream evaluations, this leads to a 2x lower
word error rate and a 1 point higher quality score. Sonic also displays impressive
performance metrics at inference, achieving lower latency (1.5x lower time-to-first-audio),
faster inference speed (2x lower real-time factor), and higher throughput (4x).
size: 2024-05-29
dependencies: [Multilingual Librispeech dataset]
training_emissions: unknown
Expand All @@ -16,7 +22,9 @@
quality_control: ''
access: limited
license: unknown
intended_uses: Sonic has potential applications across customer support, entertainment, and content creation and is a part of Cartesias broader mission to bring real-time multimodal intelligence to every device.
intended_uses: Sonic has potential applications across customer support, entertainment,
and content creation and is a part of Cartesias broader mission to bring real-time
multimodal intelligence to every device.
prohibited_uses: unknown
monitoring: unknown
feedback: Contact through the provided form or via email at [email protected].
64 changes: 51 additions & 13 deletions assets/deepmind.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -687,28 +687,50 @@
- type: model
name: Imagen 3
organization: Google DeepMind
description: Imagen 3 is a high-quality text-to-image model, capable of generating images with better detail, richer lighting, and fewer distracting artifacts compared to previous models. Improved understanding of prompts allows for a wide range of visual styles and captures small details from longer prompts. It also understands prompts written in natural, everyday language, making it easier to use. Imagen 3 is available in multiple versions, optimized for different types of tasks, from generating quick sketches to high-resolution images.
description: Imagen 3 is a high-quality text-to-image model, capable of generating
images with better detail, richer lighting, and fewer distracting artifacts
compared to previous models. Improved understanding of prompts allows for a
wide range of visual styles and captures small details from longer prompts.
It also understands prompts written in natural, everyday language, making it
easier to use. Imagen 3 is available in multiple versions, optimized for different
types of tasks, from generating quick sketches to high-resolution images.
created_date: 2024-05-14
url: https://deepmind.google/technologies/imagen-3/
model_card: none
modality: text; image
analysis: The model was tested and evaluated on various prompts to assess its understanding of natural language, its ability to generate high-quality images in various formats and styles and generate fine details and complex textures. Red teaming and evaluations were conducted on topics including fairness, bias, and content safety.
analysis: The model was tested and evaluated on various prompts to assess its
understanding of natural language, its ability to generate high-quality images
in various formats and styles and generate fine details and complex textures.
Red teaming and evaluations were conducted on topics including fairness, bias,
and content safety.
size: unknown
dependencies: []
training_emissions: unknown
training_time: unknown
training_hardware: unknown
quality_control: Extensive filtering and data labeling were used to minimize harmful content in datasets and reduce the likelihood of harmful outputs. Privacy, safety, and security technologies were leveraged in deploying the model, including watermarking tool SynthID.
quality_control: Extensive filtering and data labeling were used to minimize harmful
content in datasets and reduce the likelihood of harmful outputs. Privacy, safety,
and security technologies were leveraged in deploying the model, including watermarking
tool SynthID.
access: limited
license: unknown
intended_uses: Generate high-quality images for various purposes, from photorealistic landscapes to textured oil paintings or whimsical claymation scenes. It is useful in situations where detailed visual representation is required based on the textual description.
intended_uses: Generate high-quality images for various purposes, from photorealistic
landscapes to textured oil paintings or whimsical claymation scenes. It is useful
in situations where detailed visual representation is required based on the
textual description.
prohibited_uses: unknown
monitoring: Through digital watermarking tool SynthID embedded in pixels for detection and identification.
monitoring: Through digital watermarking tool SynthID embedded in pixels for detection
and identification.
feedback: unknown
- type: model
name: Veo
organization: Google DeepMind
description: Veo is Google DeepMind's most capable video generation model to date. It generates high-quality, 1080p resolution videos that can go beyond a minute, in a wide range of cinematic and visual styles. It accurately captures the nuance and tone of a prompt, and provides an unprecedented level of creative control. The model is also capable of maintaining visual consistency in video frames, and supports masked editing.
description: Veo is Google DeepMind's most capable video generation model to date.
It generates high-quality, 1080p resolution videos that can go beyond a minute,
in a wide range of cinematic and visual styles. It accurately captures the nuance
and tone of a prompt, and provides an unprecedented level of creative control.
The model is also capable of maintaining visual consistency in video frames,
and supports masked editing.
created_date: 2024-05-14
url: https://deepmind.google/technologies/veo/
model_card: none
Expand All @@ -719,31 +741,47 @@
training_emissions: unknown
training_time: unknown
training_hardware: unknown
quality_control: Videos created by Veo are watermarked using SynthID, DeepMinds tool for watermarking and identifying AI-generated content, and passed through safety filters and memorization checking processes to mitigate privacy, copyright and bias risks.
quality_control: Videos created by Veo are watermarked using SynthID, DeepMinds
tool for watermarking and identifying AI-generated content, and passed through
safety filters and memorization checking processes to mitigate privacy, copyright
and bias risks.
access: closed
license: unknown
intended_uses: Veo is intended to help create tools that make video production accessible to everyone. It can be used by filmmakers, creators, or educators for storytelling, education and more. Some of its features will be also brought to products like YouTube Shorts.
intended_uses: Veo is intended to help create tools that make video production
accessible to everyone. It can be used by filmmakers, creators, or educators
for storytelling, education and more. Some of its features will be also brought
to products like YouTube Shorts.
prohibited_uses: unknown
monitoring: unknown
feedback: Feedback from leading creators and filmmakers is incorporated to improve Veo's generative video technologies.
feedback: Feedback from leading creators and filmmakers is incorporated to improve
Veo's generative video technologies.
- type: model
name: Gemini 1.5 Flash
organization: Google DeepMind
description: Gemini Flash is a lightweight model, optimized for speed and efficiency. It features multimodal reasoning and a breakthrough long context window of up to one million tokens. It's designed to serve at scale and is efficient on cost, providing quality results at a fraction of the cost of larger models.
description: Gemini Flash is a lightweight model, optimized for speed and efficiency.
It features multimodal reasoning and a breakthrough long context window of up
to one million tokens. It's designed to serve at scale and is efficient on cost,
providing quality results at a fraction of the cost of larger models.
created_date: 2024-05-30
url: https://deepmind.google/technologies/gemini/flash/
model_card: none
modality: audio, image, text, video; text
analysis: The model was evaluated on various benchmarks like General MMLU, Code Natural2Code, MATH, GPQA, Big-Bench, WMT23, MMMU, and MathVista providing performance across various domains like multilingual translation, image processing, and code generation.
analysis: The model was evaluated on various benchmarks like General MMLU, Code
Natural2Code, MATH, GPQA, Big-Bench, WMT23, MMMU, and MathVista providing performance
across various domains like multilingual translation, image processing, and
code generation.
size: unknown
dependencies: []
training_emissions: unknown
training_time: unknown
training_hardware: unknown
quality_control: The research team is continually exploring new ideas at the frontier of AI and building innovative products for consistent progress.
quality_control: The research team is continually exploring new ideas at the frontier
of AI and building innovative products for consistent progress.
access: limited
license: Googles Terms and Conditions
intended_uses: The model is intended for developer and enterprise use cases. It can process hours of video and audio, and hundreds of thousands of words or lines of code, making it beneficial for a wide range of tasks.
intended_uses: The model is intended for developer and enterprise use cases. It
can process hours of video and audio, and hundreds of thousands of words or
lines of code, making it beneficial for a wide range of tasks.
prohibited_uses: ''
monitoring: unknown
feedback: none
Loading

0 comments on commit 177127f

Please sign in to comment.