Merge pull request #224 from stanford-crfm/revert-222-revert-217-jona…

…than/090524-monthly-assets Revert "Revert "add notable summer assets""
stanford-crfm · Sep 28, 2024 · 177127f · 177127f
2 parents e62030f + d4d087a
commit 177127f
Show file tree

Hide file tree

Showing 24 changed files with 1,022 additions and 32 deletions.
diff --git a/assets/360.yaml b/assets/360.yaml
@@ -2,12 +2,14 @@
 - type: model
   name: 360 Zhinao
   organization: 360 Security
-  description: 360 Zhinao is a multilingual LLM in Chinese and English with chat capabilities.
+  description: 360 Zhinao is a multilingual LLM in Chinese and English with chat
+    capabilities.
   created_date: 2024-05-23
   url: https://arxiv.org/pdf/2405.13386
   model_card: none
   modality: text; text
-  analysis: Achieved competitive performance on relevant benchmarks against other 7B models in Chinese, English, and coding tasks.
+  analysis: Achieved competitive performance on relevant benchmarks against other
+    7B models in Chinese, English, and coding tasks.
   size: 7B parameters
   dependencies: []
   training_emissions: unknown
@@ -19,4 +21,4 @@
   intended_uses: ''
   prohibited_uses: ''
   monitoring: ''
-  feedback: none
+  feedback: none
diff --git a/assets/ai21.yaml b/assets/ai21.yaml
@@ -318,3 +318,36 @@
   prohibited_uses: ''
   monitoring: ''
   feedback: https://huggingface.co/ai21labs/Jamba-v0.1/discussions
+- type: model
+  name: Jamba 1.5
+  organization: AI21
+  description: A family of models that demonstrate superior long context handling,
+    speed, and quality. Built on a novel SSM-Transformer architecture, they surpass
+    other models in their size class. These models are useful for enterprise applications,
+    such as lengthy document summarization and analysis. The Jamba 1.5 family also
+    includes the longest context window, at 256K, among open models. They are fast,
+    quality-focused, and handle long contexts efficiently.
+  created_date: 2024-08-22
+  url: https://www.ai21.com/blog/announcing-jamba-model-family
+  model_card: unknown
+  modality: text; text
+  analysis: The models were evaluated based on their ability to handle long contexts,
+    speed, and quality. They outperformed competitors in their size class, scoring
+    high on the Arena Hard benchmark.
+  size: 94B parameters
+  dependencies: []
+  training_emissions: Unknown
+  training_time: Unknown
+  training_hardware: For speed comparisons, Jamba 1.5 Mini used 2xA100 80GB GPUs,
+    and Jamba 1.5 Large used 8xA100 80GB GPUs.
+  quality_control: The models were evaluated on the Arena Hard benchmark. For maintaining
+    long context performance, they were tested on the RULER benchmark.
+  access: open
+  license: Jamba Open Model License
+  intended_uses: The models are built for enterprise scale AI applications. They
+    are purpose-built for efficiency, speed, and ability to solve critical tasks
+    that businesses care about, such as lengthy document summarization and analysis.
+    They can also be used for RAG and agentic workflows.
+  prohibited_uses: Unknown
+  monitoring: Unknown
+  feedback: Unknown
diff --git a/assets/aleph_alpha.yaml b/assets/aleph_alpha.yaml
@@ -99,3 +99,39 @@
   prohibited_uses: ''
   monitoring: ''
   feedback: ''
+- type: model
+  name: Pharia-1-LLM-7B
+  organization: Aleph Alpha
+  description: Pharia-1-LLM-7B is a model that falls within the Pharia-1-LLM model
+    family. It is designed to deliver short, controlled responses that match the
+    performance of leading open-source models around 7-8 billion parameters. The
+    model is culturally and linguistically tuned for German, French, and Spanish
+    languages. It is trained on carefully curated data in line with relevant EU
+    and national regulations. The model shows improved token efficiency and is particularly
+    effective in domain-specific applications, especially in the automotive and
+    engineering industries. It can also be aligned to user preferences, making it
+    appropriate for critical applications without the risk of shut-down behaviour.
+  created_date: 2024-09-08
+  url: https://aleph-alpha.com/introducing-pharia-1-llm-transparent-and-compliant/#:~:text=Pharia%2D1%2DLLM%2D7B
+  model_card: unknown
+  modality: text; text
+  analysis: Extensive evaluations were done with ablation experiments performed
+    on pre-training benchmarks such as lambada, triviaqa, hellaswag, winogrande,
+    webqs, arc, and boolq. Direct comparisons were also performed with applications
+    like GPT and Llama 2.
+  size: 7B parameters
+  dependencies: []
+  training_emissions: Unknown
+  training_time: Unknown
+  training_hardware: Unknown
+  quality_control: The model comes with additional safety guardrails via alignment
+    methods to ensure safe usage. Training data is carefully curated to ensure compliance
+    with EU and national regulations.
+  access: open
+  license: Aleph Open
+  intended_uses: The model is intended for use in domain-specific applications,
+    particularly in the automotive and engineering industries. It can also be tailored
+    to user preferences.
+  prohibited_uses: Unknown
+  monitoring: Unknown
+  feedback: Feedback can be sent to [email protected].
diff --git a/assets/anthropic.yaml b/assets/anthropic.yaml
@@ -598,3 +598,42 @@
     and decisions related to financing, employment, and housing.
   monitoring: ''
   feedback: none
+- type: model
+  name: Claude 3.5 Sonnet
+  organization: Anthropic
+  description: Claude 3.5 Sonnet is an AI model with advanced understanding and
+    generation abilities in text, vision, and code. It sets new industry benchmarks
+    for graduate-level reasoning (GPQA), undergrad-level knowledge (MMLU), coding
+    proficiency (HumanEval), and visual reasoning. The model operates at twice the
+    speed of its predecessor, Claude 3 Opus, and is designed to tackle tasks like
+    context-sensitive customer support, orchestrating multi-step workflows, interpreting
+    charts and graphs, and transcribing text from images.
+  created_date: 2024-06-21
+  url: https://www.anthropic.com/news/claude-3-5-sonnet
+  model_card: unknown
+  modality: text; image, text
+  analysis: The model has been evaluated on a range of tests including graduate-level
+    reasoning (GPQA), undergraduate-level knowledge (MMLU), coding proficiency (HumanEval),
+    and standard vision benchmarks. In an internal agentic coding evaluation, Claude
+    3.5 Sonnet solved 64% of problems, outperforming the previous version, Claude
+    3 Opus, which solved 38%.
+  size: Unknown
+  dependencies: []
+  training_emissions: Unknown
+  training_time: Unknown
+  training_hardware: Unknown
+  quality_control: The model underwent a red-teaming assessment, and has been tested
+    and refined by external experts. It was also provided to the UK's AI Safety
+    Institute (UK AISI) for a pre-deployment safety evaluation.
+  access: open
+  license: unknown
+  intended_uses: The model is intended for complex tasks such as context-sensitive
+    customer support, orchestrating multi-step workflows, interpreting charts and
+    graphs, transcribing text from images, as well as writing, editing, and executing
+    code.
+  prohibited_uses: Misuse of the model is discouraged though specific use cases
+    are not mentioned.
+  monitoring: Unknown of misuse, and policy feedback from external experts has been
+    integrated to ensure robustness of evaluations.
+  feedback: Feedback on Claude 3.5 Sonnet can be submitted directly in-product to
+    inform the development roadmap and improve user experience.
diff --git a/assets/aspia_space,_institu.yaml b/assets/aspia_space,_institu.yaml
@@ -0,0 +1,34 @@
+---
+- type: model
+  name: AstroPT
+  organization: Aspia Space, Instituto de Astrofísica de Canarias (IAC), UniverseTBD,
+    Astrophysics Research Institute, Liverpool John Moores University, Departamento
+    Astrofísica, Universidad de la Laguna, Observatoire de Paris, LERMA, PSL University,
+    and Universit´e Paris-Cit´e.
+  description: AstroPT is an autoregressive pretrained transformer developed with
+    astronomical use-cases in mind. The models have been pretrained on 8.6 million
+    512x512 pixel grz-band galaxy postage stamp observations from the DESI Legacy
+    Survey DR8. They have created a range of models with varying complexity, ranging
+    from 1 million to 2.1 billion parameters.
+  created_date: 2024-09-08
+  url: https://arxiv.org/pdf/2405.14930v1
+  model_card: unknown
+  modality: image; image
+  analysis: The models’ performance on downstream tasks was evaluated by linear
+    probing. The models follow a similar saturating log-log scaling law to textual
+    models, their performance improves with the increase in model size up to the
+    saturation point of parameters.
+  size: 2.1B parameters
+  dependencies: [DESI Legacy Survey DR8]
+  training_emissions: Unknown
+  training_time: Unknown
+  training_hardware: Unknown
+  quality_control: The models’ performances were evaluated on downstream tasks as
+    measured by linear probing.
+  access: open
+  license: MIT
+  intended_uses: The models are intended for astronomical use-cases, particularly
+    in handling and interpreting large observation data from astronomical sources.
+  prohibited_uses: Unknown
+  monitoring: Unknown
+  feedback: Any problem with the model can be reported to Michael J. Smith at [email protected].
diff --git a/assets/cartesia.yaml b/assets/cartesia.yaml
@@ -2,12 +2,18 @@
 - type: model
   name: Sonic
   organization: Cartesia
-  description: Sonic is a low-latency voice model that generates lifelike speech. Developed by Cartesia, it was designed to be an efficient real-time AI capable of processing any-sized contexts and running on any device.
+  description: Sonic is a low-latency voice model that generates lifelike speech.
+    Developed by Cartesia, it was designed to be an efficient real-time AI capable
+    of processing any-sized contexts and running on any device.
   created_date: 2024-05-29
   url: https://cartesia.ai/blog/sonic
   model_card: none
   modality: text; audio
-  analysis: Extensive testing on Multilingual Librispeech dataset resulted in 20% lower validation perplexity. In downstream evaluations, this leads to a 2x lower word error rate and a 1 point higher quality score. Sonic also displays impressive performance metrics at inference, achieving lower latency (1.5x lower time-to-first-audio), faster inference speed (2x lower real-time factor), and higher throughput (4x). 
+  analysis: Extensive testing on Multilingual Librispeech dataset resulted in 20%
+    lower validation perplexity. In downstream evaluations, this leads to a 2x lower
+    word error rate and a 1 point higher quality score. Sonic also displays impressive
+    performance metrics at inference, achieving lower latency (1.5x lower time-to-first-audio),
+    faster inference speed (2x lower real-time factor), and higher throughput (4x).
   size: 2024-05-29
   dependencies: [Multilingual Librispeech dataset]
   training_emissions: unknown
@@ -16,7 +22,9 @@
   quality_control: ''
   access: limited
   license: unknown
-  intended_uses: Sonic has potential applications across customer support, entertainment, and content creation and is a part of Cartesias broader mission to bring real-time multimodal intelligence to every device.
+  intended_uses: Sonic has potential applications across customer support, entertainment,
+    and content creation and is a part of Cartesias broader mission to bring real-time
+    multimodal intelligence to every device.
   prohibited_uses: unknown
   monitoring: unknown
   feedback: Contact through the provided form or via email at [email protected].
diff --git a/assets/deepmind.yaml b/assets/deepmind.yaml
@@ -687,28 +687,50 @@
 - type: model
   name: Imagen 3
   organization: Google DeepMind
-  description: Imagen 3 is a high-quality text-to-image model, capable of generating images with better detail, richer lighting, and fewer distracting artifacts compared to previous models. Improved understanding of prompts allows for a wide range of visual styles and captures small details from longer prompts. It also understands prompts written in natural, everyday language, making it easier to use. Imagen 3 is available in multiple versions, optimized for different types of tasks, from generating quick sketches to high-resolution images.
+  description: Imagen 3 is a high-quality text-to-image model, capable of generating
+    images with better detail, richer lighting, and fewer distracting artifacts
+    compared to previous models. Improved understanding of prompts allows for a
+    wide range of visual styles and captures small details from longer prompts.
+    It also understands prompts written in natural, everyday language, making it
+    easier to use. Imagen 3 is available in multiple versions, optimized for different
+    types of tasks, from generating quick sketches to high-resolution images.
   created_date: 2024-05-14
   url: https://deepmind.google/technologies/imagen-3/
   model_card: none
   modality: text; image
-  analysis: The model was tested and evaluated on various prompts to assess its understanding of natural language, its ability to generate high-quality images in various formats and styles and generate fine details and complex textures. Red teaming and evaluations were conducted on topics including fairness, bias, and content safety. 
+  analysis: The model was tested and evaluated on various prompts to assess its
+    understanding of natural language, its ability to generate high-quality images
+    in various formats and styles and generate fine details and complex textures.
+    Red teaming and evaluations were conducted on topics including fairness, bias,
+    and content safety.
   size: unknown
   dependencies: []
   training_emissions: unknown
   training_time: unknown
   training_hardware: unknown
-  quality_control: Extensive filtering and data labeling were used to minimize harmful content in datasets and reduce the likelihood of harmful outputs. Privacy, safety, and security technologies were leveraged in deploying the model, including watermarking tool SynthID.
+  quality_control: Extensive filtering and data labeling were used to minimize harmful
+    content in datasets and reduce the likelihood of harmful outputs. Privacy, safety,
+    and security technologies were leveraged in deploying the model, including watermarking
+    tool SynthID.
   access: limited
   license: unknown
-  intended_uses: Generate high-quality images for various purposes, from photorealistic landscapes to textured oil paintings or whimsical claymation scenes. It is useful in situations where detailed visual representation is required based on the textual description.
+  intended_uses: Generate high-quality images for various purposes, from photorealistic
+    landscapes to textured oil paintings or whimsical claymation scenes. It is useful
+    in situations where detailed visual representation is required based on the
+    textual description.
   prohibited_uses: unknown
-  monitoring: Through digital watermarking tool SynthID embedded in pixels for detection and identification.
+  monitoring: Through digital watermarking tool SynthID embedded in pixels for detection
+    and identification.
   feedback: unknown
 - type: model
   name: Veo
   organization: Google DeepMind
-  description: Veo is Google DeepMind's most capable video generation model to date. It generates high-quality, 1080p resolution videos that can go beyond a minute, in a wide range of cinematic and visual styles. It accurately captures the nuance and tone of a prompt, and provides an unprecedented level of creative control. The model is also capable of maintaining visual consistency in video frames, and supports masked editing.
+  description: Veo is Google DeepMind's most capable video generation model to date.
+    It generates high-quality, 1080p resolution videos that can go beyond a minute,
+    in a wide range of cinematic and visual styles. It accurately captures the nuance
+    and tone of a prompt, and provides an unprecedented level of creative control.
+    The model is also capable of maintaining visual consistency in video frames,
+    and supports masked editing.
   created_date: 2024-05-14
   url: https://deepmind.google/technologies/veo/
   model_card: none
@@ -719,31 +741,47 @@
   training_emissions: unknown
   training_time: unknown
   training_hardware: unknown
-  quality_control: Videos created by Veo are watermarked using SynthID, DeepMinds tool for watermarking and identifying AI-generated content, and passed through safety filters and memorization checking processes to mitigate privacy, copyright and bias risks.
+  quality_control: Videos created by Veo are watermarked using SynthID, DeepMinds
+    tool for watermarking and identifying AI-generated content, and passed through
+    safety filters and memorization checking processes to mitigate privacy, copyright
+    and bias risks.
   access: closed
   license: unknown
-  intended_uses: Veo is intended to help create tools that make video production accessible to everyone. It can be used by filmmakers, creators, or educators for storytelling, education and more. Some of its features will be also brought to products like YouTube Shorts.
+  intended_uses: Veo is intended to help create tools that make video production
+    accessible to everyone. It can be used by filmmakers, creators, or educators
+    for storytelling, education and more. Some of its features will be also brought
+    to products like YouTube Shorts.
   prohibited_uses: unknown
   monitoring: unknown
-  feedback: Feedback from leading creators and filmmakers is incorporated to improve Veo's generative video technologies.
+  feedback: Feedback from leading creators and filmmakers is incorporated to improve
+    Veo's generative video technologies.
 - type: model
   name: Gemini 1.5 Flash
   organization: Google DeepMind
-  description: Gemini Flash is a lightweight model, optimized for speed and efficiency. It features multimodal reasoning and a breakthrough long context window of up to one million tokens. It's designed to serve at scale and is efficient on cost, providing quality results at a fraction of the cost of larger models.
+  description: Gemini Flash is a lightweight model, optimized for speed and efficiency.
+    It features multimodal reasoning and a breakthrough long context window of up
+    to one million tokens. It's designed to serve at scale and is efficient on cost,
+    providing quality results at a fraction of the cost of larger models.
   created_date: 2024-05-30
   url: https://deepmind.google/technologies/gemini/flash/
   model_card: none
   modality: audio, image, text, video; text
-  analysis: The model was evaluated on various benchmarks like General MMLU, Code Natural2Code, MATH, GPQA, Big-Bench, WMT23, MMMU, and MathVista providing performance across various domains like multilingual translation, image processing, and code generation.
+  analysis: The model was evaluated on various benchmarks like General MMLU, Code
+    Natural2Code, MATH, GPQA, Big-Bench, WMT23, MMMU, and MathVista providing performance
+    across various domains like multilingual translation, image processing, and
+    code generation.
   size: unknown
   dependencies: []
   training_emissions: unknown
   training_time: unknown
   training_hardware: unknown
-  quality_control: The research team is continually exploring new ideas at the frontier of AI and building innovative products for consistent progress.
+  quality_control: The research team is continually exploring new ideas at the frontier
+    of AI and building innovative products for consistent progress.
   access: limited
   license: Googles Terms and Conditions
-  intended_uses: The model is intended for developer and enterprise use cases. It can process hours of video and audio, and hundreds of thousands of words or lines of code, making it beneficial for a wide range of tasks.
+  intended_uses: The model is intended for developer and enterprise use cases. It
+    can process hours of video and audio, and hundreds of thousands of words or
+    lines of code, making it beneficial for a wide range of tasks.
   prohibited_uses: ''
   monitoring: unknown
   feedback: none