Skip to content

Commit

Permalink
improved schemas, added data validation, initial page to render the e…
Browse files Browse the repository at this point in the history
…cosystem graphs as a table
  • Loading branch information
percyliang committed Mar 13, 2022
1 parent 0ed5866 commit dd3fb0c
Show file tree
Hide file tree
Showing 15 changed files with 776 additions and 176 deletions.
20 changes: 15 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,17 @@
# Ecosystem reports
# Ecosystem graphs

foundation models
This repository contains the information that powers ecosystem graphs for
foundation models (e.g., GPT-3).
Briefly, an ecosystem graph is a graph where nodes are **assets**
(e.g., datasets, models, and applications)
and directed edges represent dependencies between assets
(e.g., model trained on a dataset, application powered by a model).

- Dataset
- Model
- Application
We welcome community contributions to this repository.
To contribute, please submit a PR.

To visualize and explore the ecosystem graphs, start a local server:

python server.py

and navigate to [http://localhost:8000](http://localhost:8000).
88 changes: 0 additions & 88 deletions assets.yaml

This file was deleted.

23 changes: 23 additions & 0 deletions assets/deepmind.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
- type: model
name: Gopher
# General
organization: DeepMind
release_date: TODO
url: https://arxiv.org/pdf/2112.11446.pdf
model_card: TODO
modality: text
size: TODO
analysis: TODO
# Construction
dependencies: []
training_emissions: TODO
training_time: TODO
training_hardware: TODO
harm_mitigation: TODO
# Downstream
access: none
license: none
allowed_uses: none
prohibited_uses: none
monitoring: none
feedback: none
52 changes: 52 additions & 0 deletions assets/eleutherai.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
- type: dataset
name: The Pile
# General
organization: EleutherAI
release_date: 2021-01-01
url: https://arxiv.org/pdf/2101.00027.pdf
datasheet: https://arxiv.org/pdf/2201.07311.pdf
modality: text (English, code)
size: 825GB
examples:
- ...pot trending topics and the coverage around them. First up, there’s a bit of a visual redesign. Previously, clicking on a trending topic would highlight a story from one publication, and you’d have to scroll down past a live video section to view related stories. Facebook is replacing that system with a simple carousel, which does a better job of showing you different coverage options. To be clear, the change doesn’t affect how stories are sourced, according to Facebook. It’s still the same algorithm pickin...
- Total knee arthroplasty (TKA) is a promising treatment for endstage osteoarthritis (OA) of the knee for alleviating pain and restoring the function of the knee. Some of the cases with bilateral TKA are symptomatic, necessitating revision arthroplasty in both the knees. A bilateral revision TKA can be done ei
- On the converse, the set-valued map $\Phi:[0,3]\rightrightarrows [0,3]$ $$\Phi(x):=\left\{\begin{array}{ll} \{1\} & \mbox{ if } 0\leq x<1\\ {}[1,2] & \mbox{ if } 1\leq x\leq 2\\ \{2\} &
- This Court thus uses the same interpretation of V.R.C.P. 52(a) as it did *487 under the previous statutory requirement found in 12 V.S.A. § 2385. In essense, the defendants urge that this Court should reconsider the case of Green Mountain Marble Co. v. Highway Board, supra, and follow the Federal practice of looking to the evide
analysis: See the paper.
# Construction
dependencies: []
license: TODO
included: 22 diverse sources (Pile-CC, PubMed Central, PubMed Abstracts, Books3, BookCorpus2, OpenWebText2, ArXiv, Github, FreeLaw, Stack Exchange, USPTO, PG-19, OpenSubtitles, Wikipedia, DM Math, Ubuntu IRC, EuroParl, HackerNews, YTSubtitles, PhilPapers, NIH, Enron Emails)
excluded: US congressional record, fanfiction, literotica
harm_mitigation: TODO
# Downstream
access: Can be downloaded for free from [The Eye](https://mystic.the-eye.eu/public/AI/pile/)
allowed_uses: Training large-scale language models
prohibited_uses: none
monitoring: none
feedback: Email the authors

- type: model
name: GPT-NeoX-20B
# General
organization: EleutherAI
release_date: 2021-02-02
url: http://eaidata.bmk.sh/data/GPT_NeoX_20B.pdf
model_card: https://mystic.the-eye.eu/public/AI/models/GPT-NeoX-20B/20B_model_card.md
modality: text (English, code)
size: Autoregressive Transformer with 20B parameters
analysis: Evaluated on LAMBDA, ANLI, HellaSwag, MMLU, etc.
# Construction
dependencies:
- The Pile
training_emissions: 31.73 tCO2 eq [Section 6.4]
training_time: 1830 hours [Section 6.4]
training_hardware: 12 x 8 A100s [Section 2.3]
harm_mitigation: TODO
# Downstream
access: Can be downloaded for free from [The Eye](https://mystic.the-eye.eu/public/AI/models/GPT-NeoX-20B/)
license: Apache 2.0
allowed_uses: Research towards the safe use of AI
prohibited_uses: none
monitoring: none
feedback: Email the authors
98 changes: 98 additions & 0 deletions assets/google.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
- type: dataset
name: Internal Google BERT dataset
# General
organization: Google
release_date: none
url: none
datasheet: none
modality: text
size: unknown
examples: []
analysis: unknown
# Construction
dependencies: []
license: none
included: Web pages
excluded: unknown
harm_mitigation: unknown
# Downstream
access: none
allowed_uses: none
prohibited_uses: none
monitoring: none
feedback: none

- type: model
name: Internal Google BERT
# General
organization: Google
release_date: TODO
url: TODO
model_card: TODO
modality: text
size: TODO
analysis: TODO
# Construction
dependencies:
- Internal Google BERT dataset
training_emissions: TODO
training_time: TODO
training_hardware: TODO
harm_mitigation: unknown
# Downstream
access: none
license: none
allowed_uses: none
prohibited_uses: none
monitoring: none
feedback: none


- type: application
name: Google search
# General
organization: Google
release_date: 2019
url: https://searchengineland.com/google-bert-used-on-almost-every-english-query-342193
# Construction
dependencies:
- Internal Google BERT
adaptation: none?
output_space: web page ranking
harm_mitigation: TODO
# Downstream
access: TODO
license: TODO
terms_of_service: TODO
allowed_uses: TODO
prohibited_uses: TODO
monitoring: TODO
feedback: TODO
# Deployment
monthly_active_users: TODO
user_distribution: TODO
failures: TODO

- type: model
name: LamDA
# General
organization: Google
release_date: TODO
url: https://arxiv.org/pdf/2201.08239.pdf
model_card: TODO
modality: text
size: TODO
analysis: TODO
# Construction
dependencies: []
training_emissions: TODO
training_time: TODO
training_hardware: TODO
harm_mitigation: TODO
# Downstream
access: none
license: none
allowed_uses: none
prohibited_uses: none
monitoring: none
feedback: none
Loading

0 comments on commit dd3fb0c

Please sign in to comment.