Skip to content

Commit

Permalink
getting there
Browse files Browse the repository at this point in the history
  • Loading branch information
xrotwang committed Feb 9, 2024
1 parent fb577f1 commit a34848d
Show file tree
Hide file tree
Showing 10 changed files with 548 additions and 105 deletions.
5 changes: 4 additions & 1 deletion .zenodo.json
Original file line number Diff line number Diff line change
Expand Up @@ -32,5 +32,8 @@
"name": "Robert Forkel",
"type": "Editor"
}
]
],
"license": {
"id": "CC-BY-4.0"
}
}
396 changes: 396 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

20 changes: 19 additions & 1 deletion NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Source

This dataset if derived from ECAI's Pacific Language Mapping project, described as follows:
This dataset is derived from ECAI's Pacific Language Mapping project, described as follows:

> Languages included in this Atlas cover about one third of the world's 6,000 languages. The regions of the Pacific, Southeast Asia (apart from Burma), and Madagascar are documented. By the end of this century most of these languages will be extinct, thus limiting our ability to comprehend the diversity of human experience.
Expand Down Expand Up @@ -36,3 +36,21 @@ These mappings were then used to create aggregations of the shapes on two levels
language, were ignored.
- Areas labeled as language (sub-)groups with no counterpart in Glottolog's classification (e.g. "Papuan") were
ignored.
- Languoids in this dataset are related to the original shapes through a list-valued foreign key, i.e. a many-to-many relation. Thus,
examining languoids together with the source shapes requires joining tables which can easily be done via
[CLDF SQL](https://github.com/cldf/cldf/blob/master/extensions/sql.md).
As expected, the big language families of the area have the biggest number of associated shapes:
```sql
SELECT l.cldf_name, count(c.cldf_id) AS c
FROM LanguageTable AS l
JOIN LanguageTable_ContributionTable AS cassoc ON cassoc.LanguageTable_cldf_id = l.cldf_id
JOIN ContributionTable AS c ON c.cldf_id = cassoc.ContributionTable_cldf_id
GROUP BY l.cldf_id
ORDER BY c DESC LIMIT 4;
```
family | shapes
--- | ---
Austronesian|1259
Nuclear Trans New Guinea|389
Austroasiatic|107
Pama-Nyungan|104
20 changes: 19 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ This dataset is licensed under a CC-BY-4.0 license

## Source

This dataset if derived from ECAI's Pacific Language Mapping project, described as follows:
This dataset is derived from ECAI's Pacific Language Mapping project, described as follows:

> Languages included in this Atlas cover about one third of the world's 6,000 languages. The regions of the Pacific, Southeast Asia (apart from Burma), and Madagascar are documented. By the end of this century most of these languages will be extinct, thus limiting our ability to comprehend the diversity of human experience.
Expand Down Expand Up @@ -53,6 +53,24 @@ These mappings were then used to create aggregations of the shapes on two levels
language, were ignored.
- Areas labeled as language (sub-)groups with no counterpart in Glottolog's classification (e.g. "Papuan") were
ignored.
- Languoids in this dataset are related to the original shapes through a list-valued foreign key, i.e. a many-to-many relation. Thus,
examining languoids together with the source shapes requires joining tables which can easily be done via
[CLDF SQL](https://github.com/cldf/cldf/blob/master/extensions/sql.md).
As expected, the big language families of the area have the biggest number of associated shapes:
```sql
SELECT l.cldf_name, count(c.cldf_id) AS c
FROM LanguageTable AS l
JOIN LanguageTable_ContributionTable AS cassoc ON cassoc.LanguageTable_cldf_id = l.cldf_id
JOIN ContributionTable AS c ON c.cldf_id = cassoc.ContributionTable_cldf_id
GROUP BY l.cldf_id
ORDER BY c DESC LIMIT 4;
```
family | shapes
--- | ---
Austronesian|1259
Nuclear Trans New Guinea|389
Austroasiatic|107
Pama-Nyungan|104


## CLDF Datasets
Expand Down
16 changes: 12 additions & 4 deletions RELEASING.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,17 @@
# Releasing

FIXME: todo!

cldfbench makecldf cldfbench_languageatlasofthepacificarea.py
```shell
cldfbench makecldf cldfbench_languageatlasofthepacificarea.py --glottolog-version v4.8
```

```shell
cldfbench zenodo cldfbench_languageatlasofthepacificarea.py
```

```shell
cldfbench cldfreadme cldfbench_languageatlasofthepacificarea.py
cldfbench readme cldfbench_languageatlasofthepacificarea.py
```

```shell
cldfbench readme cldfbench_languageatlasofthepacificarea.py
```
4 changes: 2 additions & 2 deletions cldf/Generic-metadata.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,14 @@
],
"dc:bibliographicCitation": "Language Atlas of the Pacific Geo-Registered - GIS Dataset",
"dc:conformsTo": "http://cldf.clld.org/v1.0/terms.rdf#Generic",
"dc:license": "CC-BY",
"dc:license": "https://creativecommons.org/licenses/by/4.0/",
"dc:title": "CLDF Geo data derived from the ECAI digitization of Wurm and Hattori \"Language Atlas of the Pacific Area\" from 1981 and 1983",
"dcat:accessURL": "https://github.com/cldf-datasets/languageatlasofthepacificarea",
"prov:wasDerivedFrom": [
{
"rdf:about": "https://github.com/cldf-datasets/languageatlasofthepacificarea",
"rdf:type": "prov:Entity",
"dc:created": "110583a",
"dc:created": "fb577f1",
"dc:title": "Repository"
},
{
Expand Down
6 changes: 3 additions & 3 deletions cldf/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,9 @@ property | value
--- | ---
[dc:bibliographicCitation](http://purl.org/dc/terms/bibliographicCitation) | Language Atlas of the Pacific Geo-Registered - GIS Dataset
[dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF Generic](http://cldf.clld.org/v1.0/terms.rdf#Generic)
[dc:license](http://purl.org/dc/terms/license) | CC-BY
[dc:license](http://purl.org/dc/terms/license) | https://creativecommons.org/licenses/by/4.0/
[dcat:accessURL](http://www.w3.org/ns/dcat#accessURL) | https://github.com/cldf-datasets/languageatlasofthepacificarea
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="https://github.com/cldf-datasets/languageatlasofthepacificarea/tree/d0e7d7e">cldf-datasets/languageatlasofthepacificarea d0e7d7e</a></li><li><a href="https://github.com/glottolog/glottolog/tree/e27a9914d4">Glottolog v4.8-61-ge27a9914d4</a></li></ol>
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="https://github.com/cldf-datasets/languageatlasofthepacificarea/tree/fb577f1">cldf-datasets/languageatlasofthepacificarea fb577f1</a></li><li><a href="https://github.com/glottolog/glottolog/tree/v4.8">Glottolog v4.8</a></li></ol>
[prov:wasGeneratedBy](http://www.w3.org/ns/prov#wasGeneratedBy) | <ol><li><strong>python</strong>: 3.10.12</li><li><strong>python-packages</strong>: <a href="./requirements.txt">requirements.txt</a></li></ol>
[rdf:ID](http://www.w3.org/1999/02/22-rdf-syntax-ns#ID) | languageatlasofthepacificarea
[rdf:type](http://www.w3.org/1999/02/22-rdf-syntax-ns#type) | http://www.w3.org/ns/dcat#Distribution
Expand Down Expand Up @@ -64,7 +64,7 @@ Name/Property | Datatype | Description
property | value
--- | ---
[dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF LanguageTable](http://cldf.clld.org/v1.0/terms.rdf#LanguageTable)
[dc:extent](http://purl.org/dc/terms/extent) | 1758
[dc:extent](http://purl.org/dc/terms/extent) | 1757


### Columns
Expand Down
Loading

0 comments on commit a34848d

Please sign in to comment.