getting there

cldf-datasets · Feb 9, 2024 · a34848d · a34848d
1 parent fb577f1
commit a34848d
Show file tree

Hide file tree

Showing 10 changed files with 548 additions and 105 deletions.
diff --git a/.zenodo.json b/.zenodo.json
@@ -32,5 +32,8 @@
             "name": "Robert Forkel",
             "type": "Editor"
         }
-    ]
+    ],
+    "license": {
+        "id": "CC-BY-4.0"
+    }
 }
diff --git a/LICENSE b/LICENSE
diff --git a/NOTES.md b/NOTES.md
@@ -2,7 +2,7 @@
 
 ## Source
 
-This dataset if derived from ECAI's Pacific Language Mapping project, described as follows:
+This dataset is derived from ECAI's Pacific Language Mapping project, described as follows:
 
 > Languages included in this Atlas cover about one third of the world's 6,000 languages. The regions of the Pacific, Southeast Asia (apart from Burma), and Madagascar are documented. By the end of this century most of these languages will be extinct, thus limiting our ability to comprehend the diversity of human experience.
 
@@ -36,3 +36,21 @@ These mappings were then used to create aggregations of the shapes on two levels
   language, were ignored.
 - Areas labeled as language (sub-)groups with no counterpart in Glottolog's classification (e.g. "Papuan") were
   ignored.
+- Languoids in this dataset are related to the original shapes through a list-valued foreign key, i.e. a many-to-many relation. Thus,
+  examining languoids together with the source shapes requires joining tables which can easily be done via
+  [CLDF SQL](https://github.com/cldf/cldf/blob/master/extensions/sql.md).
+  As expected, the big language families of the area have the biggest number of associated shapes:
+  ```sql
+  SELECT l.cldf_name, count(c.cldf_id) AS c
+  FROM LanguageTable AS l 
+  JOIN LanguageTable_ContributionTable AS cassoc ON cassoc.LanguageTable_cldf_id = l.cldf_id
+  JOIN ContributionTable AS c ON c.cldf_id = cassoc.ContributionTable_cldf_id
+  GROUP BY l.cldf_id
+  ORDER BY c DESC LIMIT 4;
+  ```
+  family | shapes
+  --- | ---
+  Austronesian|1259
+  Nuclear Trans New Guinea|389
+  Austroasiatic|107
+  Pama-Nyungan|104
diff --git a/README.md b/README.md
@@ -19,7 +19,7 @@ This dataset is licensed under a CC-BY-4.0 license
 
 ## Source
 
-This dataset if derived from ECAI's Pacific Language Mapping project, described as follows:
+This dataset is derived from ECAI's Pacific Language Mapping project, described as follows:
 
 > Languages included in this Atlas cover about one third of the world's 6,000 languages. The regions of the Pacific, Southeast Asia (apart from Burma), and Madagascar are documented. By the end of this century most of these languages will be extinct, thus limiting our ability to comprehend the diversity of human experience.
 
@@ -53,6 +53,24 @@ These mappings were then used to create aggregations of the shapes on two levels
   language, were ignored.
 - Areas labeled as language (sub-)groups with no counterpart in Glottolog's classification (e.g. "Papuan") were
   ignored.
+- Languoids in this dataset are related to the original shapes through a list-valued foreign key, i.e. a many-to-many relation. Thus,
+  examining languoids together with the source shapes requires joining tables which can easily be done via
+  [CLDF SQL](https://github.com/cldf/cldf/blob/master/extensions/sql.md).
+  As expected, the big language families of the area have the biggest number of associated shapes:
+  ```sql
+  SELECT l.cldf_name, count(c.cldf_id) AS c
+  FROM LanguageTable AS l 
+  JOIN LanguageTable_ContributionTable AS cassoc ON cassoc.LanguageTable_cldf_id = l.cldf_id
+  JOIN ContributionTable AS c ON c.cldf_id = cassoc.ContributionTable_cldf_id
+  GROUP BY l.cldf_id
+  ORDER BY c DESC LIMIT 4;
+  ```
+  family | shapes
+  --- | ---
+  Austronesian|1259
+  Nuclear Trans New Guinea|389
+  Austroasiatic|107
+  Pama-Nyungan|104
 
 
 ## CLDF Datasets

diff --git a/RELEASING.md b/RELEASING.md
@@ -1,9 +1,17 @@
 # Releasing
 
-FIXME: todo!
-
-cldfbench makecldf cldfbench_languageatlasofthepacificarea.py 
+```shell
+cldfbench makecldf cldfbench_languageatlasofthepacificarea.py --glottolog-version v4.8
+```
 
+```shell
 cldfbench zenodo cldfbench_languageatlasofthepacificarea.py
+```
+
+```shell
 cldfbench cldfreadme cldfbench_languageatlasofthepacificarea.py 
-cldfbench readme cldfbench_languageatlasofthepacificarea.py 
+```
+
+```shell
+cldfbench readme cldfbench_languageatlasofthepacificarea.py 
+```
diff --git a/cldf/Generic-metadata.json b/cldf/Generic-metadata.json
@@ -7,14 +7,14 @@
     ],
     "dc:bibliographicCitation": "Language Atlas of the Pacific Geo-Registered - GIS Dataset",
     "dc:conformsTo": "http://cldf.clld.org/v1.0/terms.rdf#Generic",
-    "dc:license": "CC-BY",
+    "dc:license": "https://creativecommons.org/licenses/by/4.0/",
     "dc:title": "CLDF Geo data derived from the ECAI digitization of Wurm and Hattori \"Language Atlas of the Pacific Area\" from 1981 and 1983",
     "dcat:accessURL": "https://github.com/cldf-datasets/languageatlasofthepacificarea",
     "prov:wasDerivedFrom": [
         {
             "rdf:about": "https://github.com/cldf-datasets/languageatlasofthepacificarea",
             "rdf:type": "prov:Entity",
-            "dc:created": "110583a",
+            "dc:created": "fb577f1",
             "dc:title": "Repository"
         },
         {

diff --git a/cldf/README.md b/cldf/README.md
@@ -8,9 +8,9 @@ property | value
  --- | ---
 [dc:bibliographicCitation](http://purl.org/dc/terms/bibliographicCitation) | Language Atlas of the Pacific Geo-Registered - GIS Dataset
 [dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF Generic](http://cldf.clld.org/v1.0/terms.rdf#Generic)
-[dc:license](http://purl.org/dc/terms/license) | CC-BY
+[dc:license](http://purl.org/dc/terms/license) | https://creativecommons.org/licenses/by/4.0/
 [dcat:accessURL](http://www.w3.org/ns/dcat#accessURL) | https://github.com/cldf-datasets/languageatlasofthepacificarea
-[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="https://github.com/cldf-datasets/languageatlasofthepacificarea/tree/d0e7d7e">cldf-datasets/languageatlasofthepacificarea d0e7d7e</a></li><li><a href="https://github.com/glottolog/glottolog/tree/e27a9914d4">Glottolog v4.8-61-ge27a9914d4</a></li></ol>
+[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="https://github.com/cldf-datasets/languageatlasofthepacificarea/tree/fb577f1">cldf-datasets/languageatlasofthepacificarea fb577f1</a></li><li><a href="https://github.com/glottolog/glottolog/tree/v4.8">Glottolog v4.8</a></li></ol>
 [prov:wasGeneratedBy](http://www.w3.org/ns/prov#wasGeneratedBy) | <ol><li><strong>python</strong>: 3.10.12</li><li><strong>python-packages</strong>: <a href="./requirements.txt">requirements.txt</a></li></ol>
 [rdf:ID](http://www.w3.org/1999/02/22-rdf-syntax-ns#ID) | languageatlasofthepacificarea
 [rdf:type](http://www.w3.org/1999/02/22-rdf-syntax-ns#type) | http://www.w3.org/ns/dcat#Distribution
@@ -64,7 +64,7 @@ Name/Property | Datatype | Description
 property | value
  --- | ---
 [dc:conformsTo](http://purl.org/dc/terms/conformsTo) | [CLDF LanguageTable](http://cldf.clld.org/v1.0/terms.rdf#LanguageTable)
-[dc:extent](http://purl.org/dc/terms/extent) | 1758
+[dc:extent](http://purl.org/dc/terms/extent) | 1757
 
 
 ### Columns