Skip to content

Commit

Permalink
work on next version
Browse files Browse the repository at this point in the history
  • Loading branch information
xrotwang committed May 16, 2024
1 parent 0e776e3 commit 8f60060
Show file tree
Hide file tree
Showing 22 changed files with 2,377,126 additions and 1,781,923 deletions.
4 changes: 2 additions & 2 deletions .zenodo.json
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,10 @@
},
{
"name": "Harald Hammarstr\u00f6m",
"type": "Other"
"type": "Editor"
}
],
"license": {
"id": "CC-BY-4.0"
"id": "CC-BY-NC-4.0"
}
}
2 changes: 1 addition & 1 deletion CONTRIBUTORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ David Blundell | editor |
Lawrence Crissman | editor |
Jeanette Zerneke | editor |
Robert Forkel | editor | @xrotwang
Harald Hammarström | other |
Harald Hammarström | editor |
46 changes: 29 additions & 17 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Attribution 4.0 International
Attribution-NonCommercial 4.0 International

=======================================================================

Expand Down Expand Up @@ -54,16 +54,18 @@ exhaustive, and do not form part of our licenses.

=======================================================================

Creative Commons Attribution 4.0 International Public License
Creative Commons Attribution-NonCommercial 4.0 International Public
License

By exercising the Licensed Rights (defined below), You accept and agree
to be bound by the terms and conditions of this Creative Commons
Attribution 4.0 International Public License ("Public License"). To the
extent this Public License may be interpreted as a contract, You are
granted the Licensed Rights in consideration of Your acceptance of
these terms and conditions, and the Licensor grants You such rights in
consideration of benefits the Licensor receives from making the
Licensed Material available under these terms and conditions.
Attribution-NonCommercial 4.0 International Public License ("Public
License"). To the extent this Public License may be interpreted as a
contract, You are granted the Licensed Rights in consideration of Your
acceptance of these terms and conditions, and the Licensor grants You
such rights in consideration of benefits the Licensor receives from
making the Licensed Material available under these terms and
conditions.


Section 1 -- Definitions.
Expand All @@ -89,7 +91,6 @@ Section 1 -- Definitions.
categorized. For purposes of this Public License, the rights
specified in Section 2(b)(1)-(2) are not Copyright and Similar
Rights.

d. Effective Technological Measures means those measures that, in the
absence of proper authority, may not be circumvented under laws
fulfilling obligations under Article 11 of the WIPO Copyright
Expand All @@ -112,21 +113,29 @@ Section 1 -- Definitions.
h. Licensor means the individual(s) or entity(ies) granting rights
under this Public License.

i. Share means to provide material to the public by any means or
i. NonCommercial means not primarily intended for or directed towards
commercial advantage or monetary compensation. For purposes of
this Public License, the exchange of the Licensed Material for
other material subject to Copyright and Similar Rights by digital
file-sharing or similar means is NonCommercial provided there is
no payment of monetary compensation in connection with the
exchange.

j. Share means to provide material to the public by any means or
process that requires permission under the Licensed Rights, such
as reproduction, public display, public performance, distribution,
dissemination, communication, or importation, and to make material
available to the public including in ways that members of the
public may access the material from a place and at a time
individually chosen by them.

j. Sui Generis Database Rights means rights other than copyright
k. Sui Generis Database Rights means rights other than copyright
resulting from Directive 96/9/EC of the European Parliament and of
the Council of 11 March 1996 on the legal protection of databases,
as amended and/or succeeded, as well as other essentially
equivalent rights anywhere in the world.

k. You means the individual or entity exercising the Licensed Rights
l. You means the individual or entity exercising the Licensed Rights
under this Public License. Your has a corresponding meaning.


Expand All @@ -140,9 +149,10 @@ Section 2 -- Scope.
exercise the Licensed Rights in the Licensed Material to:

a. reproduce and Share the Licensed Material, in whole or
in part; and
in part, for NonCommercial purposes only; and

b. produce, reproduce, and Share Adapted Material.
b. produce, reproduce, and Share Adapted Material for
NonCommercial purposes only.

2. Exceptions and Limitations. For the avoidance of doubt, where
Exceptions and Limitations apply to Your use, this Public
Expand Down Expand Up @@ -204,7 +214,9 @@ Section 2 -- Scope.
Rights, whether directly or through a collecting society
under any voluntary or waivable statutory or compulsory
licensing scheme. In all other cases the Licensor expressly
reserves any right to collect such royalties.
reserves any right to collect such royalties, including when
the Licensed Material is used other than for NonCommercial
purposes.


Section 3 -- License Conditions.
Expand Down Expand Up @@ -266,7 +278,8 @@ apply to Your use of the Licensed Material:

a. for the avoidance of doubt, Section 2(a)(1) grants You the right
to extract, reuse, reproduce, and Share all or a substantial
portion of the contents of the database;
portion of the contents of the database for NonCommercial purposes
only;

b. if You include all or a substantial portion of the database
contents in a database in which You have Sui Generis Database
Expand Down Expand Up @@ -372,7 +385,6 @@ Section 8 -- Interpretation.
that apply to the Licensor or You, including from the legal
processes of any jurisdiction or authority.


=======================================================================

Creative Commons is not a party to its public
Expand Down
29 changes: 27 additions & 2 deletions NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Source

This dataset is derived from ECAI's Pacific Language Mapping project, described as follows:
This dataset is derived from [data released by ECAI's Pacific Language Mapping project](https://ecaidata.org/organization/ecai-pacific-language-mapping), described as follows:

> Languages included in this Atlas cover about one third of the world's 6,000 languages. The regions of the Pacific, Southeast Asia (apart from Burma), and Madagascar are documented. By the end of this century most of these languages will be extinct, thus limiting our ability to comprehend the diversity of human experience.
Expand All @@ -15,15 +15,26 @@ This dataset is derived from ECAI's Pacific Language Mapping project, described
> The ECAI Digital Pacific Language Map spans the extent of Pacific languages including Austronesian languages, Papuan languages, Australian aboriginal languages, and the Austro-Asiatic languages of Southeast Asia. Others such as trade, pidgin, and continental languages are represented as related to Pacific language regions.

### Copyright and license

The scanned atlas leaves and the Geo-Registered GIS dataset are made available under a CC-BY license (CC-BY-NC for the leaves of Taiwan).

> Registered scans of the maps from the Language Atlas of the Pacific Area (excluding the maps of Japan) are made available through the Electronic Cultural Atlas Initiative (ECAI) Metadata Clearinghouse as a result of cooperation between Academia Sinica and the ECAI Austronesian Atlas Team led by David Blundell and Lawrence Crissman. The Australian Academy of the Humanities, which owns the copyright to the available maps, has graciously permitted their reproduction and distribution in this digital format, and we are grateful for their support. Any public use of the maps should acknowledge their source and copyright ownership.
Thus, this derived dataset is licensed in its entirety under a CC-BY-NC license.


## Processing

### Shapefile

The data published in ECAI's shapefile was preprocessed as follows:

- Minimal fixes of the actual GIS data have been necessary, e.g. making sure polygons do not self-intersect, etc.
- Obvious errors or inconsistencies of the metadata have been corrected, e.g. normalizing the field names.

The cleaned up metadata was then used to map languages as specified in the ECAI data to Glottolog languoids,
see [etc/languages.csv](etc/languages.csv).
see [etc/languages_with_comment.csv](etc/languages_with_comment.csv).

These mappings were then used to create aggregations of the shapes on two levels:

Expand All @@ -38,6 +49,20 @@ These mappings were then used to create aggregations of the shapes on two levels
ignored.


### Scanned Atlas leaves

While the Atlas leaves for New Guinea were [available in geo-referenced form from ECAI](https://ecaidata.org/dataset/language_atlas_of_the_pacific_scanned_atlas_leaves_-_new_guinea)
we opted to recreate new geo-referenced versions of all Atlas leaves (except for the maps of Japan which were excluded from release).
We did this to
- provide geo-referenced images for all available maps
- in GeoTIFF format (rather than the ESRI World File variant of geo-referencing provided for the New Guinea maps).

This allowed us to also create "better" geo-referenced images in the sense that more control points could be used
thereby allowing for more complex transformations which increased the overall fit of the geo-referencing.
We could also derive images reprojected to EPSG:3857 ([Web Mercator projection](https://en.wikipedia.org/wiki/Web_Mercator_projection))
suitable for overlaying on web maps (which created a good setup for quality control of the language matches).


## Usage

Languoids in this dataset are related to the original shapes through a list-valued foreign key, i.e. a many-to-many relation. Thus,
Expand Down
31 changes: 28 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,14 @@ If you use these data please cite
## Description


This dataset is licensed under a CC-BY-4.0 license
This dataset is licensed under a CC-BY-NC-4.0 license


![](etc/img.png)

## Source

This dataset is derived from ECAI's Pacific Language Mapping project, described as follows:
This dataset is derived from [data released by ECAI's Pacific Language Mapping project](https://ecaidata.org/organization/ecai-pacific-language-mapping), described as follows:

> Languages included in this Atlas cover about one third of the world's 6,000 languages. The regions of the Pacific, Southeast Asia (apart from Burma), and Madagascar are documented. By the end of this century most of these languages will be extinct, thus limiting our ability to comprehend the diversity of human experience.
Expand All @@ -32,15 +32,26 @@ This dataset is derived from ECAI's Pacific Language Mapping project, described
> The ECAI Digital Pacific Language Map spans the extent of Pacific languages including Austronesian languages, Papuan languages, Australian aboriginal languages, and the Austro-Asiatic languages of Southeast Asia. Others such as trade, pidgin, and continental languages are represented as related to Pacific language regions.

### Copyright and license

The scanned atlas leaves and the Geo-Registered GIS dataset are made available under a CC-BY license (CC-BY-NC for the leaves of Taiwan).

> Registered scans of the maps from the Language Atlas of the Pacific Area (excluding the maps of Japan) are made available through the Electronic Cultural Atlas Initiative (ECAI) Metadata Clearinghouse as a result of cooperation between Academia Sinica and the ECAI Austronesian Atlas Team led by David Blundell and Lawrence Crissman. The Australian Academy of the Humanities, which owns the copyright to the available maps, has graciously permitted their reproduction and distribution in this digital format, and we are grateful for their support. Any public use of the maps should acknowledge their source and copyright ownership.
Thus, this derived dataset is licensed in its entirety under a CC-BY-NC license.


## Processing

### Shapefile

The data published in ECAI's shapefile was preprocessed as follows:

- Minimal fixes of the actual GIS data have been necessary, e.g. making sure polygons do not self-intersect, etc.
- Obvious errors or inconsistencies of the metadata have been corrected, e.g. normalizing the field names.

The cleaned up metadata was then used to map languages as specified in the ECAI data to Glottolog languoids,
see [etc/languages.csv](etc/languages.csv).
see [etc/languages_with_comment.csv](etc/languages_with_comment.csv).

These mappings were then used to create aggregations of the shapes on two levels:

Expand All @@ -55,6 +66,20 @@ These mappings were then used to create aggregations of the shapes on two levels
ignored.


### Scanned Atlas leaves

While the Atlas leaves for New Guinea were [available in geo-referenced form from ECAI](https://ecaidata.org/dataset/language_atlas_of_the_pacific_scanned_atlas_leaves_-_new_guinea)
we opted to recreate new geo-referenced versions of all Atlas leaves (except for the maps of Japan which were excluded from release).
We did this to
- provide geo-referenced images for all available maps
- in GeoTIFF format (rather than the ESRI World File variant of geo-referencing provided for the New Guinea maps).

This allowed us to also create "better" geo-referenced images in the sense that more control points could be used
thereby allowing for more complex transformations which increased the overall fit of the geo-referencing.
We could also derive images reprojected to EPSG:3857 ([Web Mercator projection](https://en.wikipedia.org/wiki/Web_Mercator_projection))
suitable for overlaying on web maps (which created a good setup for quality control of the language matches).


## Usage

Languoids in this dataset are related to the original shapes through a list-valued foreign key, i.e. a many-to-many relation. Thus,
Expand Down
48 changes: 31 additions & 17 deletions cldf/Generic-metadata.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,15 @@
],
"dc:bibliographicCitation": "Language Atlas of the Pacific Geo-Registered - GIS Dataset",
"dc:conformsTo": "http://cldf.clld.org/v1.0/terms.rdf#Generic",
"dc:license": "https://creativecommons.org/licenses/by/4.0/",
"dc:license": "https://creativecommons.org/licenses/by-nc/4.0/",
"dc:source": "sources.bib",
"dc:title": "CLDF Geo data derived from the ECAI digitization of Wurm and Hattori \"Language Atlas of the Pacific Area\" from 1981 and 1983",
"dcat:accessURL": "https://github.com/cldf-datasets/languageatlasofthepacificarea",
"prov:wasDerivedFrom": [
{
"rdf:about": "https://github.com/cldf-datasets/languageatlasofthepacificarea",
"rdf:type": "prov:Entity",
"dc:created": "7442837",
"dc:created": "0e776e3",
"dc:title": "Repository"
},
{
Expand All @@ -41,7 +41,7 @@
{
"dc:conformsTo": "http://cldf.clld.org/v1.0/terms.rdf#ContributionTable",
"dc:description": "We list the individual shapes from the source dataset as contributions in order to preserve the original metadata.",
"dc:extent": 3021,
"dc:extent": 3125,
"tableSchema": {
"columns": [
{
Expand Down Expand Up @@ -79,20 +79,34 @@
"name": "Citation"
},
{
"name": "Country"
"propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#source",
"separator": ";",
"name": "Source"
},
{
"separator": "; ",
"name": "Sovereigns"
"propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#mediaReference",
"separator": " ",
"name": "Media_IDs"
},
{
"separator": "; ",
"name": "Islands"
},
"datatype": {
"base": "string",
"format": "leaf|shape"
},
"name": "Type"
}
],
"foreignKeys": [
{
"propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#source",
"separator": ";",
"name": "Source"
"columnReference": [
"Media_IDs"
],
"reference": {
"resource": "media.csv",
"columnReference": [
"ID"
]
}
}
],
"primaryKey": [
Expand All @@ -103,7 +117,7 @@
},
{
"dc:conformsTo": "http://cldf.clld.org/v1.0/terms.rdf#MediaTable",
"dc:extent": 2,
"dc:extent": 307,
"tableSchema": {
"columns": [
{
Expand Down Expand Up @@ -157,7 +171,7 @@
},
{
"dc:conformsTo": "http://cldf.clld.org/v1.0/terms.rdf#LanguageTable",
"dc:extent": 1820,
"dc:extent": 1869,
"tableSchema": {
"columns": [
{
Expand Down Expand Up @@ -231,16 +245,16 @@
"name": "Family"
},
{
"dc:description": "List of identifiers of shapes in the original shapefile that were aggregated to create the shape referenced by Speaker_Area.",
"dc:description": "List of identifiers of shapes in the original shapefile that were aggregated to create the shape referenced by Speaker_Area and of Atlas leaves mapping a georeferenced area intersecting with this languoid's area.",
"propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#contributionReference",
"separator": " ",
"name": "Source_Languoid_IDs"
"name": "Contribution_IDs"
}
],
"foreignKeys": [
{
"columnReference": [
"Source_Languoid_IDs"
"Contribution_IDs"
],
"reference": {
"resource": "contributions.csv",
Expand Down
Loading

0 comments on commit 8f60060

Please sign in to comment.