diff --git a/docs/dataset.rst b/docs/dataset.rst
index fcee84b..91cc959 100644
--- a/docs/dataset.rst
+++ b/docs/dataset.rst
@@ -1,24 +1,30 @@
`pycldf.dataset`
================
+.. py:currentmodule:: pycldf.dataset
+
The core object of the API, bundling most access to CLDF data, is
-the :class:`pycldf.Dataset` . In the following we'll describe its
+the :class:`.Dataset` . In the following we'll describe its
attributes and methods, bundled into thematic groups.
Dataset initialization
~~~~~~~~~~~~~~~~~~~~~~
-.. autoclass:: pycldf.dataset.Dataset
+.. autoclass:: Dataset
:members: __init__, in_dir, from_metadata, from_data
Accessing dataset metadata
~~~~~~~~~~~~~~~~~~~~~~~~~~
-.. autoclass:: pycldf.Dataset
- :noindex:
- :members: directory, module, version, metadata_dict, properties, bibpath, bibname
+.. autoproperty:: Dataset.directory
+.. autoproperty:: Dataset.module
+.. autoproperty:: Dataset.version
+.. autoproperty:: Dataset.metadata_dict
+.. autoproperty:: Dataset.properties
+.. autoproperty:: Dataset.bibpath
+.. autoproperty:: Dataset.bibname
Accessing schema objects: components, tables, columns, etc.
@@ -26,18 +32,19 @@ Accessing schema objects: components, tables, columns, etc.
Similar to *capability checks* in programming languages that use
`duck typing `_, it is often necessary
-to access a datasets schema, i.e. its tables and columns to figure out whether
-the dataset fits a certain purpose. This is supported via a `dict`-like interface provided
-by :class:`pycldf.Dataset`, where the keys are table specifiers or pairs (table specifier, column specifier).
+to access a datasets schema, i.e. its tables and columns, to figure out whether
+the dataset fits a certain purpose. This is supported via a
+`mapping `_-like interface provided
+by :class:`.Dataset`, where the keys are table specifiers or pairs (table specifier, column specifier).
A *table specifier* can be a table's component name or its `url`, a *column specifier* can be a column
name or its `propertyUrl`.
-* check existence with `in`:
+* check existence with ``in``:
.. code-block:: python
- if 'ValueTable' in dataset: pass
- if ('ValueTable', 'Language_ID') in dataset: pass
+ if 'ValueTable' in dataset: ...
+ if ('ValueTable', 'Language_ID') in dataset: ...
* retrieve a schema object with item access:
@@ -46,58 +53,69 @@ name or its `propertyUrl`.
table = dataset['ValueTable']
column = dataset['ValueTable', 'Language_ID']
-* retrieve a schema object or a default with `.get`:
+* retrieve a schema object or a default with :meth:`.Dataset.get`:
.. code-block:: python
table_or_none = dataset.get('ValueTableX')
column_or_none = dataset.get(('ValueTable', 'Language_ID'))
-* remove a schema object with `del`:
+* remove a schema object with ``del``:
.. code-block:: python
del dataset['ValueTable', 'Language_ID']
del dataset['ValueTable']
-Note: Adding schema objects is **not** supported via key assignment, but with a set of specialized
-methods described in :ref:`Editing metadata and schema`.
+.. note::
+ Adding schema objects is **not** supported via key assignment, but with a set of specialized
+ methods described in :ref:`Editing metadata and schema`.
-.. autoclass:: pycldf.Dataset
- :noindex:
- :members: tables, components, __getitem__, __contains__, get, get_foreign_key_reference, column_names, readonly_column_names
+.. autoproperty:: Dataset.tables
+.. autoproperty:: Dataset.components
+.. automethod:: Dataset.__getitem__
+.. automethod:: Dataset.__delitem__
+.. automethod:: Dataset.__contains__
+.. automethod:: Dataset.get
+.. automethod:: Dataset.get_foreign_key_reference
+.. autoproperty:: Dataset.column_names
+.. autoproperty:: Dataset.readonly_column_names
Editing metadata and schema
~~~~~~~~~~~~~~~~~~~~~~~~~~~
In many cases, editing the metadata of a dataset is as simple as editing
-:meth:`~pycldf.dataset.Dataset.properties`, but for the somewhat complex
+:meth:`.Dataset.properties`, but for the somewhat complex
formatting of provenance data, we provide the shortcut
-:meth:`~pycldf.dataset.Dataset.add_provenance`.
+:meth:`.Dataset.add_provenance`.
-Likewise, `csvw.Table` and `csvw.Column` objects in the dataset's schema can
+Likewise, ``csvw.Table`` and ``csvw.Column`` objects in the dataset's schema can
be edited "in place", by setting their attributes or adding to/editing their
-`common_props` dictionary.
+``common_props`` dictionary.
Thus, the methods listed below are concerned with adding and removing tables
and columns.
-.. autoclass:: pycldf.Dataset
- :noindex:
- :members: add_table, remove_table, add_component, add_columns, remove_columns, rename_column, add_foreign_key, add_provenance,
+.. automethod:: Dataset.add_table
+.. automethod:: Dataset.remove_table
+.. automethod:: Dataset.add_component
+.. automethod:: Dataset.add_columns
+.. automethod:: Dataset.remove_columns
+.. automethod:: Dataset.rename_column
+.. automethod:: Dataset.add_foreign_key
+.. automethod:: Dataset.add_provenance
Adding data
~~~~~~~~~~~
-The main method to persist data as CLDF dataset is :meth:`~pycldf.Dataset.write`,
+The main method to persist data as CLDF dataset is :meth:`.Dataset.write`,
which accepts data for all CLDF data files as input. This does not include
-sources, though. These must be added using :meth:`~pycldf.Dataset.add_sources`.
+sources, though. These must be added using :meth:`.Dataset.add_sources`.
+
+.. automethod:: Dataset.add_sources
-.. autoclass:: pycldf.Dataset
- :noindex:
- :members: add_sources
Reading data
@@ -105,30 +123,31 @@ Reading data
Reading rows from CLDF data files, honoring the datatypes specified in the schema,
is already implemented by `csvw`. Thus, the simplest way to read data is iterating
-over the `csvw.Table` objects. However, this will ignore the semantic layer provided
+over the ``csvw.Table`` objects. However, this will ignore the semantic layer provided
by CLDF. E.g. a CLDF languageReference linking a value to a language will be appear
-in the `dict` returned for a row under the local column name. Thus, we provide several
+in the ``dict`` returned for a row under the local column name. Thus, we provide several
more convenient methods to read data.
-.. autoclass:: pycldf.Dataset
- :noindex:
- :members: iter_rows, get_row, get_row_url, objects, get_object
+.. automethod:: Dataset.iter_rows
+.. automethod:: Dataset.get_row
+.. automethod:: Dataset.get_row_url
+.. automethod:: Dataset.objects
+.. automethod:: Dataset.get_object
Writing (meta)data
~~~~~~~~~~~~~~~~~~
-.. autoclass:: pycldf.Dataset
- :noindex:
- :members: write, write_metadata, write_sources
+.. automethod:: Dataset.write
+.. automethod:: Dataset.write_metadata
+.. automethod:: Dataset.write_sources
Reporting
~~~~~~~~~
-.. autoclass:: pycldf.Dataset
- :noindex:
- :members: validate, stats
+.. automethod:: Dataset.validate
+.. automethod:: Dataset.stats
Dataset discovery
@@ -147,7 +166,7 @@ Sources
~~~~~~~
When constructing sources for a CLDF dataset in Python code, you may pass
-:class:`pycldf.Source` instances into :meth:`pycldf.Dataset.add_sources`,
+:class:`pycldf.Source` instances into :meth:`Dataset.add_sources`,
or use :meth:`pycldf.Reference.__str__` to format a row's `source` value
properly.
@@ -169,8 +188,19 @@ in its `sources` attribute.
Subclasses supporting specific CLDF modules
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. note::
+
+ Most functionality provided through properties and methods described below is implemented via
+ the :mod:`pycldf.orm` module, and thus subject to the limitations listed at `<./orm.html>`_
+
.. autoclass:: pycldf.Generic
:members:
.. autoclass:: pycldf.Wordlist
:members:
+
+.. autoclass:: pycldf.StructureDataset
+ :members:
+
+.. autoclass:: pycldf.TextCorpus
+ :members:
diff --git a/setup.cfg b/setup.cfg
index 6df6680..79ed27c 100644
--- a/setup.cfg
+++ b/setup.cfg
@@ -40,6 +40,8 @@ install_requires =
clldutils>=3.9
uritemplate>=3.0
python-dateutil
+ # pybtex requires setuptools, but doesn't seem to declare this.
+ setuptools
pybtex
requests
newick
diff --git a/src/pycldf/components/ExampleTable-metadata.json b/src/pycldf/components/ExampleTable-metadata.json
index f62f56e..f3faec9 100644
--- a/src/pycldf/components/ExampleTable-metadata.json
+++ b/src/pycldf/components/ExampleTable-metadata.json
@@ -61,6 +61,14 @@
"dc:description": "References the language of the translated text",
"datatype": "string"
},
+ {
+ "name": "LGR_Conformance",
+ "required": false,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#lgrConformance",
+ "dc:extent": "singlevalued",
+ "dc:description": "The level of conformance of the example with the Leipzig Glossing Rules",
+ "datatype": {"base": "string", "format": "WORD_ALIGNED|MORPHEME_ALIGNED"}
+ },
{
"name": "Comment",
"required": false,
diff --git a/src/pycldf/components/ParameterNetwork-metadata.json b/src/pycldf/components/ParameterNetwork-metadata.json
new file mode 100644
index 0000000..f069e77
--- /dev/null
+++ b/src/pycldf/components/ParameterNetwork-metadata.json
@@ -0,0 +1,45 @@
+{
+ "url": "parameter_network.csv",
+ "dc:conformsTo": "http://cldf.clld.org/v1.0/terms.rdf#ParameterNetwork",
+ "dc:description": "Rows in this table describe edges in a network of parameters.",
+ "tableSchema": {
+ "columns": [
+ {
+ "name": "ID",
+ "required": true,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#id",
+ "datatype": {
+ "base": "string",
+ "format": "[a-zA-Z0-9_\\-]+"
+ }
+ },
+ {
+ "name": "Description",
+ "required": false,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#description",
+ "datatype": "string"
+ },
+ {
+ "name": "Target_Parameter_ID",
+ "required": true,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#targetParameterReference",
+ "dc:description": "References the target node of the edge.",
+ "datatype": "string"
+ },
+ {
+ "name": "Source_Parameter_ID",
+ "required": true,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#sourceParameterReference",
+ "dc:description": "References the source node of the edge.",
+ "datatype": "string"
+ },
+ {
+ "name": "Edge_Is_Directed",
+ "required": false,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#edgeIsDirected",
+ "dc:description": "Flag signaling whether the edge is directed or undirected.",
+ "datatype": {"base": "boolean", "format": "Yes|No"}
+ }
+ ]
+ }
+}
diff --git a/src/pycldf/dataset.py b/src/pycldf/dataset.py
index 2b73c6f..04bd453 100644
--- a/src/pycldf/dataset.py
+++ b/src/pycldf/dataset.py
@@ -30,13 +30,16 @@
__all__ = [
'Dataset', 'Generic', 'Wordlist', 'ParallelText', 'Dictionary', 'StructureDataset',
- 'iter_datasets', 'sniff', 'SchemaError', 'ComponentWithValidation']
+ 'TextCorpus', 'iter_datasets', 'sniff', 'SchemaError', 'ComponentWithValidation']
MD_SUFFIX = '-metadata.json'
ORM_CLASSES = {cls.component_name(): cls for cls in orm.Object.__subclasses__()}
TableType = typing.Union[str, Table]
ColType = typing.Union[str, Column]
PathType = typing.Union[str, pathlib.Path]
+TableSpecType = typing.Union[str, Link, Table]
+ColSPecType = typing.Union[str, Column]
+SchemaObjectType = typing.Union[TableSpecType, typing.Tuple[TableSpecType, ColSPecType]]
class SchemaError(KeyError):
@@ -135,8 +138,8 @@ class Dataset:
def __init__(self, tablegroup: csvw.TableGroup):
"""
- A :class:`~pycldf.dataset.Dataset` is initialized passing a `TableGroup`. For convenience \
- methods to get such a `TableGroup` instance, see the factory methods
+ A :class:`~pycldf.dataset.Dataset` is initialized passing a `TableGroup`. The following \
+ factory methods obviate the need to instantiate such a `TableGroup` instance yourself:
- :meth:`~pycldf.dataset.Dataset.in_dir`
- :meth:`~pycldf.dataset.Dataset.from_metadata`
@@ -320,7 +323,7 @@ def bibname(self) -> str:
# Accessing schema objects (components, tables, columns, foreign keys)
#
@property
- def tables(self) -> list:
+ def tables(self) -> typing.List[Table]:
"""
:return: All tables defined in the dataset.
"""
@@ -329,7 +332,7 @@ def tables(self) -> list:
@property
def components(self) -> typing.Dict[str, csvw.Table]:
"""
- :return: Mapping of component name to table obejcts as defined in the dataset.
+ :return: Mapping of component name to table objects as defined in the dataset.
"""
res = collections.OrderedDict()
for table in self.tables:
@@ -362,27 +365,29 @@ def primary_table(self) -> typing.Union[str, None]:
except ValueError:
return None
- def __getitem__(self, item) -> typing.Union[csvw.Table, csvw.Column]:
+ def __getitem__(self, item: SchemaObjectType) -> typing.Union[csvw.Table, csvw.Column]:
"""
Access to tables and columns.
- If a pair (table-spec, column-spec) is passed as `item`, a Column will be
- returned, otherwise `item` is assumed to be a table-spec.
+ If a pair (table-spec, column-spec) is passed as ``item``, a :class:`csvw.Column` will be
+ returned, otherwise ``item`` is assumed to be a table-spec, and a :class:`csvw.Table` is
+ returned.
A table-spec may be
- - a CLDF ontology URI matching the dc:conformsTo property of a table
+ - a CLDF ontology URI matching the `dc:conformsTo` property of a table
- the local name of a CLDF ontology URI, where the complete URI matches the \
- the dc:conformsTo property of a table
- - a filename matching the `url` property of a table
+ the `dc:conformsTo` property of a table
+ - a filename matching the `url` property of a table.
A column-spec may be
- - a CLDF ontology URI matching the propertyUrl of a column
+ - a CLDF ontology URI matching the `propertyUrl` of a column
- the local name of a CLDF ontology URI, where the complete URI matches the \
- propertyUrl of a column
- - the name of a column
+ `propertyUrl` of a column
+ - the name of a column.
+ :param item: A schema object spec.
:raises SchemaError: If no matching table or column is found.
"""
if isinstance(item, tuple):
@@ -424,14 +429,19 @@ def __getitem__(self, item) -> typing.Union[csvw.Table, csvw.Column]:
raise SchemaError('Dataset has no column "{}" in table "{}"'.format(column, t.url))
- def __delitem__(self, key):
- thing = self[key]
+ def __delitem__(self, item: SchemaObjectType):
+ """
+ Remove a table or column from the datasets' schema.
+
+ :param item: See :meth:`~pycldf.dataset.Dataset.__getitem__`
+ """
+ thing = self[item]
if isinstance(thing, Column):
- self.remove_columns(self[key[0]], thing)
+ self.remove_columns(self[item[0]], thing)
else:
self.remove_table(thing)
- def __contains__(self, item) -> bool:
+ def __contains__(self, item: SchemaObjectType) -> bool:
"""
Check whether a dataset specifies a table or column.
@@ -439,7 +449,9 @@ def __contains__(self, item) -> bool:
"""
return bool(self.get(item))
- def get(self, item, default=None) -> typing.Union[csvw.Table, csvw.Column, None]:
+ def get(self,
+ item: SchemaObjectType,
+ default=None) -> typing.Union[csvw.Table, csvw.Column, None]:
"""
Acts like `dict.get`.
@@ -1189,10 +1201,67 @@ def primary_table(self):
class StructureDataset(Dataset):
+ """
+ Parameters in StructureDataset are often called "features".
+
+ .. seealso:: ``_
+ """
@property
def primary_table(self):
return 'ValueTable'
+ @functools.cached_property
+ def features(self):
+ """
+ Just an alias for the parameters.
+ """
+ return self.objects('ParameterTable')
+
+
+class TextCorpus(Dataset):
+ """
+ In a `TextCorpus`, contributions and examples have specialized roles:
+
+ - Contributions are understood as individual texts of the corpus.
+ - Examples are interpreted as the sentences of the corpus.
+ - Alternative translations are provided by linking "light-weight" examples to "full", main
+ examples.
+ - The order of sentences may be defined using a `position` property.
+
+ .. seealso:: ``_
+
+ .. code-block:: python
+
+ >>> crp = TextCorpus.from_metadata('tests/data/textcorpus/metadata.json')
+ >>> crp.texts[0].sentences[0].cldf.primaryText
+ 'first line'
+ >>> crp.texts[0].sentences[0].alternative_translations
+ []
+ """
+ @property
+ def primary_table(self):
+ return 'ExampleTable'
+
+ @functools.cached_property
+ def texts(self) -> typing.Union[None, DictTuple]:
+ # Some syntactic sugar to access the ORM data in a concise and meaningful way.
+ if 'ContributionTable' in self:
+ return self.objects('ContributionTable')
+
+ def get_text(self, tid):
+ if 'ContributionTable' in self:
+ return self.get_object('ContributionTable', tid)
+
+ @property
+ def sentences(self) -> typing.List[orm.Example]:
+ res = list(self.objects('ExampleTable'))
+ if ('ExampleTable', 'exampleReference') in self:
+ # Filter out alternative translations!
+ res = [e for e in res if not e.cldf.exampleReference]
+ if ('ExampleTable', 'position') in self:
+ return sorted(res, key=lambda o: o.cldf.position)
+ return res # pragma: no cover
+
class ComponentWithValidation:
def __init__(self, ds: Dataset):
diff --git a/src/pycldf/db.py b/src/pycldf/db.py
index 68b5582..0315324 100644
--- a/src/pycldf/db.py
+++ b/src/pycldf/db.py
@@ -3,10 +3,10 @@
To make the resulting SQLite database useful without access to the datasets metadata, we
use terms of the CLDF ontology for database objects as much as possible, i.e.
-- table names are component names (e.g. "ValueTable" for a table with propertyUrl \
- http://cldf.clld.org/v1.0/terms.rdf#ValueTable)
+- table names are component names (e.g. ``ValueTable`` for a table with `propertyUrl` \
+ ``http://cldf.clld.org/v1.0/terms.rdf#ValueTable``)
- column names are property names, prefixed with "cldf" + UNDERSCORE (e.g. a column with \
- propertyUrl http://cldf.clld.org/v1.0/terms.rdf#id will be "cldf_id" in the database)
+ `propertyUrl` ``http://cldf.clld.org/v1.0/terms.rdf#id`` will be ``cldf_id`` in the database)
This naming scheme also extends to automatically created association tables. I.e. when a
table specifies a list-valued foreign key, an association table is created to implement this
@@ -14,7 +14,7 @@
- the url properties of the tables in this relationship or of
- the component names of the tables in the relationship.
-E.g. a list-valued foreign key from the FormTable to the ParameterTable will result in an
+E.g. a list-valued foreign key from `FormTable` to `ParameterTable` will result in an
association table
.. code-block:: sql
@@ -50,6 +50,7 @@
from pycldf.terms import TERMS
from pycldf.sources import Reference, Sources, Source
+from pycldf import Dataset
__all__ = ['Database']
@@ -97,9 +98,9 @@ def translate(d: typing.Dict[str, TableTranslation], table: str, col=None) -> st
"""
Translate a db object name.
- :param d: `dict` mapping table urls to `TableTranslation` instances.
+ :param d: ``dict`` mapping table urls to `TableTranslation` instances.
:param table: The table name of the object to be translated.
- :param col: Column name to be translated or `None` - so `table` will be translated.
+ :param col: Column name to be translated or `None` - so ``table`` will be translated.
:return: Translated name.
"""
if col:
@@ -121,16 +122,16 @@ def clean_bibtex_key(s):
class Database(csvw.db.Database):
"""
- Extend the functionality provided by `csvw.db.Database` by
+ Extend the functionality provided by ``csvw.db.Database`` by
- providing consistent naming of schema objects according to CLDF semantics,
- integrating sources into the DB schema.
"""
source_table_name = 'SourceTable'
- def __init__(self, dataset, **kw):
+ def __init__(self, dataset: Dataset, **kw):
"""
- :param dataset: a `pycldf.Dataset` instance.
+ :param dataset: The :class:`Dataset` instance from which to derive the database schema.
"""
self.dataset = dataset
self._retranslate = collections.defaultdict(dict)
@@ -269,9 +270,9 @@ def retranslate(self, table, item):
@staticmethod
def round_geocoordinates(item, precision=4):
"""
- We round geo coordinates to `precision` decimal places.
+ We round geo coordinates to ``precision`` decimal places.
- See https://en.wikipedia.org/wiki/Decimal_degrees
+ .. seealso:: ``_
:param item:
:param precision:
diff --git a/src/pycldf/media.py b/src/pycldf/media.py
index f0e9310..9c85be7 100644
--- a/src/pycldf/media.py
+++ b/src/pycldf/media.py
@@ -1,7 +1,8 @@
"""
Accessing media associated with a CLDF dataset.
-You can iterate over the `File` objects associated with media using the `Media` wrapper:
+You can iterate over the :class:`.File` objects associated with media using the :class:`.Media`
+wrapper:
.. code-block:: python
@@ -11,7 +12,7 @@
if f.mimetype.type == 'audio':
f.save(directory)
-or instantiate a `File` from a `pycldf.orm.Object`:
+or instantiate a :class:`.File` from a :class:`pycldf.orm.Object`:
.. code-block:: python
@@ -21,6 +22,7 @@
"""
import io
+import json
import base64
import typing
import logging
@@ -28,6 +30,7 @@
import zipfile
import functools
import mimetypes
+import collections
import urllib.parse
import urllib.request
@@ -71,7 +74,7 @@ def __init__(self, media: 'MediaTable', row: dict):
if self.url:
self.url = anyURI.to_string(self.url)
self.parsed_url = urllib.parse.urlparse(self.url)
- self.scheme = self.parsed_url.scheme
+ self.scheme = self.parsed_url.scheme or 'file'
@classmethod
def from_dataset(
@@ -127,10 +130,14 @@ def local_path(self, d: pathlib.Path) -> pathlib.Path:
return d.joinpath('{}{}'.format(
self.id, '.zip' if self.path_in_zip else (self.mimetype.extension or '')))
+ def read_json(self, d=None):
+ assert self.mimetype.subtype.endswith('json')
+ return json.loads(self.read(d=d))
+
def read(self, d=None) -> typing.Union[None, str, bytes]:
"""
:param d: A local directory where the file has been saved before. If `None`, the content \
- will read from the file's URL.
+ will be read from the file's URL.
"""
if self.path_in_zip:
zipcontent = None
@@ -148,7 +155,7 @@ def read(self, d=None) -> typing.Union[None, str, bytes]:
return self.mimetype.read(self.local_path(d).read_bytes())
if self.url:
try:
- return self.url_reader[self.scheme or 'file'](self.parsed_url, self.mimetype)
+ return self.url_reader[self.scheme](self.parsed_url, self.mimetype)
except KeyError:
raise ValueError('Unsupported URL scheme: {}'.format(self.scheme))
@@ -206,13 +213,20 @@ def __iter__(self) -> typing.Generator[File, None, None]:
yield File(self, row)
def validate(self, success: bool = True, log: logging.Logger = None) -> bool:
+ speaker_area_files = collections.defaultdict(list)
+ if ('LanguageTable', 'speakerArea') in self.ds:
+ for lg in self.ds.iter_rows('LanguageTable', 'id', 'speakerArea'):
+ if lg['speakerArea']:
+ speaker_area_files[lg['speakerArea']].append(lg['id'])
+
for file in self:
+ content = None
if not file.url:
success = False
log_or_raise('File without URL: {}'.format(file.id), log=log)
elif file.scheme == 'file':
try:
- file.read()
+ content = file.read()
except FileNotFoundError:
success = False
log_or_raise('Non-existing local file referenced: {}'.format(file.id), log=log)
@@ -221,10 +235,22 @@ def validate(self, success: bool = True, log: logging.Logger = None) -> bool:
log_or_raise('Error reading {}: {}'.format(file.id, e), log=log)
elif file.scheme == 'data':
try:
- file.read()
+ content = file.read()
except Exception as e: # pragma: no cover
success = False
log_or_raise('Error reading {}: {}'.format(file.id, e), log=log)
+ if file.id in speaker_area_files and file.mimetype.subtype == 'geo+json' and content:
+ content = json.loads(content)
+ if content['type'] != 'Feature':
+ assert content['type'] == 'FeatureCollection'
+ for feature in content['features']:
+ lid = feature['properties'].get('cldf:languageReference')
+ if lid and lid in speaker_area_files[file.id]:
+ speaker_area_files[file.id].remove(lid)
+ if speaker_area_files[file.id]:
+ log_or_raise(
+ 'Error: Not all language IDs found in speakerArea GeoJSON: {}'.format(
+ speaker_area_files[file.id])) # pragma: no cover
return success
diff --git a/src/pycldf/modules/TextCorpus-metadata.json b/src/pycldf/modules/TextCorpus-metadata.json
new file mode 100644
index 0000000..9ab0342
--- /dev/null
+++ b/src/pycldf/modules/TextCorpus-metadata.json
@@ -0,0 +1,97 @@
+{
+ "@context": [
+ "http://www.w3.org/ns/csvw",
+ {
+ "@language": "en"
+ }
+ ],
+ "dc:conformsTo": "http://cldf.clld.org/v1.0/terms.rdf#TextCorpus",
+ "dialect": {
+ "commentPrefix": null
+ },
+ "tables": [
+ {
+ "url": "examples.csv",
+ "dc:conformsTo": "http://cldf.clld.org/v1.0/terms.rdf#ExampleTable",
+ "tableSchema": {
+ "columns": [
+ {
+ "name": "ID",
+ "required": true,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#id",
+ "datatype": {
+ "base": "string",
+ "format": "[a-zA-Z0-9_\\-]+"
+ }
+ },
+ {
+ "name": "Language_ID",
+ "required": true,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#languageReference",
+ "dc:extent": "singlevalued",
+ "datatype": "string"
+ },
+ {
+ "name": "Primary_Text",
+ "required": true,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#primaryText",
+ "dc:description": "The example text in the source language.",
+ "dc:extent": "singlevalued",
+ "datatype": "string"
+ },
+ {
+ "name": "Analyzed_Word",
+ "required": false,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#analyzedWord",
+ "dc:description": "The sequence of words of the primary text to be aligned with glosses",
+ "dc:extent": "multivalued",
+ "datatype": "string",
+ "separator": "\t"
+ },
+ {
+ "name": "Gloss",
+ "required": false,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#gloss",
+ "dc:description": "The sequence of glosses aligned with the words of the primary text",
+ "dc:extent": "multivalued",
+ "datatype": "string",
+ "separator": "\t"
+ },
+ {
+ "name": "Translated_Text",
+ "required": false,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#translatedText",
+ "dc:extent": "singlevalued",
+ "dc:description": "The translation of the example text in a meta language",
+ "datatype": "string"
+ },
+ {
+ "name": "Meta_Language_ID",
+ "required": false,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#metaLanguageReference",
+ "dc:extent": "singlevalued",
+ "dc:description": "References the language of the translated text",
+ "datatype": "string"
+ },
+ {
+ "name": "LGR_Conformance",
+ "required": false,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#lgrConformance",
+ "dc:extent": "singlevalued",
+ "dc:description": "The level of conformance of the example with the Leipzig Glossing Rules",
+ "datatype": {
+ "base": "string",
+ "format": "WORD_ALIGNED|MORPHEME_ALIGNED"
+ }
+ },
+ {
+ "name": "Comment",
+ "required": false,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#comment",
+ "datatype": "string"
+ }
+ ]
+ }
+ }
+ ]
+}
\ No newline at end of file
diff --git a/src/pycldf/orm.py b/src/pycldf/orm.py
index 110ecd4..be4cabd 100644
--- a/src/pycldf/orm.py
+++ b/src/pycldf/orm.py
@@ -1,10 +1,10 @@
"""
Object oriented (read-only) access to CLDF data
-To read ORM objects from a `pycldf.Dataset`, use two methods
+To read ORM objects from a `pycldf.Dataset`, there are two generic methods:
-* `pycldf.Dataset.objects`
-* `pycldf.Dataset.get_object`
+* :meth:`pycldf.Dataset.objects`
+* :meth:`pycldf.Dataset.get_object`
Both will return default implementations of the objects, i.e. instances of the corresponding
class defined in this module. To customize these objects,
@@ -25,6 +25,11 @@ def custom_method(self):
2. pass the class into the `objects` or `get_object` method.
+In addition, module-specific subclasses of :class:`pycldf.Dataset` provide more meaningful
+properties and methods, as shortcuts to the methods above. See
+`<./dataset.html#subclasses-supporting-specific-cldf-modules>`_ for details.
+
+
Limitations:
------------
* We only support foreign key constraints for CLDF reference properties targeting either a \
@@ -37,8 +42,8 @@ def custom_method(self):
Reading ~400,000 rows from a ValueTable of a StructureDataset takes
* ~2secs with csvcut, i.e. only making sure it's valid CSV
- * ~15secs iterating over pycldf.Dataset['ValueTable']
- * ~35secs iterating over pycldf.Dataset.objects('ValueTable')
+ * ~15secs iterating over ``pycldf.Dataset['ValueTable']``
+ * ~35secs iterating over ``pycldf.Dataset.objects('ValueTable')``
"""
import types
import typing
@@ -52,6 +57,9 @@ def custom_method(self):
from pycldf.util import DictTuple
from pycldf.sources import Reference
+if typing.TYPE_CHECKING:
+ from pycldf import Dataset # pragma: no cover
+
class Object:
"""
@@ -71,7 +79,7 @@ class Object:
# specified here:
__component__ = None
- def __init__(self, dataset, row: dict):
+ def __init__(self, dataset: 'Dataset', row: dict):
# Get a mapping of column names to pairs (CLDF property name, list-valued) for columns
# present in the component specified by class name.
cldf_cols = {
@@ -113,7 +121,7 @@ def component(self) -> str:
return self.__class__.component_name()
@property
- def key(self):
+ def key(self) -> typing.Tuple[int, str, str]:
return id(self.dataset), self.__class__.__name__, self.id
def __hash__(self):
@@ -141,13 +149,13 @@ def aboutUrl(self, col='id') -> typing.Union[str, None]:
"""
return self._expand_uritemplate('aboutUrl', col)
- def valueUrl(self, col='id'):
+ def valueUrl(self, col='id') -> typing.Union[str, None]:
"""
The table's `valueUrl` property, expanded with the object's row as context.
"""
return self._expand_uritemplate('valueUrl', col)
- def propertyUrl(self, col='id'):
+ def propertyUrl(self, col='id') -> typing.Union[str, None]:
"""
The table's `propertyUrl` property, expanded with the object's row as context.
"""
@@ -168,7 +176,7 @@ def references(self) -> typing.Tuple[Reference]:
multi=True,
)
- def related(self, relation: str):
+ def related(self, relation: str) -> typing.Union[None, 'Object']:
"""
The CLDF ontology specifies several "reference properties". This method returns the first
related object specified by such a property.
@@ -253,6 +261,22 @@ def cognateset(self):
return self.related('cognatesetReference')
+class Contribution(Object):
+ @property
+ def sentences(self):
+ res = []
+ if self.dataset.module == 'TextCorpus':
+ # Return the list of lines, ordered by position.
+ for e in self.dataset.objects('ExampleTable'):
+ if e.cldf.contributionReference == self.id:
+ if not getattr(e.cldf, 'exampleReference', None):
+ # Not just an alternative translation line.
+ res.append(e)
+ if res and hasattr(res[0].cldf, 'position'):
+ return sorted(res, key=lambda e: getattr(e.cldf, 'position'))
+ return res
+
+
class Entry(Object, _WithLanguageMixin):
@property
def senses(self):
@@ -272,6 +296,25 @@ def igt(self):
self.cldf.translatedText,
)
+ @property
+ def text(self):
+ """
+ Examples in a TextCorpus are interpreted as lines of text.
+ """
+ if self.dataset.module == 'TextCorpus' and hasattr(self.cldf, 'contributionReference'):
+ return self.related('contributionReference')
+
+ @property
+ def alternative_translations(self):
+ res = []
+ if hasattr(self.cldf, 'exampleReference'):
+ # There's a self-referential foreign key. We assume this to link together full examples
+ # and alternative translations.
+ for ex in self.dataset.objects('ExampleTable'):
+ if ex.cldf.exampleReference == self.id:
+ res.append(ex)
+ return res
+
class Form(Object, _WithLanguageMixin, _WithParameterMixin):
pass
@@ -288,6 +331,9 @@ def form(self): # pragma: no cover
class Language(Object):
+ """
+ FIXME: describe usage!
+ """
@property
def lonlat(self):
"""
@@ -305,6 +351,25 @@ def as_geojson_feature(self):
"properties": self.cldf,
}
+ @functools.cached_property
+ def speaker_area(self):
+ from pycldf.media import File
+
+ if getattr(self.cldf, 'speakerArea', None):
+ return File.from_dataset(self.dataset, self.related('speakerArea'))
+
+ @functools.cached_property
+ def speaker_area_as_geojson_feature(self):
+ if self.speaker_area and self.speaker_area.mimetype.subtype == 'geo+json':
+ res = self.speaker_area.read_json()
+ if res['type'] == 'FeatureCollection':
+ for feature in res['features']:
+ if feature['properties']['cldf:languageReference'] == self.id:
+ return feature
+ else:
+ assert res['type'] == 'Feature'
+ return res
+
@property
def values(self):
return DictTuple(v for v in self.dataset.objects('ValueTable') if self in v.languages)
@@ -326,6 +391,18 @@ def glottolog_languoid(self, glottolog_api):
return glottolog_api.languoid(self.cldf.glottocode)
+class Media(Object):
+ @property
+ def downloadUrl(self):
+ if hasattr(self.cldf, 'downloadUrl'):
+ return self.cldf.downloadUrl
+ return self.valueUrl()
+
+
+class ParameterNetworkEdge(Object):
+ __component__ = 'ParameterNetwork'
+
+
class Parameter(Object):
@functools.cached_property
def columnSpec(self):
@@ -375,6 +452,10 @@ def entries(self):
return self.all_related('entryReference')
+class Tree(Object):
+ pass
+
+
class Value(Object, _WithLanguageMixin, _WithParameterMixin):
"""
Value objects correspond to rows in a dataset's ``ValueTable``.
@@ -420,15 +501,3 @@ def code(self):
@property
def examples(self):
return self.all_related('exampleReference')
-
-
-class Contribution(Object):
- pass
-
-
-class Media(Object):
- @property
- def downloadUrl(self):
- if hasattr(self.cldf, 'downloadUrl'):
- return self.cldf.downloadUrl
- return self.valueUrl()
diff --git a/src/pycldf/terms.rdf b/src/pycldf/terms.rdf
index d864c9f..0ad63ed 100644
--- a/src/pycldf/terms.rdf
+++ b/src/pycldf/terms.rdf
@@ -82,6 +82,25 @@
+
+
+
+ A position represents the placement of an item in a series or sequence of items.
+ Although an integer is the recommended datatype, any datatype that supports a total
+ ordering (where the order is transparent, such as alphabetic order for strings) is
+ acceptable. It is also possible to have a list-valued column for this property,
+ which can be useful for implementing multi-level orderings. In such cases, the
+ typical order for tuples is assumed.
+
+
+ "Position"
+ ";"
+ {"base": "string"}
+
+
+
+
+
@@ -92,7 +111,7 @@
An identifier referencing a language either
- - by providing a foreign key into the LanguageTable or
+ - by providing a foreign key to
LanguageTable
or
- by using a known encoding scheme.
@@ -110,7 +129,7 @@
an example - either
- - by providing a foreign key into the LanguageTable or
+ - by providing a foreign key to
LanguageTable
or
- by using a known encoding scheme.
@@ -127,7 +146,7 @@
An identifier referencing a parameter either
- - by providing a foreign key into the ParameterTable or
+ - by providing a foreign key to
ParameterTable
or
- by using a known encoding scheme.
@@ -142,7 +161,7 @@
An identifier referencing a code (aka category) description
- by providing a foreign key into the CodeTable.
+ by providing a foreign key to CodeTable
.
@@ -155,8 +174,8 @@
dc:type="reference-property">
- An identifier referencing an example by providing a foreign key into the
- ExampleTable.
+ An identifier referencing an example by providing a foreign key to
+ ExampleTable
.
@@ -170,8 +189,7 @@
dc:type="reference-property">
- An identifier referencing a dictionary entry
- by providing a foreign key into the EntryTable.
+ An identifier referencing a dictionary entry by providing a foreign key to EntryTable
.
@@ -186,7 +204,7 @@
An identifier referencing a form
- by providing a foreign key into the FormTable.
+ by providing a foreign key to FormTable
.
@@ -200,7 +218,7 @@
An identifier referencing the source form of a loanword
- by providing a foreign key into the FormTable.
+ by providing a foreign key to FormTable
.
@@ -214,7 +232,7 @@
An identifier referencing a loanword
- by providing a foreign key into the FormTable.
+ by providing a foreign key to FormTable
.
@@ -223,6 +241,34 @@
+
+
+
+ An identifier referencing the source parameter of a parameter network edge.
+
+
+
+ "Source_Parameter_ID"
+
+
+
+
+
+
+
+
+ An identifier referencing the target parameter of a parameter network edge.
+
+
+
+ "Target_Parameter_ID"
+
+
+
+
+
@@ -230,7 +276,7 @@
An identifier referencing a cognateset either
- - by providing a foreign key into the CognatesetTable or
+ - by providing a foreign key to
CognatesetTable
or
- by using a known encoding scheme.
@@ -240,12 +286,26 @@
+
+
+
+ An identifier referencing a language tree by providing a foreign key TreeTable
.
+
+
+
+ "Tree_ID"
+
+
+
+
+
An identifier referencing a media resource
- by providing a foreign key into the MediaTable
+ by providing a foreign key to MediaTable
.
@@ -255,12 +315,41 @@
+
+
+
+ An identifier referencing a media resource
+ by providing a foreign key to MediaTable
.
+
+
+ This property can be used in LanguageTable
to point to a media resource describing
+ the speaker area of a language, i.e. the geographic area where the speakers of the
+ language live.
+
+
+ The linked media resource may be an image of a map, depicting the area, or some other
+ multimedia content for human consumption. But it may also be a GeoJSON
+ resource (i.e. a media resource with mediaType
application/geo+json
).
+ In the latter case, the GeoJSON object MUST contain a feature with a geometry of type
+ Polygon
or Multipolygon
and a key cldf:languageReference
+ in its properties
object with the linking language's id
as
+ value.
+
+
+
+ "Media_ID"
+
+
+
+
+
An identifier referencing a contribution
- by providing a foreign key into the ContributionTable
+ by providing a foreign key to ContributionTable
.
@@ -276,7 +365,7 @@
A functional equivalent set is a group of strings from different languages that express similar function.
This is an identifier referencing a cognateset either
- - by providing a foreign key into the FunctionalEquivalentsetTable or
+ - by providing a foreign key to
FunctionalEquivalentsetTable
or
- by using a known encoding scheme.
@@ -289,11 +378,16 @@
-
- A concept set groups a number of concept labels which are used in
- different questionnaires and were judged to denote the same concept despite
- potential differences among the concrete concept labels (be it their spelling,
- or the language in which they were originally created).
+
+
+ An identifier of a Concepticon concept set.
+
+
+ A concept set groups a number of concept labels which are used in
+ different questionnaires and were judged to denote the same concept despite
+ potential differences among the concrete concept labels (be it their spelling,
+ or the language in which they were originally created).
+
"Concepticon_ID"
{"base": "string", "format": "[0-9]+"}
@@ -306,6 +400,14 @@
+
+ An identifier of a sound described in the CLTS dataset.
+
+
+ A sound identifier is the last path component of the sound's URL at
+ https://clts.clld.org/parameters , e.g. short_neutral_tone
for
+ https://clts.clld.org/parameters/short_neutral_tone
.
+
References a sound in the Cross-Linguistic Transcription Systems database. Suitable to
mark parameters as phonemes, and consequently values as elements of phoneme inventories.
@@ -328,8 +430,11 @@
dc:type="reference-property">
- References a taxonomic unit in GBIF's Backbone Taxonomy. Can be used in for example in a
- ParameterTable to mark a lexical concept as biological species. E.g.
+ A numeric identifier for a unit in GBIF's Backbone Taxonomy.
+
+
+ References a taxonomic unit in GBIF's Backbone Taxonomy. Can be used for example in
+ ParameterTable
to mark a lexical concept as biological species. E.g.
5219404.
@@ -356,18 +461,32 @@
- A Glottolog code denoting a languoid.
+ A Glottocode denoting a languoid described in Glottolog.
"Glottocode"
{"base": "string", "format": "[a-z0-9]{4}[1-9][0-9]{3}"}
"http://glottolog.org/resource/languoid/id/{Glottocode}"
-
+
+
+
+
+ A Glottocode denoting the language-level languoid that is
+ a parent languoid of the languoid described by the row in LanguageTable
.
+
+
+ "Parent_Language_Glottocode"
+ {"base": "string", "format": "[a-z0-9]{4}[1-9][0-9]{3}"}
+ "http://glottolog.org/resource/languoid/id/{Glottocode}"
+
+
+
+
- A macroarea as defined by Glottolog.
+ The name of a macroarea as defined by Glottolog.
"Macroarea"
@@ -408,7 +527,7 @@
CSVW column description.
This column specification may be used by CLDF consumers to read a parameter's value as typed data.
Note that a CSVW datatye description is not sufficient, because parsing a string value
- must also be informed by the column properties "null" and "separator".
+ must also be informed by the column properties null
and separator
.
"ColumnSpec"
{"base": "json"}
@@ -422,7 +541,7 @@
-->
- Contributor(s) to a citeable unit of a dataset.
+ Names of contributor(s) to a citeable unit of a dataset.
"Contributor"
@@ -443,15 +562,28 @@
+
+
+
+ Flag signaling whether an edge in a graph is directed or not.
+
+ "Edge_Is_Directed"
+
+ {"base": "boolean", "format": "Yes|No"}
+
+
+
- The type of a tree ("summary" or "sample") describes how the tree can be used.
- Summary (or consensus) trees can be analysed in isolation and should have type "summary".
+
The type of a tree (summary
or sample
) describes how the tree can be used.
+ Summary (or consensus) trees can be analysed in isolation and should have type summary
.
Trees resulting from a method that creates multiple trees, and thus should be analysed as a whole
- (or sampled appropriately) should have type "sample".
+ (or sampled appropriately) should have type sample
.
"Tree_Type"
{"base": "string", "format": "summary|sample"}
@@ -460,7 +592,7 @@
- Whether a tree is rooted or not.
+ Flag signaling whether a tree is rooted or not.
"Tree_Is_Rooted"
@@ -563,6 +695,56 @@
+
+
+
+ The level of conformance of the example with the Leipzig Glossing Rules.
+
+
+ The following levels are distinguished:
+
+
+ WORD_ALIGNED
: Analyzed text and glosses obey LGR rule 1, "word-by-word alignment".
+ MORPHEME_ALIGNED
: Analyzed text and glosses obey LGR rule 2, "morpheme-by-morpheme correspondence".
+
+
+ No information regarding LGR conformance should be signaled with an empty string, i.e.
+ null
value for the property.
+
+
+ While more information is needed to assess how to interpret IGT - e.g. whether rule 4a is
+ followed to group gloss elements for unsegmentable morpheme - the two levels considered here
+ are essential for decisions about automated re-use.
+
+
+
+ "LGR_Conformance"
+ {"base": "string", "format": "WORD_ALIGNED|MORPHEME_ALIGNED"}
+
+
+
+
+
+
+
+ A judgement about the (un)grammaticality of the example.
+
+
+ A non-null
value for this property flags an example as ungrammatical
+ or unacceptable. The actual string value is the typographical symbol(s) or text which is to be
+ used to mark the example when formatting it in text (e.g. *
).
+
+
+ Note: Ungrammatical examples should link (via languageReference
)
+ to special item(s) in LanguageTable
with an empty Glottocode
to
+ prevent data aggregators from inadvertently assigning such an example to a proper language
+ (if they fail to honour grammaticalityJudgement
).
+
+
+ "Grammaticality_Judgement"
+
+
+
@@ -579,13 +761,13 @@
- The part-of-speech of dictionary entry.
+ The part-of-speech of a dictionary entry.
"Part_Of_Speech"
-
+
@@ -597,8 +779,8 @@
For features with a limited, discrete set of valid values (a.k.a. categorical variables)
- it is recommended to relate items of a ValueTable to the respective code
- in the CodeTable.
+ it is recommended to relate items of ValueTable
to the respective code
+ in CodeTable
.
"Value"
@@ -734,7 +916,7 @@
A generic CLDF dataset; i.e. a set of cross-linguistic data which does
- not fit any of the established CLDF modules.
+ not fit any of the other CLDF modules.
@@ -780,13 +962,24 @@
+
+ "TextCorpus"
+
+ A dataset according to the
+ CLDF Text Corpus
+ specification
+
+
+
+
+
"ValueTable"
- The table of value assignments of a CLDF Structure Dataset
+ The table of value assignments of a Structure Dataset
"values.csv"
@@ -804,7 +997,7 @@
"ExampleTable"
- The table of examples provided with a CLDF dataset
+ The table of text examples provided with a CLDF dataset
"examples.csv"
@@ -931,6 +1124,16 @@
as tree structure with items of the LanguageTable as leaf nodes.
"trees.csv"
+
+
+
+
+ "ParameterNetwork"
+
+ A table listing edges of a parameter network, i.e. a graph with parameters as nodes.
+
+ "parameter_network.csv"
+
diff --git a/src/pycldf/validators.py b/src/pycldf/validators.py
index c85444b..80cffd8 100644
--- a/src/pycldf/validators.py
+++ b/src/pycldf/validators.py
@@ -28,6 +28,15 @@ def valid_igt(dataset, table, column, row):
raise ValueError('number of words and word glosses does not match')
+def valid_grammaticalityJudgement(dataset, table, column, row):
+ lid_name = dataset.readonly_column_names.ExampleTable.languageReference[0]
+ gc_name = dataset.readonly_column_names.LanguageTable.glottocode[0]
+ if row[column.name] is not None:
+ lg = dataset.get_row('LanguageTable', row[lid_name])
+ if lg[gc_name]:
+ raise ValueError('Glottolog language linked from ungrammatical example')
+
+
VALIDATORS = [
(
None,
@@ -44,5 +53,9 @@ def valid_igt(dataset, table, column, row):
(
None,
'http://cldf.clld.org/v1.0/terms.rdf#source',
- valid_references)
+ valid_references),
+ (
+ None,
+ 'http://cldf.clld.org/v1.0/terms.rdf#grammaticalityJudgement',
+ valid_grammaticalityJudgement),
]
diff --git a/tests/conftest.py b/tests/conftest.py
index 17811c1..2ee225b 100644
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -57,11 +57,22 @@ def dictionary(data):
return Dataset.from_metadata(data / 'dictionary' / 'metadata.json')
+@pytest.fixture(scope='module')
+def textcorpus(data):
+ return Dataset.from_metadata(data / 'textcorpus' / 'metadata.json')
+
+
@pytest.fixture(scope='module')
def structuredataset_with_examples(data):
return Dataset.from_metadata(data / 'structuredataset_with_examples' / 'metadata.json')
+@pytest.fixture
+def dataset_with_media(data):
+ dsdir = data / 'dataset_with_media'
+ return Dataset.from_metadata(dsdir / 'metadata.json')
+
+
@pytest.fixture(scope='module')
def wordlist_with_borrowings(data):
return Dataset.from_metadata(data / 'wordlist_with_borrowings' / 'metadata.json')
diff --git a/tests/data/dataset_with_media/erzya.geojson b/tests/data/dataset_with_media/erzya.geojson
new file mode 100644
index 0000000..9a6b631
--- /dev/null
+++ b/tests/data/dataset_with_media/erzya.geojson
@@ -0,0 +1,33 @@
+{
+ "id": 72,
+ "type": "Feature",
+ "properties": {"Branch": "Mordvin", "Glottocode": "erzy1239", "ISO_639_3": "myv", "Sources": "Ermu\u0161kin 1984, Feoktistov 1990", "Timeperiod": "traditional", "Language": "Erzya", "Dialect": ""},
+ "geometry": {
+ "type": "MultiPolygon",
+ "coordinates": [
+ [[[42.62555948081459, 54.688448862774045], [42.62413340244894, 54.701635598037065], [42.58551044671131, 54.7207212976511], [42.521336920254946, 54.728956953718765], [42.4314939832161, 54.745423245895836], [42.3459292812743, 54.77422315505452], [42.32881634088591, 54.80300257682697], [42.31740771396039, 54.85885834065671], [42.3459292812743, 54.834225617964435], [42.380155162051004, 54.84654385911282], [42.41321303273674, 54.837247777224704], [42.448606923604416, 54.84490164419394], [42.477330225489844, 54.84012749479537], [42.525733995215866, 54.842301333877195], [42.57136850291816, 54.82751743914127], [42.62413340244894, 54.824231393848926], [42.64552457793444, 54.83655268487905], [42.69971555583086, 54.82998179762], [42.77529770921275, 54.83408872749744], [42.81237574672085, 54.833267374946516], [42.83376692220636, 54.81683681415517], [42.86086241115455, 54.79628921051109], [42.90079260539405, 54.78724495454134], [42.92360985924523, 54.77244089848784], [42.92931417270797, 54.76503683846403], [42.917905545782375, 54.72717270358116], [42.89794044866266, 54.68267831839815], [42.853732019326074, 54.62988100747758], [42.80952358998946, 54.59519548030617], [42.732515358241855, 54.5604803864136], [42.678324380345366, 54.54642063031613], [42.65693320485989, 54.55799960514113], [42.64124634283729, 54.58693266401557], [42.6298377159117, 54.62657889744592], [42.63126379427738, 54.65628823827317], [42.62555948081459, 54.688448862774045]]],
+ [[[43.144913453628526, 54.18825965194049], [43.18912188296511, 54.178245249577344], [43.2190695286447, 54.17407186605909], [43.23475639066741, 54.15904419830698], [43.27611266327259, 54.13064370922123], [43.318895014243495, 54.0796410531189], [43.29892991712371, 54.057046075378814], [43.2704083498098, 54.049511683704694], [43.227625998838896, 54.061231258150436], [43.20480874498775, 54.06039425533075], [43.172008942576724, 54.05286047091207], [43.12922659160582, 54.05034890581001], [43.09927894592623, 54.05369762554198], [43.06220090841813, 54.06457910076081], [43.072183456977996, 54.0946975417204], [43.080739927172196, 54.12228691669576], [43.090722475732036, 54.152363482720666], [43.10070502429191, 54.172402394737084], [43.1149658079489, 54.18241821198793], [43.13350482670294, 54.18742521104704], [43.144913453628526, 54.18825965194049]]],
+ [[[43.7088086073977, 55.202397503054556], [43.71451292086049, 55.219755494667844], [43.7487388016372, 55.221924711423654], [43.83430350357901, 55.219755494667844], [43.92176964334168, 55.215416706379045], [44.00733434528348, 55.187203050507556], [44.07198323119504, 55.152451102754746], [44.15944937095779, 55.111143808605966], [44.216492505585606, 55.07741409021822], [44.233605445973986, 55.039297939674185], [44.237408321615845, 55.02404131622162], [44.178463749167044, 55.01095958675255], [44.11761773889734, 55.01423041928617], [44.01113722092536, 55.007688487422826], [43.91036101641609, 55.00332660661457], [43.788668995876705, 55.0022360622963], [43.68599135354652, 54.995692173779155], [43.632751094560554, 55.00441712128593], [43.61753959199311, 55.01859111433407], [43.61753959199311, 55.063260899316276], [43.62324390545591, 55.076325560950515], [43.64986403494888, 55.111143808605966], [43.67838560226285, 55.14267165862767], [43.699301418293054, 55.152451102754746], [43.703104293934906, 55.169830863337914], [43.703104293934906, 55.19154491415376], [43.7088086073977, 55.202397503054556]]],
+ [[[44.15184361967411, 55.215416706379045], [44.18416806262984, 55.23168472347194], [44.25071838636238, 55.24035827965444], [44.37241040690183, 55.24361037552815], [44.45036935755986, 55.24144234116832], [44.52535731162274, 55.23678970173334], [44.61947848375877, 55.203437387543396], [44.72073004772321, 55.16191133888808], [44.80772082803069, 55.12523476066953], [44.896137686703824, 55.11626424717518], [45.005945720862485, 55.09994906026239], [45.06156277712462, 55.0754637876076], [45.09875965449657, 55.04365590809782], [45.08164671410818, 54.99460142192121], [45.05692802243609, 54.97059737882558], [44.975166196136136, 54.955314604831294], [44.790726727506076, 54.95749821421207], [44.684246209534074, 54.96186507700746], [44.57586425374115, 55.00441712128593], [44.530229746038856, 55.02513126768026], [44.4712851735901, 55.048013402414874], [44.37241040690183, 55.09700255260978], [44.296352894064675, 55.13941131124101], [44.22980257033214, 55.17634632094446], [44.17466087352523, 55.19697157826973], [44.149942181853135, 55.20565270306831], [44.15184361967411, 55.215416706379045]]],
+ [[[45.14306769162866, 53.246498041279835], [45.15637775637512, 53.299939830001044], [45.169687821121656, 53.36693178742148], [45.219125204465804, 53.36693178742148], [45.230533831391334, 53.336287508211356], [45.219125204465804, 53.29539419347687], [45.1791950102263, 53.26128649690317], [45.14306769162866, 53.246498041279835]]],
+ [[[45.297084155123834, 53.00689869316492], [45.32560572243778, 53.01948229944466], [45.36363447885637, 53.03206223667269], [45.45110061861908, 53.06292471176731], [45.48722793721673, 53.07549197765959], [45.544271071844584, 53.09033947334335], [45.57659551480035, 53.09262324891298], [45.57659551480035, 53.07777654110758], [45.57659551480035, 53.053782578533], [45.56898976351664, 53.02176983364996], [45.56898976351664, 52.995455866815], [45.59560989300968, 52.88659769647268], [45.61462427121897, 52.83723498210216], [45.69638609751888, 52.83838359002768], [45.808570928953685, 52.85446091152981], [45.86751550140249, 52.85446091152981], [45.87892412832807, 52.832640246620926], [45.9036428200001, 52.7763152832645], [45.92646007385129, 52.739492243476775], [45.918854322567576, 52.69572444220258], [45.87321981486525, 52.69341965710356], [45.85040256101413, 52.73373583119155], [45.791457988565334, 52.779765847301235], [45.7382177295793, 52.80161306743663], [45.69638609751888, 52.79126575182472], [45.616525709039855, 52.76135967784102], [45.530961007098114, 52.739492243476775], [45.46821355900743, 52.73258445751131], [45.37504310578193, 52.71531020309631], [45.27806977691457, 52.725675576839215], [45.23813958267503, 52.83034269660795], [45.213420891002976, 52.94965422790528], [45.297084155123834, 53.00689869316492]]],
+ [[[45.32669451328041, 54.12824478584223], [45.34761032931068, 54.113757768831995], [45.37423045880365, 54.10595496967772], [45.416062090864095, 54.09034496558658], [45.41225921522221, 54.068034767715105], [45.42556927996874, 54.04459615202419], [45.42556927996874, 54.01667579272126], [45.438879344715204, 53.986500743795226], [45.44268222035713, 53.978674011922834], [45.417963528685036, 53.958541376518085], [45.35141320495253, 53.94959041653298], [45.317187324175826, 53.945114215981555], [45.256341313906056, 53.94399509078139], [45.22401687095029, 53.94399509078139], [45.189790990173584, 53.957422611596584], [45.184086676710784, 53.97196421406534], [45.189790990173584, 53.98538272927886], [45.20119961709917, 54.00550239915864], [45.21450968184564, 54.02672928183132], [45.21450968184564, 54.05687517145453], [45.210706806203824, 54.08699920036323], [45.21450968184564, 54.122673455523724], [45.22972118441309, 54.13492939345846], [45.26965137865259, 54.14495428257994], [45.294370070324646, 54.14384052584381], [45.311483010713026, 54.13938519932222], [45.32669451328041, 54.12824478584223]]],
+ [[[45.414195540571036, 54.65921763140383], [45.38540140600159, 54.65280607904625], [45.35117552522485, 54.65060594309415], [45.30363957970165, 54.64400482056412], [45.24279356943193, 54.62749732370909], [45.18872143139929, 54.60286118600975], [45.1050581672784, 54.57972543128489], [45.01759202751569, 54.56760145748109], [44.99560665271119, 54.59556382516861], [45.010818155278635, 54.623094192737604], [45.03835086886171, 54.633978682469], [45.053719346113304, 54.65349359710488], [45.06702941085984, 54.67768682285603], [45.061325097397045, 54.711752824174226], [45.06702941085984, 54.761152593052095], [45.12787542112958, 54.78857088680158], [45.22758206686449, 54.79117465824982], [45.316949644448144, 54.77801704361703], [45.36258415215045, 54.771436630877695], [45.34166833612021, 54.79665573428454], [45.36448558997139, 54.81528583471998], [45.393007157285304, 54.82624070539282], [45.41996732492878, 54.81899243196323], [45.450050291913186, 54.810903054394736], [45.5108963021829, 54.78788565592008], [45.56984087463166, 54.77582369163173], [45.61167250669214, 54.76485514773986], [45.6249825714386, 54.769242922102535], [45.63258832272234, 54.77911367501328], [45.64589838746885, 54.79007835386647], [45.65730701439447, 54.79117465824982], [45.6763213926037, 54.803232044639145], [45.72766021376879, 54.809807285017634], [45.7428717163362, 54.80871148592138], [45.77899903493382, 54.82076364149486], [45.80371772660594, 54.830621821673994], [45.839845045203525, 54.82076364149486], [45.894986742010474, 54.810903054394736], [45.93111406060812, 54.81638145550756], [45.96343850356395, 54.842667440333095], [46.01097444908712, 54.86018192364473], [46.050904643326625, 54.85361488350587], [46.081327648461475, 54.84376231832494], [46.08322908628242, 54.81857260793296], [46.07562333499875, 54.79117465824982], [46.07562333499875, 54.769242922102535], [46.085130524103334, 54.75168896991988], [46.09083483756613, 54.73302955346184], [46.10414490231266, 54.68029772983467], [46.119356404880044, 54.62749732370909], [46.056608956789425, 54.59776693983449], [45.988157195236006, 54.58675017457203], [45.90639536893606, 54.59446222313957], [45.798013413143146, 54.607679480939446], [45.74667459197805, 54.62529581779683], [45.71244871120135, 54.64070385727536], [45.687730019529226, 54.648405688031964], [45.65920845221531, 54.64070385727536], [45.65350413875252, 54.62529581779683], [45.62878544708045, 54.609881940116956], [45.56413656116893, 54.59996993531363], [45.512797740003805, 54.59336059131257], [45.46145891883871, 54.60437556872369], [45.42343016242015, 54.62529581779683], [45.42343016242015, 54.64950583045201], [45.414195540571036, 54.65921763140383]]],
+ [[[45.43662138730284, 55.104898704176634], [45.42901563601917, 55.120124533273774], [45.434719949481966, 55.138605239850804], [45.434719949481966, 55.17771262210125], [45.45563576551218, 55.23955424827528], [45.51648177578185, 55.26014677979171], [45.57732778605159, 55.25906322830891], [45.71423130915845, 55.23304912695652], [45.858740583549015, 55.19616666979387], [45.92529090728155, 55.16250875215417], [45.9214880316397, 55.1505587862591], [45.93289665856529, 55.11359988847581], [45.9614182258792, 55.0831374526055], [45.97662972844658, 55.05809736763755], [45.95761535023735, 55.04175843428952], [45.90437509125131, 55.04066893502264], [45.8359233296979, 55.03631064162323], [45.77507731942819, 54.97087932776642], [45.71993562262125, 54.895501313613], [45.67239967709801, 54.87800217134784], [45.61535654247019, 54.86049542746377], [45.541200467453976, 54.861589821691744], [45.49366452193073, 54.86596710160848], [45.46514295461676, 54.871438033295604], [45.446128576407496, 54.88675269257847], [45.41570557127264, 54.946861098207414], [45.39098687960055, 54.983974129519574], [45.37387393921223, 54.989429036573775], [45.36626818792849, 55.01669245332666], [45.43662138730284, 55.104898704176634]]],
+ [[[45.45183288987029, 55.47626370270599], [45.49176308410979, 55.4913477099847], [45.55451053220048, 55.4978105169423], [45.619159418112034, 55.4978105169423], [45.77317588160729, 55.49673345609218], [45.91958659381876, 55.4999645502743], [46.0526872412838, 55.4999645502743], [46.11543468937441, 55.50319537935865], [46.15726632143486, 55.50857950542422], [46.17818213746509, 55.512886276139845], [46.20860514259995, 55.516116044947864], [46.22381664516736, 55.51719257565216], [46.23712670991386, 55.512886276139845], [46.25804252594407, 55.50211846578518], [46.27895834197431, 55.4999645502743], [46.290366968899896, 55.50319537935865], [46.31508566057196, 55.501041522757156], [46.32269141185566, 55.49457924602195], [46.324592849676606, 55.48165151105687], [46.34741010352772, 55.47626370270599], [46.36452304391611, 55.47303066412304], [46.387340297767224, 55.44931361064656], [46.387340297767224, 55.43205588426267], [46.37783310866257, 55.419107636589054], [46.35501585481146, 55.41479061047995], [46.32649428749748, 55.41479061047995], [46.29226840672077, 55.40831418664397], [46.26564827722777, 55.3996773033324], [46.23712670991386, 55.39319840208278], [46.24092958555575, 55.38455821522198], [46.231422396451066, 55.37483574819556], [46.21430945606268, 55.37159439478916], [46.21811233170457, 55.360787965501586], [46.235225272092954, 55.35322170918683], [46.22952095863016, 55.347816355013826], [46.21621089388362, 55.34457278833409], [46.20290082913716, 55.34241026296729], [46.197196515674364, 55.33375898078453], [46.206703704778974, 55.328350970020686], [46.14775913233021, 55.32726927930199], [46.01655992268615, 55.328350970020686], [45.881557837400194, 55.329432631216754], [45.78648594635378, 55.328350970020686], [45.69711836877013, 55.326187559060436], [45.602046477723654, 55.329432631216754], [45.518383213602796, 55.345654006743004], [45.46134007897497, 55.376996502943776], [45.43852282512379, 55.3996773033324], [45.41950844691453, 55.43205588426267], [45.43281851166099, 55.45578331258055], [45.442325700765636, 55.462251953497045], [45.45183288987029, 55.47626370270599]]],
+ [[[45.82378243152113, 53.48023730966899], [45.89793850653737, 53.454203775885986], [45.901741382179225, 53.437216781276504], [45.92671655422351, 53.422931700764714], [45.9492773277024, 53.41002345934542], [45.97779889501638, 53.39755404005196], [45.92284297579025, 53.38033348827569], [45.89793850653737, 53.351044911157295], [45.86561406358154, 53.3623932842564], [45.78195079946068, 53.388483075917584], [45.793359426386274, 53.434951335417594], [45.82378243152113, 53.48023730966899]]],
+ [[[45.82568386934208, 53.51756188725778], [45.84850112319319, 53.529996112155224], [45.88462844179084, 53.526605321881966], [45.87702269050713, 53.51303944595875], [45.852303998835076, 53.508516521970755], [45.82568386934208, 53.51756188725778]]],
+ [[[45.93977013859775, 53.50060024345099], [45.96258739244893, 53.5119087602148], [45.97779889501638, 53.49833817804499], [45.97779889501638, 53.47684253680264], [45.94357301423963, 53.47910574889259], [45.93977013859775, 53.50060024345099]]],
+ [[[45.9492965597708, 54.090368199123965], [45.92077499245682, 54.07782019541437], [45.89772005887806, 54.052712804386886], [45.904850450706576, 54.04294471921074], [45.90247365343037, 54.027590231586025], [45.8787056806688, 54.00943670702502], [45.8787056806688, 53.9884804875402], [45.84067692425021, 53.985685527840424], [45.82403934331712, 53.96891183041817], [45.78601058689852, 53.96331909714526], [45.75748901958461, 53.95213137911647], [45.721837060442155, 53.93114631539704], [45.726590654994496, 53.91155073082679], [45.70496179978145, 53.9186668578003], [45.659327292079155, 53.95448571084171], [45.59467840616759, 54.02603120137782], [45.54143814718156, 54.09410862898948], [45.59277696834665, 54.12866260541873], [45.55308445383476, 54.16279350943384], [45.51980929196851, 54.19895870469953], [45.47465014372147, 54.30033045396596], [45.44137498185518, 54.38623146017149], [45.42236060364593, 54.463665243954296], [45.42711419819826, 54.502327287291436], [45.47227334644526, 54.51750594814878], [45.58398281842488, 54.52716215993759], [45.74560503320382, 54.52302406305532], [45.82641614059328, 54.51750594814878], [45.85731450518333, 54.499566924857525], [45.866821694288014, 54.45952071404453], [45.8787056806688, 54.41252003660878], [45.89058966704958, 54.36131095006535], [45.885836072497305, 54.33498955569901], [45.885836072497305, 54.30033045396596], [45.89534326160192, 54.24620394033667], [45.91435763981122, 54.203129576728806], [45.91911123436356, 54.151659393115814], [45.94050240984896, 54.12938217370117], [45.95476319350595, 54.104305975962525], [45.9492965597708, 54.090368199123965]]],
+ [[[46.234192026367616, 54.16000940554767], [46.21993124271062, 54.16000940554767], [46.20471974014321, 54.16613230478749], [46.17905032956063, 54.16613230478749], [46.160986670261856, 54.16724546185651], [46.14007085423158, 54.16557571502185], [46.12485935166421, 54.16557571502185], [46.099189941081626, 54.165019117768836], [46.091584189797956, 54.16613230478749], [46.06211190357352, 54.179488213050135], [46.046900401006084, 54.201738478082284], [46.046900401006084, 54.222865138053436], [46.06116118466307, 54.24398099332199], [46.104894254544426, 54.257311757201066], [46.14007085423158, 54.2595331323122], [46.198064707769966, 54.25453487002356], [46.23894562091992, 54.23786961989473], [46.252255685666384, 54.21730653834845], [46.260812155860584, 54.1917273408981], [46.252255685666384, 54.16668888706564], [46.234192026367616, 54.16000940554767]]],
+ [[[46.33124788204985, 55.219764533316386], [46.420377779905856, 55.16888965365803], [46.41740678331065, 55.15361451594558], [46.438203759477055, 55.13833352672479], [46.45008774585791, 55.104354817242196], [46.459000735643485, 55.090755238045915], [46.4679137254291, 55.080552517121006], [46.46197173223866, 55.06354219737519], [46.47979771180988, 55.04312026670258], [46.5303046539283, 55.048226725900676], [46.545159636904316, 55.034608054483066], [46.557043623285104, 55.017578203502254], [46.5481306334995, 55.00565300207421], [46.57189860626113, 54.993724254126505], [46.574869602856296, 54.98520154824179], [46.55407262668994, 54.971561454113804], [46.568927609665955, 54.956210808870814], [46.58675358923715, 54.9425608661256], [46.62240554837954, 54.935734156343024], [46.63428953476032, 54.930613363369766], [46.6788544836884, 54.920369821318964], [46.72936142580678, 54.90841572548345], [46.76204238835399, 54.898166531431436], [46.77986836792518, 54.88449688062878], [46.776897371330016, 54.86911297804433], [46.73233242240201, 54.84688141723136], [46.68776747347394, 54.83319436502209], [46.64914451773635, 54.82634909811931], [46.64320252454594, 54.816079021488605], [46.65211551433152, 54.804093964334825], [46.66994149390275, 54.788679380626334], [46.723419432616396, 54.771545170671786], [46.785810361115594, 54.74925984515498], [46.871969262376425, 54.70636895680192], [46.91059221811409, 54.6926342839597], [46.91653421130443, 54.68233022758204], [46.89870823173324, 54.65655864025839], [46.79472335090121, 54.622171080113866], [46.84523029301962, 54.60840791161699], [46.88682424535245, 54.57914572108539], [46.967041153422876, 54.53779859245591], [46.98783812958924, 54.50158545820803], [46.975954143208455, 54.4169632164789], [46.94030218406606, 54.3079059605136], [46.90762122151885, 54.22288336181468], [46.901203868873175, 54.159053476328744], [46.79246539348885, 54.124594488403964], [46.58675358923715, 54.17595921498701], [46.52864089583497, 54.178880494360314], [46.48740434600494, 54.18565643548332], [46.37355487356549, 54.213294313639246], [46.32720732668032, 54.22788545958779], [46.28620757366652, 54.24767948293351], [46.17212130441081, 54.2768323191566], [46.05268724128373, 54.27891387584709], [45.99564410665591, 54.29452219943443], [46.06873062289782, 54.36312858184558], [46.191729881939175, 54.446134470144024], [46.230233997812995, 54.48260385631579], [46.227381841081595, 54.50620873152116], [46.1916110420754, 54.5205582476063], [46.17972705569455, 54.55158562675182], [46.22132100802738, 54.58431116485817], [46.331247882049894, 54.61701043740178], [46.33124788204985, 54.61701043740178], [46.274798946741, 54.63764908130225], [46.17378506250421, 54.79553103240727], [46.120307123790596, 54.88107874307137], [46.063858188481746, 54.981791959121566], [46.00146725998254, 55.129841559414444], [45.9925542701969, 55.17398006609327], [45.9925542701969, 55.204508892280316], [46.02523523274411, 55.2163748961572], [46.11733612719536, 55.22993171255057], [46.23914698759857, 55.22993171255057], [46.33124788204985, 55.219764533316386]]],
+ [[[46.84021978667119, 53.096690923925735], [46.89666872198005, 53.04492428696752], [47.000653602812044, 53.08955439749843], [47.057102538120866, 53.10739349441831], [47.09869649045369, 53.114527062106], [47.14029044278653, 53.11274378112609], [47.17594240192895, 53.08777008095032], [47.199710374690554, 53.06456723323225], [47.16108741895296, 53.05742537972639], [47.13731944619129, 53.05742537972639], [47.11355147342972, 53.050282342190485], [47.1046384836441, 53.04135188009843], [47.101667487048935, 53.02527238534767], [47.101667487048935, 53.00561152660287], [47.101667487048935, 52.99667181111702], [47.14029044278653, 52.9752089408365], [47.14920343257214, 52.962684008134445], [47.09869649045369, 52.9304603710004], [46.99768260621688, 52.882079899643955], [46.96500164366965, 52.87849401153252], [46.869929752623236, 52.882079899643955], [46.8194228105048, 52.90179698414577], [46.80753882412398, 52.92866946509722], [46.82833580029041, 52.9752089408365], [46.837248790076025, 53.0038237315919], [46.8194228105048, 53.02527238534767], [46.79565483774319, 53.04849639779178], [46.77188686498156, 53.07349288520233], [46.79565483774319, 53.08063207495259], [46.84021978667119, 53.096690923925735]]],
+ [[[47.07778067442362, 54.54598380728752], [47.07778067442362, 54.57699186198983], [47.14611359611323, 54.589044182570056], [47.21741751439804, 54.554599521164114], [47.22038851099328, 54.5046030458759], [47.12531661994684, 54.457999546254726], [47.08669366420924, 54.50115275585485], [47.06886768463798, 54.523574434122104], [47.07778067442362, 54.54598380728752]]],
+ [[[47.12056302539448, 53.671671482417985], [47.126505018584886, 53.69630501641493], [47.14730199475136, 53.7174080069329], [47.182953953893744, 53.73850041493003], [47.248315878988144, 53.743771863697454], [47.313677804082616, 53.73147078794884], [47.30773581089221, 53.70158175628954], [47.263170861964205, 53.65934931146511], [47.197808936869734, 53.639978619682935], [47.14730199475136, 53.624123252280036], [47.09679505263291, 53.624123252280036], [47.09679505263291, 53.64350122555498], [47.12056302539448, 53.671671482417985]]],
+ [[[47.12828761654207, 53.92610690943748], [47.140171602922855, 53.89985756443028], [47.12531661994684, 53.88585116667325], [47.095606653994786, 53.882348833735186], [47.02727373230521, 53.89110411593832], [47.000534762948405, 53.966323866111715], [47.05401270166202, 53.982050381897864], [47.11343263356604, 53.95933239815573], [47.12828761654207, 53.92610690943748]]],
+ [[[47.71488118429858, 53.547440261412255], [47.72438837340322, 53.57849705576769], [47.75053314344099, 53.60247987223423], [47.79569229168806, 53.62926809111617], [47.82421385900201, 53.63913315049595], [47.9074017636676, 53.630677526543494], [47.988212871057094, 53.60247987223423], [48.026241627475684, 53.57849705576769], [48.01673443837104, 53.54602804711638], [47.99058966833323, 53.51070737355606], [47.94780731736237, 53.47394261989025], [47.94543052008616, 53.39748318063712], [47.94780731736237, 53.31662668981753], [47.94543052008616, 53.2612154107886], [47.92403934460073, 53.23846178514429], [47.874126601801386, 53.22281164752865], [47.847981831763576, 53.22281164752865], [47.83847464265892, 53.228503268552934], [47.82659065627814, 53.24415132601518], [47.82896745355428, 53.264058763207785], [47.82896745355428, 53.29674372981299], [47.83134425083045, 53.32088611953456], [47.79569229168806, 53.35494625525029], [47.72914196795556, 53.4017345366877], [47.69111321153693, 53.44280908364267], [47.688736414260795, 53.471113241671794], [47.688736414260795, 53.49939853623131], [47.69586680608928, 53.51070737355606], [47.70062040064165, 53.526252098949975], [47.71488118429858, 53.547440261412255]]],
+ [[[48.35423965158571, 54.624010155021], [48.35423965158571, 54.65289635381298], [48.38276121889965, 54.69206623520562], [48.4112827862136, 54.71678564468656], [48.44336954944171, 54.71266678924705], [48.454065137184465, 54.696187182997086], [48.43623915761327, 54.66733175850209], [48.421978373956286, 54.65702127853178], [48.41484798212784, 54.63432901042865], [48.4112827862136, 54.621946069788], [48.38276121889965, 54.61162407272107], [48.35423965158571, 54.624010155021]]],
+ [[[49.388582212882945, 53.56544186354657], [49.360060645569, 53.57955540821719], [49.329162280978885, 53.59507486687517], [49.33629267280736, 53.61199868618614], [49.37907502377829, 53.61763845242989], [49.407596591092236, 53.62045805304135], [49.436118158406146, 53.6091785205424], [49.455132536615444, 53.59648544416233], [49.45988613116775, 53.58378855305856], [49.438494955682316, 53.56544186354657], [49.41235018564458, 53.55979512660715], [49.388582212882945, 53.56544186354657]]]
+ ]
+ }
+}
diff --git a/tests/data/dataset_with_media/erzya2.geojson b/tests/data/dataset_with_media/erzya2.geojson
new file mode 100644
index 0000000..da22f86
--- /dev/null
+++ b/tests/data/dataset_with_media/erzya2.geojson
@@ -0,0 +1,40 @@
+{
+"type": "FeatureCollection",
+"features": [
+{
+ "id": 72,
+ "type": "Feature",
+ "properties": {
+ "cldf:languageReference": "2",
+ "Branch": "Mordvin", "Glottocode": "erzy1239", "ISO_639_3": "myv", "Sources": "Ermu\u0161kin 1984, Feoktistov 1990", "Timeperiod": "traditional", "Language": "Erzya", "Dialect": ""},
+ "geometry": {
+ "type": "MultiPolygon",
+ "coordinates": [
+ [[[42.62555948081459, 54.688448862774045], [42.62413340244894, 54.701635598037065], [42.58551044671131, 54.7207212976511], [42.521336920254946, 54.728956953718765], [42.4314939832161, 54.745423245895836], [42.3459292812743, 54.77422315505452], [42.32881634088591, 54.80300257682697], [42.31740771396039, 54.85885834065671], [42.3459292812743, 54.834225617964435], [42.380155162051004, 54.84654385911282], [42.41321303273674, 54.837247777224704], [42.448606923604416, 54.84490164419394], [42.477330225489844, 54.84012749479537], [42.525733995215866, 54.842301333877195], [42.57136850291816, 54.82751743914127], [42.62413340244894, 54.824231393848926], [42.64552457793444, 54.83655268487905], [42.69971555583086, 54.82998179762], [42.77529770921275, 54.83408872749744], [42.81237574672085, 54.833267374946516], [42.83376692220636, 54.81683681415517], [42.86086241115455, 54.79628921051109], [42.90079260539405, 54.78724495454134], [42.92360985924523, 54.77244089848784], [42.92931417270797, 54.76503683846403], [42.917905545782375, 54.72717270358116], [42.89794044866266, 54.68267831839815], [42.853732019326074, 54.62988100747758], [42.80952358998946, 54.59519548030617], [42.732515358241855, 54.5604803864136], [42.678324380345366, 54.54642063031613], [42.65693320485989, 54.55799960514113], [42.64124634283729, 54.58693266401557], [42.6298377159117, 54.62657889744592], [42.63126379427738, 54.65628823827317], [42.62555948081459, 54.688448862774045]]],
+ [[[43.144913453628526, 54.18825965194049], [43.18912188296511, 54.178245249577344], [43.2190695286447, 54.17407186605909], [43.23475639066741, 54.15904419830698], [43.27611266327259, 54.13064370922123], [43.318895014243495, 54.0796410531189], [43.29892991712371, 54.057046075378814], [43.2704083498098, 54.049511683704694], [43.227625998838896, 54.061231258150436], [43.20480874498775, 54.06039425533075], [43.172008942576724, 54.05286047091207], [43.12922659160582, 54.05034890581001], [43.09927894592623, 54.05369762554198], [43.06220090841813, 54.06457910076081], [43.072183456977996, 54.0946975417204], [43.080739927172196, 54.12228691669576], [43.090722475732036, 54.152363482720666], [43.10070502429191, 54.172402394737084], [43.1149658079489, 54.18241821198793], [43.13350482670294, 54.18742521104704], [43.144913453628526, 54.18825965194049]]],
+ [[[43.7088086073977, 55.202397503054556], [43.71451292086049, 55.219755494667844], [43.7487388016372, 55.221924711423654], [43.83430350357901, 55.219755494667844], [43.92176964334168, 55.215416706379045], [44.00733434528348, 55.187203050507556], [44.07198323119504, 55.152451102754746], [44.15944937095779, 55.111143808605966], [44.216492505585606, 55.07741409021822], [44.233605445973986, 55.039297939674185], [44.237408321615845, 55.02404131622162], [44.178463749167044, 55.01095958675255], [44.11761773889734, 55.01423041928617], [44.01113722092536, 55.007688487422826], [43.91036101641609, 55.00332660661457], [43.788668995876705, 55.0022360622963], [43.68599135354652, 54.995692173779155], [43.632751094560554, 55.00441712128593], [43.61753959199311, 55.01859111433407], [43.61753959199311, 55.063260899316276], [43.62324390545591, 55.076325560950515], [43.64986403494888, 55.111143808605966], [43.67838560226285, 55.14267165862767], [43.699301418293054, 55.152451102754746], [43.703104293934906, 55.169830863337914], [43.703104293934906, 55.19154491415376], [43.7088086073977, 55.202397503054556]]],
+ [[[44.15184361967411, 55.215416706379045], [44.18416806262984, 55.23168472347194], [44.25071838636238, 55.24035827965444], [44.37241040690183, 55.24361037552815], [44.45036935755986, 55.24144234116832], [44.52535731162274, 55.23678970173334], [44.61947848375877, 55.203437387543396], [44.72073004772321, 55.16191133888808], [44.80772082803069, 55.12523476066953], [44.896137686703824, 55.11626424717518], [45.005945720862485, 55.09994906026239], [45.06156277712462, 55.0754637876076], [45.09875965449657, 55.04365590809782], [45.08164671410818, 54.99460142192121], [45.05692802243609, 54.97059737882558], [44.975166196136136, 54.955314604831294], [44.790726727506076, 54.95749821421207], [44.684246209534074, 54.96186507700746], [44.57586425374115, 55.00441712128593], [44.530229746038856, 55.02513126768026], [44.4712851735901, 55.048013402414874], [44.37241040690183, 55.09700255260978], [44.296352894064675, 55.13941131124101], [44.22980257033214, 55.17634632094446], [44.17466087352523, 55.19697157826973], [44.149942181853135, 55.20565270306831], [44.15184361967411, 55.215416706379045]]],
+ [[[45.14306769162866, 53.246498041279835], [45.15637775637512, 53.299939830001044], [45.169687821121656, 53.36693178742148], [45.219125204465804, 53.36693178742148], [45.230533831391334, 53.336287508211356], [45.219125204465804, 53.29539419347687], [45.1791950102263, 53.26128649690317], [45.14306769162866, 53.246498041279835]]],
+ [[[45.297084155123834, 53.00689869316492], [45.32560572243778, 53.01948229944466], [45.36363447885637, 53.03206223667269], [45.45110061861908, 53.06292471176731], [45.48722793721673, 53.07549197765959], [45.544271071844584, 53.09033947334335], [45.57659551480035, 53.09262324891298], [45.57659551480035, 53.07777654110758], [45.57659551480035, 53.053782578533], [45.56898976351664, 53.02176983364996], [45.56898976351664, 52.995455866815], [45.59560989300968, 52.88659769647268], [45.61462427121897, 52.83723498210216], [45.69638609751888, 52.83838359002768], [45.808570928953685, 52.85446091152981], [45.86751550140249, 52.85446091152981], [45.87892412832807, 52.832640246620926], [45.9036428200001, 52.7763152832645], [45.92646007385129, 52.739492243476775], [45.918854322567576, 52.69572444220258], [45.87321981486525, 52.69341965710356], [45.85040256101413, 52.73373583119155], [45.791457988565334, 52.779765847301235], [45.7382177295793, 52.80161306743663], [45.69638609751888, 52.79126575182472], [45.616525709039855, 52.76135967784102], [45.530961007098114, 52.739492243476775], [45.46821355900743, 52.73258445751131], [45.37504310578193, 52.71531020309631], [45.27806977691457, 52.725675576839215], [45.23813958267503, 52.83034269660795], [45.213420891002976, 52.94965422790528], [45.297084155123834, 53.00689869316492]]],
+ [[[45.32669451328041, 54.12824478584223], [45.34761032931068, 54.113757768831995], [45.37423045880365, 54.10595496967772], [45.416062090864095, 54.09034496558658], [45.41225921522221, 54.068034767715105], [45.42556927996874, 54.04459615202419], [45.42556927996874, 54.01667579272126], [45.438879344715204, 53.986500743795226], [45.44268222035713, 53.978674011922834], [45.417963528685036, 53.958541376518085], [45.35141320495253, 53.94959041653298], [45.317187324175826, 53.945114215981555], [45.256341313906056, 53.94399509078139], [45.22401687095029, 53.94399509078139], [45.189790990173584, 53.957422611596584], [45.184086676710784, 53.97196421406534], [45.189790990173584, 53.98538272927886], [45.20119961709917, 54.00550239915864], [45.21450968184564, 54.02672928183132], [45.21450968184564, 54.05687517145453], [45.210706806203824, 54.08699920036323], [45.21450968184564, 54.122673455523724], [45.22972118441309, 54.13492939345846], [45.26965137865259, 54.14495428257994], [45.294370070324646, 54.14384052584381], [45.311483010713026, 54.13938519932222], [45.32669451328041, 54.12824478584223]]],
+ [[[45.414195540571036, 54.65921763140383], [45.38540140600159, 54.65280607904625], [45.35117552522485, 54.65060594309415], [45.30363957970165, 54.64400482056412], [45.24279356943193, 54.62749732370909], [45.18872143139929, 54.60286118600975], [45.1050581672784, 54.57972543128489], [45.01759202751569, 54.56760145748109], [44.99560665271119, 54.59556382516861], [45.010818155278635, 54.623094192737604], [45.03835086886171, 54.633978682469], [45.053719346113304, 54.65349359710488], [45.06702941085984, 54.67768682285603], [45.061325097397045, 54.711752824174226], [45.06702941085984, 54.761152593052095], [45.12787542112958, 54.78857088680158], [45.22758206686449, 54.79117465824982], [45.316949644448144, 54.77801704361703], [45.36258415215045, 54.771436630877695], [45.34166833612021, 54.79665573428454], [45.36448558997139, 54.81528583471998], [45.393007157285304, 54.82624070539282], [45.41996732492878, 54.81899243196323], [45.450050291913186, 54.810903054394736], [45.5108963021829, 54.78788565592008], [45.56984087463166, 54.77582369163173], [45.61167250669214, 54.76485514773986], [45.6249825714386, 54.769242922102535], [45.63258832272234, 54.77911367501328], [45.64589838746885, 54.79007835386647], [45.65730701439447, 54.79117465824982], [45.6763213926037, 54.803232044639145], [45.72766021376879, 54.809807285017634], [45.7428717163362, 54.80871148592138], [45.77899903493382, 54.82076364149486], [45.80371772660594, 54.830621821673994], [45.839845045203525, 54.82076364149486], [45.894986742010474, 54.810903054394736], [45.93111406060812, 54.81638145550756], [45.96343850356395, 54.842667440333095], [46.01097444908712, 54.86018192364473], [46.050904643326625, 54.85361488350587], [46.081327648461475, 54.84376231832494], [46.08322908628242, 54.81857260793296], [46.07562333499875, 54.79117465824982], [46.07562333499875, 54.769242922102535], [46.085130524103334, 54.75168896991988], [46.09083483756613, 54.73302955346184], [46.10414490231266, 54.68029772983467], [46.119356404880044, 54.62749732370909], [46.056608956789425, 54.59776693983449], [45.988157195236006, 54.58675017457203], [45.90639536893606, 54.59446222313957], [45.798013413143146, 54.607679480939446], [45.74667459197805, 54.62529581779683], [45.71244871120135, 54.64070385727536], [45.687730019529226, 54.648405688031964], [45.65920845221531, 54.64070385727536], [45.65350413875252, 54.62529581779683], [45.62878544708045, 54.609881940116956], [45.56413656116893, 54.59996993531363], [45.512797740003805, 54.59336059131257], [45.46145891883871, 54.60437556872369], [45.42343016242015, 54.62529581779683], [45.42343016242015, 54.64950583045201], [45.414195540571036, 54.65921763140383]]],
+ [[[45.43662138730284, 55.104898704176634], [45.42901563601917, 55.120124533273774], [45.434719949481966, 55.138605239850804], [45.434719949481966, 55.17771262210125], [45.45563576551218, 55.23955424827528], [45.51648177578185, 55.26014677979171], [45.57732778605159, 55.25906322830891], [45.71423130915845, 55.23304912695652], [45.858740583549015, 55.19616666979387], [45.92529090728155, 55.16250875215417], [45.9214880316397, 55.1505587862591], [45.93289665856529, 55.11359988847581], [45.9614182258792, 55.0831374526055], [45.97662972844658, 55.05809736763755], [45.95761535023735, 55.04175843428952], [45.90437509125131, 55.04066893502264], [45.8359233296979, 55.03631064162323], [45.77507731942819, 54.97087932776642], [45.71993562262125, 54.895501313613], [45.67239967709801, 54.87800217134784], [45.61535654247019, 54.86049542746377], [45.541200467453976, 54.861589821691744], [45.49366452193073, 54.86596710160848], [45.46514295461676, 54.871438033295604], [45.446128576407496, 54.88675269257847], [45.41570557127264, 54.946861098207414], [45.39098687960055, 54.983974129519574], [45.37387393921223, 54.989429036573775], [45.36626818792849, 55.01669245332666], [45.43662138730284, 55.104898704176634]]],
+ [[[45.45183288987029, 55.47626370270599], [45.49176308410979, 55.4913477099847], [45.55451053220048, 55.4978105169423], [45.619159418112034, 55.4978105169423], [45.77317588160729, 55.49673345609218], [45.91958659381876, 55.4999645502743], [46.0526872412838, 55.4999645502743], [46.11543468937441, 55.50319537935865], [46.15726632143486, 55.50857950542422], [46.17818213746509, 55.512886276139845], [46.20860514259995, 55.516116044947864], [46.22381664516736, 55.51719257565216], [46.23712670991386, 55.512886276139845], [46.25804252594407, 55.50211846578518], [46.27895834197431, 55.4999645502743], [46.290366968899896, 55.50319537935865], [46.31508566057196, 55.501041522757156], [46.32269141185566, 55.49457924602195], [46.324592849676606, 55.48165151105687], [46.34741010352772, 55.47626370270599], [46.36452304391611, 55.47303066412304], [46.387340297767224, 55.44931361064656], [46.387340297767224, 55.43205588426267], [46.37783310866257, 55.419107636589054], [46.35501585481146, 55.41479061047995], [46.32649428749748, 55.41479061047995], [46.29226840672077, 55.40831418664397], [46.26564827722777, 55.3996773033324], [46.23712670991386, 55.39319840208278], [46.24092958555575, 55.38455821522198], [46.231422396451066, 55.37483574819556], [46.21430945606268, 55.37159439478916], [46.21811233170457, 55.360787965501586], [46.235225272092954, 55.35322170918683], [46.22952095863016, 55.347816355013826], [46.21621089388362, 55.34457278833409], [46.20290082913716, 55.34241026296729], [46.197196515674364, 55.33375898078453], [46.206703704778974, 55.328350970020686], [46.14775913233021, 55.32726927930199], [46.01655992268615, 55.328350970020686], [45.881557837400194, 55.329432631216754], [45.78648594635378, 55.328350970020686], [45.69711836877013, 55.326187559060436], [45.602046477723654, 55.329432631216754], [45.518383213602796, 55.345654006743004], [45.46134007897497, 55.376996502943776], [45.43852282512379, 55.3996773033324], [45.41950844691453, 55.43205588426267], [45.43281851166099, 55.45578331258055], [45.442325700765636, 55.462251953497045], [45.45183288987029, 55.47626370270599]]],
+ [[[45.82378243152113, 53.48023730966899], [45.89793850653737, 53.454203775885986], [45.901741382179225, 53.437216781276504], [45.92671655422351, 53.422931700764714], [45.9492773277024, 53.41002345934542], [45.97779889501638, 53.39755404005196], [45.92284297579025, 53.38033348827569], [45.89793850653737, 53.351044911157295], [45.86561406358154, 53.3623932842564], [45.78195079946068, 53.388483075917584], [45.793359426386274, 53.434951335417594], [45.82378243152113, 53.48023730966899]]],
+ [[[45.82568386934208, 53.51756188725778], [45.84850112319319, 53.529996112155224], [45.88462844179084, 53.526605321881966], [45.87702269050713, 53.51303944595875], [45.852303998835076, 53.508516521970755], [45.82568386934208, 53.51756188725778]]],
+ [[[45.93977013859775, 53.50060024345099], [45.96258739244893, 53.5119087602148], [45.97779889501638, 53.49833817804499], [45.97779889501638, 53.47684253680264], [45.94357301423963, 53.47910574889259], [45.93977013859775, 53.50060024345099]]],
+ [[[45.9492965597708, 54.090368199123965], [45.92077499245682, 54.07782019541437], [45.89772005887806, 54.052712804386886], [45.904850450706576, 54.04294471921074], [45.90247365343037, 54.027590231586025], [45.8787056806688, 54.00943670702502], [45.8787056806688, 53.9884804875402], [45.84067692425021, 53.985685527840424], [45.82403934331712, 53.96891183041817], [45.78601058689852, 53.96331909714526], [45.75748901958461, 53.95213137911647], [45.721837060442155, 53.93114631539704], [45.726590654994496, 53.91155073082679], [45.70496179978145, 53.9186668578003], [45.659327292079155, 53.95448571084171], [45.59467840616759, 54.02603120137782], [45.54143814718156, 54.09410862898948], [45.59277696834665, 54.12866260541873], [45.55308445383476, 54.16279350943384], [45.51980929196851, 54.19895870469953], [45.47465014372147, 54.30033045396596], [45.44137498185518, 54.38623146017149], [45.42236060364593, 54.463665243954296], [45.42711419819826, 54.502327287291436], [45.47227334644526, 54.51750594814878], [45.58398281842488, 54.52716215993759], [45.74560503320382, 54.52302406305532], [45.82641614059328, 54.51750594814878], [45.85731450518333, 54.499566924857525], [45.866821694288014, 54.45952071404453], [45.8787056806688, 54.41252003660878], [45.89058966704958, 54.36131095006535], [45.885836072497305, 54.33498955569901], [45.885836072497305, 54.30033045396596], [45.89534326160192, 54.24620394033667], [45.91435763981122, 54.203129576728806], [45.91911123436356, 54.151659393115814], [45.94050240984896, 54.12938217370117], [45.95476319350595, 54.104305975962525], [45.9492965597708, 54.090368199123965]]],
+ [[[46.234192026367616, 54.16000940554767], [46.21993124271062, 54.16000940554767], [46.20471974014321, 54.16613230478749], [46.17905032956063, 54.16613230478749], [46.160986670261856, 54.16724546185651], [46.14007085423158, 54.16557571502185], [46.12485935166421, 54.16557571502185], [46.099189941081626, 54.165019117768836], [46.091584189797956, 54.16613230478749], [46.06211190357352, 54.179488213050135], [46.046900401006084, 54.201738478082284], [46.046900401006084, 54.222865138053436], [46.06116118466307, 54.24398099332199], [46.104894254544426, 54.257311757201066], [46.14007085423158, 54.2595331323122], [46.198064707769966, 54.25453487002356], [46.23894562091992, 54.23786961989473], [46.252255685666384, 54.21730653834845], [46.260812155860584, 54.1917273408981], [46.252255685666384, 54.16668888706564], [46.234192026367616, 54.16000940554767]]],
+ [[[46.33124788204985, 55.219764533316386], [46.420377779905856, 55.16888965365803], [46.41740678331065, 55.15361451594558], [46.438203759477055, 55.13833352672479], [46.45008774585791, 55.104354817242196], [46.459000735643485, 55.090755238045915], [46.4679137254291, 55.080552517121006], [46.46197173223866, 55.06354219737519], [46.47979771180988, 55.04312026670258], [46.5303046539283, 55.048226725900676], [46.545159636904316, 55.034608054483066], [46.557043623285104, 55.017578203502254], [46.5481306334995, 55.00565300207421], [46.57189860626113, 54.993724254126505], [46.574869602856296, 54.98520154824179], [46.55407262668994, 54.971561454113804], [46.568927609665955, 54.956210808870814], [46.58675358923715, 54.9425608661256], [46.62240554837954, 54.935734156343024], [46.63428953476032, 54.930613363369766], [46.6788544836884, 54.920369821318964], [46.72936142580678, 54.90841572548345], [46.76204238835399, 54.898166531431436], [46.77986836792518, 54.88449688062878], [46.776897371330016, 54.86911297804433], [46.73233242240201, 54.84688141723136], [46.68776747347394, 54.83319436502209], [46.64914451773635, 54.82634909811931], [46.64320252454594, 54.816079021488605], [46.65211551433152, 54.804093964334825], [46.66994149390275, 54.788679380626334], [46.723419432616396, 54.771545170671786], [46.785810361115594, 54.74925984515498], [46.871969262376425, 54.70636895680192], [46.91059221811409, 54.6926342839597], [46.91653421130443, 54.68233022758204], [46.89870823173324, 54.65655864025839], [46.79472335090121, 54.622171080113866], [46.84523029301962, 54.60840791161699], [46.88682424535245, 54.57914572108539], [46.967041153422876, 54.53779859245591], [46.98783812958924, 54.50158545820803], [46.975954143208455, 54.4169632164789], [46.94030218406606, 54.3079059605136], [46.90762122151885, 54.22288336181468], [46.901203868873175, 54.159053476328744], [46.79246539348885, 54.124594488403964], [46.58675358923715, 54.17595921498701], [46.52864089583497, 54.178880494360314], [46.48740434600494, 54.18565643548332], [46.37355487356549, 54.213294313639246], [46.32720732668032, 54.22788545958779], [46.28620757366652, 54.24767948293351], [46.17212130441081, 54.2768323191566], [46.05268724128373, 54.27891387584709], [45.99564410665591, 54.29452219943443], [46.06873062289782, 54.36312858184558], [46.191729881939175, 54.446134470144024], [46.230233997812995, 54.48260385631579], [46.227381841081595, 54.50620873152116], [46.1916110420754, 54.5205582476063], [46.17972705569455, 54.55158562675182], [46.22132100802738, 54.58431116485817], [46.331247882049894, 54.61701043740178], [46.33124788204985, 54.61701043740178], [46.274798946741, 54.63764908130225], [46.17378506250421, 54.79553103240727], [46.120307123790596, 54.88107874307137], [46.063858188481746, 54.981791959121566], [46.00146725998254, 55.129841559414444], [45.9925542701969, 55.17398006609327], [45.9925542701969, 55.204508892280316], [46.02523523274411, 55.2163748961572], [46.11733612719536, 55.22993171255057], [46.23914698759857, 55.22993171255057], [46.33124788204985, 55.219764533316386]]],
+ [[[46.84021978667119, 53.096690923925735], [46.89666872198005, 53.04492428696752], [47.000653602812044, 53.08955439749843], [47.057102538120866, 53.10739349441831], [47.09869649045369, 53.114527062106], [47.14029044278653, 53.11274378112609], [47.17594240192895, 53.08777008095032], [47.199710374690554, 53.06456723323225], [47.16108741895296, 53.05742537972639], [47.13731944619129, 53.05742537972639], [47.11355147342972, 53.050282342190485], [47.1046384836441, 53.04135188009843], [47.101667487048935, 53.02527238534767], [47.101667487048935, 53.00561152660287], [47.101667487048935, 52.99667181111702], [47.14029044278653, 52.9752089408365], [47.14920343257214, 52.962684008134445], [47.09869649045369, 52.9304603710004], [46.99768260621688, 52.882079899643955], [46.96500164366965, 52.87849401153252], [46.869929752623236, 52.882079899643955], [46.8194228105048, 52.90179698414577], [46.80753882412398, 52.92866946509722], [46.82833580029041, 52.9752089408365], [46.837248790076025, 53.0038237315919], [46.8194228105048, 53.02527238534767], [46.79565483774319, 53.04849639779178], [46.77188686498156, 53.07349288520233], [46.79565483774319, 53.08063207495259], [46.84021978667119, 53.096690923925735]]],
+ [[[47.07778067442362, 54.54598380728752], [47.07778067442362, 54.57699186198983], [47.14611359611323, 54.589044182570056], [47.21741751439804, 54.554599521164114], [47.22038851099328, 54.5046030458759], [47.12531661994684, 54.457999546254726], [47.08669366420924, 54.50115275585485], [47.06886768463798, 54.523574434122104], [47.07778067442362, 54.54598380728752]]],
+ [[[47.12056302539448, 53.671671482417985], [47.126505018584886, 53.69630501641493], [47.14730199475136, 53.7174080069329], [47.182953953893744, 53.73850041493003], [47.248315878988144, 53.743771863697454], [47.313677804082616, 53.73147078794884], [47.30773581089221, 53.70158175628954], [47.263170861964205, 53.65934931146511], [47.197808936869734, 53.639978619682935], [47.14730199475136, 53.624123252280036], [47.09679505263291, 53.624123252280036], [47.09679505263291, 53.64350122555498], [47.12056302539448, 53.671671482417985]]],
+ [[[47.12828761654207, 53.92610690943748], [47.140171602922855, 53.89985756443028], [47.12531661994684, 53.88585116667325], [47.095606653994786, 53.882348833735186], [47.02727373230521, 53.89110411593832], [47.000534762948405, 53.966323866111715], [47.05401270166202, 53.982050381897864], [47.11343263356604, 53.95933239815573], [47.12828761654207, 53.92610690943748]]],
+ [[[47.71488118429858, 53.547440261412255], [47.72438837340322, 53.57849705576769], [47.75053314344099, 53.60247987223423], [47.79569229168806, 53.62926809111617], [47.82421385900201, 53.63913315049595], [47.9074017636676, 53.630677526543494], [47.988212871057094, 53.60247987223423], [48.026241627475684, 53.57849705576769], [48.01673443837104, 53.54602804711638], [47.99058966833323, 53.51070737355606], [47.94780731736237, 53.47394261989025], [47.94543052008616, 53.39748318063712], [47.94780731736237, 53.31662668981753], [47.94543052008616, 53.2612154107886], [47.92403934460073, 53.23846178514429], [47.874126601801386, 53.22281164752865], [47.847981831763576, 53.22281164752865], [47.83847464265892, 53.228503268552934], [47.82659065627814, 53.24415132601518], [47.82896745355428, 53.264058763207785], [47.82896745355428, 53.29674372981299], [47.83134425083045, 53.32088611953456], [47.79569229168806, 53.35494625525029], [47.72914196795556, 53.4017345366877], [47.69111321153693, 53.44280908364267], [47.688736414260795, 53.471113241671794], [47.688736414260795, 53.49939853623131], [47.69586680608928, 53.51070737355606], [47.70062040064165, 53.526252098949975], [47.71488118429858, 53.547440261412255]]],
+ [[[48.35423965158571, 54.624010155021], [48.35423965158571, 54.65289635381298], [48.38276121889965, 54.69206623520562], [48.4112827862136, 54.71678564468656], [48.44336954944171, 54.71266678924705], [48.454065137184465, 54.696187182997086], [48.43623915761327, 54.66733175850209], [48.421978373956286, 54.65702127853178], [48.41484798212784, 54.63432901042865], [48.4112827862136, 54.621946069788], [48.38276121889965, 54.61162407272107], [48.35423965158571, 54.624010155021]]],
+ [[[49.388582212882945, 53.56544186354657], [49.360060645569, 53.57955540821719], [49.329162280978885, 53.59507486687517], [49.33629267280736, 53.61199868618614], [49.37907502377829, 53.61763845242989], [49.407596591092236, 53.62045805304135], [49.436118158406146, 53.6091785205424], [49.455132536615444, 53.59648544416233], [49.45988613116775, 53.58378855305856], [49.438494955682316, 53.56544186354657], [49.41235018564458, 53.55979512660715], [49.388582212882945, 53.56544186354657]]]
+ ]
+ }
+}
+]
+}
\ No newline at end of file
diff --git a/tests/data/dataset_with_media/languages.csv b/tests/data/dataset_with_media/languages.csv
new file mode 100644
index 0000000..a4251ae
--- /dev/null
+++ b/tests/data/dataset_with_media/languages.csv
@@ -0,0 +1,3 @@
+ID,Name,Speaker_Area
+1,Erzya,3
+2,Erzya,4
\ No newline at end of file
diff --git a/tests/data/dataset_with_media/media.csv b/tests/data/dataset_with_media/media.csv
index dc061ed..2b0e33c 100644
--- a/tests/data/dataset_with_media/media.csv
+++ b/tests/data/dataset_with_media/media.csv
@@ -1,3 +1,5 @@
ID,Name,Description,Media_Type,Download_URL
1,x,y,text/plain,"data:text/plain;base64,SGVsbG8sIFdvcmxkIQ=="
-2,y,x,text/plain;charset=UTF-8,"data:;base64,w6TDtsO8"
\ No newline at end of file
+2,y,x,text/plain;charset=UTF-8,"data:;base64,w6TDtsO8"
+3,z,,application/geo+json,erzya.geojson
+4,z,,application/geo+json,erzya2.geojson
\ No newline at end of file
diff --git a/tests/data/dataset_with_media/metadata.json b/tests/data/dataset_with_media/metadata.json
index 93fdad9..4b67d3a 100644
--- a/tests/data/dataset_with_media/metadata.json
+++ b/tests/data/dataset_with_media/metadata.json
@@ -4,6 +4,35 @@
"dialect": {"commentPrefix": null},
"rdf:ID": "dswm",
"tables": [
+ {
+ "url": "languages.csv",
+ "dc:conformsTo": "http://cldf.clld.org/v1.0/terms.rdf#LanguageTable",
+ "tableSchema": {
+ "columns": [
+ {
+ "name": "ID",
+ "required": true,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#id",
+ "datatype": {
+ "base": "string",
+ "format": "[a-zA-Z0-9_\\-]+"
+ }
+ },
+ {
+ "name": "Name",
+ "required": false,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#name",
+ "datatype": "string"
+ },
+ {
+ "name": "Speaker_Area",
+ "required": false,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#speakerArea",
+ "datatype": "string"
+ }
+ ]
+ }
+},
{
"url": "media.csv",
"dc:conformsTo": "http://cldf.clld.org/v1.0/terms.rdf#MediaTable",
diff --git a/tests/data/textcorpus/languages.csv b/tests/data/textcorpus/languages.csv
new file mode 100644
index 0000000..b653048
--- /dev/null
+++ b/tests/data/textcorpus/languages.csv
@@ -0,0 +1,2 @@
+ID,Name,Glottocode
+l1,Tsez,dido1241
\ No newline at end of file
diff --git a/tests/data/textcorpus/lines.csv b/tests/data/textcorpus/lines.csv
new file mode 100644
index 0000000..881a4e2
--- /dev/null
+++ b/tests/data/textcorpus/lines.csv
@@ -0,0 +1,4 @@
+ID,Language_ID,Primary_Text,Analyzed_Word,Gloss,Translated_Text,Meta_Language_ID,Comment,Text_ID,Position,Example_ID,Grammaticality_Judgement
+e1,l1,second line,der in halt,i dont know,no idea,l2,,1,1 2,,
+e2,l1,first line,der in halt,i dont know,no idea,l2,,1,1 1,,*
+e2-alt,l1,first line,,,alt,l1,,,,e2,
\ No newline at end of file
diff --git a/tests/data/textcorpus/metadata.json b/tests/data/textcorpus/metadata.json
new file mode 100644
index 0000000..8f87ce3
--- /dev/null
+++ b/tests/data/textcorpus/metadata.json
@@ -0,0 +1,177 @@
+{
+ "@context": [
+ "http://www.w3.org/ns/csvw",
+ {
+ "@language": "en"
+ }
+ ],
+ "dc:conformsTo": "http://cldf.clld.org/v1.0/terms.rdf#TextCorpus",
+ "dialect": {
+ "commentPrefix": null
+ },
+ "tables": [
+ {
+ "url": "languages.csv",
+ "dc:conformsTo": "http://cldf.clld.org/v1.0/terms.rdf#LanguageTable",
+ "tableSchema": {
+ "columns": [
+ {
+ "name": "ID",
+ "required": true,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#id",
+ "datatype": {
+ "base": "string",
+ "format": "[a-zA-Z0-9_\\-]+"
+ }
+ },
+ {
+ "name": "Name",
+ "required": false,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#name",
+ "datatype": "string"
+ },
+ {
+ "name": "Glottocode",
+ "required": false,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#glottocode",
+ "datatype": {
+ "base": "string",
+ "format": "[a-z0-9]{4}[1-9][0-9]{3}"
+ },
+ "valueUrl": "http://glottolog.org/resource/languoid/id/{Glottocode}"
+ }
+ ]
+ }
+ },
+ {
+ "url": "lines.csv",
+ "dc:conformsTo": "http://cldf.clld.org/v1.0/terms.rdf#ExampleTable",
+ "tableSchema": {
+ "columns": [
+ {
+ "name": "ID",
+ "required": true,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#id",
+ "datatype": {
+ "base": "string",
+ "format": "[a-zA-Z0-9_\\-]+"
+ }
+ },
+ {
+ "name": "Text_ID",
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#contributionReference",
+ "datatype": "string"
+ },
+ {
+ "name": "Language_ID",
+ "required": true,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#languageReference",
+ "dc:extent": "singlevalued",
+ "datatype": "string"
+ },
+ {
+ "name": "Primary_Text",
+ "required": true,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#primaryText",
+ "dc:description": "The example text in the source language.",
+ "dc:extent": "singlevalued",
+ "datatype": "string"
+ },
+ {
+ "name": "Analyzed_Word",
+ "required": false,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#analyzedWord",
+ "dc:description": "The sequence of words of the primary text to be aligned with glosses",
+ "dc:extent": "multivalued",
+ "datatype": "string",
+ "separator": "\t"
+ },
+ {
+ "name": "Gloss",
+ "required": false,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#gloss",
+ "dc:description": "The sequence of glosses aligned with the words of the primary text",
+ "dc:extent": "multivalued",
+ "datatype": "string",
+ "separator": "\t"
+ },
+ {
+ "name": "Translated_Text",
+ "required": false,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#translatedText",
+ "dc:extent": "singlevalued",
+ "dc:description": "The translation of the example text in a meta language",
+ "datatype": "string"
+ },
+ {
+ "name": "Meta_Language_ID",
+ "required": false,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#metaLanguageReference",
+ "dc:extent": "singlevalued",
+ "dc:description": "References the language of the translated text",
+ "datatype": "string"
+ },
+ {
+ "name": "LGR_Conformance",
+ "required": false,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#lgrConformance",
+ "dc:extent": "singlevalued",
+ "dc:description": "The level of conformance of the example with the Leipzig Glossing Rules",
+ "datatype": {
+ "base": "string",
+ "format": "WORD_ALIGNED|MORPHEME_ALIGNED"
+ }
+ },
+ {
+ "name": "Example_ID",
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#exampleReference",
+ "dc:extent": "singlevalued",
+ "datatype": "string"
+ },
+ {
+ "name": "Position",
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#position",
+ "separator": " ",
+ "datatype": "integer"
+ },
+ {
+ "name": "Comment",
+ "required": false,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#comment",
+ "datatype": "string"
+ },
+ {
+ "name": "Grammaticality_Judgement",
+ "required": false,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#grammaticalityJudgement",
+ "datatype": "string"
+ }
+ ]
+ }
+ },
+ {
+ "url": "texts.csv",
+ "dc:conformsTo": "http://cldf.clld.org/v1.0/terms.rdf#ContributionTable",
+ "tableSchema": {
+ "columns": [
+ {
+ "name": "ID",
+ "required": true,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#id",
+ "datatype": {
+ "base": "string",
+ "format": "[a-zA-Z0-9_\\-]+"
+ }
+ },
+ {
+ "name": "Name",
+ "required": false,
+ "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#name",
+ "datatype": "string"
+ }
+ ]
+ }
+ }
+
+ ]
+}
\ No newline at end of file
diff --git a/tests/data/textcorpus/texts.csv b/tests/data/textcorpus/texts.csv
new file mode 100644
index 0000000..6e75585
--- /dev/null
+++ b/tests/data/textcorpus/texts.csv
@@ -0,0 +1,3 @@
+ID,Name
+1,The text
+2,Text without lines
\ No newline at end of file
diff --git a/tests/test_dataset.py b/tests/test_dataset.py
index fe9dee7..086c775 100644
--- a/tests/test_dataset.py
+++ b/tests/test_dataset.py
@@ -10,8 +10,8 @@
from pycldf.terms import term_uri, TERMS
from pycldf.dataset import (
- Generic, Wordlist, StructureDataset, Dictionary, ParallelText, Dataset, GitRepository,
- make_column, get_modules, iter_datasets, SchemaError)
+ Generic, Wordlist, StructureDataset, Dictionary, ParallelText, Dataset, TextCorpus,
+ GitRepository, make_column, get_modules, iter_datasets, SchemaError)
from pycldf.sources import Sources
@@ -25,6 +25,11 @@ def ds_wl(tmp_path):
return Wordlist.in_dir(tmp_path)
+@pytest.fixture
+def ds_tc(tmp_path):
+ return TextCorpus.in_dir(tmp_path)
+
+
@pytest.fixture
def ds_wl_notables(tmp_path):
return Wordlist.in_dir(str(tmp_path), empty_tables=True)
@@ -94,8 +99,9 @@ def test_provenance(ds, tmp_path):
assert ds.properties['prov:wasDerivedFrom']['dc:created']
-def test_primary_table(ds):
+def test_primary_table(ds, ds_tc):
assert ds.primary_table is None
+ assert ds_tc.primary_table is not None
def test_components(ds):
@@ -832,7 +838,7 @@ def test_get_modules():
@pytest.mark.filterwarnings('ignore::UserWarning')
def test_iter_datasets(data, tmp_path, csvw3, caplog):
- assert len(list(iter_datasets(data))) == 10 if csvw3 else 11
+ assert len(list(iter_datasets(data))) == 11 if csvw3 else 12
if csvw3:
assert 'Reading' in caplog.records[0].msg
@@ -938,3 +944,7 @@ def test_Dataset_set_sources(ds):
src = Sources()
ds.sources = src
assert ds.sources is src
+
+
+def test_StructureDataset(structuredataset_with_examples):
+ assert len(structuredataset_with_examples.features) == 2
diff --git a/tests/test_media.py b/tests/test_media.py
index 6cb63ff..257ceaa 100644
--- a/tests/test_media.py
+++ b/tests/test_media.py
@@ -211,3 +211,7 @@ def test_Media_validate(tmp_path):
ds['MediaTable', 'ID'].valueUrl = ''
ds.write(MediaTable=[dict(ID='123', Media_Type='text/plain')])
assert not ds.validate(log=logging.getLogger('test'))
+
+
+def test_Media_validate2(dataset_with_media):
+ assert dataset_with_media.validate()
diff --git a/tests/test_orm.py b/tests/test_orm.py
index ef12879..f5807ef 100644
--- a/tests/test_orm.py
+++ b/tests/test_orm.py
@@ -200,3 +200,34 @@ def test_columnspec(tmp_path):
v = ds.objects('ValueTable')[0]
assert v.cldf.value == '1 2 3'
assert v.typed_value == [1, 2, 3]
+
+
+def test_TextCorpus(textcorpus):
+ assert len(textcorpus.texts) == 2
+
+ e = textcorpus.get_object('ExampleTable', 'e2')
+ assert e.alternative_translations
+
+ text = e.text
+ assert text
+ assert text.sentences[0].id == 'e2'
+
+ assert textcorpus.get_text('2').sentences == []
+
+ assert len(textcorpus.sentences) == 2
+ assert textcorpus.sentences[0].cldf.primaryText == 'first line'
+
+ with pytest.raises(ValueError) as e:
+ textcorpus.validate()
+ assert 'ungrammatical' in str(e)
+
+
+def test_speakerArea(dataset_with_media):
+ lang = dataset_with_media.objects('LanguageTable')[0]
+ sa = lang.speaker_area
+ assert sa.scheme == 'file'
+ assert sa
+ assert sa.mimetype.subtype == 'geo+json'
+ assert 'properties' in lang.speaker_area_as_geojson_feature
+
+ assert dataset_with_media.objects('LanguageTable')[1].speaker_area_as_geojson_feature