Skip to content
This repository has been archived by the owner on Jan 10, 2023. It is now read-only.

id generation for data/metadata in CSV files #9

Open
nkrishnaswami opened this issue Aug 12, 2019 · 0 comments
Open

id generation for data/metadata in CSV files #9

nkrishnaswami opened this issue Aug 12, 2019 · 0 comments

Comments

@nkrishnaswami
Copy link
Contributor

We may want to indicate how to generate IDs for the triples corresponding to the rows in CSV files.

This will facilitate having a well defined mapping from DSPL 2 datasets to triples, and may make it feasible to use dimension values and footnotes defined in CSV files across datasets.

Tentative proposal

Attempt to generate easy-to-keep-unique IDs, and make no provisions for ID collisions.

codeList

For each CSV row,

  • Start with the containing dimension's ID.
  • If there is no fragment, set the fragment to the dimension's name, URL encoded.
  • Append an = and the URL-encoded codeValue to the fragment.

For example, if a row's codeValue is us and its containing Dimension has @id of #country, the row's triples should be generated as if from equivalent JSON-LD with "@id": "#country=us".


footnote

For each CSV row,

  • Start with the containing StatisticalDataset's @id.
  • If there is a fragment, append a /
  • Append footnote= and the URL-encoded codeValue to the fragment

For example, if the dataset's @id is the empty string, a footnote with codeValue of p would yield an ID of #footnote=p. Similarly, if the dataset @id is #my_dataset, the footnote would have @id of #my_dataset/footnote=p.


observation

For each CSV row,

  • Start with the slice's @id.
  • If there is a fragment, append a / to it.
  • Sort the dimension values by dimension name.
  • For each dimension value, append the URL-encoded name, = and the URL-encoded codeValue to the fragment, separating the entries with /.
  • Sort the measure values by measure name
  • For each measure value, append the URL-encoded name to the fragment, separating entries with /.

For example, an observation in a slice with an @id of #europe_unemployment_slice with dimensions

  • gender of m,
  • country of uk, and
  • month of 2010-10

and measures

  • unemployment_rate and
  • unemployment

would have an @id of #europe_unemployment_slice/country=uk/gender=m/month=2010-10/unemployment/unemployment_rate

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant