Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add @source to correspDesc and use @sameAs in sourceDesc/bibl #9

Open
StefanDumont opened this issue Mar 9, 2017 · 7 comments
Open
Milestone

Comments

@StefanDumont
Copy link
Contributor

Since the version 3.1.0 of the TEI P5 guidelines the attribute @source is global and therefore also available in correspDesc. We should use it in correspDesc to specifiy the source of the correspondence description, i.e. the scholarly edition. This information is already encoded in the teiHeader//sourceDesc/bibl of a CMIF file. But in cases where letters from multiple publications are gathered in a CMIF file, we're already using @source to specify to which publication (i.e. bibl) a correspDesc element belongs to. At the latest if we want to aggregate multiple CMIF files from different sources , it's very useful to have the source publication noted in the correspDesc element itself. correspSearch for example marks every correspDesc element in the moment of the import with an attribute @cs:source, which points to teiHeader//sourceDesc/bibl/@xml:id (for an example, see "The Letters of Otto Nicolai").

Thats why I would like to propose:

  • We make correspDesc/@source mandatory
  • the values of correspDesc/@source should be
    • an URI from an authority file, like GND, VIAF or LC
    • OR an ISBN URI, e.g.: urn:isbn:0824060970 (see section 12.4 of the ISBN User's Manual)
    • OR an URN urn:nbn:de:kobv:83-opus4-57390
    • OR the official URL of a digital scholarly edition, e.g. http://www.weber-gesamtausgabe.de
    • OR a DOI, if the publication is available just online i.e. http://dx.doi.org/10.18443/228
    • OR an (fixed) UUID if the publication doesn't have an URI
  • teiHeader//sourceDesc/bibl remains, but instead of an attribute @xml:id it must have an attribute @sameAs, which have the same value as proposed above (the specification of @xml:id doesn't allow URIs as values). This has also the advantage, that the publications are noted also by standard identifier.

For publications, which don't have an URI we should consider using some other IDs. For example the accession number of the State Library Berlin for books published between 1871 and 1912, because the State Library has an official collection focus for this time period (be careful: catalogue numbers (in contrast to accession numbers) are often not permanent).

@StefanDumont
Copy link
Contributor Author

@rettinghaus I've checked the CMIF files gathered by correspSearch and it seems that you have the most diverse and "difficult" use cases for this feature. What do you mean?

@StefanDumont StefanDumont added this to the CMIF 2.0 milestone Mar 9, 2017
@rettinghaus
Copy link
Contributor

Makes sense. But why not turn this around, as <bibl> is also part of att.global.linking. We could store an URI or whatever there and use @source as before (and don't break compatibility).
On the other hand: @source should be used "to point to an external source". As <bibl> usually describes an external source this seems ok to me. Or is it really meant to point somewhere outside the document?

@StefanDumont
Copy link
Contributor Author

StefanDumont commented Feb 23, 2018

After discussing this topic with a colleague, the suggestion arose to simply use a UUID in bibl/@xml:id and refer to it with correspDesc/@source. This would also correspond to the examples provided in the TEI Guidelines. Further identifiers for the bibliographic entry (ISBN, DNB-URI etc.) could be noted in bibl/@sameAs.

@StefanDumont
Copy link
Contributor Author

This has the advantage that we can separate the "linking" between bibl and correspDesc from the presence of an ISBN, GND-URI or similar.

@pdaengeli
Copy link

After discussing this topic with a colleague, the suggestion arose to simply use a UUID in bibl/@xml:id and refer to it with correspDesc/@source. This would also correspond to the examples provided in the TEI Guidelines. Further identifiers for the bibliographic entry (ISBN, DNB-URI etc.) could be noted in bibl/@sameAs.

While this seems to be working for many cases (I had a look at CMIF data linked at https://correspsearch.net/data.xql?id=cmiFiles), there is a problem with the recommendation/requirement "to simply use a UUID in bibl/@xml:id":

According to the spec 1,

The normalized value of the [ @xml:id ] attribute is an NCName according to the Namespaces in XML Recommendation which has the same version as the document in which this attribute occurs (NCName for XML 1.0, or NCName for XML 1.1).

with NCNames requiring a NameStartChar 2, which cannot be numeric. This collides with many valid UUIDs starting in [0-9]. Take for instance the first example from https://en.wikipedia.org/wiki/Universally_unique_identifier#Format: 123e4567-e89b-12d3-a456-426614174000.

Perhaps this should be considered for CMIF v2.

A possible way out could be to say that bibl/@xml:id should/may start with a prefix (e.g. uuid_). Processing applications could handle this ambiguity relatively easily (while staying backwards compatible).

@zed-g
Copy link

zed-g commented Jul 6, 2020

Yeah, the fact that an xml:id can't start with a number is really unfortunate. But rather than using a prefix, I'm generally fond of just generating a UUID (Version 4) until you get one starting with a letter. In any case, it should definitely be noted in the documentation.

@pdaengeli
Copy link

From my point of view, name-space name based UUIDs (Versions 3/5) are best aligned with the recommendation "to use a 'static' UUID, i.e. which is not re-generated every time the CMIF data is updated". This would allow to deterministically generate a UUID on the basis of a meaningful string.

For now I'll recursively generate a name-based UUID until I get a suitable one.

For instance like this.
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:foo="foo" 
    exclude-result-prefixes="xs" expand-text="true" version="3.0">
    
    <xsl:template match="/">{foo:generateUUID('https://example.org/my-meaningful-string')}</xsl:template>
    
    <xsl:function name="foo:generateUUID">
        <xsl:param name="input" as="xs:string"/>
        <xsl:variable name="currentUUID" select="unparsed-text('https://www.uuidtools.com/api/generate/v5/namespace/ns:url/name/' || 
            $input => encode-for-uri()) => parse-json()" as="xs:string"/>
        <xsl:choose>
            <xsl:when test="$currentUUID castable as xs:ID">{$currentUUID}</xsl:when>
            <xsl:otherwise>{foo:generateUUID($currentUUID)}</xsl:otherwise>
        </xsl:choose>
    </xsl:function>
    
</xsl:transform>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants