Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update encoding of author-assigned keywords to match DHQ-assigned keywords #58

Open
juliaflanders opened this issue Dec 10, 2023 · 2 comments
Assignees
Labels
encoding update Global update to DHQ article encoding

Comments

@juliaflanders
Copy link
Contributor

juliaflanders commented Dec 10, 2023

The current encoding of author-assigned keywords uses <list type="simple">, whereas the encoding of DHQ-assigned keywords uses <term>. We should change the encoding of author-assigned keywords to use <term> so that we are using the same element in both cases. The encoding will still differ in that DHQ-assigned keywords use @corresp to point to the DHQ keyword taxonomy, whereas the author-assigned keywords don't have any external reference to point to.

We also should update our (currently placeholder) encoding of project keywords to match this practice. There aren't yet any project keywords but our metadata includes a placeholder for these and might as well make it match the others while we're at it. Because the project keywords will point to a project registry, the encoding will be analogous to the DHQ-assigned keywords.

Here are samples of the target encoding for each case:

DHQ-assigned:

<keywords scheme="#dhq_keywords">
    <term corresp="[pointer-to-keyword goes here]"/>
</keywords>

Author-assigned:

<keywords scheme="#authorial_keywords">
    <term>[keyword goes here]</term>
</keywords>

Project:

<keywords scheme="#project_keywords">
    <term corresp="[pointer-to-keyword goes here]"/>
</keywords>
@juliaflanders juliaflanders added the encoding update Global update to DHQ article encoding label Dec 10, 2023
@sydb
Copy link
Contributor

sydb commented Dec 10, 2023

@juliaflanders
I made a slight tweak to the original problem statement, above, in that @corresp holds a pointer to the keyword, not the keyword itself. But

  1. I am wondering if it is better to use @ref instead of @corresp — I think of its semantics as significantly more precisely matching what we want, here. But in truth the Guidelines (incorrectly IMHO) say it is for “named entitites”. (I feel a TEI ticket coming on …)
  2. The keywords themselves are stored in the file common/xml/taxonomy.xml. Should the pointers be
    a) direct to that file (e.g., ../../common/xml/taxonomy.xml#code_studies)
    b) direct to the <category> (e.g., #code_studies, which presupposes something like <xi:include href="../../common/xml/taxonomy.xml"/> in the article itself)
    c) indirectly (e.g., k:code_studies, which presupposes a proper <prefixDef> in the <teiHeader> of each article)

For 2, I do not think it makes much difference, except that (b) runs the slight but real risk that someone has used an ID of code_studies for something else in the article. In any case, for the most part it would be quite easy to change from one to the other, should we change our minds.

@juliaflanders
Copy link
Contributor Author

I don't have strong feelings about @corresp vs. @ref, so I'm glad to leave that choice to you based on your reading of the Guidelines. I can see the logic of saying in effect that @ref is to <term> as @ref is to <name> in that it anchors a referencing string to a definition/authority, even if the specific thing in question isn't a named entity.

For the storage of the keywords, if we go with #2b, presumably validation would catch ID conflicts of the kind you were concerned about? I think that would be the simplest approach. But any would work and I don't feel strongly--the differences are ultimately just something we would build into our template, so equally easy to encode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
encoding update Global update to DHQ article encoding
Projects
None yet
Development

No branches or pull requests

2 participants