Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to encode compact/compressed headwords #227

Open
daliboris opened this issue Dec 5, 2024 · 0 comments
Open

How to encode compact/compressed headwords #227

daliboris opened this issue Dec 5, 2024 · 0 comments

Comments

@daliboris
Copy link
Contributor

In Czech printed dictionaries it is not uncommon to encode variants of the lemma in one word, for example

byte(d)lný, i.e. bytedlný and bytelný (residing) (also used as a running head)

bytedlny

(j)herce, jhercě and hercě (player; see also uncompressed headwords jhráč and hráč - could be compressed as (j)hráč)

jhrac

It would be possible to use @expand attribute of the <orth> element, where space is a lemma separator:

<form type="lemma">
  <orth expand="jhercě hercě">(j)hercě</orth>
</form>

But not in the case of

leleti (sě), leleti and leleti sě (sway and sway itself)

leleti-se

where is part of the reflexive variant of the lemma

<form type="lemma">
   <orth expand="leleti leleti sě">leleti (sě)</orth>
</form>

What are othe possible (TEI Lex-0 v. 0.9.3 valid) solutions:

<form type="lemma">
   <orth type="abbreviated">leleti (sě)</orth>
   <orth type="expanded" resp="#boris">leleti</orth>
   <orth type="expanded" resp="#boris">leleti sě</orth>
</form>
<form type="lemma" subtype="abbreviated">
 <orth type="abbreviated">leleti (sě)</orth>
</form>
<form type="lemma" subtype="expanded" resp="#boris">
 <orth type="expanded">leleti</orth>
</form>
<form type="lemma" subtype="expanded" resp="#boris" >
 <orth type="expanded">leleti sě</orth>
</form>

What would you recommend?

In the last example, the compressed variants are also part of word forms (for 1st and 2nd person: -eju, -éš (sě)) and definition (vlnit (se)), so the chosen encoding should be also applicable for <form type="inflected"> and <def> elements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant