You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Chinese LPP conllulex file has examples of sentence and token level alignments to the English LPP file. There is a script scripts/generate_alignments_from_conllulex.py, which generates alignments for the Chinese-English language pair with the Chinese conllulex file as input to the script.
Some proposed changes are needed to expand the alignments to support a) Multiple language pairs , b) 1-many sentence alignments.
With english as example, add a new metadata field en_sent_id_2 (and maybe en_sent_id_3 if needed) to support 1-many sentence alignments. Similarly for other languages (e.g hi_sent_id, hi_sent_id_2, hi_sent_id_3 for hi-zh alignment). Add corresponding fields for the sentence text fields. The prefix for these fields could match the language's slug value in Xposition ('en' , 'zh' , 'he' , 'hi' , 'ko' , 'de')
Alternatively, UD notation could have a list data structure which could be used in the metadata field (more research needed)
In the misc column in conllulex file, replicate the existing token-level annotations for other languages, separated by a new delimiter delineating annotations across languages. The order of languages follows the order of language tags in the metadata.
The Chinese LPP conllulex file has examples of sentence and token level alignments to the English LPP file. There is a script
scripts/generate_alignments_from_conllulex.py
, which generates alignments for the Chinese-English language pair with the Chinese conllulex file as input to the script.Some proposed changes are needed to expand the alignments to support a) Multiple language pairs , b) 1-many sentence alignments.
With english as example, add a new metadata field en_sent_id_2 (and maybe en_sent_id_3 if needed) to support 1-many sentence alignments. Similarly for other languages (e.g hi_sent_id, hi_sent_id_2, hi_sent_id_3 for hi-zh alignment). Add corresponding fields for the sentence text fields. The prefix for these fields could match the language's slug value in Xposition ('en' , 'zh' , 'he' , 'hi' , 'ko' , 'de')
Alternatively, UD notation could have a list data structure which could be used in the metadata field (more research needed)
In the misc column in conllulex file, replicate the existing token-level annotations for other languages, separated by a new delimiter delineating annotations across languages. The order of languages follows the order of language tags in the metadata.
Also see #234
The text was updated successfully, but these errors were encountered: