Skip to content

Commit

Permalink
resolve merge conflict
Browse files Browse the repository at this point in the history
  • Loading branch information
Edward Garrett committed Dec 5, 2017
2 parents 0084c74 + 5138cbb commit 230c858
Show file tree
Hide file tree
Showing 273 changed files with 34,060 additions and 33,741 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
scala.txt
minimal-dependencies.cg
tidc2upos.cg
output.txt
temp.txt
examples.txt
17 changes: 7 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,35 +130,32 @@ file advances the break forward until it immediately precedes the following sent
Subsequent derived formats, including Niceline CG, also follow this practice,
which makes the texts easier for later tools to process.

## Niceline CG with Universal Dependencies
The file `minimal-dependencies.txt` is a script that uses the
## Niceline CG with Universal POS tags
The file `tidc2upos.txt` is a script that uses the
[Constraint Grammar](http://visl.sdu.dk/constraint_grammar.html) formalism to add
universal POS tags and universal features to our text corpus.
Note that CG3 does not directly implement feature-value tags,
so the universal features just come out as complex tags, for example
`NumType=Card|NumForm=Digit`.

In addition, the script adds minimal syntactic dependencies to the texts.
We add only those dependencies that can be inferred from the
POS tags. For example, if a case marker follows a count noun, then it depends on that count
noun. We know this because case markers are not free-standing words in Tibetan. (Complications may arise
in regards to nominal compounds but we ignore that now.)
In addition, the script adds self-referential dependency tags for each word.
This makes it easier to convert the resulting Niceline file to CoNNL-U format.

For the output of this process, see `mila-vislcg-UD.txt` and similarly named files.

To apply the grammar to the input yourself, first install VISL CG3 and then compile the grammar:

`cg-comp minimal-dependencies.txt minimal-dependencies.cg`
`cg-comp tidc2upos.txt tidc2upos.cg`

Then, apply the grammar to the input, specifying an output file:

`vislcg3 -g minimal-dependencies.cg -I mila-vislcg.txt -O mila-vislcg-UD.txt`
`vislcg3 -g tidc2upos.cg -I mila-vislcg.txt -O mila-vislcg-UD.txt`

Alternatively, the output can be piped to a different format using the
[cg-conv](http://beta.visl.sdu.dk/cg3/chunked/cmdreference.html#cg-conv) tool.
This repository, for example, includes the output of the following command:

`vislcg3 -g minimal-dependencies.cg -I mila-vislcg.txt | cg-conv -N > mila-niceline-UD.txt`
`vislcg3 -g tidc2upos.cg -I mila-vislcg.txt | cg-conv -N > mila-niceline-UD.txt`

The resulting file - in [Niceline CG format](http://beta.visl.sdu.dk/cg3/chunked/streamformats.html#stream-niceline) -
is compact and easily converted to other formats.
Expand Down
1 change: 1 addition & 0 deletions brat-config/annotation.conf
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ arg2-lvc Arg1:VERB, Arg2:NOUN
arg3 Arg1:VERB, Arg2:ADJ|DET|NOUN|NUM|PRON|PROPN|VERB
argcl Arg1:VERB, Arg2:ADJ|DET|NOUN|NUM|PRON|PROPN|VERB
aux Arg1:AUX|VERB, Arg2:ADJ|AUX|DET|NOUN|NUM|PRON|PROPN|VERB
aux-lvc Arg1:AUX|VERB, Arg2:AUX|VERB
cop Arg1:ADJ|DET|NOUN|NUM|PRON|PROPN|VERB, Arg2:AUX|VERB
obl Arg1:VERB, Arg2:ADJ|DET|NOUN|NUM|PRON|PROPN|VERB
obl-arg Arg1:VERB, Arg2:ADJ|DET|NOUN|NUM|PRON|PROPN|VERB
Expand Down
2 changes: 2 additions & 0 deletions brat-config/visual.conf
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ X | ?
arg1-lvc | arg1:lvc
arg2-hon | arg2:hon
arg2-lvc | arg2:lvc
aux-lvc | aux:lvc
obl-arg | obl:arg
obl-adv | obl:adv

Expand All @@ -32,6 +33,7 @@ arg2-lvc color:#003399, dashArray:-
arg3 color:#004DB3, dashArray:-
argcl color:#004DB3, dashArray:-
aux color:#000000, dashArray:3-3
aux-lvc color:#000000, dashArray:3-3
cop color:#000000, dashArray:3-3
obl color:#006633, dashArray:-
obl-arg color:#006633, dashArray:-
Expand Down
148 changes: 74 additions & 74 deletions classical-tibetan/mila/brat-annotations/011b.ann
Original file line number Diff line number Diff line change
Expand Up @@ -629,194 +629,194 @@ A604a Case T604 Impf
N604 Reference T604 Nonverbs:0121 ཅིང་√cv
T605 VERB 996 1000 འདུག
N605 Reference T605 Hill:0875 འདུག་
T606 PRON 1000 1004 ངེད་
T606 PRON 1001 1005 ངེད་
A606a PronType T606 Prs
N606 Reference T606 Nonverbs:0204 ངེད་√p
T607 NOUN 1004 1013 མིང་སྲིང་
T607 NOUN 1005 1014 མིང་སྲིང་
A607a Number T607 Sing
T608 NUM 1013 1018 གཉིས་
T608 NUM 1014 1019 གཉིས་
A608a NumType T608 Card
A608b NumForm T608 Word
T609 ADP 1018 1023 ཀྱིས་
T609 ADP 1019 1024 ཀྱིས་
A609a Case T609 Agn
N609 Reference T609 Nonverbs:0026 གྱིས་√case
T610 PART 1023 1026 ནི་
T610 PART 1024 1027 ནི་
N610 Reference T610 Nonverbs:0343 ནི་√cl
T611 NOUN 1026 1033 རམ་མདའ་
T611 NOUN 1027 1034 རམ་མདའ་
A611a Number T611 Sing
N611 Reference T611 Nonverbs:0630 རམ་མདའ་
T612 VERB 1033 1038 ངུ་བ་
T612 VERB 1034 1039 ངུ་བ་
A612a Tense T612 Fut/Pres
A612b VerbForm T612 Vnoun
N612 Reference T612 Hill:0407 ངུ་
T613 VERB 1038 1044 མིན་པ་
T613 VERB 1039 1045 མིན་པ་
A613a Tense T613 Past/Pres
A613b VerbForm T613 Vnoun
T614 PART 1044 1046 མ་
T614 PART 1045 1047 མ་
A614a Polarity T614 Neg
N614 Reference T614 Nonverbs:0339 མ་√neg
T615 VERB 1046 1051 བྱུང་
T615 VERB 1047 1052 བྱུང་
A615a Tense T615 Past
N615 Reference T615 Hill:1243 འབྱུང་
T616 PUNCT 1051 1052 །
T616 PUNCT 1052 1053
N616 Reference T616 Nonverbs:0529 །
T618 NOUN 1053 1058 ཨ་ཞང་
T618 NOUN 1054 1059 ཨ་ཞང་
A618a Number T618 Sing
T619 ADP 1058 1062 གིས་
T619 ADP 1059 1063 གིས་
A619a Case T619 Agn
N619 Reference T619 Nonverbs:0026 གྱིས་√case
T620 PART 1062 1065 ནི་
T620 PART 1063 1066 ནི་
N620 Reference T620 Nonverbs:0343 ནི་√cl
T621 NOUN 1065 1070 ཨ་ཁུ་
T621 NOUN 1066 1071 ཨ་ཁུ་
A621a Number T621 Sing
T622 ADP 1070 1072 ལ་
T622 ADP 1071 1073 ལ་
A622a Case T622 All
N622 Reference T622 Nonverbs:0028 ལ་√case
T623 NOUN 1072 1075 བུ་
T623 NOUN 1073 1076 བུ་
A623a Number T623 Sing
T624 ADJ 1075 1081 མང་པོ་
T625 VERB 1081 1086 ཡོད་པ
T624 ADJ 1076 1082 མང་པོ་
T625 VERB 1082 1087 ཡོད་པ
A625a VerbForm T625 Vnoun
N625 Reference T625 Hill:1632 ཡོད་
T626 ADP 1086 1088 ས་
T626 ADP 1087 1089 ས་
A626a Case T626 Agn
N626 Reference T626 Nonverbs:0026 གྱིས་√case
T627 VERB 1088 1093 རྒོལ་
T627 VERB 1089 1094 རྒོལ་
A627a Tense T627 Pres
#627l AnnotatorNotes T627 [རྒལ་][རྒོལ་]
T628 PART 1093 1095 མ་
T628 PART 1094 1096 མ་
A628a Polarity T628 Neg
N628 Reference T628 Nonverbs:0339 མ་√neg
T629 VERB 1095 1098 ནུས
T629 VERB 1096 1099 ནུས
A629a VerbType T629 Aux
N629 Reference T629 Hill:0980 ནུས་
T630 PUNCT 1098 1099 །
T630 PUNCT 1099 1100
N630 Reference T630 Nonverbs:0529 །
T632 DET 1100 1104 གཞན་
T632 DET 1101 1105 གཞན་
N632 Reference T632 Nonverbs:0271 གཞན་√d
T633 NOUN 1104 1111 ཡུལ་མི་
T633 NOUN 1105 1112 ཡུལ་མི་
A633a Number T633 Sing
T634 PRON 1111 1115 ངེད་
T634 PRON 1112 1116 ངེད་
A634a PronType T634 Prs
N634 Reference T634 Nonverbs:0204 ངེད་√p
T635 PRON 1115 1118 རང་
T635 PRON 1116 1119 རང་
A635a Reflex T635 Yes
N635 Reference T635 Nonverbs:0230 རང་√p
T636 ADP 1118 1120 ལ་
T636 ADP 1119 1121 ལ་
A636a Case T636 All
N636 Reference T636 Nonverbs:0028 ལ་√case
T637 VERB 1120 1126 དཀར་བ་
T637 VERB 1121 1127 དཀར་བ་
A637a Tense T637 Fut/Pres
A637b VerbForm T637 Vnoun
T638 DET 1126 1131 རྣམས་
T638 DET 1127 1132 རྣམས་
A638a Number T638 Plur
N638 Reference T638 Nonverbs:0305 རྣམས་√d
T639 PART 1131 1134 ནི་
T639 PART 1132 1135 ནི་
N639 Reference T639 Nonverbs:0343 ནི་√cl
T640 NOUN 1134 1140 མ་སྨད་
T640 NOUN 1135 1141 མ་སྨད་
A640a Number T640 Sing
T641 DET 1140 1142 ཚོ
T641 DET 1141 1143 ཚོ
A641a Number T641 Plur
N641 Reference T641 Nonverbs:0324 ཚོ་√d
T642 ADP 1142 1145 འི་
T642 ADP 1143 1146 འི་
A642a Case T642 Gen
N642 Reference T642 Nonverbs:0012 གྱི་√case
T643 NOUN 1145 1150 སྙིང་
T643 NOUN 1146 1151 སྙིང་
A643a Number T643 Sing
T644 VERB 1150 1156 རྗེ་བ་
T644 VERB 1151 1157 རྗེ་བ་
A644a Tense T644 Pres
A644b VerbForm T644 Vnoun
N644 Reference T644 Hill:0587 རྗེ་
T645 ADP 1156 1158 ལ་
T645 ADP 1157 1159 ལ་
A645a Case T645 All
N645 Reference T645 Nonverbs:0028 ལ་√case
T646 VERB 1158 1162 ཟེར་
T646 VERB 1159 1163 ཟེར་
#646l AnnotatorNotes T646 [ཟེར་√1][ཟེར་√2]
T647 PART 1162 1165 མི་
T647 PART 1163 1166 མི་
A647a Polarity T647 Neg
N647 Reference T647 Nonverbs:0337 མི་√neg
T648 VERB 1165 1170 ངུ་བ་
T648 VERB 1166 1171 ངུ་བ་
A648a Tense T648 Fut/Pres
A648b VerbForm T648 Vnoun
N648 Reference T648 Hill:0407 ངུ་
T649 PART 1170 1173 མི་
T649 PART 1171 1174 མི་
A649a Polarity T649 Neg
N649 Reference T649 Nonverbs:0337 མི་√neg
T650 VERB 1173 1177 འདུག
T650 VERB 1174 1178 འདུག
A650a Tense T650 Fut/Pres
N650 Reference T650 Hill:0875 འདུག་
T651 DET 1177 1181 གཞན་
T651 DET 1179 1183 གཞན་
N651 Reference T651 Nonverbs:0271 གཞན་√d
T652 DET 1181 1186 རྣམས་
T652 DET 1183 1188 རྣམས་
A652a Number T652 Plur
N652 Reference T652 Nonverbs:0305 རྣམས་√d
T653 PART 1186 1190 ཀྱང་
T653 PART 1188 1192 ཀྱང་
N653 Reference T653 Nonverbs:0352 འང་√cl
T654 NOUN 1190 1199 ཤུགས་རིང་
T654 NOUN 1192 1201 ཤུགས་རིང་
A654a Number T654 Sing
T655 ADV 1199 1205 ནར་ནར་
T655 ADV 1201 1207 ནར་ནར་
A655a AdvType T655 Mim
N655 Reference T655 Nonverbs:0400 ནར་ནར་
T656 VERB 1205 1209 འདུག
T656 VERB 1207 1211 འདུག
N656 Reference T656 Hill:0875 འདུག་
T657 NOUN 1209 1214 ཨ་ཁུ་
T657 NOUN 1212 1217 ཨ་ཁུ་
A657a Number T657 Sing
T658 ADP 1214 1217 དང་
T658 ADP 1217 1220 དང་
A658a Case T658 Com
N658 Reference T658 Nonverbs:0031 དང་√case
T659 NOUN 1217 1222 ཨ་ནེ་
T659 NOUN 1220 1225 ཨ་ནེ་
A659a Number T659 Sing
T660 PART 1222 1225 ནི་
T660 PART 1225 1228 ནི་
N660 Reference T660 Nonverbs:0343 ནི་√cl
T661 PRON 1225 1229 ངེད་
T661 PRON 1228 1232 ངེད་
A661a PronType T661 Prs
N661 Reference T661 Nonverbs:0204 ངེད་√p
T662 ADP 1229 1231 ལ་
T662 ADP 1232 1234 ལ་
A662a Case T662 All
N662 Reference T662 Nonverbs:0028 ལ་√case
T663 NOUN 1231 1235 ནོར་
T663 NOUN 1234 1238 ནོར་
A663a Number T663 Coll
T664 VERB 1235 1240 དགོས་
T664 VERB 1238 1243 དགོས་
N664 Reference T664 Hill:0248 དགོས་
T665 VERB 1240 1244 ཟེར་
T665 VERB 1243 1247 ཟེར་
#665l AnnotatorNotes T665 [ཟེར་√1][ཟེར་√2]
T666 NOUN 1244 1248 ནོར་
T666 NOUN 1247 1251 ནོར་
A666a Number T666 Coll
T667 PRON 1248 1253 ཁྱེད་
T667 PRON 1251 1256 ཁྱེད་
A667a PronType T667 Prs
N667 Reference T667 Nonverbs:0227 ཁྱེད་√p
T668 PRON 1253 1256 རང་
T668 PRON 1256 1259 རང་
A668a Reflex T668 Yes
N668 Reference T668 Nonverbs:0230 རང་√p
T669 ADP 1256 1258 ལ་
T669 ADP 1259 1261 ལ་
A669a Case T669 All
N669 Reference T669 Nonverbs:0028 ལ་√case
T670 VERB 1258 1264 འདུག་པ
T670 VERB 1261 1267 འདུག་པ
A670a VerbForm T670 Vnoun
N670 Reference T670 Hill:0875 འདུག་
T671 PUNCT 1264 1265
T671 PUNCT 1267 1268
N671 Reference T671 Nonverbs:0529 །
T673 NOUN 1266 1283 ཡུལ་མི་ཁྱིམ་མཚེས་
T673 NOUN 1269 1286 ཡུལ་མི་ཁྱིམ་མཚེས་
A673a Number T673 Sing
T674 ADP 1283 1285 ལ་
T674 ADP 1286 1288 ལ་
A674a Case T674 All
N674 Reference T674 Nonverbs:0028 ལ་√case
T675 NOUN 1285 1290 ཤ་ཆང་
T675 NOUN 1288 1293 ཤ་ཆང་
A675a Number T675 Sing
T676 ADJ 1290 1298 ཕངས་མེད་
T677 VERB 1298 1303 བཤིག་
T676 ADJ 1293 1301 ཕངས་མེད་
T677 VERB 1301 1306 བཤིག་
A677a Tense T677 Past
N677 Reference T677 Hill:0557 འཇིག་√2
T678 SCONJ 1303 1306 ནས་
T678 SCONJ 1306 1309 ནས་
A678a Case T678 Ela
N678 Reference T678 Nonverbs:0074 ནས་√cv
T679 NOUN 1306 1314 སྟོན་མོ་
T679 NOUN 1309 1317 སྟོན་མོ་
A679a Number T679 Sing
T680 VERB 1314 1323 བཤམ་རྒྱུ་
T680 VERB 1317 1326 བཤམ་རྒྱུ་
A680a Tense T680 Fut
A680b VerbForm T680 Vnoun
T681 VERB 1323 1329 འདུག་པ
T681 VERB 1326 1332 འདུག་པ
A681a VerbForm T681 Vnoun
N681 Reference T681 Hill:0875 འདུག་
T682 PUNCT 1329 1330
T682 PUNCT 1332 1333
N682 Reference T682 Nonverbs:0529 །
Loading

0 comments on commit 230c858

Please sign in to comment.