Development of this project has moved to jonorthwash/ud-annotatrix in the notatrix/
folder. Please file any issues against this project in that repository. See jonorthwash/ud-annotatrix#436 for more details.
Experimental notational system for UD Annotatrix, combines CoNLL-U and CG3 markup formats into one backend that combines the functionality of both.
For basic usage, just reference the main file from a CDN in an HTML script tag.
For example:
<script type="text/javascript" src=""></script>
text = 'this is a test';
sent = new nx.Sentence(text);
conllu ='conllu');
Or, just clone the repository!
$ cd ~/src
$ git clone notatrix
Then, you can test it out directly in the browser by including a path to notatrix/build/notatrix.js
in a script tag of an HTML document. All of the notatrix
methods will be available on a global nx
For example:
<script type="text/javascript" src="file:///home/keggsmurph21/src/notatrix/build/notatrix.js"></script>
text = 'this is a test';
sent = new nx.Sentence(text);
conllu ='conllu');
Alternatively, you can use this package in Node.js. To install the package and all its dependencies:
$ cd ~/src/some/existing/project
$ npm install notatrix
$ node # NOTE: this command opens the Node.js REPL
Then notatrix
is available as a module via
> const nx = require('notatrix');
The basic unit of notatrix
is the notatrix.Sentence
. Instances of this class hold format-agnostic information about sentences, and can be constructed from
const nx = require('notatrix');
const brackets = '[root [nsubj I] have [obj [amod [advmod too] many] commitments] [advmod right now] [punct .]]';
const sent = new nx.Sentence(brackets);
const nx = require('notatrix');
const cg3 = `# sent_id = mst-0001
# text = Peşreve başlamalı.
"peşrev" Noun @obl #1->2
"başla" Verb SpaceAfter=No @root #2->0
"." Punc @punct #3->2`;
const sent = new nx.Sentence(cg3);
const nx = require('notatrix');
const conllu = `# sent_id = chapID01:paragID1:sentID1
# text = Кечаень сыргозтизь налкставтыця карвот .
# text[eng] = Kechai was awoken by annoying flies.
1 Кечаень Кечай N N Sem/Ant_Mal|Prop|SP|Gen|Indef 2 obj _ Кечаень
2 сыргозтизь сыргозтемс V V TV|Ind|Prt1|ScPl3|OcSg3 0 root _ сыргозтизь
3 налкставтыця налкставтомс PRC Prc V|TV|PrcPrsL|Sg|Nom|Indef 4 amod _ налкставтыця
4 карвот карво N N Sem/Ani|N|Pl|Nom|Indef 2 nsubj _ карвот
5 . . CLB CLB CLB 2 punct _ .`;
const sent = new nx.Sentence(conllu);
const nx = require('notatrix');
const params = [
{ form: 'hello' },
{ form: 'world' }
const sent = new nx.Sentence(params);
const nx = require('notatrix');
const text = 'this is my test string';
const sent = new nx.Sentence(text);
const nx = require('notatrix');
const sd = `He says that you like to swim
ccomp(says, like)
mark(like, that)`;
const sent = new nx.Sentence(sd);
const nx = require('notatrix');
const conllu = `# text = He boued e tebr Mona er gegin.
# text[eng] = Mona eats her food here in the kitchen.
# labels = press_1986 ch_syntax p_197 to_check
1 He he det _ pos|f|sp 2 det _ _
2 boued boued n _ m|sg 4 obj _ _
3 e e vpart _ obj 4 aux _ _
4 tebr debriñ vblex _ pri|p3|sg 0 root _ _
5 Mona Mona np _ ant|f|sg 4 nsubj _ _
6-7 er _ _ _ _ _ _ _ _
6 _ e pr _ _ 8 case _ _
7 _ an det _ def|sp 8 det _ _
8 gegin kegin n _ f|sg 4 obl _ _
9 . . sent _ _ 4 punct _ _`;
const sent = new nx.Sentence(conllu);
console.log(sent.comments.length); // expected 3
console.log(sent.tokens.length); // expected 8, only counts top-level tokens
console.log(sent.size); // expected 10, counts all tokens
Some interesting properties here are notatrix.Sentence.comments
, notatrix.Sentence.tokens
, and notatrix.Sentence.size
. For more information about the syntax of working with notatrix.Sentence
objects, check out the API Documentation.
Once we have a notatrix.Sentence
, we can output it to any supported format. Since not all formats will support all of the information we might want to encode, we get both an output
string and a loss
array that gives all of the fields we were unable to encode. To output to a specific format, we can call
, where format is one of (coming soon!), apertium stream
, cg3
, conllu
, notatrix serial
, params
, plain text
, or sd
For example:
const nx = require('notatrix');
const conllu = `# this is my first comment
# here is another comment
1 hello hello _ _ _ 0 root _
2 , , PUNCT _ _ 1 punct _ _
3 world world _ _ _ 1 _ _`;
const sent = new nx.Sentence(conllu);
//const toApertiumStream ='apertium stream');
const toBrackets ='brackets');
/* expected:
output: '[root hello [punct ,] [_ world]]',
loss: [ 'comments', 'lemma', 'upostag' ]
const toCG3 ='cg3');
/* expected:
output: '# this is my first comment\n# here is another comment\n"<hello>"\n\t"hello" @root #1->0\n"<,>"\n\t"," PUNCT @punct #2->1\n"<world>"\n\t"world" #3->1',
loss: []
const toConllu ='conllu');
/* expected:
output: '# this is my first comment\n# here is another comment\n1\thello\thello\t_\t_\t_\t0\troot\t_\t_\n2\t,\t,\tPUNCT\t_\t_\t1\tpunct\t_\t_\n3\tworld\tworld\t_\t_\t_\t1\t_\t_\t_',
loss: []
const toSerial ='notatrix serial');
/* expected:
output: { ... },
loss: []
const toParams ='params');
/* expected:
output: [
{ form: 'hello', lemma: 'hello', head: '0' },
{ form: ',', lemma: ',', upostag: 'PUNCT', head: '1' },
{ form: 'world', lemma: 'world', head: '1' }
loss: [ 'comments' ]
const toPlainText ='plain text');
/* expected:
output: 'hello, world',
loss: [ 'comments', 'lemma', 'heads', 'upostag' ]
const toSD ='sd');
/* expected:
output: '# this is my first comment\n# here is another comment\nhello, world\nroot(ROOT, hello)\npunct(hello, ,)\n_(hello, world)',
loss: [ 'lemma', 'upostag' ]
Feel free to submit GitHub issues for any bugs or feature requests! To get started, clone the repository, install the dependencies, and run tests:
$ cd ~/src
$ git clone notatrix
$ cd notatrix
$ npm install
$ npm test
If you plan on submitting a pull request, make sure that all the tests pass and that the project still compiles!
$ npm test
$ npm run build
$ cd ~/src/notatrix
$ node
> const nx = require('.');
- UD Annotatrix is a client-side, browser-only, language-independent tool for editing dependency trees
- Notatrix Utils is a collection of utilities for working with the
format, including a database, a basic server, a web scraper, and (other stuff coming soon)