SVMs based morphological analyzer for word segmentation and part-of-speech tagging.
Old version(Python2 and YamCha) is here.
$ pip install visvmtagger
$ python
>>> from visvmtagger import Tagger
>>> t = Tagger()
>>> t.tokenize("Tôi là sinh viên .")
[Tôi(B-PP), là(B-VB), sinh(B-NN), viên(I-NN), .(B-SB)]
>>> t.tokenize("Tôi là sinh viên .")[0].surface # pos is also available
'Tôi'
Please see a main()
in visvmtagger/train.py
.
MIT