Skip to content

kanjirz50/restore-tonemark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Vietnamese Diacritics Restoration

This is a Vietnamese Diacritics Restoration tool based on SVMs.

Usage

"train" and "predict" directory, you should put LIBLINEAR Libary, "liblinear.so.3" under the "src" directory.

Train

Make corpus

# make no syllable corpus
% cat corpus.txt | python stdin2delete_tonemark.py > resource/viet_corpus_no_tonemark.txt

Training

Firstly, you edit config.ini.

% emacs config.ini
[settings]
path1 = /Users/takahashi/restore-tonemark/train/resource/VNTQcorpus_small.txt
path2 = /Users/takahashi/restore-tonemark/train/resource/VNTQcorpus_small_no_tone_mark.txt
preserve_dir_path = /Users/takahashi/restore-tonemark/train/models
window_size = 2
# training
% cd train
% python train.py

Predict

% cd predict
% python predict.py < echo "Toi la sinh vien" # cat input.txt

About

Vietnamese diacritics restoration

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages