This is a Vietnamese Diacritics Restoration tool based on SVMs.
"train" and "predict" directory, you should put LIBLINEAR Libary, "liblinear.so.3" under the "src" directory.
# make no syllable corpus
% cat corpus.txt | python stdin2delete_tonemark.py > resource/viet_corpus_no_tonemark.txt
Firstly, you edit config.ini.
% emacs config.ini
[settings]
path1 = /Users/takahashi/restore-tonemark/train/resource/VNTQcorpus_small.txt
path2 = /Users/takahashi/restore-tonemark/train/resource/VNTQcorpus_small_no_tone_mark.txt
preserve_dir_path = /Users/takahashi/restore-tonemark/train/models
window_size = 2
# training
% cd train
% python train.py
% cd predict
% python predict.py < echo "Toi la sinh vien" # cat input.txt