Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train the aligner on inuktitut #2

Open
cmesher opened this issue Jun 4, 2012 · 3 comments
Open

Train the aligner on inuktitut #2

cmesher opened this issue Jun 4, 2012 · 3 comments

Comments

@cmesher
Copy link
Owner

cmesher commented Jun 4, 2012

./align.py -t inuktitut_data inuktitut_data

@cmesher
Copy link
Owner Author

cmesher commented Jun 12, 2012

Put two small files into the training dir

$ ls inuktitut_train/
ConversationInuit9-11_extract.lab ConversationInuit9-11_extract.wav

Train aligner

$ ./align.py -t inuktitut_train inuktitut_train/
Initializing...
Training...
Modeling silence...
More training...
Realigning...
More training...
Final aligning...
Making TextGrids...
Alignment complete.

Run aligner

$ ./align_ex.sh inuktitut_train/ConversationInuit9-11_extract.wav inuktitut_train/ConversationInuit9-11_extract.lab
Initializing...
Aligning...
ERROR [+5010] InitSource: Cannot open source file AH
ERROR [+7010] LoadHMMSet: Can't find file
ERROR [+3228] Initialise: LoadHMMSet failed
FATAL ERROR - Terminating program HVite
Making TextGrids...
Alignment complete.
mv: rename .dat/ConversationInuit9-11_extract.TextGrid to ./ConversationInuit9-11_extract.TextGrid: No such file or directory
Output is in ConversationInuit9-11_extract.TextGrid.

@cmesher
Copy link
Owner Author

cmesher commented Jun 12, 2012

How to adjust pruning HTK p 43

(phones0.mlf) The -t option sets the pruning thresholds to be used during training. Pruning limits the range of
state alignments that the forward-backward algorithm includes in its summation and it can reduce the amount of computation required by an order of magnitude. For most training files, a very tight pruning threshold can be set, however, some training files will provide poorer acoustic matching

and in consequence a wider pruning beam is needed. HERest deals with this by having an auto- incrementing pruning threshold. In the above example, pruning is normally 250.0. If re-estimation fails on any particular file, the threshold is increased by 150.0 and the file is reprocessed. This is repeated until either the file is successfully processed or the pruning limit of 1000.0 is exceeded. At this point it is safe to assume that there is a serious problem with the training file and hence the fault should be fixed (typically it will be an incorrect transcription) or the training file should be discarded.

@cmesher
Copy link
Owner Author

cmesher commented Jun 12, 2012

./align.py -t sample_english_data sample_english data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant