Useful functions for E2E Speech Recognition training with PyTorch and CUDA.
Here is a simple use case with Reinforcement Learning and RNN-T loss:
blank = torch.tensor([0], dtype=torch.int).cuda()
space = torch.tensor([1], dtype=torch.int).cuda()
xs = model.greedy_decode(xs, sampled=True)
torch_edit_distance.remove_blank(xs, xn, blank)
rewards = 1 - torch_edit_distance.compute_wer(xs, ys, xn, yn, blank, space)
nll = rnnt_loss(zs, ys, xn, yn)
loss = nll * rewards
Levenshtein edit-distance with detailed statistics for ins/del/sub operations.
Merge repeated tokens, useful for CTC-based model.
Remove unnecessary blank tokens, useful for CTC, RNN-T, RNA models.
Remove leading, trailing and repeated middle separators.
- C++11 compiler (tested with GCC 9.4.0).
- Python: 3.5, 3.6, 3.7, 3.8, 3.9 (tested with version 3.8).
- PyTorch >= 1.5.0 (tested with version 1.13.1+cu116).
- CUDA Toolkit (tested with version 11.2).
There is no compiled version of the package. The following setup instructions compile the package from the source code locally.
pip install torch_edit_distance
git clone https://github.com/1ytic/pytorch-edit-distance
cd pytorch-edit-distance
python setup.py install
python -m torch_edit_distance.test