- This package is for offline speech recognition (ASR) using Nvidia NeMo toolkit.
- /nemo_node : node for ASR
- exit : type 'n' for shutdown of the node.
- /speech_recognition : (String) Speech recognition result
- Tested on python=3.8.10
$ apt-get install sox libsndfile1 ffmpeg portaudio19-dev
$ apt-get install build-essential
$ pip install -r requirements.txt
- guide @ https://github.com/NVIDIA/NeMo
$ pip install nemo_toolkit[all]
$ roslaunch nemo_asr nemo_asr.launch \
lang:=ko \
frame:=5 \
speech_channel:=speech_recognition
- lang : {"en", "ko"}
- frame : time(sec) to record each voice command
- speech_channel : topic name
[INPUT] 'y' : record for 5 seconds / 'l' : language / 'c' : cli input / 'n' : shutdown
- press 'c' to enable command-line input (instead of STT)
- press 'l' to change language
- this package currently uses Conformer-CTC models - https://arxiv.org/abs/2005.08100
- Currently, English and Korean is supported.
- To change the model, edit "src/utils/agent.py"
- "cwwojin/stt_kr_conformer_ctc_medium" - https://huggingface.co/cwwojin/stt_kr_conformer_ctc_medium
- This model is trained on KsponSpeech dataset - https://aihub.or.kr/
- Preprocessing & training scripts using KsponSpeech can be found at - https://github.com/rirolab/Co-op/tree/main/Woojin%20Choi/03_nemo_KsponSpeech_train
- Woojin Choi / [email protected]