NeMo-ASR : ROS Package for Speech Recognition using Nvidia NeMo

1. Intro

This package is for offline speech recognition (ASR) using Nvidia NeMo toolkit.

Nodes

/nemo_node : node for ASR
exit : type 'n' for shutdown of the node.

Publishing Topics

/speech_recognition : (String) Speech recognition result

2. Prerequisites

Tested on python=3.8.10

Dependencies

$ apt-get install sox libsndfile1 ffmpeg portaudio19-dev
$ apt-get install build-essential

$ pip install -r requirements.txt

Nvidia NeMo

guide @ https://github.com/NVIDIA/NeMo

$ pip install nemo_toolkit[all]

3. Usage

start NeMo node

$ roslaunch nemo_asr nemo_asr.launch \
    lang:=ko \
    frame:=5 \
    speech_channel:=speech_recognition

lang : {"en", "ko"}
frame : time(sec) to record each voice command
speech_channel : topic name

Interface

[INPUT] 'y' : record for 5 seconds / 'l' : language / 'c' : cli input / 'n' : shutdown

press 'c' to enable command-line input (instead of STT)
press 'l' to change language

4. Pre-trained ASR models

this package currently uses Conformer-CTC models - https://arxiv.org/abs/2005.08100

Changing models / languages

Currently, English and Korean is supported.
To change the model, edit "src/utils/agent.py"

Korean ASR model

"cwwojin/stt_kr_conformer_ctc_medium" - https://huggingface.co/cwwojin/stt_kr_conformer_ctc_medium
This model is trained on KsponSpeech dataset - https://aihub.or.kr/
Preprocessing & training scripts using KsponSpeech can be found at - https://github.com/rirolab/Co-op/tree/main/Woojin%20Choi/03_nemo_KsponSpeech_train

Author

Woojin Choi / [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
launch		launch
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
package.xml		package.xml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeMo-ASR : ROS Package for Speech Recognition using Nvidia NeMo

1. Intro

Nodes

Publishing Topics

2. Prerequisites

Dependencies

Nvidia NeMo

3. Usage

start NeMo node

Interface

4. Pre-trained ASR models

Changing models / languages

Korean ASR model

Author

About

Releases 1

Packages

Languages

cwwojin/nemo_asr

Folders and files

Latest commit

History

Repository files navigation

NeMo-ASR : ROS Package for Speech Recognition using Nvidia NeMo

1. Intro

Nodes

Publishing Topics

2. Prerequisites

Dependencies

Nvidia NeMo

3. Usage

start NeMo node

Interface

4. Pre-trained ASR models

Changing models / languages

Korean ASR model

Author

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages