Skip to content

ROS Package for Speech Recognition with Nvidia NeMO

Notifications You must be signed in to change notification settings

cwwojin/nemo_asr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NeMo-ASR : ROS Package for Speech Recognition using Nvidia NeMo

1. Intro

  • This package is for offline speech recognition (ASR) using Nvidia NeMo toolkit.

Nodes

  • /nemo_node : node for ASR
  • exit : type 'n' for shutdown of the node.

Publishing Topics

  • /speech_recognition : (String) Speech recognition result

2. Prerequisites

  • Tested on python=3.8.10

Dependencies

$ apt-get install sox libsndfile1 ffmpeg portaudio19-dev
$ apt-get install build-essential
$ pip install -r requirements.txt

Nvidia NeMo

$ pip install nemo_toolkit[all]

3. Usage

start NeMo node

$ roslaunch nemo_asr nemo_asr.launch \
    lang:=ko \
    frame:=5 \
    speech_channel:=speech_recognition
  • lang : {"en", "ko"}
  • frame : time(sec) to record each voice command
  • speech_channel : topic name

Interface

[INPUT] 'y' : record for 5 seconds / 'l' : language / 'c' : cli input / 'n' : shutdown  
  • press 'c' to enable command-line input (instead of STT)
  • press 'l' to change language

4. Pre-trained ASR models

Changing models / languages

  • Currently, English and Korean is supported.
  • To change the model, edit "src/utils/agent.py"

Korean ASR model

Author

About

ROS Package for Speech Recognition with Nvidia NeMO

Resources

Stars

Watchers

Forks

Packages

No packages published