About Korean ASR #3648

eesungkim · 2022-02-11T17:05:38Z

eesungkim
Feb 11, 2022

Hi guys,

Thank you for sharing a great tool for conversational AI.

I'm going to start a discussion on Korean ASR here. @okuchaiev

eesungkim · 2022-02-11T17:08:25Z

eesungkim
Feb 11, 2022
Author

First, I share one of the models in the following.

Conformer-Transducer-BPE-Small.nemo [Link]

Model Overview

This collection contains small size versions of Conformer-Transducer trained on ksponspeech which is an open-domain Korean dialog corpus.

Model Architecture

Conformer-Transducer model is an autoregressive variant of Conformer model [1] for Automatic Speech Recognition which uses Transducer loss/decoding. You may find more info on the detail of this model here: [Conformer-Transducer Model].

Training

The NeMo toolkit [3] was used for training the models for over several hundred epochs. These model are trained with this [base config].

The tokenizers for these models were built using the text transcript.

Datasets

All the models in this collection are trained on Ksponspeech dataset [Download]

Performance

The list of the available models in this collection is shown in the following. Performances of the ASR models are reported in terms of Word Error Rate (WER%) with mAES decoding.

Version	Tokenizer	eval_clean CER	eval_other CER	eval_clean WER	eval_other WER
v1.5.1	SentencePiece Char	7.97%	8.85%	20.94%	25.18%

15 replies

eesungkim Mar 24, 2022
Author

Hi, @duckyngo. Sorry, I don't have medium model.

titu1994 Mar 25, 2022
Maintainer

So reading table 2 and 3, the scores here with greedy decoding for clean sets is better than their "joint" method ? That's pretty impressive, and you didn't need SSL either so we could potentially further improve these scores. Really impressive !

lifefeel Mar 28, 2022

Very awesome. Thank you for your great contribution. As a Korean, I also would like to participate your work. Send me DM, If you need help.

eesungkim Mar 30, 2022
Author

@titu1994 That's right!

I think that we can have better performance by applying other decoding methods or language model.

In addition, I have a plan to apply the Korean language to the SSL models such as wav2vec 2.0 and HuBERT. Are there any plans for development of the SSL models within Nemo? I'm currently experiencing that the SSL model within Nemo is unstable, right?

titu1994 Mar 30, 2022
Maintainer

@sam1373 has recently added quite good SSL support to Nemo, there's also a tutorial that will be released in Nemo 1.8 in the coming weeks. I think if currently supports contrastive learning based SSL but there are experiments ongoing to add HuBERT style as well.

titu1994 · 2022-04-21T01:32:23Z

titu1994
Apr 21, 2022
Maintainer

@eesungkim With the NeMo 1.8.1 release (soon™), we will support Huggingface Hub for external contributions (starting with ASR support). See #4030 for more details.

If you would like, you can upload a public checkpoint for Korean ASR to HuggingFace and add the links here so that others may use it easily.

When naming the model, please try to follow the current conversion for Conformer models- stt_{2_char_lang_id}_conformer_{ctc/transducer}_{small/medium/large}

So for example, stt_kr_conformer_transducer_{small/medium} would be the appropriate name for a 30 M param Conformer Transducer for the Korean language.

0 replies

okuchaiev · 2022-06-03T23:22:54Z

okuchaiev
Jun 3, 2022
Collaborator

@eesungkim It is now very easy to publish your model on Hugging Face Hub. @titu1994 prepared a great tutorial on how to do this #4333 I would encourage you publish your model (under your name/org) on HF Hub.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Korean ASR #3648

{{title}}

Replies: 3 comments 15 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

About Korean ASR #3648

eesungkim Feb 11, 2022

Replies: 3 comments · 15 replies

eesungkim Feb 11, 2022 Author

Conformer-Transducer-BPE-Small.nemo [Link]

Model Overview

Model Architecture

Training

Datasets

Performance

eesungkim Mar 24, 2022 Author

titu1994 Mar 25, 2022 Maintainer

lifefeel Mar 28, 2022

eesungkim Mar 30, 2022 Author

titu1994 Mar 30, 2022 Maintainer

titu1994 Apr 21, 2022 Maintainer

okuchaiev Jun 3, 2022 Collaborator

eesungkim
Feb 11, 2022

Replies: 3 comments 15 replies

eesungkim
Feb 11, 2022
Author

eesungkim Mar 24, 2022
Author

titu1994 Mar 25, 2022
Maintainer

eesungkim Mar 30, 2022
Author

titu1994 Mar 30, 2022
Maintainer

titu1994
Apr 21, 2022
Maintainer

okuchaiev
Jun 3, 2022
Collaborator