Howto perform an ASR Neural Rescoring for Italian #2572

ftamburin · 2021-07-28T12:52:11Z

ftamburin
Jul 28, 2021

I was successful in rescoring the ASR output (provided by the NVIDIA STT_it_QuartzNet15x5 model) using Beam Search & KenLM and training the latter on a large Italian corpus (say 'it_corpus.txt').

Now, I would like to apply a Neural Rescorer, but I cannot find a way to do that given that there is no Nemo LM for Italian already trained and available.
Ideally, I would like to use one of the several Transformer models provided by Huggingface for Italian, but I cannot understand how to do it: some doc pages state that it is possible to load HF models, but not how to do it, and the standard script 'eval_neural_rescoring.py' requires a '.nemo' LM.
Is it possible to load HF models into Nemo or convert them into .nemo LM?
Alternatively, how can I train a LM from scratch on 'it_corpus.txt' by using the 'transformer_lm.py' script?
I did not find any example in the docs.
Thanks!

Answered by VahidooX

Jul 28, 2021

We do not support using HF pretrained models as neural rescorer currently but I think it should not be that hard to write such a script to load the HF model and rescore sentences. Just keep in mind that if you are referring to BERT-like models in HF, you can use them as neural rescorer for ASR models generally but they are not efficient computationally to be used as neural rescorers. It is better to use GPT style LM like regular Transformer LM. I have explained a little more here why:
#2313

The other thing is that those pretrianed models are trained on different types/domains of data while ASR models are generally trained on lower case with no punctuation text. So it is likely that a mode…

View full answer

VahidooX · 2021-07-28T20:28:03Z

VahidooX
Jul 28, 2021
Collaborator

We do not support using HF pretrained models as neural rescorer currently but I think it should not be that hard to write such a script to load the HF model and rescore sentences. Just keep in mind that if you are referring to BERT-like models in HF, you can use them as neural rescorer for ASR models generally but they are not efficient computationally to be used as neural rescorers. It is better to use GPT style LM like regular Transformer LM. I have explained a little more here why:
#2313

The other thing is that those pretrianed models are trained on different types/domains of data while ASR models are generally trained on lower case with no punctuation text. So it is likely that a model you train by yourself from scratch may give better results if you have enough large text corpus.

We are going to add more documentation on training Transfromer LM models soon. In the meantime, you may try to train it as the following:

1-Train a yttm tokenizer with

import youtokentome as yttm
yttm.BPE.train(data=train.txt, vocab_size=4096, model=model_path)

Specify path train_ds and validation_ds path as train.txt and tokenizer path as path to the file created in step 1. If you use transformer_lm_config.yaml as a base, you should replace train_ds params corresponding to tarred dataset with plain text as it is done in validation_ds and test_ds. You may need to tune the learning policy based on your batch size, and also size of the model. The following config results in a model of size ~200M.

python examples/nlp/language_modeling/transformer_lm.py \
	--config-path=conf \
	--config-name=transformer_lm_config \
	trainer.gpus=-1 \
	model.tokenizer_name=yttm \
	model.tokenizer_model=tokenizer_model_file.yttm \
	model.train_ds.use_tarred_dataset=false \
	+model.train_ds.file_name=train.txt \
	+model.train_ds.shuffle=True \
	model.train_ds.tokens_in_batch=4096 \
	model.encoder.hidden_size=1024 \
	model.encoder.inner_size=4096 \
	model.encoder.num_attention_heads=16 \
	model.encoder.num_layers=16 \
	model.encoder.pre_ln=true \
	model.optim.name=adamw \
	model.optim.lr=2.0 \
	model.optim.betas=[0.9,0.98] \
	model.optim.weight_decay=1e-3 \
	model.optim.sched.warmup_steps=20000 \
	~model.optim.sched.warmup_ratio \
	model.optim.sched.name=NoamAnnealing \
	+model.optim.sched.d_model=1024 \
	+model.optim.sched.min_lr=1e-6 \
	+model.optim.eps=1e-8 \
	+trainer.gradient_clip_val=0

4 replies

ftamburin Aug 1, 2021
Author

Many thanks! It seems to work...

muntasir2000 Aug 28, 2021

When trying rescoring the output of a subword based citrinet model, do I need to use the same tokenizer for training the LM too?

titu1994 Aug 28, 2021
Maintainer

Yes that would be best. The scripts we provide builds an LM using the tokenizer of the model provided to it.

Dreahim Jan 31, 2024

hello, How can I use transformer LM for inference with ASR model ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Howto perform an ASR Neural Rescoring for Italian #2572

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Howto perform an ASR Neural Rescoring for Italian #2572

ftamburin Jul 28, 2021

Replies: 1 comment · 4 replies

VahidooX Jul 28, 2021 Collaborator

ftamburin Aug 1, 2021 Author

muntasir2000 Aug 28, 2021

titu1994 Aug 28, 2021 Maintainer

Dreahim Jan 31, 2024

ftamburin
Jul 28, 2021

Replies: 1 comment 4 replies

VahidooX
Jul 28, 2021
Collaborator

ftamburin Aug 1, 2021
Author

titu1994 Aug 28, 2021
Maintainer