Issue with WER evaluation when using a pre-trained model #7893

vdovichevnick · 2023-11-15T20:29:59Z

vdovichevnick
Nov 15, 2023

I'm using a pre-trained model:
model = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name="stt_ru_quartznet15x5")
Loading my data set:

test_manifest = "/content/drive/MyDrive/sberNeMo/test/crowd/manifest.jsonl"
test_dataset = nemo_asr.data.audio_to_text.AudioToCharDataset(
    manifest_filepath=test_manifest,
    labels=model.decoder.vocabulary,
    sample_rate=16000 
)

It loads correctly, I can play any wav file:

import torchaudio
from IPython.display import Audio, display

# Choose audio index
sample_index = 100
audio_signal, audio_signal_length, _, _ = test_dataset[sample_index]

audio_signal = audio_signal.float()
# PlayAudio
display(Audio(audio_signal.numpy(), rate=16000))

So, I need to evaluate the model on this dataset without training. How I do that?
I'm trying this:

# Create WER-metric
wer_metric = WER(decoding="greedy")

# Create DataLoader
batch_size = 1  
test_sampler = torch.utils.data.SequentialSampler(test_dataset)
test_batch_sampler = torch.utils.data.BatchSampler(test_sampler, batch_size=batch_size, drop_last=False)
test_dataloader = DataLoader(test_dataset, batch_sampler=test_batch_sampler)

all_hypotheses = []
all_references = []

with torch.no_grad():
    for batch in test_dataloader:
        audio_signal, audio_signal_length, _, transcripts = batch
        audio_signal = audio_signal.to(model.device)
        audio_signal_length = audio_signal_length.to(model.device)
        logits, logits_len, _ = model(input_signal=audio_signal, input_signal_length=audio_signal_length)

        # Use of GreedyCTCDecoder 
        greedy_decoder = GreedyCTCDecoder()

        hypotheses = greedy_decoder(log_probs=logits, log_probs_length=logits_len)

        all_hypotheses.extend(hypotheses)
        all_references.extend(transcripts)

wer_value = wer_metric(predictions=all_hypotheses, references=all_references)

print(f"Word Error Rate (WER): {wer_value * 100:.2f}%")

But I can't import GreedyCTCDecoder from NeMo's library
I tried import GreedyCTCDecoder from :' nemo.collections.asr.modules.ctc', 'nemo.collections.asr.modules', 'nemo.collections.asr.modules'
Where can I find GreedyCTCDecoder? Or what other ways to evaluate the model?

Answered by titu1994

Nov 15, 2023

If you just need transcription, you can simply use model.transcribe() and pass a list of files. If there's a whole manifest of files, then you can use the transcribe_speech.py script in ASR examples.

For your code though, wer Decoding can be found for ctc under asr.metrics.wer.py - https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/asr/metrics/wer.py#L993

This is Char decoding, for subwords use CTCBPEDecoding in ctc_bpe.py

View full answer

titu1994 · 2023-11-15T21:17:59Z

titu1994
Nov 15, 2023
Maintainer

If you just need transcription, you can simply use model.transcribe() and pass a list of files. If there's a whole manifest of files, then you can use the transcribe_speech.py script in ASR examples.

For your code though, wer Decoding can be found for ctc under asr.metrics.wer.py - https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/asr/metrics/wer.py#L993

This is Char decoding, for subwords use CTCBPEDecoding in ctc_bpe.py

1 reply

vdovichevnick Nov 16, 2023
Author

Thanks for your advice! Model.transcribe() and passing a list of files works well

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with WER evaluation when using a pre-trained model #7893

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Issue with WER evaluation when using a pre-trained model #7893

vdovichevnick Nov 15, 2023

Replies: 1 comment · 1 reply

titu1994 Nov 15, 2023 Maintainer

vdovichevnick Nov 16, 2023 Author

vdovichevnick
Nov 15, 2023

Replies: 1 comment 1 reply

titu1994
Nov 15, 2023
Maintainer

vdovichevnick Nov 16, 2023
Author