From 84fccc48faf71f8f12f0673b1f34e9cdda60232c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Lui=CC=81s=20Arteiro?= Date: Tue, 19 Dec 2023 19:31:07 +0000 Subject: [PATCH] feat: Adding WER. --- _comparison/metrics.ipynb | 51 +++++++++++++++++++++++++++++++++++- _comparison/requirements.txt | 2 ++ 2 files changed, 52 insertions(+), 1 deletion(-) diff --git a/_comparison/metrics.ipynb b/_comparison/metrics.ipynb index b6b759d..75dc6cd 100644 --- a/_comparison/metrics.ipynb +++ b/_comparison/metrics.ipynb @@ -330,7 +330,7 @@ }, { "cell_type": "code", - "execution_count": 28, + "execution_count": 29, "metadata": {}, "outputs": [ { @@ -379,6 +379,55 @@ "\n", "Before all of this, we download [`punkt`](https://www.nltk.org/api/nltk.tokenize.punkt.html), a tokenizer model. It is used to divide a text into a list of sentences by using an unsupervised algorithm to build a model for abbreviation words, collocations, and words that start sentences." ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Word Error Rate\n", + "\n", + "The [**Word Error Rate (WER)**](https://en.wikipedia.org/wiki/Word_error_rate) is a common metric for evaluating the performance of a speech recognition or machine translation system. It compares a reference text to a hypothesis text, and it is calculated as the number of substitutions, insertions, and deletions needed to change the hypothesis into the reference, divided by the number of words in the reference." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We'll now focus on adding the `Word Error Rate` in our `df` dataframe." + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [], + "source": [ + "import jiwer\n", + "\n", + "# Function to calculate WER for a single row\n", + "def calculate_wer(row):\n", + " # Assuming 'original_caption' is the reference and 'prediction' is the hypothesis\n", + " reference = row['original_caption']\n", + " hypothesis = row['prediction']\n", + " # Calculate WER using jiwer\n", + " wer_score = jiwer.wer(reference, hypothesis)\n", + " return wer_score\n", + "\n", + "# Apply the calculate_wer function to each row in df\n", + "df['Word_error_rate'] = df.apply(calculate_wer, axis=1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here's how we can interpret the WER score:\n", + "\n", + "- **`WER = 0`**: This means that the hypothesis (the generated text) matches the reference (the target text) perfectly. There are no errors at all.\n", + "- **`0 < WER < 1`**: The hypothesis has errors, but the number of errors is less than the number of words in the reference. This indicates that there are some mistakes, but more than half of the words are correct.\n", + "- **`WER = 1`**: The number of errors is equal to the number of words in the reference. This could mean that every word is wrong, or that the hypothesis is of the same length as the reference but completely different.\n", + "- **`WER > 1`**: The hypothesis is so inaccurate that the number of errors exceeds the number of words in the reference. This can happen if the hypothesis is longer than the reference and contains many incorrect words." + ] } ], "metadata": { diff --git a/_comparison/requirements.txt b/_comparison/requirements.txt index d4e3f50..7bd7172 100644 --- a/_comparison/requirements.txt +++ b/_comparison/requirements.txt @@ -37,6 +37,7 @@ importlib_metadata=7.0.0=hd8ed1ab_0 ipykernel=6.26.0=pyh3cd1d5f_0 ipython=8.18.1=pyh707e725_3 jedi=0.19.1=pyhd8ed1ab_0 +jiwer=3.0.3=pypi_0 joblib=1.3.2=pyhd8ed1ab_0 jpeg=9e=h80987f9_1 jupyter_client=8.6.0=pyhd8ed1ab_0 @@ -102,6 +103,7 @@ python-dateutil=2.8.2=pyhd8ed1ab_0 python-tzdata=2023.3=pyhd3eb1b0_0 pytz=2023.3.post1=py312hca03da5_0 pyzmq=25.1.0=py312h313beb8_0 +rapidfuzz=3.5.2=pypi_0 readline=8.2=h1a28f6b_0 regex=2023.10.3=py312h80987f9_0 requests=2.31.0=py312hca03da5_0