feat: Adding WER.

dwyl · Dec 19, 2023 · 84fccc4 · 84fccc4
1 parent 94ff0ec
commit 84fccc4
Show file tree

Hide file tree

Showing 2 changed files with 52 additions and 1 deletion.
diff --git a/_comparison/metrics.ipynb b/_comparison/metrics.ipynb
@@ -330,7 +330,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 28,
+   "execution_count": 29,
    "metadata": {},
    "outputs": [
     {
@@ -379,6 +379,55 @@
     "\n",
     "Before all of this, we download [`punkt`](https://www.nltk.org/api/nltk.tokenize.punkt.html), a tokenizer model. It is used to divide a text into a list of sentences by using an unsupervised algorithm to build a model for abbreviation words, collocations, and words that start sentences."
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Word Error Rate\n",
+    "\n",
+    "The [**Word Error Rate (WER)**](https://en.wikipedia.org/wiki/Word_error_rate) is a common metric for evaluating the performance of a speech recognition or machine translation system. It compares a reference text to a hypothesis text, and it is calculated as the number of substitutions, insertions, and deletions needed to change the hypothesis into the reference, divided by the number of words in the reference."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We'll now focus on adding the `Word Error Rate` in our `df` dataframe."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import jiwer\n",
+    "\n",
+    "# Function to calculate WER for a single row\n",
+    "def calculate_wer(row):\n",
+    "    # Assuming 'original_caption' is the reference and 'prediction' is the hypothesis\n",
+    "    reference = row['original_caption']\n",
+    "    hypothesis = row['prediction']\n",
+    "    # Calculate WER using jiwer\n",
+    "    wer_score = jiwer.wer(reference, hypothesis)\n",
+    "    return wer_score\n",
+    "\n",
+    "# Apply the calculate_wer function to each row in df\n",
+    "df['Word_error_rate'] = df.apply(calculate_wer, axis=1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here's how we can interpret the WER score:\n",
+    "\n",
+    "- **`WER = 0`**: This means that the hypothesis (the generated text) matches the reference (the target text) perfectly. There are no errors at all.\n",
+    "- **`0 < WER < 1`**: The hypothesis has errors, but the number of errors is less than the number of words in the reference. This indicates that there are some mistakes, but more than half of the words are correct.\n",
+    "- **`WER = 1`**: The number of errors is equal to the number of words in the reference. This could mean that every word is wrong, or that the hypothesis is of the same length as the reference but completely different.\n",
+    "- **`WER > 1`**: The hypothesis is so inaccurate that the number of errors exceeds the number of words in the reference. This can happen if the hypothesis is longer than the reference and contains many incorrect words."
+   ]
   }
  ],
  "metadata": {

diff --git a/_comparison/requirements.txt b/_comparison/requirements.txt
@@ -37,6 +37,7 @@ importlib_metadata=7.0.0=hd8ed1ab_0
 ipykernel=6.26.0=pyh3cd1d5f_0
 ipython=8.18.1=pyh707e725_3
 jedi=0.19.1=pyhd8ed1ab_0
+jiwer=3.0.3=pypi_0
 joblib=1.3.2=pyhd8ed1ab_0
 jpeg=9e=h80987f9_1
 jupyter_client=8.6.0=pyhd8ed1ab_0
@@ -102,6 +103,7 @@ python-dateutil=2.8.2=pyhd8ed1ab_0
 python-tzdata=2023.3=pyhd3eb1b0_0
 pytz=2023.3.post1=py312hca03da5_0
 pyzmq=25.1.0=py312h313beb8_0
+rapidfuzz=3.5.2=pypi_0
 readline=8.2=h1a28f6b_0
 regex=2023.10.3=py312h80987f9_0
 requests=2.31.0=py312hca03da5_0