Skip to content

Commit

Permalink
feat: Adding WER.
Browse files Browse the repository at this point in the history
  • Loading branch information
LuchoTurtle committed Dec 19, 2023
1 parent 94ff0ec commit 84fccc4
Show file tree
Hide file tree
Showing 2 changed files with 52 additions and 1 deletion.
51 changes: 50 additions & 1 deletion _comparison/metrics.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -330,7 +330,7 @@
},
{
"cell_type": "code",
"execution_count": 28,
"execution_count": 29,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -379,6 +379,55 @@
"\n",
"Before all of this, we download [`punkt`](https://www.nltk.org/api/nltk.tokenize.punkt.html), a tokenizer model. It is used to divide a text into a list of sentences by using an unsupervised algorithm to build a model for abbreviation words, collocations, and words that start sentences."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Word Error Rate\n",
"\n",
"The [**Word Error Rate (WER)**](https://en.wikipedia.org/wiki/Word_error_rate) is a common metric for evaluating the performance of a speech recognition or machine translation system. It compares a reference text to a hypothesis text, and it is calculated as the number of substitutions, insertions, and deletions needed to change the hypothesis into the reference, divided by the number of words in the reference."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll now focus on adding the `Word Error Rate` in our `df` dataframe."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [],
"source": [
"import jiwer\n",
"\n",
"# Function to calculate WER for a single row\n",
"def calculate_wer(row):\n",
" # Assuming 'original_caption' is the reference and 'prediction' is the hypothesis\n",
" reference = row['original_caption']\n",
" hypothesis = row['prediction']\n",
" # Calculate WER using jiwer\n",
" wer_score = jiwer.wer(reference, hypothesis)\n",
" return wer_score\n",
"\n",
"# Apply the calculate_wer function to each row in df\n",
"df['Word_error_rate'] = df.apply(calculate_wer, axis=1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's how we can interpret the WER score:\n",
"\n",
"- **`WER = 0`**: This means that the hypothesis (the generated text) matches the reference (the target text) perfectly. There are no errors at all.\n",
"- **`0 < WER < 1`**: The hypothesis has errors, but the number of errors is less than the number of words in the reference. This indicates that there are some mistakes, but more than half of the words are correct.\n",
"- **`WER = 1`**: The number of errors is equal to the number of words in the reference. This could mean that every word is wrong, or that the hypothesis is of the same length as the reference but completely different.\n",
"- **`WER > 1`**: The hypothesis is so inaccurate that the number of errors exceeds the number of words in the reference. This can happen if the hypothesis is longer than the reference and contains many incorrect words."
]
}
],
"metadata": {
Expand Down
2 changes: 2 additions & 0 deletions _comparison/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ importlib_metadata=7.0.0=hd8ed1ab_0
ipykernel=6.26.0=pyh3cd1d5f_0
ipython=8.18.1=pyh707e725_3
jedi=0.19.1=pyhd8ed1ab_0
jiwer=3.0.3=pypi_0
joblib=1.3.2=pyhd8ed1ab_0
jpeg=9e=h80987f9_1
jupyter_client=8.6.0=pyhd8ed1ab_0
Expand Down Expand Up @@ -102,6 +103,7 @@ python-dateutil=2.8.2=pyhd8ed1ab_0
python-tzdata=2023.3=pyhd3eb1b0_0
pytz=2023.3.post1=py312hca03da5_0
pyzmq=25.1.0=py312h313beb8_0
rapidfuzz=3.5.2=pypi_0
readline=8.2=h1a28f6b_0
regex=2023.10.3=py312h80987f9_0
requests=2.31.0=py312hca03da5_0
Expand Down

0 comments on commit 84fccc4

Please sign in to comment.