Skip to content

Evaluation Error Categories

Natalie Prange edited this page Sep 19, 2024 · 2 revisions

This page describes the evaluation error types displayed in the web app.

NER false negatives

Error type Numerator Denominator Example
All A ground truth mention span is not linked to an entity. All ground truth entity mentions. (Any of the examples below.)
Lowercased Undetected lowercased ground truth mention. All lowercased ground truth mentions. Not predicting "countess".
Partially included A part of the ground truth mention is linked to an entity. By "part" we mean a subsequence of whitespace-split words. All ground truth mentions consisting of multiple words. That is, all ground truth mentions containing a whitespace. Only predicting "Spanish-American" where the ground truth is "Spanish-American War".
Partial overlap Undetected mention that overlaps with a predicted mention. All ground truth mentions that are not lowercased. Predicting "The Americans" where the ground truth is "Americans".
Other Undetected mentions which are none of the above types. All ground truth mentions that are not lowercased. Not predicting "Rudolf Senti".

NER false positives

Error type Numerator Denominator Example Comment
All A mention is predicted whose span is not linked in the ground truth. No denominator, absolute numbers are reported. (Any of the examples below.)
Lowercased The predicted mention is lowercased and does not overlap with a ground truth mention. No denominator, absolute numbers are reported. Predicting "passenger trains". These are usually abstract entities.
Ground truth entity unknown The predicted mention is capitalized and the ground truth is "Unknown". No denominator, absolute numbers are reported. Linking "Sean Kelly" to Q156707 where the ground truth is "Unknown" (because the correct Sean Kelly is not contained in Wikidata).
Other NER false positives which are none of the above types. No denominator, absolute numbers are reported. Linking "Eastern", which is not linked in the ground truth.
Wrong span The predicted mention overlaps with a ground truth mention of the same entity, but the spans do not match exactly. All predicted mentions. Linking only the part "Louis IX" to Q213610, where the ground truth links "Louis IX of Hesse-Darmstadt" to Q213610.

Disambiguation errors

Error type Numerator Denominator Example Comment
All A ground truth span was detected, but linked to the wrong entity. All NER true positives. (Any of the examples below.)
Demonym The mention text is a demonym, i.e. it is contained in a list of demonyms from Wikidata. NER true positives where the mention text is a demonym. "Spanish" is linked to "Spain" (the country) but the ground truth is "Spanish" (the language). Defined by the mention text instead of the ground truth or predicted entity because demonym is not a type in Wikidata.
Metonymy The most popular entity for the given mention text is a location, the ground truth is not a location, and the prediction is a location (not necessarily the same location as the ground truth). NER true positives where the most popular candidate is a location but the ground truth is not a location. "Japan" is linked to "Japan" (the country), but the ground truth is "Japan national football team". The most popular entity for the mention text is the entity that corresponds to the Wikipedia article that was linked most often by a hyperlink with the mention text as link text in Wikipedia.
Partial name The mention text is a part of the ground truth entity name. NER true positives that are a part of the ground truth entity name. Linking "Ray" to "Ray Charles" instead of "Ray Dixon". By entity name we mean the entity's label in Wikidata.
Rare The most popular entity for the given mention text was predicted instead of the less popular ground truth entity. NER true positives where the most popular candidate is not the correct entity. Linking "Michael Jordan" to the basketball player instead of the less popular computer scientist (Q3308285).
Other Disambiguation errors that do not fall into the categories above. None Linking "Bombardier" to "Bombardier Transportation" (Q688893) instead of "Bombardier" (Q333498).
Wrong candidates The ground truth entity is not in the candidate set returned by the linker for the mention. All NER true positives. The grund truth entity "USS John R. Pierce" (Q7870570) is not in the candidate set for "John R. Pierce". This error type is only applicable for linkers which return candidate sets.
Multiple candidates The candidate set for the mention contains multiple candidate entities, one of which is the ground truth entity, and the linker chose a wrong entity. NER true positives where the linker returned a candidate set with more than one entity and the ground truth is contained in the candidate set. The linker produced the ground truth entity "Bombardier" (Q333498) as a candidate for "Bombardier", but predicted "Bombardier Transportation" (Q688893). Same.

Why are only the NER true positives counted in the denominators, instead of all ground truth mentions? We want the percentage value to reflect how often the disambiguation went wrong, therefore we only count detected mentions - doing differently would also count the NER false negatives, which we treat as a different error category (see above).

Does that mean a linker that only detects the mentions that it is able to disambiguate correctly has no disambiguation error? Yes.

Clone this wiki locally