You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many chemical compounds seem to have their labels mixed among them for languages different from English (es, fr, ar,...). For instance, for http://dbpedia.org/resource/Cholesterol there are more than 900 labels in Spanish, including many clearly not corresponding to it like: "Cocaina"...
Using the following query, many resources with more than 900 labels in Spanish are detected:
SELECT ?concept (COUNT(?label) AS ?count)
FROM <http://dbpedia.org>
WHERE {
?concept rdfs:label ?label
FILTER(LANG(?label) = 'es')
} GROUP BY ?concept
HAVING (COUNT(?label) > 900)
Example DBpedia resource URL(s)
http://dbpedia.org/resource/Cholesterol
Other
Reducing the threshold to more than 100 labels, many other kinds of resources (including people) are also present. They seem also incorrect, like: https://dbpedia.org/page/Alexandra_of_Denmark
The text was updated successfully, but these errors were encountered:
This is an example of a corruption that entered the release-workflow at some point in the recent past.
We've also seen chemical label problem.
In an earlier release, both a synonym and language label were more accurate than recent releases.
Similarly, we reported image corruption.
While some problems have been corrected, many images are just plain wrong.
Again, these problems did not exist in earlier releases, but unfortunately I don't have screen shots of correct-data that I can contrast with incorrect-data.
The bottom line: the quality of DBpedia data has degraded.
New releases may have more items, but the fidelity of older items has been degraded during transitions.
How can we help restore higher quality data from previous releases?
Issue validity
The version is currently available from https://dbpedia.org/sparql
Error Description
Many chemical compounds seem to have their labels mixed among them for languages different from English (es, fr, ar,...). For instance, for http://dbpedia.org/resource/Cholesterol there are more than 900 labels in Spanish, including many clearly not corresponding to it like: "Cocaina"...
Pinpointing the source of the error
Details
Using the following query, many resources with more than 900 labels in Spanish are detected:
Reducing the threshold to more than 100 labels, many other kinds of resources (including people) are also present. They seem also incorrect, like: https://dbpedia.org/page/Alexandra_of_Denmark
The text was updated successfully, but these errors were encountered: