-
Notifications
You must be signed in to change notification settings - Fork 0
Home
On this page we provide additional material related to our K-CAP 2017 submission "Induction of Property-Specific Dependency Graph Patterns for Relation Detection via Frequent Subgraph Mining", such as the patterns that we found, training data, the gold standard, evaluation material, and evaluation results.
Abstract of our paper: Relation detection is the task of detecting whether within a text a certain relation, e.g., that someone is the author of something, is expressed. This task is challenging because, due to the variability of natural language, there can be hundreds of ways in which a relation can be expressed. In this paper we present an approach that mines frequent relation-specific graph patterns from dependency parses of example sentences collected via distant supervision. Frequent Subgraph Mining is applied to derive relation-specific dependency graph patterns that can be applied for relation detection. We show the feasibility of our approach using the Japanese Wikipedia as a text source and DBpedia as a data source. An automatic evaluation as well as an evaluation incorporating experts show that the patterns detect relations in Japanese texts with high precision. The patterns show some degree of generalization from training data to test data, but the approach is not immune to errors introduced by distant supervision.
Excerpt from file fusilli-author-train-10-80-0.01-0.2-5-100-0.7
30 313 1 false 0 30 313 0.9 false 0 30 313 0.8 false 0 30 313 0.7 false 1 30 313 0.6 false 1 30 313 0.5 true 0 200 313 0.7 false 1 200 313 0.6 false 1 200 313 0.5 true 0 200 313 0.7 false 1 200 313 0.6 false 1 200 313 0.5 true 0
Meaning: This is an excerpt from the protocol related to the property dbo:author (see filename of the protocol). For example, the first line expressed that given a timeout value of 30 minutes (first column), given the cluster with ID 313 (second column), given the tau value 1 (third column), no timeout occurred ("false" in fourth column) and no patterns were induced ("0" in fifth column). Given a higher timeout value of 200 minutes, frequent subgraph mining is not applied on that cluster with tau=1 anymore, since no timeout occurred for that cluster with that tau value for smaller timeout values.
Links to protocol files for each property: author, bandMember, crosses, foundingYear, languageFamily, locationCity, nationality, occupation, parent, yearOfConstruction.
cnt_available_clusters: 1427 cnt_clusters_processed: 1386 rel_clusters_processed: 97.12 % cnt_clusters_where_we_found_patterns: 250 cnt_clusters_where_we_found_no_patterns: 1136 total_pattern_cnt: 859
For each property, for how many clusters did we find patterns:
author -> 28 / 174 (16.09) bandMember -> 22 / 62 (35.48) crosses -> 1 / 9 (11.11) foundingYear -> 65 / 309 (21.03) languageFamily -> 9 / 55 (16.36) locationCity -> 12 / 75 (16) nationality -> 9 / 134 (6.71) occupation -> 63 / 242 (26.03) parent -> 40 / 316 (12.65) yearOfConstruction -> 1 / 10 (10)
For each property, the number of patterns that we found:
author: 94 bandMember: 65 crosses: 1 foundingYear: 262 languageFamily: 35 locationCity: 38 nationality: 20 occupation: 225 parent: 118 yearOfConstruction: 1
Each file is related to one cluster. Each file may contain multiple patterns.
author: 1 2 3 4 5 6 7 8 bandMember: 1 2 3 4 5 6 foundingYear: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 languageFamily: 1 2 locationCity: 1 nationality: 1 occupation: 1 2 3 4 5 6 7 8 9 parent: 1 2 3 4 5 6 7 8 9 yearOfConstruction: 1
author: 1 2 3 4 5 6 7 8 bandMember: 1 2 3 4 5 6 foundingYear: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 languageFamily: 1 2 locationCity: 1 nationality: 1 occupation: 1 2 3 4 5 6 7 8 9 parent: 1 2 3 4 5 6 7 8 9 yearOfConstruction: 1
author: 1 2 3 4 5 6 7 8 bandMember: 1 2 3 4 5 6 foundingYear: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 languageFamily: 1 2 locationCity: 1 nationality: 1 occupation: 1 2 3 4 5 6 7 8 9 parent: 1 2 3 4 5 6 7 8 9 yearOfConstruction: 1
Warning: some images can be up to 30 MB in size.
author: 1 2 3 4 5 6 7 8 bandMember: 1 2 3 4 5 6 foundingYear: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 languageFamily: 1 2 locationCity: 1 nationality: 1 occupation: 1 2 3 4 5 6 7 8 9 parent: 1 2 3 4 5 6 7 8 9 yearOfConstruction: 1
Warning: some images can be up to 7 MB in size.
author: 1 2 3 4 5 6 7 8 bandMember: 1 2 3 4 5 6 foundingYear: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 languageFamily: 1 2 locationCity: 1 nationality: 1 occupation: 1 2 3 4 5 6 7 8 9 parent: 1 2 3 4 5 6 7 8 9 yearOfConstruction: 1
- EvaluationGuidelines.pdf
- EnglishSentences.pdf
- Sample evaluation sheet: Empty-Sheet-1-a.pdf
- Data annotated by evaluators
Sheet2-a.tsv Sheet2-b.tsv Sheet2-c.tsv
Sheet3-a.tsv Sheet3-b.tsv Sheet3-c.tsv
Sheet4-a.tsv Sheet4-b.tsv Sheet4-c.tsv
Sheet5-a.tsv Sheet5-b.tsv Sheet5-c.tsv
Sheet6-a.tsv Sheet6-b.tsv Sheet6-c.tsv
Sheet7-a.tsv Sheet7-b.tsv Sheet7-c.tsv
Sheet8-a.tsv Sheet8-b.tsv Sheet8-c.tsv
Sheet9-a.tsv Sheet9-b.tsv Sheet9-c.tsv
Sheet10-a.tsv Sheet10-b.tsv Sheet10-c.tsv
- Files for the evaluation of inter-annotator agreement:
- Compilation of all answers for all properties and all evaluators: compiled.yml