Who Knows What in German Dramas? A Composite Annotation Scheme for Knowledge Transfer

This repository contains code and data to reproduce results reported in a publication submitted in the Journal of Computational Literary Studies.

Contents of this repository:

`data`

This folder contains the annotated plays that are reported in the article. The plays are provided both in the format as used by the annotation tool (CorefAnnotator), as well as CSV and TEI/XML files exported from the annotation tool. The CSV files are used for the analysis. The TEI files are used to investigate how many annotations per 1000 tokens occur in the texts, presented in Section 5.1.

`section-4`: Calculating Inter-Annotator Agreement

This folder contains the code needed to calculate inter-annotator agreement with Gamma.

With bash on a Unix system, you can run it with python3 iaa.py ../data/round-2/V1/csv/guenderode-udohla_0?.csv, to compare the two annotations of Günderrodes' Udohla. The output is a line formatted to be used as a LaTeX table.

To generate an entire table, you can use the following command:

for i in $( ls ../data/round-2/V1/csv/*01.csv)
do 
    python3 iaa.py $i ${i/01/02}
done

This will iterate over all files in data/round-2/V1, and call the python script for each file. The python script gets the versions by two annotators as arguments.

Performance

The script makes use of the pygamma-agreement library, which in turn relies on a highly optimized library for integer linear programming. Please follow their installation instructions to use the CBC solver.

`section-5`: Analysing Annotated Knowledge Transfers

Python script (Python version 3.10.1)

The python script can be run using the command

$ python3 annotations_per_x_tokens.py ../data --xtokens 1000

No further packages need to be installed.

R scripts (R version 4.1.2)

To install the needed packages for the R scripts, issue the following command in a R console:

> install.packages(c("DramaAnalysis", "ggplot2", "igraph", "kableExtra", "knitr", "reshape2", "tidyverse"))

All R scripts can either be run in RStudio or in the console using the command Rscript $PATH_TO_R_SCRIPT. The plots generated by the R scripts can be found in the folder plots after running the scripts.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
section-4		section-4
section-5		section-5
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Who Knows What in German Dramas? A Composite Annotation Scheme for Knowledge Transfer

`data`

`section-4`: Calculating Inter-Annotator Agreement

Performance

`section-5`: Analysing Annotated Knowledge Transfers

Python script (Python version 3.10.1)

R scripts (R version 4.1.2)

About

Releases

Packages

Contributors 2

Languages

quadrama/jcls2022

Folders and files

Latest commit

History

Repository files navigation

Who Knows What in German Dramas? A Composite Annotation Scheme for Knowledge Transfer

data

section-4: Calculating Inter-Annotator Agreement

Performance

section-5: Analysing Annotated Knowledge Transfers

Python script (Python version 3.10.1)

R scripts (R version 4.1.2)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

`data`

`section-4`: Calculating Inter-Annotator Agreement

`section-5`: Analysing Annotated Knowledge Transfers

Packages