TDDiscourse

This repository contains the TDDiscourse dataset, created as part of the paper: "TDDiscourse: A Dataset for Discourse-Level Temporal Ordering of Events" (Naik et al., 2019).

TDDiscourse is a dataset for temporal ordering of events, which specifically focuses on event pairs that are more than one sentence apart in a document.

TDDiscourse was created by augmenting TimeBank-Dense (Cassidy et al., 2014), a corpus of English news articles containing annotations for events and temporal relations based on the TimeML annotation scheme (Sauri et al., 2006).

TimeBank-Dense focuses mainly on event pairs which are in the same or adjacent sentences (though they do include labels for some event pairs which are more than one sentence apart). TDDiscourse was created to address this gap and to turn the focus towards discourse-level temporal ordering, which turns out to be a harder task.

This dataset is divided into the following subsets:

1. TDDMan

TDDMan is manually annotated subset of TDDiscourse, created according to the guidelines described in (Naik et al., 2019). The TDDMan sub-folder contains the train, dev, and test splits for this subset. Additionally, it also contains a file which provides extra annotations for a small sample (n=107) from the test set. These extra annotations label each event pair with the phenomena required to deduce the correct temporal relation for the pair. A complete list of the phenomena and some prelimiary phenomenon distribution statistics can be found in section 6 of (Naik et al., 2019).

2. TDDAuto

TDDAuto is the automatically generated subset of TDDiscourse, created according by the heuristic algorithm described in (Naik et al., 2019). The TDDAuto sub-folder contains the train, dev, and test splits for this subset. It also contains a file which provides extra phenomenon annotations for a small sample (n=110) from the test set, similar to the one in TDDMan.

Dataset Statistics:

The following table gives a brief overview of the train, dev, and test splits for TDDiscourse:

Subset	Train	Dev	Test
TDDMan	4000	650	1500
TDDAuto	32607	1435	4258

Temporal Relation Labels:

TDDiscourse uses the following temporal relations:

Label	Relation
a	After
b	Before
i	Includes
ii	Is Included
s	Simultaneous

For any questions or to report any issues with the data, please reach out to: [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
TDDAuto		TDDAuto
TDDMan		TDDMan
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TDDiscourse

1. TDDMan

2. TDDAuto

Dataset Statistics:

Temporal Relation Labels:

About

Releases

Packages

aakanksha19/TDDiscourse

Folders and files

Latest commit

History

Repository files navigation

TDDiscourse

1. TDDMan

2. TDDAuto

Dataset Statistics:

Temporal Relation Labels:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages