Corpora

There is a number of annotated corpora which can be used to carry out research on temporal expressions and/or event ordering. Those that can serve as gold standard corpora are:

Corpus Short Description TIMEX version
MUC-6 The corpus from the 6th Message Understanding Conference, available at LDC under the catalogue number LDC2003T13. MUC-6 TIMEX
MUC-7 The corpus from the 7th Message Understanding Conference, available at LDC under the catalogue number LDC2001T02. MUC-7 TIMEX
TIDES This corpus consists of two parts: (1) 95 Spanish dialogs (a part of the Enthusiast corpus) and their English translations; (2) 193 documents of the TDT-2 corpus. Only the first part is available at the Mitre's website on TIMEX2. TIMEX2 2001 v.1.0.2
(June 2001)
ACE-2004 Dev This was the development corpus used at the Automatic Content Extraction (ACE) evaluations in 2004, available at LDC under the catalogue number LDC2005T07. TIMEX2 2003 v.1.3
(April 2004)
ACE-2004 Eval This corpus was used for official evaluation at the ACE 2004 TERN task. TIMEX2 2003 v.1.3
(April 2004)
ACE-2005 Dev This was the development corpus used at the Automatic Content Extraction (ACE) evaluations in 2005, available at LDC under the catalogue number LDC2006T06. TIMEX2
(April 2005)
ACE-2005 Eval This was the evaluation corpus used at the Automatic Content Extraction (ACE) evaluations in 2005. This is not publicly available corpus yet. TIMEX2
(April 2005)
ACE-2007 Dev This was the development corpus, consisting of selected domains in Arabic and Spanish only, used at the Automatic Content Extraction (ACE) evaluations in 2007. This is not publicly available corpus yet. TIMEX2
(April 2005)
ACE-2007 Eval This was the evaluation corpus used at the Automatic Content Extraction (ACE) evaluations in 2007. This is not publicly available corpus yet. TIMEX2
(April 2005)
TimeBank 1.1 The TimeBank corpus in the 1.1 version, available to download from the MITRE website. See release notes. TIMEX3
(TimeML 1.1)
TimeBank 1.2 The TimeBank corpus in the 1.2 version, available at LDC under the catalogue number LDC2006T08. TIMEX3
(TimeML 1.2.1)
WikiWars A corpus of English Wikipedia articles about wars. TIMEX2
(Sep 2005)
WikiWarsDE A German version of WikiWars created from the corresponding German Wikipedia articles. TIMEX2
(Sep 2005)
ModeS TimeBank 1.0 This is a corpus of Modern Spanish (17th and 18th centuries) annotated with temporal and event information expressed in TimeML mark-ups and annotated with spatial information following the SpatialML scheme. TIMEX3 (TimeML)
French TimeBank The French TimeBank is a corpus annotated with the ISO-TimeML temporal annotation standard. Events and temporal expressions appearing in the texts, as well as the temporal, aspectual and modal subordination relations that hold among these entities are annotated. TIMEX3
(ISO-TimeML)
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License