This corpus contains the English training data prepared for the 2004 Time Expression Recognition and Normalization (TERN) evaluation. The evaluation was held in August 2004, and the corresponding workshop in September 2004. Evaluation participants received this data for training purposes; the corpus is now publicly available and distributed by the LDC. The corpus consists of 862 documents containing in total 306k words and nearly 9k TIMEX2 expressions. The documents are divided into three subsets:
- ACE2002: this data was originally prepared for the ACE 2002 Relation Detection and Characterization (RDC) evaluation; it was then re-annotated with TIMEX annotations by two annotators and the annotations reconciled.
- ACE2003: this contains the training data used in the ACE 2003 evaluation. For the release contained in this corpus, the files were doubly-annotated for TIMEX2 tags and reconciled.
- ACE2004: this contains the data prepared for the ACE 2004 evaluation. All of the files were doubly-annotated and reconciled.
The corpus is available at LDC under the catalogue number LDC2005T07.
The tables below show the domains, numbers of documents, and number of words and TIMEX expressions in each corpus subset. The words counts are those provided by corpus developers (an informal analysis indicates that our word counts are slightly different).
ACE2002 Subset
Domain | Domain Code | #Docs | #Words | #TIMEX2 | Comments |
---|---|---|---|---|---|
Broadcast News | BN | 85 | 17922 | 628 | |
Newspaper | NP | 17 | 14682 | 337 | |
Newswire | NW | 78 | 34134 | 926 | |
Total | 180 | 66738 | 1891 |
ACE2003 Subset
Domain | Domain Code | #Docs | #Words | #TIMEX2 | Comments |
---|---|---|---|---|---|
Broadcast News | BN | 147 | 34681 | 1050 | |
Newswire | NW | 102 | 58592 | 1547 | |
Total | 249 | 93273 | 2597 |
ACE2004 Subset
Domain | Domain Code | #Docs | #Words | #TIMEX2 | Comments |
---|---|---|---|---|---|
Arabic Treebank (translated) | AT | 58 | 13466 | 526 | No document creation date available |
Broadcast News | BN | 222 | 61621 | 1848 | |
Chinese Treebank (translated) | CT | 37 | 12522 | 365 | No document creation date available |
Newswire | NW | 116 | 58543 | 1711 | |
Total | 433 | 146152 | 4450 |