ACE 2005 Evaluation Corpus
This corpus contains the data used for the ACE 2005 evaluation exercise. Training data files were dually annotated by two annotators working independently. Discrepancies between the two versions of each file were then adjudicated by a senior annotator or team leader, resulting in a gold standard file. After adjudication, TIMEX2 values were normalized (for English only).
The distribution of files across domains in the corpus is as follows:
ACE 2005 Evaluation corpus
Domain | Domain Code | #Docs | #Words | #TIMEX2 | Comments |
---|---|---|---|---|---|
Broadcast Conversation | BC | 9 | 7499 (15%) | 142 | |
Broadcast News | BN | 74 | 10049 (20%) | 322 | |
Conversational Telephone Speech | CTS | 6 | 7531 (15%) | 70 | |
Newswire | NW | 34 | 10410 (20%) | 305 | |
Usenet Newsgroups | UN | 13 | 7503 (15%) | 167 | |
Weblog | WL | 19 | 7299 (15%) | 148 | |
Total | 155 | 50291 (100%) | 1154 |
page revision: 1, last edited: 11 Jan 2008 21:23