ACE 2005 Evaluation Corpus

This corpus contains the data used for the ACE 2005 evaluation exercise. Training data files were dually annotated by two annotators working independently. Discrepancies between the two versions of each file were then adjudicated by a senior annotator or team leader, resulting in a gold standard file. After adjudication, TIMEX2 values were normalized (for English only).

The distribution of files across domains in the corpus is as follows:

ACE 2005 Evaluation corpus

Domain Domain Code #Docs #Words #TIMEX2 Comments
Broadcast Conversation BC 9 7499 (15%) 142
Broadcast News BN 74 10049 (20%) 322
Conversational Telephone Speech CTS 6 7531 (15%) 70
Newswire NW 34 10410 (20%) 305
Usenet Newsgroups UN 13 7503 (15%) 167
Weblog WL 19 7299 (15%) 148
Total 155 50291 (100%) 1154
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License