ACE 2007 Evaluation Corpus

This corpus contains the data used for the ACE 2007 evaluation exercise. Note that the BC, BN, CTS and UN domain components of this data set are the same as those found in the ACE 2005 Evaluation data, so that only the Newswire and Weblog data is new.

The distribution of files across domains in the corpus is as follows:

ACE 2007 Evaluation corpus

Domain Domain Code #Docs #Words #TIMEX2 Comments
Broadcast Conversation BC 9 7499 142
Broadcast News BN 74 10049 322
Conversational Telephone Speech CTS 6 7531 70
Newswire NW 106 898
Usenet Newsgroups UN 13 7503 167
Weblog WL 46 433
Total 254 2032
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License